Q1Which lines does uniq treat as duplicates?
Text-Shaping Tools — sort / uniq / cut / wc / tr / tee
Practice sort -n ordering, sort | uniq -c duplicate counts, cut -d',' -f1 column extraction, wc -l line counts, tr 'a-z' 'A-Z' translation, and tee's screen-and-file split, one command at a time with diagrams and a terminal.
Sorting Lines — sort
In this article you'll practice six text-shaping commands one at a time: sort / uniq / cut / wc / tr / tee. First up is sort. It rearranges the input lines in dictionary order (character by character). Use sort -n to order lines as numbers, and sort -r for descending order. In dictionary order 10 comes before 2 (the first characters 1 and 2 are compared), so sorting numbers needs -n.
| Form | Meaning |
|---|---|
sort | Sort lines in dictionary order |
sort -n | Sort lines by numeric value |
sort -r | Reverse the order (descending) |
10 comes before 2. Add -n for numeric order.printf 'cherry\napple\nbanana\n' > words.txt # create a 3-line material file
sort words.txt # apple banana cherry (dictionary order)
sort -r words.txt # cherry banana apple (descending)
printf '100\n9\n25\n' > scores.txt # create a numeric material file
sort scores.txt # 100 25 9 (dictionary order puts 1 first)
sort -n scores.txt # 9 25 100 (numeric order)
Collapsing Duplicates — uniq
uniq collapses identical lines that sit next to each other into one. It never looks at duplicates further apart, so you normally run sort first to bring equal lines together and pipe the result in. Add uniq -c and each line gets prefixed with its number of occurrences — an instant per-category tally.
| Form | Meaning |
|---|---|
uniq | Collapse adjacent identical lines into one |
sort file.txt | uniq | Sort first so duplicates further apart collapse too |
uniq -c | Prefix each line with its occurrence count |
uniq only sees adjacent duplicates, so run sort first to bring equal lines together.printf 'banana\napple\nbanana\napple\n' > items.txt # create a 4-line material file
sort items.txt # apple apple banana banana
sort items.txt | uniq # apple banana (duplicates removed)
sort items.txt | uniq -c # 2 apple / 2 banana (occurrence counts)
Extracting Columns — cut
cut splits each line on a delimiter and keeps only the fields you ask for. The delimiter goes after -d and the field number after -f, so cut -d',' -f1 pulls out the first column of a CSV. For multiple fields, list them with commas like -f1,3. It's the tool for grabbing just the columns you need from CSVs or colon-separated config files.
| Form | Meaning |
|---|---|
cut -d',' -f1 | Extract field 1 of comma-separated lines |
cut -d':' -f1 | Extract field 1 of :-separated lines |
cut -d',' -f1,3 | Extract fields 1 and 3 together |
-d delimiter, and only the fields numbered by -f are output.printf 'root:x:0\nuser:x:1000\n' > passwd.txt # create a colon-separated material file
cut -d':' -f1 passwd.txt # field 1 only (the names)
cut -d':' -f1,3 passwd.txt # fields 1 and 3
Counting Lines, Words, and Bytes — wc
wc counts the lines, words, and bytes of its input. With no options it prints all three numbers; wc -l prints only the line count, wc -w only the word count, and wc -c only the byte count. The one you'll use most is wc -l — fed from a pipe, as in ls | wc -l, it answers questions like "how many files are there?" or "how many matching lines?"
| Form | Meaning |
|---|---|
wc | Print line, word, and byte counts together |
wc -l | Print only the line count |
wc -w | Print only the word count |
wc -c | Print only the byte count |
printf 'one two three\nfour\n' > draft.txt # create a 2-line, 4-word material file
wc draft.txt # line, word, and byte counts together
wc -l draft.txt # 2 (line count only)
ls | wc -l # fed from a pipe: count the files
Translating Characters — tr
tr replaces characters from standard input character by character. tr 'a-z' 'A-Z' converts lowercase to uppercase. tr -s squeezes runs of the same character into one (collapsing repeated spaces, for example), and tr -d deletes the characters you specify. tr doesn't take a filename argument — it reads from a pipe or input redirection.
| Form | Meaning |
|---|---|
tr 'a-z' 'A-Z' | Convert lowercase to uppercase |
tr -s ' ' | Squeeze repeated characters into one |
tr -d 'x' | Delete the specified characters |
-s), and deletion (-d).echo 'hello linux' | tr 'a-z' 'A-Z' # HELLO LINUX
echo 'a b c' | tr -s ' ' # squeeze repeated spaces -> a b c
echo 'a-b-c' | tr -d '-' # delete - -> abc
Splitting Output to Screen and File — tee
tee takes what it receives from a pipe, shows it on screen, and writes it to a file at the same time. Use it when you want a log of intermediate results while still passing the data along to the next command. Chained as command | tee out.txt | next command, it records into out.txt while the pipeline keeps going. To append to an existing file instead of overwriting, use tee -a.
| Form | Meaning |
|---|---|
tee out.txt | Write to both screen and file (overwrite) |
tee -a out.txt | Show on screen and append to the file |
tee shows what it receives on screen, writes it to the file, and you can check later with cat.ls | tee list.txt # show the listing and save it to list.txt
cat list.txt # it's in the file too
echo 'extra' | tee -a list.txt # -a appends instead of overwriting
Knowledge Check
Answer each question one by one.
Q2What does cut -d',' -f1 users.csv output?
Q3What happens when you run echo hi | tee out.txt?