Learn by reading through in order

Text-Shaping Tools — sort / uniq / cut / wc / tr / tee

Practice sort -n ordering, sort | uniq -c duplicate counts, cut -d',' -f1 column extraction, wc -l line counts, tr 'a-z' 'A-Z' translation, and tee's screen-and-file split, one command at a time with diagrams and a terminal.

Sorting Lines — sort

In this article you'll practice six text-shaping commands one at a time: sort / uniq / cut / wc / tr / tee. First up is sort. It rearranges the input lines in dictionary order (character by character). Use sort -n to order lines as numbers, and sort -r for descending order. In dictionary order 10 comes before 2 (the first characters 1 and 2 are compared), so sorting numbers needs -n.

FormMeaning
sortSort lines in dictionary order
sort -nSort lines by numeric value
sort -rReverse the order (descending)
sort is dictionary order, sort -n is numeric
compared as textsort10 2 30nums.txt10 2 30compared as numberssort -n2 10 30
Dictionary order compares the first characters, so 10 comes before 2. Add -n for numeric order.
printf 'cherry\napple\nbanana\n' > words.txt   # create a 3-line material file
sort words.txt                                   # apple banana cherry (dictionary order)
sort -r words.txt                                # cherry banana apple (descending)
printf '100\n9\n25\n' > scores.txt              # create a numeric material file
sort scores.txt                                  # 100 25 9 (dictionary order puts 1 first)
sort -n scores.txt                               # 9 25 100 (numeric order)

① Create a three-line numeric material file with printf '10\n2\n30\n' > nums.txt.

② Sort it with sort nums.txt and check that in dictionary order 10 comes before 2.

③ Add the numeric-order option to sort and check that the order becomes 2 → 10 → 30.

④ Then add the reverse option and check that the lines come out largest first. (If you run it correctly, an explanation will appear.)

Linux console
0 / 4 completed
Loading Linux Terminal...

Collapsing Duplicates — uniq

uniq collapses identical lines that sit next to each other into one. It never looks at duplicates further apart, so you normally run sort first to bring equal lines together and pipe the result in. Add uniq -c and each line gets prefixed with its number of occurrences — an instant per-category tally.

FormMeaning
uniqCollapse adjacent identical lines into one
sort file.txt | uniqSort first so duplicates further apart collapse too
uniq -cPrefix each line with its occurrence count
Sort first, then collapse with uniq
b a asorta b b (adjacent)uniqabduplicates removed
uniq only sees adjacent duplicates, so run sort first to bring equal lines together.
printf 'banana\napple\nbanana\napple\n' > items.txt   # create a 4-line material file
sort items.txt                                          # apple apple banana banana
sort items.txt | uniq                                   # apple banana (duplicates removed)
sort items.txt | uniq -c                                # 2 apple / 2 banana (occurrence counts)

① Create a five-line material file with duplicates using printf 'pear\nfig\npear\nfig\nfig\n' > fruit.txt.

② Sort it with sort fruit.txt and check that the identical lines end up next to each other.

③ Pipe the sort output into uniq and check that the duplicates collapse into one.

④ In the same pipe, add the option that gives uniq occurrence counts, and check that each line is prefixed with its count.

Linux console
0 / 4 completed
Loading Linux Terminal...

Extracting Columns — cut

cut splits each line on a delimiter and keeps only the fields you ask for. The delimiter goes after -d and the field number after -f, so cut -d',' -f1 pulls out the first column of a CSV. For multiple fields, list them with commas like -f1,3. It's the tool for grabbing just the columns you need from CSVs or colon-separated config files.

FormMeaning
cut -d',' -f1Extract field 1 of comma-separated lines
cut -d':' -f1Extract field 1 of :-separated lines
cut -d',' -f1,3Extract fields 1 and 3 together
cut splits on a delimiter and picks fields
cut -d',' -f1sato,30,tokyofield 1satofield 230field 3tokyo-f1: only satosplit into 3 on commas
The line is split on the -d delimiter, and only the fields numbered by -f are output.
printf 'root:x:0\nuser:x:1000\n' > passwd.txt   # create a colon-separated material file
cut -d':' -f1 passwd.txt                          # field 1 only (the names)
cut -d':' -f1,3 passwd.txt                        # fields 1 and 3

① Create a comma-separated material file with printf 'sato,30,tokyo\nito,25,osaka\n' > people.csv.

② Use cut with the delimiter set to a comma to extract only field 1 (the names).

③ Then extract fields 1 and 3 together and check how multiple fields are specified.

Linux console
0 / 3 completed
Loading Linux Terminal...

Counting Lines, Words, and Bytes — wc

wc counts the lines, words, and bytes of its input. With no options it prints all three numbers; wc -l prints only the line count, wc -w only the word count, and wc -c only the byte count. The one you'll use most is wc -l — fed from a pipe, as in ls | wc -l, it answers questions like "how many files are there?" or "how many matching lines?"

FormMeaning
wcPrint line, word, and byte counts together
wc -lPrint only the line count
wc -wPrint only the word count
wc -cPrint only the byte count
wc counts three numbers
memo.txtgood morning / hellowc -lwc -wwc -c2 lines3 words19 bytes
Same input — the option decides what gets counted.
printf 'one two three\nfour\n' > draft.txt   # create a 2-line, 4-word material file
wc draft.txt                                  # line, word, and byte counts together
wc -l draft.txt                               # 2 (line count only)
ls | wc -l                                    # fed from a pipe: count the files

① Create a two-line material file with printf 'good morning\nhello\n' > memo.txt.

② Run wc memo.txt and check that three numbers appear: lines, words, and bytes.

③ Add the option that prints only the line count and check that it shows 2.

④ Then add the option that prints only the word count and check that it shows 3.

Linux console
0 / 4 completed
Loading Linux Terminal...

Translating Characters — tr

tr replaces characters from standard input character by character. tr 'a-z' 'A-Z' converts lowercase to uppercase. tr -s squeezes runs of the same character into one (collapsing repeated spaces, for example), and tr -d deletes the characters you specify. tr doesn't take a filename argument — it reads from a pipe or input redirection.

FormMeaning
tr 'a-z' 'A-Z'Convert lowercase to uppercase
tr -s ' 'Squeeze repeated characters into one
tr -d 'x'Delete the specified characters
tr translates, squeezes, and deletes characters
hellotr 'a-z' 'A-Z'HELLOa b(3 spaces)tr -s ' 'a b(1 space)a-b-ctr -d '-'abc
From top to bottom: translation (lowercase to uppercase), squeezing runs (-s), and deletion (-d).
echo 'hello linux' | tr 'a-z' 'A-Z'     # HELLO LINUX
echo 'a   b   c' | tr -s ' '            # squeeze repeated spaces -> a b c
echo 'a-b-c' | tr -d '-'                # delete - -> abc

① Pipe the output of echo 'desktech learn' into tr and convert the lowercase letters to uppercase.

② Next, pipe the output of echo 'x y z' into tr and use the option that squeezes repeated spaces into one.

③ Pipe the output of echo 'a-b-c' into tr and use the delete option to turn it into abc.

Linux console
0 / 3 completed
Loading Linux Terminal...

Splitting Output to Screen and File — tee

tee takes what it receives from a pipe, shows it on screen, and writes it to a file at the same time. Use it when you want a log of intermediate results while still passing the data along to the next command. Chained as command | tee out.txt | next command, it records into out.txt while the pipeline keeps going. To append to an existing file instead of overwriting, use tee -a.

FormMeaning
tee out.txtWrite to both screen and file (overwrite)
tee -a out.txtShow on screen and append to the file
tee sends output to both screen and file
echo hi| tee out.txt| next commandhi on screenhi in out.txt toopipeline continuescat out.txthi is still there
tee shows what it receives on screen, writes it to the file, and you can check later with cat.
ls | tee list.txt              # show the listing and save it to list.txt
cat list.txt                   # it's in the file too
echo 'extra' | tee -a list.txt # -a appends instead of overwriting

① Pipe the output of echo 'one two' into tee to write it to note.txt, and check that the same content also appears on screen.

② Run cat note.txt to check that the same content is in the file as well.

③ Next, append the output of echo 'three' to note.txt using tee's append option.

④ Run cat note.txt again and check that the file has grown to two lines.

Linux console
0 / 3 completed
Loading Linux Terminal...
QUIZ

Knowledge Check

Answer each question one by one.

Q1Which lines does uniq treat as duplicates?

Q2What does cut -d',' -f1 users.csv output?

Q3What happens when you run echo hi | tee out.txt?