Learn by reading through in order

Text-Shaping Tools — sort / uniq / cut / wc / tr / tee

Q: Which lines does uniq treat as duplicates?

Only identical lines that are next to each other

Q: What does cut -d',' -f1 users.csv output?

Only the first comma-separated field

Q: What happens when you run echo hi | tee out.txt?

It shows on screen and writes to out.txt at the same time

Practice sort -n ordering, sort | uniq -c duplicate counts, cut -d',' -f1 column extraction, wc -l line counts, tr 'a-z' 'A-Z' translation, and tee's screen-and-file split, one command at a time with diagrams and a terminal.

Sorting Lines — sort

In this article you'll practice six text-shaping commands one at a time: sort / uniq / cut / wc / tr / tee. First up is sort. It rearranges the input lines in dictionary order (character by character). Use sort -n to order lines as numbers, and sort -r for descending order. In dictionary order 10 comes before 2 (the first characters 1 and 2 are compared), so sorting numbers needs -n.

Form	Meaning
`sort`	Sort lines in dictionary order
`sort -n`	Sort lines by numeric value
`sort -r`	Reverse the order (descending)

sort is dictionary order, sort -n is numeric

Dictionary order compares the first characters, so 10 comes before 2. Add -n for numeric order.

printf 'cherry\napple\nbanana\n' > words.txt   # create a 3-line material file
sort words.txt                                   # apple banana cherry (dictionary order)
sort -r words.txt                                # cherry banana apple (descending)
printf '100\n9\n25\n' > scores.txt              # create a numeric material file
sort scores.txt                                  # 100 25 9 (dictionary order puts 1 first)
sort -n scores.txt                               # 9 25 100 (numeric order)

① Create a three-line numeric material file with printf '10\n2\n30\n' > nums.txt.

② Sort it with sort nums.txt and check that in dictionary order 10 comes before 2.

③ Add the numeric-order option to sort and check that the order becomes 2 → 10 → 30.

④ Then add the reverse option and check that the lines come out largest first. (If you run it correctly, an explanation will appear.)

Linux console

0 / 4 completed

Loading Linux Terminal...

Collapsing Duplicates — uniq

uniq collapses identical lines that sit next to each other into one. It never looks at duplicates further apart, so you normally run sort first to bring equal lines together and pipe the result in. Add uniq -c and each line gets prefixed with its number of occurrences — an instant per-category tally.

Form	Meaning
`uniq`	Collapse adjacent identical lines into one
`sort file.txt \| uniq`	Sort first so duplicates further apart collapse too
`uniq -c`	Prefix each line with its occurrence count

Sort first, then collapse with uniq

uniq only sees adjacent duplicates, so run sort first to bring equal lines together.

printf 'banana\napple\nbanana\napple\n' > items.txt   # create a 4-line material file
sort items.txt                                          # apple apple banana banana
sort items.txt | uniq                                   # apple banana (duplicates removed)
sort items.txt | uniq -c                                # 2 apple / 2 banana (occurrence counts)

① Create a five-line material file with duplicates using printf 'pear\nfig\npear\nfig\nfig\n' > fruit.txt.

② Sort it with sort fruit.txt and check that the identical lines end up next to each other.

③ Pipe the sort output into uniq and check that the duplicates collapse into one.

④ In the same pipe, add the option that gives uniq occurrence counts, and check that each line is prefixed with its count.

Linux console

0 / 4 completed

Loading Linux Terminal...

Extracting Columns — cut

cut splits each line on a delimiter and keeps only the fields you ask for. The delimiter goes after -d and the field number after -f, so cut -d',' -f1 pulls out the first column of a CSV. For multiple fields, list them with commas like -f1,3. It's the tool for grabbing just the columns you need from CSVs or colon-separated config files.

Form	Meaning
`cut -d',' -f1`	Extract field 1 of comma-separated lines
`cut -d':' -f1`	Extract field 1 of `:`-separated lines
`cut -d',' -f1,3`	Extract fields 1 and 3 together

cut splits on a delimiter and picks fields

The line is split on the -d delimiter, and only the fields numbered by -f are output.

printf 'root:x:0\nuser:x:1000\n' > passwd.txt   # create a colon-separated material file
cut -d':' -f1 passwd.txt                          # field 1 only (the names)
cut -d':' -f1,3 passwd.txt                        # fields 1 and 3

① Create a comma-separated material file with printf 'sato,30,tokyo\nito,25,osaka\n' > people.csv.

② Use cut with the delimiter set to a comma to extract only field 1 (the names).

③ Then extract fields 1 and 3 together and check how multiple fields are specified.

Linux console

0 / 3 completed

Loading Linux Terminal...

Counting Lines, Words, and Bytes — wc

wc counts the lines, words, and bytes of its input. With no options it prints all three numbers; wc -l prints only the line count, wc -w only the word count, and wc -c only the byte count. The one you'll use most is wc -l — fed from a pipe, as in ls | wc -l, it answers questions like "how many files are there?" or "how many matching lines?"

Form	Meaning
`wc`	Print line, word, and byte counts together
`wc -l`	Print only the line count
`wc -w`	Print only the word count
`wc -c`	Print only the byte count

wc counts three numbers

Same input — the option decides what gets counted.

printf 'one two three\nfour\n' > draft.txt   # create a 2-line, 4-word material file
wc draft.txt                                  # line, word, and byte counts together
wc -l draft.txt                               # 2 (line count only)
ls | wc -l                                    # fed from a pipe: count the files

① Create a two-line material file with printf 'good morning\nhello\n' > memo.txt.

② Run wc memo.txt and check that three numbers appear: lines, words, and bytes.

③ Add the option that prints only the line count and check that it shows 2.

④ Then add the option that prints only the word count and check that it shows 3.

Linux console

0 / 4 completed

Loading Linux Terminal...

Translating Characters — tr

tr replaces characters from standard input character by character. tr 'a-z' 'A-Z' converts lowercase to uppercase. tr -s squeezes runs of the same character into one (collapsing repeated spaces, for example), and tr -d deletes the characters you specify. tr doesn't take a filename argument — it reads from a pipe or input redirection.

Form	Meaning
`tr 'a-z' 'A-Z'`	Convert lowercase to uppercase
`tr -s ' '`	Squeeze repeated characters into one
`tr -d 'x'`	Delete the specified characters

tr translates, squeezes, and deletes characters

From top to bottom: translation (lowercase to uppercase), squeezing runs (-s), and deletion (-d).

echo 'hello linux' | tr 'a-z' 'A-Z'     # HELLO LINUX
echo 'a   b   c' | tr -s ' '            # squeeze repeated spaces -> a b c
echo 'a-b-c' | tr -d '-'                # delete - -> abc

① Pipe the output of echo 'desktech learn' into tr and convert the lowercase letters to uppercase.

② Next, pipe the output of echo 'x y z' into tr and use the option that squeezes repeated spaces into one.

③ Pipe the output of echo 'a-b-c' into tr and use the delete option to turn it into abc.

Linux console

0 / 3 completed

Loading Linux Terminal...

Splitting Output to Screen and File — tee

tee takes what it receives from a pipe, shows it on screen, and writes it to a file at the same time. Use it when you want a log of intermediate results while still passing the data along to the next command. Chained as command | tee out.txt | next command, it records into out.txt while the pipeline keeps going. To append to an existing file instead of overwriting, use tee -a.

Form	Meaning
`tee out.txt`	Write to both screen and file (overwrite)
`tee -a out.txt`	Show on screen and append to the file

tee sends output to both screen and file

tee shows what it receives on screen, writes it to the file, and you can check later with cat.

ls | tee list.txt              # show the listing and save it to list.txt
cat list.txt                   # it's in the file too
echo 'extra' | tee -a list.txt # -a appends instead of overwriting

① Pipe the output of echo 'one two' into tee to write it to note.txt, and check that the same content also appears on screen.

② Run cat note.txt to check that the same content is in the file as well.

③ Next, append the output of echo 'three' to note.txt using tee's append option.

④ Run cat note.txt again and check that the file has grown to two lines.

Linux console

0 / 3 completed

Loading Linux Terminal...

Answer each question one by one.

Q1Which lines does uniq treat as duplicates?

Q2What does cut -d',' -f1 users.csv output?

Q3What happens when you run echo hi | tee out.txt?

Back to Linux Intermediate

Text-Shaping Tools — sort / uniq / cut / wc / tr / tee

Sorting Lines — sort

Collapsing Duplicates — uniq

Extracting Columns — cut

Counting Lines, Words, and Bytes — wc

Translating Characters — tr

Splitting Output to Screen and File — tee

Knowledge Check