Text Manipulation

CS155 Commands2

Text manipulation

Vocabulary	`script`, `wc`, `grep`, `sort`, `cut`, `uniq`
Punctuation	`;`
Grammar	command [option]... [argument]... [redirection]

Learning More with man and info

Many Unix command have options that modify their behavior.
To learn more about a specific command try
man – show manual for command
info – show info page for command
man and info can answer many of your questions. Consider them travel guides.

script: Recording a session

The script records a terminal session to a file named typescript.
The output file can be changed by running script filename.
The scripting session is ended by typing exit.

wc: Counting Lines and Words

wc is used to get information about the contents of a file.

% cat my_file
user1, old_user
user3
user4
user5 user6
% wc my_file
4 6 40 my_file

That’s 4 lines, 6 words, and 40 bytes. A byte is the same as a character, for now.

Note that a byte means a letter, a number, a space, a dot, a newline character—everything that takes up room.

wc : Some Options

Individual counts can be extracted with options

-c : print the byte count
-l : print the line count
-m : print the character count
-w : print the word count

If multiple files are listed on the command line, then multiple counts are computed, and a total is also given. wc is useful when combined with other commands using | (the pipe symbol).

grep : looking for text in a file

grep is an extremely useful command (it’s also extremely complicated). The simplest use is to search for an exact piece of text in a list of files.

% grep "user1" *
my_file:user1, old_user

The output is filename:line of text. That way, you know which output came from which file.

The -n option causes the line number to be printed with the file name and line.

grep: Simple Patterns

Some symbols can be used for searching for inexact patterns. Alas, * and ? have different meanings than their use in wildcards.

Pattern	Meaning in `grep`	Pattern	Meaning
`.`	Any single character	`^`	start of line
`[aeiou]`	a single vowel	`$`	end of line
`[aeiou]*`	a bunch of vowels	`\?`	zero or one of what came before
`[a-z]`	a lowercase letter	`*`	zero or more of what came before

grep examples

Command	Matches
`grep 'e[as]t' my_file`	“eat” or “west” but not “east”
`grep 'b[oi]y' *html`	“boy” but not “BOY”
`grep 'windows* ' *html`	“window ” and “windows ”
`grep 'window.' *html`	“window-” and “window,” and “windows”
`grep '^Jack' foo`	“Jack” only at the start of the line

sort : reorder lines of text

sort reorders lines of text lexicographically. This means they are sorted alphabetically moving from the first character to the last.

Usage: sort [OPTION]... [FILE]...

-n makes sort do a numeric sort
-r reverses the result of the comparisons
-u removes duplicates while sorting
-t'delimiter' -kpos1,pos2 can be used to sort by column

Uniqueness while sorting by column is only checked on the field(s) specified by -k.

sort: An Example

% cat my_file2
1234
10
10000
5679
% sort my_file2
10
10000
1234
5679
% sort -n my_file2
10
1234
5679
10000
% sort -n <my_file2
(same as above)
% cat my_file2 | sort -n

By default, sort sorts alphabetically.
The -n option says to sort numerically.
sort does not change its input file. Instead, it writes a sorted version to the output stream. This could be your screen, or you could use > to save it somewhere.
This empties the file before sorting—bad!
sort foo >foo

cut: Selecting columns from a file

cut allows us to select columns from a file. Options:

-d "delimiter": Specify your own delimiter, which is between the columns.
-f field-list: A comma-separated list of columns or ranges.

Examples:

     cut -d";" -f3 filename
     cut -f2,3,5,7 -d"/" filename
     grep "x" filename | cut -d"," -f1,3,5-7,9-

Field-based cut examples

% cat data
Alpha,Beta,Gamma,Delta,Epsilon,Zeta
Eta,Theta,Iota,Kappa,Lambda,Mu,Nu,Xi
Omicron,Pi,Rho,Sigma,Tau,Upsilon,Phi
Chi,Psi,Omega

% cut -d"," -f 3 data
Gamma
Iota
Rho
Omega
% cut -d"," -f 5 data
Epsilon
Lambda
Tau

% cut -d"," -f 2,4 data
Beta,Delta
Theta,Kappa
Pi,Sigma
Psi

More field-based cut examples

% cat data
Alpha,Beta,Gamma,Delta,Epsilon,Zeta
Eta,Theta,Iota,Kappa,Lambda,Mu,Nu,Xi
Omicron,Pi,Rho,Sigma,Tau,Upsilon,Phi
Chi,Psi,Omega

% cut -d"," -f 2-4 data
Beta,Gamma,Delta
Theta,Iota,Kappa
Pi,Rho,Sigma
Psi,Omega
% cut -d"," -f 1-3,5-7 data
Alpha,Beta,Gamma,Epsilon,Zeta
Eta,Theta,Iota,Lambda,Mu,Nu
Omicron,Pi,Rho,Tau,Upsilon,Phi
Chi,Psi,Omega

Character-based cut

cut can also use -c character-list to obtain ranges of characters (not fields). No delimiter is needed.

% cat data
Alpha,Beta,Gamma,Delta,Epsilon,Zeta
Eta,Theta,Iota,Kappa,Lambda,Mu,Nu,Xi
Omicron,Pi,Rho,Sigma,Tau,Upsilon,Phi
Chi,Psi,Omega

% cut -c5-12 data
a,Beta,G
Theta,Io
ron,Pi,R
Psi,Omeg
% cut -c1-3,5,7-10 data
AlpaBeta
EtaTeta,
Omirn,Pi
ChiPi,Om
% cut -c10- data
a,Gamma,Delta,Epsilon,Zeta
,Iota,Kappa,Lambda,Mu,Nu,Xi
i,Rho,Sigma,Tau,Upsilon,Phi
mega

uniq: Selecting unique lines from a file

uniq removes repeated lines from a sorted file.

uniq can also be used to print only lines that are unique to a file (with -u) or only those that are repeated (with -d).

The combination of sort, cut, and uniq is a powerful tool for text manipulation in Unix.

A More Complex Example

Consider the example file.

user1 4125142 passwd
user3 1415511 f#afk@
user2 9999999 p_2ad(
user4 1415511 m#@!ad
user5 0011292 lkdfaa

We want to find out how many unique ID numbers there are and also get a list of names and passwords sorted by user name.

Example

% cat my_file3
user1 4125142 passwd
user3 1415511 f#afk@
user2 9999999 p_2ad(
user4 1415511 m#@!ad
user5 0011292 lkdfaa
% cut -f2 -d" " my_file3
4125142
1415511
9999999
1415511
0011292
% cut -f2 -d" " my_file3 | sort -n
0011292
1415511
1415511
4125142
9999999

Example

% cut -f2 -d" " my_file3 | sort -n | uniq
0011292
1415511
4125142
9999999  
% sort my_file3
user1 4125142 passwd
user2 9999999 p_2ad(
user3 1415511 f#afk@
user4 1415511 m#@!ad
user5 0011292 lkdfaa 
% sort my_file3 | cut -f1,3 -d" "
user1 passwd
user2 p_2ad(
user3 f#afk@
user4 m#@!ad
user5 lkdfaa

CS155: Introduction to Unix

Fall 2017

Commands 2

Text Manipulation

Text manipulation

Learning More with man and info

script: Recording a session

wc: Counting Lines and Words

wc : Some Options

grep : looking for text in a file

grep: Simple Patterns

grep examples

sort : reorder lines of text

sort: An Example

cut: Selecting columns from a file

Field-based cut examples

More field-based cut examples

Character-based cut

uniq: Selecting unique lines from a file

A More Complex Example

Example

Example