Play with text in Linux: GREP, CUT, AWK, SED

Reading Time: 6 minutes

Play with text in Linux: Linux is a widely-used open-source operating system that provides a large number of text processing tools. In our everyday work, we need to search text, extract parts of the text, modify the text, and sort text. Linux shell has a number of useful tools that help us do various text processing tasks. In this blog, we are going to learn some most important text processing tools.

Linux Text Processing Tools

Here are the some most important test procession tool that we will discuss in this blog

  • Grep
  • Cut
  • Awk
  • Sed

GREP

GREP is a multi-purpose file search tool that uses Regular Expressions. The grep stands for “global regular expression print,” processes text line by line, and prints any lines which match a specified pattern. The grep command is used for searching the text from the file according to the regular expression. By default, grep displays the matching lines. Grep is considered to be one of the most useful commands on Linux and Unix-like operating systems. grep is a powerful file pattern searcher in Linux.

Syntax:
grep [options] pattern [files]

Most important Options:
-c: Count the number of lines that match a pattern.
-h: Display the matched lines, but do not display the filenames.
-i: Ignore case for matching.
-l: Displays list of a filenames only.
-n: Display the matched lines with line numbers.
-v: Prints all the lines that do not match the pattern.
-e exp: Specifies expression with this option. Can use multiple times.
-f file: Takes patterns from a file.
-E: Treats pattern as an extended regular expression (ERE).
-w: Match whole word.
-o : Print only the matched parts of a matching line.

Example:

Consider the below file as an input:
$cat > grepExample.txt
GREP is a multi-purpose file search tool that uses Regular Expressions.
The grep command is used for searching the text from the file according to the regular expression.
grep is a powerful file pattern searcher in Linux.

1. Case insensitive search  
$ grep -i "GRep" grepExample.txt
output:
GREP is a multi-purpose file search tool that uses Regular Expressions.
The grep command is used for searching the text from the file according to the regular expression.
grep is a powerful file pattern searcher in Linux.
2. Displaying the count of the number of matches 
$ grep -c "grep" grepExample.txt
output
: 2
3. Search the whole words in a file    
$ grep -w "grep" grepExample.txt
output:
The grep command is used for searching the text from the file according to the regular expression.
grep is a powerful file pattern searcher in Linux.
4. Displaying only the matched pattern 
$ grep -o "grep" grepExample.txt
output
:
grep
grep
5. Show line number while displaying the output 
$ grep -n "grep" grepExample.txt
output
:
2:The grep command is used for searching the text from the file according to the regular expression.
3:grep is a powerful file pattern searcher in Linux.
6. Inverting the pattern match
$ grep -v "grep" grepExample.txt
output
:
GREP is a multi-purpose file search tool that uses Regular Expressions.
7. Specifies expression multiple times 
$ grep -e "grep" -e "grep" -e "grep" grepExample.txt
output
:
The grep command is used for searching the text from the file according to the regular expression.
grep is a powerful file pattern searcher in Linux.

CUT

The CUT command is a command-line tool for cutting data from each line of files and writing the result to standard output. It can be used to cut parts of a line by byte position, character, and delimiter. It can also be used to cut data from file formats like CSV.

Basically, the cut command slices a line and extracts the text. It is necessary to specify options with a command otherwise it gives an error.

Syntax:
cut OPTION... [FILE]...
Most important Options: 
-b, --bytes=LIST # select only these bytes
-c, --characters=LIST # select only these characters
-d, --delimiter=DELIM # use DELIM instead of TAB for field delimiter
-f, --fields=LIST # select only these fields; also print any line that contains no delimiter character, unless the -s option is specified
--complement # complement the set of selected bytes, characters or fields
-s, --only-delimited # do not print lines not containing delimiters
--output-delimiter=STRING # use STRING as the output delimiter

Example:

1. How to cut by byte position
$ echo 'knoldus' | cut -b 2
Output: n
2.How to cut by character
$ echo 'knoldus' | cut -c 1-3
output: kno
3.How to cut based on a delimiter
Suppose we have a file employee.txt which have some data given below:
Name Age State
Azmat 25 Delhi
Yatharth 20 Noida
Shubham 22 Delhi

$ cut -d ' ' -f 1 employee.txt
output:
Name
Azmat
Yatharth
Shubham
4.How to cut by complement pattern 
$ echo 'knoldus' | cut --complement -c 1
output: noldus
5.How to modify the output delimiter
$ cut -d ' ' -f 1,2 --output-delimiter='-' employee.txt
output:
Name Age
Azmat-25
Yatharth-20
Shubham-22

AWK

It is a very powerful interpreted programming language which is specially designed for text processing through this we can search, cut, and manipulate text.

It can be used as a field extractor (like cut command), a basic calculator, and as a pattern matcher (like grep command) and It allows the user to use variables, numeric functions, string functions, and logical operators.

Synatax:
awk options 'selection _criteria {action }' input-file
Most important Options: 
-f program-file: Reads the AWK program source from the file program-file, instead of from the first command line argument.
-F fs: Use fs for the input field separator(delimiter)
-v var=val : assign variable

Example:

Consider the following text file as the input file for all cases below.
$cat > employee.txt
azmat trainee devops 12000
shubham trainee devops 25000
yatharth manager devops 50000
shubhrank manager account 47000
1. Print all the lines:
$ awk '{print}' employee.txt
output:
azmat trainee devops 12000
shubham trainee devops 25000
yatharth manager devops 50000
shubhrank manager account 47000
2. Search for pattern:
$ awk '/azmat/' employee.txt
output: azmat trainee devops 12000
3. Spliting a Line Into Fields:
$ awk '/azmat/ {print $1,$4}' employee.txt
output: azmat 12000
4. To find the length of the longest line present in the file:
$ awk '{ if (length($0) > max) max = length($0) } END { print max }' employee.txt
output: 31
5. To count the lines in a file:
$ awk 'END { print NR }' employee.txt
output: 4
6.To find/check for any string in any column:
$ awk '{ if($1 == "azmat") print $0;}' employee.txt
output : azmat trainee devops 12000

SED

Sed is a stream editor. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline). It can perform lots of operations on file like, searching, find and replace, insertion or deletion.

It can also be used to perform complex modifications to streams of data (usually text, but it can be used also to modify binary data).

Stream Editor Workflow: Play with text in Linux
Syntax:
sed OPTIONS... [SCRIPT] [INPUTFILE...]
Most important Options: 
-n : suppress automatic printing of pattern space
-e : add the script to the commands to be executed
-f : add the contents of script-file to the commands to be executed
-i : Changes in orignal file
-r : use extended regular expressions in the script
Operations:
s : for substitution
d : for deletion
p : Print out the pattern space (to the standard output). This command is usually only used in conjunction with the -n command-line option.
g : global replacement
i : Insert data
a
: Append data

Example:

Consider the following text file as the input file for all cases below.
$cat > employee.txt
azmat trainee devops 12000
shubham trainee devops 25000
yatharth manager devops 50000
shubhrank manager account 47000
1. Replacing or substituting string :
$sed 's/azmat/hasan/' employee.txt
output:
hasan trainee devops 12000
shubham trainee devops 25000
yatharth manager devops 50000
shubhrank manager account 4700
2. Replacing the nth occurrence of a pattern in a line : 
$sed 's/azmat/hasan/1' employee.txt
It replaces only the first occurrence of 'azmat' with 'hasan' in each line
3. Replacing all the occurrence of the pattern in a line : 
$sed 's/azmat/hasan/g' employee.txt
It replaces all the occurrences of 'azmat' with 'hasan'.
4. Replacing string on a specific line number : 
$sed '3 s/manager/devops/' employee.txt
output:
azmat trainee devops 12000
shubham trainee devops 25000
yatharth devops devops 50000
shubhrank manager account 4700
5. Printing only the replaced lines:
$sed -n 's/azmat/hasan/p' employee.txt
output: hasan trainee devops 12000
6. Deleting lines from a particular file :
$ sed '4d' employee.txt 
output:
azmat trainee devops 12000
shubham trainee devops 25000
yatharth manager devops 5000

References:

That’s all for now in Play with text in Linux:, I will follow it up with more knowledge on this topic next time.

Thank you for sticking to the end. If you like this blog, please do show your appreciation by giving thumbs-ups and share this blog and give me suggestions on how I can improve my future posts to suit your needs. Follow me to get updates on different technologies.

For more blogs reach us at blog.knoldus.com

knoldus Bottom Image

Written by 

Azmat Hasan is a Software Consultant at Knoldus Software LLP. He has done MCA from CDAC Noida in 2019. He has good knowledge of DevOps technologies i.e docker, Ansible, CI/CD(Jenkins, Bamboo), Kubernetes, Monitoring(Prometheus, Grafana), Logging(ELK Stack), etc. He is a self-motivated, enthusiastic person who believes in striving to achieve what we can sustain over a longer period of time, instead of working for short term benefits. He believe in working together to create synergy.