Who wouldn’t be an operation and maintenance expert if you master these 12 Linux Shell text processing skills?

Latest update time：2024-08-16

Reads：


From: Big CC 
Link: http://www.cnblogs.com/me115/p/3427319.html

Linux Shell is a basic skill. Due to its weird syntax and poor readability, it is usually replaced by Python and other scripts. Since it is a basic skill, you need to master it. After all, in the process of learning Shell scripts, you can still learn a lot about the Linux system.

Not everyone can become a Linux script master, but it is still necessary to use some simple shells to implement some common basic functions.

Below I introduce the most commonly used tools for processing text using Shell under Linux:

find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, awk;

The examples and parameters provided are the most commonly used and practical;

The principle I use for shell scripts is to write commands in single lines, and try not to exceed 2 lines;

If you have more complex task requirements, consider Python;

1. Find file search

Find txt and pdf files


find . \( -name "*.txt" -o -name "*.pdf" \) -print

Regular expression search and



   .txt

pdf


find . -regex  ".*\(\.txt|\.pdf\)$"

-iregex: Ignore case-sensitive regular expressions

Negate the parameter to find all non-txt text


find . ! -name "*.txt" -print

Specifying the search depth

Print out the files in the current directory (depth 1)


find . -maxdepth 1 -type f

Custom Search

Search by Type:


find . -type d -print  //只列出所有目录

-type f file/ l symbolic link

Search by time:

-atime access time (the unit is day, the unit is -amin, similar to the following)
-mtime modification time (content was modified)
-ctime change time (metadata or permission change)

All files accessed in the last 7 days:


find . -atime 7 -type f -print

Search by size:
w font k MG

Find files larger than 2k


find . -type f -size +2k

Search by permission:


find . -type f -perm 644 -print //找具有可执行权限的所有文件

Search by user:


find . -type f -user weber -print// 找用户weber所拥有的文件

Follow-up actions after finding

delete:

Delete all swp files in the current directory:


find . -type f -name "*.swp" -delete

Execute actions (powerful exec)


find . -type f -user root -exec chown weber {} \; //将当前目录下的所有权变更为weber

Note: {} is a special string. For each matching file, {} will be replaced with the corresponding file name.

Eg: copy all found files to another directory:


find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;

Combining multiple commands

Tips: If you need to execute multiple commands later, you can write them into a script. Then execute the script when calling -exec;


-exec ./commands.sh {} \;

-print delimiter

By default, is used as the file delimiter; '\n'

-print0 uses '\0' as the file delimiter, so that you can search for files containing spaces;

2. Grep text search

grep match_patten file // default access matching line

Common parameters

-o only outputs matching lines VS -v only outputs non-matching lines
-c counts the number of times a file contains text

grep -c "text" filename

-n Print matching line numbers
-i Ignore case when searching
-l Print only the file name

Recursive search for text in multiple directories (a favorite for programmers searching for code):


grep "class" . -R -n

Matching multiple patterns


grep -e "class" -e "vitural" file

Grep outputs file names ending with \0: (-z)


grep "test" file* -lZ| xargs -0 rm

3. xargs command line parameter conversion

xargs can convert input data into command line parameters of a specific command; in this way, it can be used in combination with many commands, such as grep and find;

Convert multi-line output to single-line output

cat file.txt| xargs

\n

It is the delimiter between multiple lines of text.

Convert single line to multiple lines of output

cat single.txt | xargs -n 3

-n: Specifies the number of fields to display per line

xargs parameter description

-d Define the delimiter (the default is space and the delimiter for multiple lines is \n)
-n specifies multiple lines of output
-I {} specifies the replacement string, which will be replaced when xargs is expanded. It is used when the command to be executed requires multiple parameters.

For example:


cat file.txt | xargs -I {} ./command.sh -p {} -1

-0: specifies \0 as the input delimiter

Eg: Count the number of program lines


find source_dir/ -type f -name "*.cpp" -print0 |xargs -0 wc -l

4. sort

Field Description:

-n sorts numerically vs -d sorts lexicographically
-r sort in reverse order
-k N specifies sorting by column N

For example:

sort -nrk 1 data.txtsort -bd data // 忽略像空格之类的前导空白字符

5. uniq eliminates duplicate lines

Eliminate duplicate rows


sort unsort.txt | uniq

Count the number of times each line appears in a file


sort unsort.txt | uniq -c

Find duplicate rows


sort unsort.txt | uniq -d

You can specify the repeated content to be compared in each line: -s starting position -w number of characters to compare

6. Use tr for conversion

General Usage


echo 12345 | tr '0-9' '9876543210' //加解密转换，替换对应字符cat text | tr '\t' ' '  //制表符转空格

tr delete characters


cat file | tr -d '0-9' // 删除所有数字

-c Request complement


cat file | tr -c '0-9' //获取文件中所有数字cat file | tr -d -c '0-9 \n'  //删除非数字数据

tr compressed characters

tr -s compresses repeated characters in text; most commonly used to compress extra spaces


cat file | tr -s ' '

Character Classes

Various character classes are available in tr:
alnum: letters and numbers
alpha: letter
digit: number
space: blank character
lower: lowercase
upper: uppercase
cntrl: control (non-printable) characters
print: printable characters

Usage: tr [:class:] [:class:]


eg: tr '[:lower:]' '[:upper:]'

7. cut splits text by column

Extract the 2nd and 4th columns of the file:


cut -f2,4 filename

Remove all columns except the 3rd column from the file:


cut -f3 --complement filename

-d specifies the delimiter:


cat -f2 -d";" filename

cut range

N - Nth field to the end
-M The first field is M
NM N to M fields

The unit of cut

-b In bytes
-c In characters
-f In field units (use delimiter)

For example:


cut -c1-5 file //打印第一到5个字符cut -c-2 file  //打印前2个字符

8. paste concatenates text by column

Join two texts together by columns;


cat file112cat file2colinbook

paste file1 file21 colin2 book

The default delimiter is a tab, and you can specify a delimiter with -d.

paste file1 file2 -d “,”1,colin2,book

9. wc Tool for counting lines and characters

wc -l file // count lines
wc -w file // count words
wc -c file // count characters

10. sed text replacement tool

First replacement


sed 's/text/replace_text/' file   //替换每一行的第一处匹配的text

Global Replacement


sed 's/text/replace_text/g' file

After the default replacement, the replaced content is output. If you need to directly replace the original file, use -i:


sed -i 's/text/repalce_text/g' file

Remove blank lines:


sed '/^$/d' file

Variable conversion, the matched string is referenced by the & marker.

echo this is en example | sed 's/\w+/[&]/g'$>[this]  [is] [en] [example]

Substring matching flag
The first matching bracket content is referenced using the marker \1


sed 's/hello\([0-9]\)/\1/'

Double quote evaluation

sed is usually quoted with single quotes; double quotes can also be used, and when used, double quotes will evaluate the expression:


sed 's/$var/HLLOE/'

When using double quotes, we can specify variables in both the sed style and replacement strings;


eg:p=pattenr=replacedecho "line con a patten" | sed "s/$p/$r/g"$>line con a replaced

Other Examples

String insertion character: Convert each line of text (PEKSHA) to PEK/SHA


sed 's/^.\{3\}/&\//g' file

11. awk data stream processing tool

awk script structure
awk 'BEGIN{ statements } statements2 END{ statements } '

How it works

Execute the statement block in begin;
Read a line from the file or stdin, then execute statements2, and repeat this process until the entire file has been read;
Execute the end statement block;

print prints the current line

When print is used without parameters, the current line is printed;


echo -e "line1\nline2" | awk 'BEGIN{print "start"} {print } END{ print "End" }'

When print is separated by commas, the parameters are delimited by spaces;


echo | awk ' {var1 = "v1" ; var2 = "V2"; var3="v3"; \print var1, var2 , var3; }'$>v1 V2 v3

Use the - separator ("" as a separator);

echo | awk ' {var1 = "v1" ; var2 = "V2"; var3="v3"; \print var1"-"var2"-"var3; }'$>v1-V2-v3

Special variables: NR NF $0 $1 $2

NR: indicates the number of records, which corresponds to the current line number during execution;
NF: indicates the number of fields, which corresponds to the number of fields in the current line during execution;
$0: this variable contains the text content of the current line during execution;
$1: the text content of the first field;
$2: the text content of the second field;

echo -e “line1 f2 f3\n line2 \n line 3” | awk ‘{print NR”:”1”-“$2}’

Print the second and third fields of each line:


awk '{print $2, $3}' file

Count the number of lines in a file:


awk ' END {print NR}' file

Add up the first field of each line:


echo -e "1\n 2\n 3\n 4\n" | awk 'BEGIN{num = 0 ;print "begin";} {sum += $1;} END {print "=="; print sum }'

Passing external variables


var=1000echo | awk '{print vara}' vara=$var #  输入来自stdinawk '{print vara}' vara=$var file # 输入来自文件用样式对awk处理的行进行过滤

awk 'NR < 5' #line number is less than 5
awk 'NR==1,NR==4 {print}' file #print out the lines with numbers 1 and 4
awk '/linux/' #Lines containing linux text (can be specified using regular expressions, super powerful)
awk '!/linux/' #Lines that do not contain linux text

Set the delimiter

Use -F to set the delimiter (default is space)
awk -F: '{print $NF}' /etc/passwd

Read command output

Use getline to read the output of the external shell command into the variable cmdout;


echo | awk '{"grep root /etc/passwd" | getline cmdout; print cmdout }'

Using loops in awk

for(i=0;i<10;i++){print $i;}
for(i in array){print array[i];}

Eg:
Print lines in reverse order: (implementation of tac command)


seq 9| \awk '{lifo[NR] = $0; lno=NR} \END{ for(;lno>-1;lno--){print lifo[lno];}} '

awk implements head and tail commands

head:


awk 'NR<=10{print}' filename

tail:


awk '{buffer[NR%10] = $0;} END{for(i=0;i<11;i++){ \print buffer[i %10]} } ' filename

Print the specified columns

Implementation with awk:


ls -lrt | awk '{print $6}'

Implementation by cut method


ls -lrt | cut -f6

Print the specified text area

Determine the line number


seq 100| awk 'NR==4,NR==6{print}'

Confirm text

Print the text between start_pattern and end_pattern;


awk '/start_pattern/, /end_pattern/' filename

For example:


seq 100 | awk '/13/,/15/'cat /etc/passwd| awk '/mai.*mail/,/news.*news/'

Commonly used built-in functions of awk

index(string,search_string): Returns the position where search_string appears in string
sub(regex,replacement_str,string): replace the first occurrence of the regular expression with replacement_str;
match(regex, string): Check if the regular expression can match the string;
length(string): Returns the length of the string


echo | awk '{"grep root /etc/passwd" | getline cmdout; print length(cmdout) }'

printf is similar to printf in C language, formatting the output
, eg:


seq 10 | awk '{printf "->%4s\n", $1}'

12. Iterate over lines, words, and characters in a file

1. Iterate over each line in the file

While loop


while read line;doecho $line;done < file.txt改成子shell:cat file.txt | (while read line;do echo $line;done)

awk method:

cat file.txt | awk ‘{print}’

2. Iterate over each word in a line


for word in $line;do echo $word;done

3. Iterate over each character

${string:start_pos:num_of_chars}: extract a character from a string; (bash text slice)
${#word}: returns the length of the variable word


for((i=0;i<${#word};i++))doecho ${word:i:1);done

Autumn The recruitment has already begun. If you are not well prepared, Autumn It is difficult to find a good job.

Here is a big employment gift package for everyone. You can prepare for the spring recruitment and find a good job!

Latest articles about

■Master Docker image skills: create efficient and lightweight containers

■Improving website response speed and reliability: Nginx load balancing best practices

■The Linux version of WeChat has been officially announced - developed based on Qt, starts in seconds, and provides a smooth experience, beating QQ next door?

■HTTP will add a new method type!

■Break the bottleneck! Practical methods to comprehensively improve MySQL performance

■These are some ridiculously difficult programming languages. If you learn them, I lose.

■Explain in detail why the process needs to sleep?

■Arrange files neatly! A very detailed tutorial on how to sort file contents using the Sort command in Linux

■The customer's computer broke down, so he took out the hard drive and connected it to another computer. He never expected that he did not have access permissions to access the folder. What should he do?

■Comprehensively master the Top command: common techniques and methods for system performance monitoring