Article count:948 Read by:3148873

Account Entry

Who wouldn’t be an operation and maintenance expert if you master these 12 Linux Shell text processing skills?

Latest update time:2024-08-16
    Reads:

From: Big CC
Link: http://www.cnblogs.com/me115/p/3427319.html

Linux Shell is a basic skill. Due to its weird syntax and poor readability, it is usually replaced by Python and other scripts. Since it is a basic skill, you need to master it. After all, in the process of learning Shell scripts, you can still learn a lot about the Linux system.
Not everyone can become a Linux script master, but it is still necessary to use some simple shells to implement some common basic functions.

Below I introduce the most commonly used tools for processing text using Shell under Linux:

find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, awk;

The examples and parameters provided are the most commonly used and practical;

The principle I use for shell scripts is to write commands in single lines, and try not to exceed 2 lines;

If you have more complex task requirements, consider Python;

1. Find file search

Find txt and pdf files

find . \( -name "*.txt" -o -name "*.pdf" \) -print
Regular expression search and .txt pdf
find . -regex  ".*\(\.txt|\.pdf\)$"
  • -iregex: Ignore case-sensitive regular expressions

Negate the parameter to find all non-txt text

find . ! -name "*.txt" -print

Specifying the search depth

Print out the files in the current directory (depth 1)
find . -maxdepth 1 -type f
  • Custom Search

Search by Type:

find . -type d -print  //只列出所有目录

-type f file/ l symbolic link

Search by time:

  • -atime access time (the unit is day, the unit is -amin, similar to the following)
  • -mtime modification time (content was modified)
  • -ctime change time (metadata or permission change)

All files accessed in the last 7 days:

find . -atime 7 -type f -print

Search by size:
w font k MG

Find files larger than 2k

find . -type f -size +2k

Search by permission:

find . -type f -perm 644 -print //找具有可执行权限的所有文件
Search by user:
find . -type f -user weber -print// 找用户weber所拥有的文件
Follow-up actions after finding
  • delete:

Delete all swp files in the current directory:

find . -type f -name "*.swp" -delete
Execute actions (powerful exec)
find . -type f -user root -exec chown weber {} \; //将当前目录下的所有权变更为weber

Note: {} is a special string. For each matching file, {} will be replaced with the corresponding file name.

Eg: copy all found files to another directory:

find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;
  • Combining multiple commands

Tips: If you need to execute multiple commands later, you can write them into a script. Then execute the script when calling -exec;
-exec ./commands.sh {} \;

-print delimiter

By default, is used as the file delimiter; '\n'

-print0 uses '\0' as the file delimiter, so that you can search for files containing spaces;

2. Grep text search

grep match_patten file // default access matching line

Common parameters

  • -o only outputs matching lines VS -v only outputs non-matching lines
  • -c counts the number of times a file contains text


grep -c "text" filename
  • -n Print matching line numbers
  • -i Ignore case when searching
  • -l Print only the file name

Recursive search for text in multiple directories (a favorite for programmers searching for code):

grep "class" . -R -n

Matching multiple patterns

grep -e "class" -e "vitural" file

Grep outputs file names ending with \0: (-z)

grep "test" file* -lZ| xargs -0 rm

3. xargs command line parameter conversion

xargs can convert input data into command line parameters of a specific command; in this way, it can be used in combination with many commands, such as grep and find;

Convert multi-line output to single-line output

cat file.txt| xargs
\n It is the delimiter between multiple lines of text.

Convert single line to multiple lines of output

cat single.txt | xargs -n 3

-n: Specifies the number of fields to display per line

xargs parameter description
  • -d Define the delimiter (the default is space and the delimiter for multiple lines is \n)
  • -n specifies multiple lines of output
  • -I {} specifies the replacement string, which will be replaced when xargs is expanded. It is used when the command to be executed requires multiple parameters.
For example:
cat file.txt | xargs -I {} ./command.sh -p {} -1
  • -0: specifies \0 as the input delimiter

Eg: Count the number of program lines

find source_dir/ -type f -name "*.cpp" -print0 |xargs -0 wc -l

4. sort

Field Description:

  • -n sorts numerically vs -d sorts lexicographically
  • -r sort in reverse order
  • -k N specifies sorting by column N

For example:

sort -nrk 1 data.txtsort -bd data // 忽略像空格之类的前导空白字符

5. uniq eliminates duplicate lines

Eliminate duplicate rows

sort unsort.txt | uniq
Count the number of times each line appears in a file
sort unsort.txt | uniq -c
Find duplicate rows
sort unsort.txt | uniq -d
You can specify the repeated content to be compared in each line: -s starting position -w number of characters to compare

6. Use tr for conversion

General Usage

echo 12345 | tr '0-9' '9876543210' //加解密转换,替换对应字符cat text | tr '\t' ' '  //制表符转空格

tr delete characters

cat file | tr -d '0-9' // 删除所有数字
-c Request complement
cat file | tr -c '0-9' //获取文件中所有数字cat file | tr -d -c '0-9 \n'  //删除非数字数据
tr compressed characters
tr -s compresses repeated characters in text; most commonly used to compress extra spaces
cat file | tr -s ' '

Character Classes

  • Various character classes are available in tr:
  • alnum: letters and numbers
  • alpha: letter
  • digit: number
  • space: blank character
  • lower: lowercase
  • upper: uppercase
  • cntrl: control (non-printable) characters
  • print: printable characters

Usage: tr [:class:] [:class:]

eg: tr '[:lower:]' '[:upper:]'

7. cut splits text by column

Extract the 2nd and 4th columns of the file:

cut -f2,4 filename
Remove all columns except the 3rd column from the file:
cut -f3 --complement filename

-d specifies the delimiter:

cat -f2 -d";" filename

cut range

  • N - Nth field to the end
  • -M The first field is M

  • NM N to M fields

The unit of cut

  • -b In bytes

  • -c In characters

  • -f In field units (use delimiter)

For example:

cut -c1-5 file //打印第一到5个字符cut -c-2 file  //打印前2个字符

8. paste concatenates text by column

Join two texts together by columns;

cat file112cat file2colinbook

paste file1 file21 colin2 book

The default delimiter is a tab, and you can specify a delimiter with -d.
paste file1 file2 -d “,”1,colin2,book

9. wc Tool for counting lines and characters

wc -l file // count lines
wc -w file // count words
wc -c file // count characters

10. sed text replacement tool

First replacement

sed 's/text/replace_text/' file   //替换每一行的第一处匹配的text

Global Replacement

sed 's/text/replace_text/g' file

After the default replacement, the replaced content is output. If you need to directly replace the original file, use -i:

sed -i 's/text/repalce_text/g' file

Remove blank lines:

sed '/^$/d' file

Variable conversion, the matched string is referenced by the & marker.

echo this is en example | sed 's/\w+/[&]/g'$>[this]  [is] [en] [example]
Substring matching flag
The first matching bracket content is referenced using the marker \1
sed 's/hello\([0-9]\)/\1/'

Double quote evaluation

sed is usually quoted with single quotes; double quotes can also be used, and when used, double quotes will evaluate the expression:
sed 's/$var/HLLOE/'

When using double quotes, we can specify variables in both the sed style and replacement strings;

eg:p=pattenr=replacedecho "line con a patten" | sed "s/$p/$r/g"$>line con a replaced
Other Examples
String insertion character: Convert each line of text (PEKSHA) to PEK/SHA
sed 's/^.\{3\}/&\//g' file

11. awk data stream processing tool

awk script structure
awk 'BEGIN{ statements } statements2 END{ statements } '

How it works

  1. Execute the statement block in begin;

  2. Read a line from the file or stdin, then execute statements2, and repeat this process until the entire file has been read;

  3. Execute the end statement block;

print prints the current line

When print is used without parameters, the current line is printed;

echo -e "line1\nline2" | awk 'BEGIN{print "start"} {print } END{ print "End" }'

When print is separated by commas, the parameters are delimited by spaces;

echo | awk ' {var1 = "v1" ; var2 = "V2"; var3="v3"; \print var1, var2 , var3; }'$>v1 V2 v3

Use the - separator ("" as a separator);

echo | awk ' {var1 = "v1" ; var2 = "V2"; var3="v3"; \print var1"-"var2"-"var3; }'$>v1-V2-v3
Special variables: NR NF $0 $1 $2

NR: indicates the number of records, which corresponds to the current line number during execution;
NF: indicates the number of fields, which corresponds to the number of fields in the current line during execution;
$0: this variable contains the text content of the current line during execution;
$1: the text content of the first field;
$2: the text content of the second field;

echo -e “line1 f2 f3\n line2 \n line 3” | awk ‘{print NR”:”1”-“$2}’

Print the second and third fields of each line:

awk '{print $2, $3}' file

Count the number of lines in a file:

awk ' END {print NR}' file

Add up the first field of each line:

echo -e "1\n 2\n 3\n 4\n" | awk 'BEGIN{num = 0 ;print "begin";} {sum += $1;} END {print "=="; print sum }'

Passing external variables

var=1000echo | awk '{print vara}' vara=$var #  输入来自stdinawk '{print vara}' vara=$var file # 输入来自文件用样式对awk处理的行进行过滤
  • awk 'NR < 5' #line number is less than 5

  • awk 'NR==1,NR==4 {print}' file #print out the lines with numbers 1 and 4

  • awk '/linux/' #Lines containing linux text (can be specified using regular expressions, super powerful)

  • awk '!/linux/' #Lines that do not contain linux text

Set the delimiter

Use -F to set the delimiter (default is space)
awk -F: '{print $NF}' /etc/passwd

Read command output

Use getline to read the output of the external shell command into the variable cmdout;

echo | awk '{"grep root /etc/passwd" | getline cmdout; print cmdout }'
Using loops in awk

for(i=0;i<10;i++){print $i;}
for(i in array){print array[i];}

Eg:
Print lines in reverse order: (implementation of tac command)

seq 9| \awk '{lifo[NR] = $0; lno=NR} \END{ for(;lno>-1;lno--){print lifo[lno];}} '

awk implements head and tail commands

head:

awk 'NR<=10{print}' filename

tail:

awk '{buffer[NR%10] = $0;} END{for(i=0;i<11;i++){ \print buffer[i %10]} } ' filename

Print the specified columns

Implementation with awk:

ls -lrt | awk '{print $6}'

Implementation by cut method

ls -lrt | cut -f6

Print the specified text area

Determine the line number

seq 100| awk 'NR==4,NR==6{print}'

Confirm text

Print the text between start_pattern and end_pattern;

awk '/start_pattern/, /end_pattern/' filename

For example:

seq 100 | awk '/13/,/15/'cat /etc/passwd| awk '/mai.*mail/,/news.*news/'
Commonly used built-in functions of awk
  • index(string,search_string): Returns the position where search_string appears in string
  • sub(regex,replacement_str,string): replace the first occurrence of the regular expression with replacement_str;
  • match(regex, string): Check if the regular expression can match the string;
  • length(string): Returns the length of the string

echo | awk '{"grep root /etc/passwd" | getline cmdout; print length(cmdout) }'
printf is similar to printf in C language, formatting the output
, eg:
seq 10 | awk '{printf "->%4s\n", $1}'

12. Iterate over lines, words, and characters in a file

1. Iterate over each line in the file

While loop

while read line;doecho $line;done < file.txt改成子shell:cat file.txt | (while read line;do echo $line;done)
awk method:
cat file.txt | awk ‘{print}’
2. Iterate over each word in a line
for word in $line;do echo $word;done

3. Iterate over each character

${string:start_pos:num_of_chars}: extract a character from a string; (bash text slice)
${#word}: returns the length of the variable word

for((i=0;i<${#word};i++))doecho ${word:i:1);done


Autumn The recruitment has already begun. If you are not well prepared, Autumn It is difficult to find a good job.


Here is a big employment gift package for everyone. You can prepare for the spring recruitment and find a good job!



Latest articles about

 
EEWorld WeChat Subscription

 
EEWorld WeChat Service Number

 
AutoDevelopers

About Us Customer Service Contact Information Datasheet Sitemap LatestNews

Room 1530, Zhongguancun MOOC Times Building,Block B, 18 Zhongguancun Street, Haidian District,Beijing, China Tel:(010)82350740 Postcode:100190

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号