当前位置：网站首页>Re regular expression

Re regular expression

2022-04-23 17:59:00 【name_ QGY】

re Regular expressions

A regular expression is a special sequence of characters , It can help users check whether a string matches a pattern , So as to quickly retrieve or replace a certain mode 、 The text of the rule . Equate to Word Find and replace functions in .

import re
text='178,168,123456,9537,123456'
print(re.findall('123456',text))
#Out:
['123456', '123456']

1 Recognize regular expressions

surface 1 Basic syntax rules for matching a certain type of characters

Regular characters	Sketch Statement
.	Matching elimination "\n" Any single character other than . The most extensive . To match includes ’\n’ Any character in , You can use ’[.\n]' Pattern
\d	Matches a numeric character , Equivalent to [0-9]
\D	Matches a non-numeric character , Equivalent to [^0-9]
\s	Match any white space character , Including Spaces 、 tabs 、 Form-feed character, etc , Equivalent to [\f\n\r\t\v]
\S	Match non white space characters , Equivalent to [^\f\n\r\t\v]
\w	Matches any word numeric character that includes an underscore , Equivalent to [A-Za-z0-9_]
\W	Match any non word 、 Numbers 、 Underscore character , Equivalent to [^A-Za-z0-9_]

surface 2 [ ] Specify a range or a specific number of characters

Regular characters	Sketch Statement
[Pp]ython	matching Python or python
rub[ye]	matching ruby or rube
[aeiou]	Match any letter in bracket
[0-9]	Match any number , Equivalent to [0123456789]
[a-z]	Match any lowercase letter
[A-Z]	Match any capital letter
[a-zA-Z0-9]	Match any letters and numbers
[^aeiou]	Matching elimination aeiou All characters except letters
[^0-9]	Match all characters except numbers

import re
text=' height ：180, weight ：130, Student number ：123456, password ：9537'
print(re.findall(r'\d',text))
print(re.findall(r'\S',text))
print(re.findall(r'\w',text))
print(re.findall(r'[1-5]',text))
print(re.findall(r'[ High weight ]',text))
#Out:
['1', '7', '8', '1', '6', '8', '1', '2', '3', '4', '5', '6', '9', '5', '3', '7']
[' body ', ' high ', '：', '1', '7', '8', ',', ' body ', ' heavy ', '：', '1', '6', '8', ',', ' learn ', ' Number ', '：', '1', '2', '3', '4', '5', '6', ',', ' The secret ', ' code ', '：', '9', '5', '3', '7']
[' body ', ' high ', '1', '7', '8', ' body ', ' heavy ', '1', '6', '8', ' learn ', ' Number ', '1', '2', '3', '4', '5', '6', ' The secret ', ' code ', '9', '5', '3', '7']
['1', '1', '1', '2', '3', '4', '5', '5', '3']
[' high ', ' heavy ']

surface 3 Number of character repetitions quantifiers

Regular characters	Sketch Statement
*	0 Or more
+	1 Or more
?	0 or 1 individual
{2}	2 individual
{2,5}	2-5 individual
{2,}	At least 2 individual
{,5}	at most 5 individual

text = "my telephone number is 15951817010,and my hometown's telephone \
number is 13863417300,my landline number is 0634-5608603."
print(re.findall(r'\d{4}-\d{7}(?# Find the landline number )', text))
#Out:
['0634-5608603']

surface 4 Portfolio model

Regular examples of combinations	Sketch Statement
\d{6}[a-z]{6}	The two sub modes are put together to form a large mode , matching 6 A number plus 6 Lowercase letters
\d{6}\|[a-z]{6}	Use a vertical bar to indicate that there is a match between the front and rear patterns , matching 6 A number or 6 Lowercase letters
(abc){3}	Use parentheses to indicate grouping , After grouping, quantifiers can be applied in groups , matching abcabcabc
\X	Matching first X A matching group
(?#…)	notes

###\X
text="aabbcc ddfgkk oaddww aaaaaa ababcc"
print(re.findall(r'(\w{2})(\1)',text))
print(re.findall(r'(\w{2})(\1)(\2)',text))
#Out:
[('aa', 'aa'), ('ab', 'ab')]
[('aa', 'aa', 'aa')]
###(?#...)
text = "my telephone number is 15951817010,and my hometown's telephone \
number is 13863417300,my landline number is 0634-5608603."
print(re.findall(r'\d{4}-\d{7}(?# Find the landline number )', text))
#Out:
['0634-5608603']

surface 5 Location

Regular characters	Sketch Statement
^	Match the beginning of the string
$	Match the end of the string
\A	Match string start
\Z	End of match string , If there is a newline , Just match the end string before the line break
\b	Match word boundaries , This is the position between the word and the space . for example ：er\b Can match never Medium er, But can't match verb Medium er
\B	Match non word boundaries . for example ：er\B Can match verb Medium er, But can't match never Medium er
(?=…)	The matching content is … Before
(?!..)	The matching content is not in … Before
(?<=…)	The matching content is … after
(?<!..)	The matching content is not in … after

###(?=...)
text="height:180,weight:63,student_num:2020802178,key:hello_world"
print(re.findall(r'\w+(?=:2020802178)',text))
#Out:
['student_num']
###(?<=...)
print(re.findall(r'(?<=key:)\w+',text))
#Out:
['hello_world']

The content of regular expressions is broad and profound , You can master the basic framework first , Know what things , Explore further if necessary , I think this is learning new knowledge 、 A more efficient way to learn new things . It's like learning a new subject , There is no need to recite the classic bibliography of the subject word by word , Just master the core content , Form a network in your mind , When you need to use a certain part of this net, study it carefully .

2 re modular

python Of re modular , Include 8 A way ：

re.search()： Find characters that match the pattern , Just return the first one , return Match object
re.match()： and search equally , But it must match from the beginning of the string , return Match object
re.findall()： Returns a list of all matching strings
re.finditer()： Returns an iterator , It contains all matches , That is to say Match object
re.sub()： Replace matching string , Returns the text after the replacement
re.subn()： Replace matching string , Returns the text that has been replaced and the number of times it has been replaced
re.split()： Use the string matching the expression as a separator to split the original string
re.compile()： Compile regular expressions into an object , Easy to use at the back

above 8 According to different functions, the two methods can be divided into 4 Column , They are to find , Replace , Split and compile .

2.1 lookup , Yes 4 A way ：search、match、findall、finditer

2.1.1 search - Only return 1 individual

import re
text = "abc,Abc,aBC,abc"
print(re.search(r'abc',text))
#Out:
<re.Match object; span=(0, 3), match='abc'>

search The method is to return a Match object , And only one value was returned ,span=(0,3) It means to match to the... Th 1-3 Characters , How to go from Match Get the matching value from the object , Need to use group Method .

import re
text = "abc,Abc,aBC,abc"
m = re.search(r'abc', text)
print(m.group())
#Out:
abc

group The return value of the method without parameters is the result of matching .

If you group matching values , You can go through group Method to pass in the parameters of the group , The output result is the result of the matching group .

text = "name:qgy,score:99;name:myt,score:98"
m = re.search(r'(name):(\w{3})',text)
print(m.group())
print(m.group(1))
print(m.group(2))
print(m.groups())
#Out:
name:qgy
name
qgy
('name', 'qgy')

Here I would like to add a knowledge point , The four methods found have three parameters , The previous part only uses the first two , So re.search() Methods as an example ,

re.search(pattern,string,flags=0)

The first parameter pattern It refers to the matching pattern , The second parameter string Is the string to match ,flags It's the sign bit , Used to control how regular expressions are matched , Such as ： Is it case sensitive 、 Multiline matching, etc .

text = "aBc,Abc,aBC,abc"
m = re.search(r'abc', text, flags=re.I)
print(m.group())
#Out:
aBc

In the upper regular , What I want to match is abc,text Does not exist in the abc, The result of the match is aBc.

in other words ,re.I Is case insensitive .

Want to find flags What are they? , You can hold down the ctrl Click... In your script re Position of appearance , It will pop up re.py file , Press and hold... Under the file ctrl+F, Pop up search box , Input flags You can see flags What are the parameters .

Insert picture description here

2.1.2 match - Also only return 1 individual , But match from scratch

text = "xaBc,Abc,aBC,abc"
m = re.match(r'abc', text, flags=re.I)
n = re.search(r'^abc',text,flags=re.I)
print(m)
print(n)
#Out:
None
None

In the top code , I will aBc Added before x, Use match Method , matching text The starting character , If the starting character does not conform to the regular expression , The match will fail , The result returned is None.

re.match(r’‘,text) Equivalent to re.search(r’^',text)

Tell the truth ,match No dice , Design this match It gives people the feeling of painting a snake and adding feet , use search That's enough

2.1.3 findall - Returns all matching strings

text = "name:qgy,score:99;name:myt,score:98"
m = re.findall(r'(name):(\w{3})',text)
print(m)
#Out:
[('name', 'qgy'), ('name', 'myt')]

findall Method to return a list , If there are groups when matching , Each value in the list will use （） Cover up , Use... Between different groups ‘,’ Division .

2.1.4 finditer - return Match iterator

text = "name:qgy,score:99;name:myt,score:98"
m = re.finditer(r'(name):(\w{3})',text)
print(m)
for i in m:
    print(i)
#Out:
<callable_iterator object at 0x00000224C3D5A460>
<re.Match object; span=(0, 8), match='name:qgy'>
<re.Match object; span=(18, 26), match='name:myt'>

finditer Is to return an iterator , I don't know what iterators are , I guess I don't have to , Just don't spend time and energy sorting out for the time being . This piece has been passed by the spring and autumn brushwork .

Find the function of 4 A method of ,search、match and finditer All return to one Match object , Can master the most basic findall and search, Just find it all findall, Find one to use search.

2.2 Replace , Yes 2 A way ：sub、subn

The two methods replaced are only different from the four methods found earlier , Multiple parameters , That is, you need to indicate what character you want to replace .

2.2.1 sub - It's English words substitute English abbreviations

text="abc,aBc,ABc,xyz,opq"
result=re.sub(r'abc','xyz',text,flags=re.I)
print(result)
#Out:
xyz,xyz,xyz,xyz,opq

2.2.2 subn - After the replacement, tell me that several characters have been replaced

text="abc,aBc,ABc,xyz,opq"
result=re.subn(r'abc','xyz',text,flags=re.I)
print(result)
#Out:
('xyz,xyz,xyz,xyz,opq', 3)

2.3 Division

split The method is to split a string into some small strings according to certain rules .

split What's the use ？ for instance , I have a sequence , I digested the sequence with restriction enzyme , Cut the original sequence into small fragments , Want to know the sequence information of small fragments after cutting , You can go through split Method to implement .

MobI = "GATC"
text = "ATCGATCGGTTTAAGATCCTTCG"
result = re.split(MobI, text, flags=re.I)
print(result)
#Out:
['ATC', 'GGTTTAA', 'CTTCG']

2.4 compile

compile The method is to take a regular expression as an object that can be conveyed , Pass the object to another method and you can use it , To do search、findall wait .

Why compile , The advantage of using compilation method is to improve efficiency , If a regular expression needs to be repeated thousands of times , It's not very troublesome to manually input each match , With the result of compilation , When it is reused, it directly calls .

re_telephone=re.compile(r'\d{4}-\d{3,8}')
text1="Xiao ming's telephone number is 0634-4854481"
text2="Xiao hone's telephone number is 0531-145488454"
text3="Xiao gang's telephone number is 0452-567188155"
print(re_telephone.search(text1).group())
print(re_telephone.search(text2).group())
print(re_telephone.search(text3).group())
#Out:
0634-4854481
0531-14548845
0452-56718815