当前位置:网站首页>regular expression

regular expression

2022-04-23 15:15:00 You are my bug forever

1、 feel The charm of regular expressions

1.1、 Regular expression steps

  1. So let's create one Pattern object , Pattern object , Is a regular expression object
 Pattern pattern = Pattern.compile("[a-zA-z]+");
  1. Create a matcher object : Namely matcher The matcher follows pattern To content Take the matching that meets the requirements in the text
Matcher matcher = pattern.matcher(content);
  1. Start the loop matching
 while (matcher.find()){
    
            //  Match content . Put it in matcher.group(0)
            System.out.println(" find :" + matcher.group(0));
        }

1.2、 Example

/** *  Experience the power of regular expressions , Ask us what convenience it brings to our processing  * *  Regular expressions deal specifically with text classes  */
public class A_regexp {
    

    public static void main(String[] args) {
    
        //  as follows   Text 
        String content = " Order software is generally used to display numbers and other items for quick reference and analysis ." +
                " in real life : Order software has form application software and form control ," +
                " A typical image Office word,excel The form is 1 Most commonly used order data 8 One way to manage ," +
                " The main 6 For input 、 Output 、 Show 、 Process and print data , It can make all kinds of complex forms and documents ," +
                " Can even help with 5 Users carry out complex 6 Statistical operations and charts 32 It's a show 1 etc. . Table controls also 4 It can be often used in database 1 Presentation and editing of data " +
                " Data entry interface design 、 Data exchange ( Such as and Excel Exchange data )、 Data report and distribution, etc . such as Spread ComponentOne Of FlexGrid." +
                "ip:172.16.20.20;192.16.255.185";

        //  Extract all English words in the article 
        // 1、 traditional method , Traverse   A lot of code , The efficiency is not high 
        // 2、 Use regular expression techniques to match words 

        // 1、 So let's create one  Pattern  object , Pattern object , Is a regular expression object 
// Pattern pattern = Pattern.compile("[a-zA-z]+"); //  Match English words 
// Pattern pattern = Pattern.compile("[0-9]+"); // Match the Numbers 
// Pattern pattern = Pattern.compile("([0-9]+)|([a-zA-z]+)"); // Match the Numbers  +  English words 
        Pattern pattern = Pattern.compile("\\d+\\.\\d+\\.\\d+\\.\\d+"); //ip Address 

        // 2、 Create a matcher object : Namely matcher  The matcher follows  pattern  To content Take the matching that meets the requirements in the text 
        Matcher matcher = pattern.matcher(content);
        // 3、 Start the loop matching 
        while (matcher.find()){
    
            //  Match content . Put it in matcher.group(0)
            System.out.println(" find :" + matcher.group(0));
        }


    }
}

Sample results : Match English words
 Insert picture description here

2、 The underlying principle of regular expressions

What is regular expression grouping
(\d\d)(\d\d)
Regular expression grouping : first () It's the first group The second is the second group

2.1、 Regular expressions are not grouped

/** *  The principle of regular expressions : Underlying implementation  * demo: Get... In text   Continuous 4 A digital :1998 */
public class B_RegTheory {
    
    public static void main(String[] args) {
    
        String content = "1998 year 12 month 8 Japan , Second generation Java Enterprise version of the platform J2EE Release .1999 year 6 month ," +
                "Sun The company released a second generation Java platform ( Referred to as Java2) Of 3 A version :J2ME(Java2 Micro Edition,Java2 A miniature version of the platform )," +
                " Apply to move 、 Wireless and limited resource environment ;J2SE(Java 2 Standard Edition,Java 2 The standard version of the platform )," +
                " For desktop environments ;J2EE(Java 2Enterprise Edition,Java 2 Enterprise version of the platform ), Apply to based on Java Application server .Java 2 Platform release ," +
                " yes Java The most important milestone in the development process , Mark the Java Application start 1991 Universal .\n" +
                "1999 year 4 month 27 Japan ,HotSpot Virtual machine Publishing .HotSpot The virtual machine sends 2992 Cloth is used as JDK 1.2 Additional procedures for 3993 Provided ," +
                " Then it became JDK 1.3 And 12345678 All versions after Sun JDK The default virtual machine for ";


        /** *  Get... In text   Continuous 4 A digital : Such as 1998 * *  explain : * 1、\\d Represents an arbitrary number  * */
        String regStr = "\\d\\d\\d\\d";
        //  So let's create one  Pattern  object , Pattern object , Is a regular expression object 
        Pattern pattern = Pattern.compile(regStr);

        // 2、 Create a matcher object : Namely matcher  The matcher follows  pattern  To content Take the matching that meets the requirements in the text 
        Matcher matcher = pattern.matcher(content);

        // 3、 Start the loop matching 
        while (matcher.find()){
    
            //  Match content . Put it in matcher.group(0)
            System.out.println(" find :" + matcher.group(0));
        }
    }
}

2.1.1 matcher.find() effect

1、 Locate the substring that satisfies the rule according to the specified rule ( such as 1998)
2、 After finding , Record the start index of the substring to matcher In the properties of the object int[] groups;
such as 1998 in 1 The index of is 0 groups[0] = 0
End character 8 The index of +1 The value of is recorded in groups[1] = 4
3、 Simultaneous recording oldLast The value of is Match end character The index of +1 Next time find Start here

Take a break and see :groups The default value is -1
 Insert picture description here
Break point next groups[0] = 0 groups[1] = 4
 Insert picture description here
meanwhile oldLast = 4 The next time from 4 Start matching
 Insert picture description here

2.1.2、matcher.group(0) effect

Source code :

  public String group(int group) {
    
        if (first < 0)
            throw new IllegalStateException("No match found");
        if (group < 0 || group > groupCount())
            throw new IndexOutOfBoundsException("No group " + group);
        if ((groups[group*2] == -1) || (groups[group*2+1] == -1))
            return null;
        return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString();
    }

according to group[0] = 0 and group[1] = 4 The location of the record , from content Start to intercept the substring and return
Namely from String In text subString(0,4) Print out Rule compliant text

2.2、 Regular expression grouping

String regStr = “(\d\d)(\d\d)”;

/** *  The principle of regular expressions : Underlying implementation  * demo: Get... In text   Continuous 4 A digital :1998  Regular expression grouping  */
public class C_RegTheory {
    
    public static void main(String[] args) {
    
        String content = "1998 year 12 month 8 Japan , Second generation Java Enterprise version of the platform J2EE Release .1999 year 6 month ," +
                "Sun The company released a second generation Java platform ( Referred to as Java2) Of 3 A version :J2ME(Java2 Micro Edition,Java2 A miniature version of the platform )," +
                " Apply to move 、 Wireless and limited resource environment ;J2SE(Java 2 Standard Edition,Java 2 The standard version of the platform )," +
                " For desktop environments ;J2EE(Java 2Enterprise Edition,Java 2 Enterprise version of the platform ), Apply to based on Java Application server .Java 2 Platform release ," +
                " yes Java The most important milestone in the development process , Mark the Java Application start 1991 Universal .\n" +
                "1999 year 4 month 27 Japan ,HotSpot Virtual machine Publishing .HotSpot The virtual machine sends 2992 Cloth is used as JDK 1.2 Additional procedures for 3993 Provided ," +
                " Then it became JDK 1.3 And 12345678 All versions after Sun JDK The default virtual machine for ";


        /** *  Get... In text   Continuous 4 A digital : Such as 1998 * *  explain : * 1、\\d Represents an arbitrary number  * *  Regular expressions   There is one () It's a group.   Two are two groups  * */
        String regStr = "(\\d\\d)(\\d\\d)";
        //  So let's create one  Pattern  object , Pattern object , Is a regular expression object 
        Pattern pattern = Pattern.compile(regStr);

        // 2、 Create a matcher object : Namely matcher  The matcher follows  pattern  To content Take the matching that meets the requirements in the text 
        Matcher matcher = pattern.matcher(content);


        // 3、 Start the loop matching 
        while (matcher.find()){
    
            //  Match content . Put it in matcher.group(0)
            System.out.println(" find :" + matcher.group(0));
            System.out.println(" Find the first group :" + matcher.group(1));
            System.out.println(" Find the second group :" + matcher.group(2));
        }
    }
}

The biggest difference between groups is matcher.find():
1、 Locate the substring that satisfies the rule according to the specified rule ( such as 1998)
2、 After finding , Record the start index of the substring to matcher In the properties of the object int[] groups;
such as 1998 in
2.1、 1 The index of is 0 groups[0] = 0 End character 8 The index of +1 The value of is recorded in groups[1] = 4
2.2、 Record the first group Matching string 19 groups[2] = 0, End character 9 The index of +1 :groups[3] = 2
2.3、 Record the second group Matching string 98 groups[4] = 2, End character 8 The index of +1 :groups[5] = 4
And so on
3、 Simultaneous recording oldLast The value of is Match end character The index of +1 Next time find Start here

Take a look at
 Insert picture description here
At this time You can print Regular expressions Each group is matched to character string

  // 3、 Start the loop matching 
        while (matcher.find()){
    
            //  Match content . Put it in matcher.group(0)
            System.out.println(" find :" + matcher.group(0));
            System.out.println(" Find the first group :" + matcher.group(1));
            System.out.println(" Find the second group :" + matcher.group(2));
        }

 Insert picture description here

3、 Metacharacters

3.1、 Escape number

Transfer number :\
Tips :java Two of them \\ amount to In other languages One \
explain : When retrieving some special characters with regular expressions , You need an escape symbol \\ Otherwise, the result cannot be retrieved Even reported a mistake

Case study : use $ To match “abc$(”

public class D_Regexp {
    

    public static void main(String[] args) {
    
        String str = "abc$(abc(123(";
        String regStr = "\\$";
        Pattern compile = Pattern.compile(regStr);
        Matcher matcher = compile.matcher(str);
        while (matcher.find()){
    
            System.out.println(" find :" + matcher.group(0));
        }
    }

}

Only add \\ To retrieve Special characters
![(https://img-blog.csdnimg.cn/23a665fbd6eb4de297c216f416e5f127.png)

3.2、 Character matchers

Character matchers explain Example example In conformity with the requirements demo
[] List of matching characters [efgh] matching e、f、g、h Any one of e
[^] List of unmatched characters [^edgh] Match except [] Medium edgh Any other a
- Connector [a-z] matching a-z Any lowercase letter in a
. Matching elimination \n Any character other than a…b With a start With b ending Include two arbitrary characters in the middle, with a length of 4 String aaab
\\d Match a single numeric character == [0-9] \\d{3}(\\d)? Matching continuity 3 individual Or a string of four numbers abc123
\\D Match a single non numeric character amount to [^0-9] \\D(\\d)* Start with a single non number , Followed by any number string a12354
\\w Match a single number 、 Upper and lower case alphabetic characters and underscores == [0-9a-zA-z_] \\d{3}\\w{4} Three numbers followed by 4 A digital / Uppercase and lowercase characters /_ 123as1_
\\W Match a single non Numbers 、 Upper and lower case alphabetic characters and underscores == [^0-9a-zA-z_] \\W+\\d{2} At least one Not Numbers 、 Uppercase and lowercase characters Followed by 2 A string ending in a number #$23
\\s Matches any whitespace characters ( Space 、 Box drawings, etc ) \\s c v
\\S Matches any non-whitespace characters ( Space 、 Box drawings, etc )

understand :

1、\\d{3} == \\d\\d\\d
2、(\\d)? == \\d There may or may not be
3、(\\d)* == There are any \\d
4、\\W+ == At least one \\W
5、[^a-z]{2} == Two in a row are not a-z The characters of

Case insensitive

Case insensitive
1、(?i)q (?i) The following ones are case insensitive
(?i)acb abc It's not size sensitive
a(?i)bc bc It's not size sensitive
a((?i)b)c b It's not size sensitive
2、Pattern compile = Pattern.compile(regStr,Pattern.CASE_INSENSITIVE);

3.3 Choose the match

|
: matching “ | ” Expressions before or after
Such as :ab|cd :ab perhaps cd

 Insert picture description here

3.4、 qualifiers

Specifies how many consecutive occurrences of characters and combinations that precede them

qualifiers explain Example example In conformity with the requirements demo
* Appoint The character appears 0 Time or n Time (abc)* Only any abc String abcabcabc
+ The specified character appears at least 1 Time m+ At least one m Starting string mmss
? The specified character is repeated 0 Time or 1 Time cm? c Back Yes 0 individual m perhaps 1 individual m cmab c
{n} Specifies the number of characters that appear n frequency [a-z]{3} a-z There are a total of characters between 3 Time abc
{n,} The specified character appears at least n frequency [a-z]{3,} a-z Characters between at least 3 Time asfv
{n,m} The specified character appears at least N Time most m Time [a-z]{3,4} a-z Characters between appear at least 3 Time most 4 Time abc
? Any qualifier follows ?, The matching pattern is changed to Non greedy model o+? It was o At least once , add ? It can only be once aa0

understand :

java Try to match as many Default Greedy matching
What is greedy matching
Such as :b{3,4} Text is "bbbbb" here b Yes 5 individual Regular expressions will Match to 4 individual bbbb, As many matches as possible
Non greedy matching :o+? o+ It was at least once , Now add ? It's a non greedy match , Only matches appear 1 individual o

1、ab{3,4} a Followed by 3 A or 4 individual b
2、[ab]{3,4} a perhaps b Three or four consecutive
3、(ab){3,4} ab Three or four consecutive
4、cm? c Back Yes 0 individual m perhaps 1 individual m
.cm? Limited to ? The first character

3.5、 Fixed character

Specify where characters appear , For example, where to start and end

qualifiers explain Example example In conformity with the requirements demo
^ Specify the starting character ^a With a start abc
$ Specify the end character [a-z]+$ At least one lowercase letter ends 1-ss
\\b Match the boundary of the target string han\\b The string is divided into substrings with spaces , Substring with han At the end of the aaa nnhan mmm
\\B Matches the boundary of the non target string han\\B The string is divided into substrings with spaces , Substrings are not represented by han At the end of the aaa hannn mmm

3.6、 grouping

3.6.1、 Capture grouping

1、 Unnamed capture (pattern)
Capture matching substrings . The number is 0 Of (group(0)) What is captured is the whole regular expression matching ;
Other capture results Is grouped according to () From left to right 1 Numbered starting (group(1)group(2)).

Example :

        String str = "asdvxs s7788 nn7785bb";
        /** *  Unnamed   grouping :() It's a group.  * group(0))  What is captured is the whole regular expression matching  * group(1)  The catch is   The first group  * group(2)  The catch is   The second group  * */
        String reg = "(\\d\\d)(\\d\\d)";


        Pattern compile = Pattern.compile(reg);
        Matcher matcher = compile.matcher(str);
        while (matcher.find()){
    

            System.out.println(" find :" + matcher.group(0));
            /** *  Unnamed   grouping  */
            System.out.println(" find 1:" + matcher.group(1));
            System.out.println(" find 2:" + matcher.group(2));
        }

 Insert picture description here

2、 Name groups (?<name>pattern)
Name capture , Capture the matching substring into a group name or number name , You can use numbers or Group name Get the matching content
name The string of cannot contain any punctuation , And can't start with a number , You can use single quotation marks instead of angle brackets , for example :(?‘name’) Example :

 		String str = "asdvxs s7788 nn7785bb";
 		/** * *  Name groups  * */
        String reg = "(?<d1>\\d\\d)(?<d2>\\d\\d)";
        Pattern compile = Pattern.compile(reg);
        Matcher matcher = compile.matcher(str);
        while (matcher.find()){
    

            System.out.println(" find :" + matcher.group(0));

            /** *  Name groups  */
            System.out.println(" find 1:" + matcher.group("d1"));
            System.out.println(" find 2:" + matcher.group("d2"));
        }

Is to name the group , Take the matching content with the name
 Insert picture description here

3.6.2、 Non capture grouping

Not Capture groups , Can't be based on group() Get the matching content .

1、(?:pattern)
matching pattern But don't capture the Matching subexpressions , That is, he is
effect : Li (?: Dan | double ) == Li Dan | Li Shuang
Example :

        /** * (?:pattern) */
        String str = " Li Dan   Li Shuang ";
// String regStr = " Li Dan | Li Shuang ";
        String regStr = " Li (?: Dan | double )";
        Pattern compile = Pattern.compile(regStr);
        Matcher matcher = compile.matcher(str);
        while (matcher.find()){
    
            System.out.println(" find :" + matcher.group(0));

        }

 Insert picture description here
2、(?=pattern)
Non capture match effect :iPhone(?=5|6|8|13) matching iPhone After is 5 perhaps 6 perhaps 8 perhaps 13 Of iPhone character string But it doesn't match iPhone After is 7 Of iPhone character string

        /** * (?=pattern) */
        String str = "iPhone5,iPhone6,iPhone8,iPhone13,iPhone20";
// String regStr = " Li Dan | Li Shuang ";
        String regStr = "iPhone(?=5|6|8|13)";
        Pattern compile = Pattern.compile(regStr);
        Matcher matcher = compile.matcher(str);
        while (matcher.find()){
    
            System.out.println(" find :" + matcher.group(0));

        }

 Insert picture description here
3、(?=pattern)
Non capture match effect :iPhone(?!5|6|8|13) Except for a mismatch iPhone After is 5 perhaps 6 perhaps 8 perhaps 13 Of iPhone character string , Everything else matches


        /** * (?=pattern) */
        String str = "iPhone5,iPhone6,iPhone8,iPhone13,iPhone20";
// String regStr = " Li Dan | Li Shuang ";
        String regStr = "iPhone(?!15|6|8|13)";
        Pattern compile = Pattern.compile(regStr);
        Matcher matcher = compile.matcher(str);
        while (matcher.find()){
    
            System.out.println(" find :" + matcher.group(0));

        }

 Insert picture description here

4、 backreferences

Back reference requires Group and capture

1、 grouping : Use one () Form a group
2、 Capture obtain A substring for grouping matching May by group(1)group(2)… obtain
3、 backreferences
After the content of the group is captured , Can be used after this bracket . Then write a more practical matching pattern , We call this backreferences
This reference can be used both in Inside regular expressions use \\ Group number It can also be outside the regular expression , use $ Group number

4.1 Match two consecutive identical numbers

        /** *  Match two consecutive identical numbers   Internal references  * */
        String str = "12312216543663789";
        String regStr = "(\\d)\\1";
        Pattern compile = Pattern.compile(regStr);
        Matcher matcher = compile.matcher(str);
        while (matcher.find()){
    
            System.out.println(" find :" + matcher.group(0));

        }

 Insert picture description here

4.2 Match five consecutive identical numbers

        /** *  Match five consecutive identical numbers   Internal references  * */
        String str = "12312216543666663789";
        String regStr = "(\\d)\\1{4}";
        Pattern compile = Pattern.compile(regStr);
        Matcher matcher = compile.matcher(str);
        while (matcher.find()){
    
            System.out.println(" find :" + matcher.group(0));

        }

 Insert picture description here

4.3 Find the format :1221 3663 Numbers

        /** *  requirement   Find the four numbers connected together   also   The first and fourth places are the same   The second and third places are the same   Such as  1221 3663 */
        String str = "12312216543663789";
        String regStr = "(\\d)(\\d)\\2\\1";
        Pattern compile = Pattern.compile(regStr);
        Matcher matcher = compile.matcher(str);
        while (matcher.find()){
    
            System.out.println(" find :" + matcher.group(0));

        }

 Insert picture description here

4.4、 find similar :12321-444999666

        /** *  find   similar :12321-444999666 */
        String str = "12312212321-44499966663789";
        String regStr = "(\\d)(\\d)(\\d)\\2\\1-(\\d)\\4{2}(\\d)\\5{2}(\\d)\\6{2}";
        Pattern compile = Pattern.compile(regStr);
        Matcher matcher = compile.matcher(str);
        while (matcher.find()){
    
            System.out.println(" find :" + matcher.group(0));

        }

 Insert picture description here

5、 stammer duplicate removal

Such as : I … I want to go … Learn to learn …java == I'm going to learn java

        String str = " I ... I want to go ... Learn to learn ...java";
        // 1、 Take out a little bit .
        Pattern compile = Pattern.compile("\\.");
        Matcher matcher = compile.matcher(str);
        str = matcher.replaceAll("");
        System.out.println(str);

        /** * 2、 Remove the duplicate words  *  Ideas : 1、 use   backreferences   Get the duplicate words  (.)\\1+ * 2、 Use   External reverse reference $1  Replace matching content  */
        compile = Pattern.compile("(.)\\1+"); //  The captured content of the group is recorded to  $1
        matcher = compile.matcher(str);
        while (matcher.find()){
    
            System.out.println(" find :" + matcher.group(0));

        }

        //  Use   External reverse reference $1  Replace matching content   use   I   Replace me, i 
        str =matcher.replaceAll("$1");
        System.out.println(str);

6、String class Using regular expressions

String.matches() It's a whole match

    public static void main(String[] args) {
    

        String content = "1998 year 12 month 8 Japan , Second generation Java Enterprise version of the platform J2EE Release .JDK1.3  and  JDK1.4  and  J2SE1.3 One after another ";

        //  Using regular expressions   take JDK1.3 JDK1.4  Replace with  JDK
        content = content.replaceAll("JDK1.3|JDK1.4", "JDK");
        System.out.println(content);

        //  requirement :  Verify a mobile number , The requirement must be  138 139  start 
        content = "13816966686";
        boolean matches = content.matches("^(138|139)\\d{8}");
        System.out.println(matches);

        //  The requirements are as follows  #  perhaps  -  perhaps  ~  perhaps   Numbers   To segment 
        content = "nihao#mawoshi-nidie~shima18suila";
        String[] split = content.split("[\\#\\-\\~\\d]");
        for (Object d:split) {
    
            System.out.println(d);
        }


    }

end Test

1、 Verify that the mailbox is legal

        /** * 1、 Verify that the mailbox is legal  *  The rules : * 1、 There can only be one @ * 2、@ The previous is the user name   It can be a-zA-Z0-9 and -_ character  * 3、@ Followed by the domain name , And the domain name can only be English letters , Such as :sohu.com */

        String str = "[email protected]";
        String regStr = "^[a-zA-Z0-9\\-\\_]+@([a-zA-Z]+\\.)+[a-zA-Z]+$";
        if (str.matches(regStr)){
    
            System.out.println("1ok");
        }

2、 Verify whether it is an integer perhaps decimal

        /** * 2、 Verify whether it is an integer   perhaps   decimal  *  Consider positive numbers   and   negative  *  such as :123 -123 34.89 -87.3 -0.01 0.45 * 0089  unreasonable  */

        String str2 = "89.139909";
        String regStr2 = "^[\\+\\-]?([1-9]\\d*|0)(\\.)?\\d+$";
        if (str2.matches(regStr2)){
    
            System.out.println("2ok");
        }

3、 One url To analyze

        /** *  Yes   One url  To analyze  http://www.baidu.com:8080/abc/index.htm * *  obtain   Its  : agreement : http/https *  domain name : www.baidu.com *  port : 8080 *  file name :index.htm * */

        String str3 = "http://www.baidu.com:8080/abc/index.htm";
        String regStr3 = "^([a-z]+)://([a-zA-Z.]+):(\\d+)[\\w-/]*/([\\w.]+)$";
        Pattern pattern = Pattern.compile(regStr3); //  Match English words 
        Matcher matcher = pattern.matcher(str3);
        // 3、 Start the loop matching 
        while (matcher.find()){
    
            //  Match content . Put it in matcher.group(0)
            System.out.println(" find :" + matcher.group(0));
            System.out.println(" Find the agreement :" + matcher.group(1));
            System.out.println(" Find the domain name :" + matcher.group(2));
            System.out.println(" Port found :" + matcher.group(3));
            System.out.println(" Find the filename :" + matcher.group(4));
        }

版权声明
本文为[You are my bug forever]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231406271956.html