当前位置:网站首页>How to match garbled characters regularly?

How to match garbled characters regularly?

2022-08-10 00:33:00 ailx10

When I was protecting the net, I had to write a regex, and the matching text contained garbled characters. At that time, I was nervous and didn't write it out. I felt very embarrassed afterward, but now I'm relieved, taking advantage of the weekend break, briefly researched, in case the same problem will be encountered in the future, the test samples are as follows, including: Chinese, English, Korean, Japanese, Chinese punctuation, English punctuation, and garbled characters.

(Chinese)(¥)(abc+-*/)(한국인)(小さなJapan)(��)(Chinese)(¥)(+-*/)(한국인) (小さなJapan)

The regular expression [ -~]+ can match all printable ASCII characters, but cannot match Chinese, Chinese punctuation and garbled characters.

The regular expression [^ -~]+ can match all Chinese, Chinese punctuation and garbled characters, as well as Japanese and Korean.

The regular expression [\u4e00-\u9fa5]+ can match all Chinese characters.

Regular Expression [\u3002\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5\u00a5]+, can match all Chinese punctuation.

Regular Expression[\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF]+ , which matches all Chinese, Japanese, and Korean characters, but does not contain punctuation.

Therefore, the initial construction of a regular expression that matches garbled characters is as follows:

[^ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u30\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5\u00a5]+

Tested as follows and works well:

a3b3b798f6b9ef9dbebbd57d6b6d1217.jpeg

If you want to match all text after garbled characters, you can use the following regular expression:

[^ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u3u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]+.*

3637fb96591dcaf9fcd2f00c151f9911.jpeg

If you want to match the entire text containing garbled characters, you can use the following regular expression:

[ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u3011\u3\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]*[^ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]+.*

e399f34f83a846e56510d55d4db83a23.jpeg

If there are no garbled characters in the text, the test fails

07c88964860e9904445f807f52520fec.jpeg

Network security is a long way to go, let's wash it and sleep~

d7e7447f60349cd67c7cad5d1968d313.gif

原网站

版权声明
本文为[ailx10]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/222/202208092205335600.html