当前位置:网站首页>How to match garbled characters regularly?
How to match garbled characters regularly?
2022-08-10 00:33:00 【ailx10】
When I was protecting the net, I had to write a regex, and the matching text contained garbled characters. At that time, I was nervous and didn't write it out. I felt very embarrassed afterward, but now I'm relieved, taking advantage of the weekend break, briefly researched, in case the same problem will be encountered in the future, the test samples are as follows, including: Chinese, English, Korean, Japanese, Chinese punctuation, English punctuation, and garbled characters.
(Chinese)(¥)(abc+-*/)(한국인)(小さなJapan)(��)(Chinese)(¥)(+-*/)(한국인) (小さなJapan)The regular expression [ -~]+ can match all printable ASCII characters, but cannot match Chinese, Chinese punctuation and garbled characters.
The regular expression [^ -~]+ can match all Chinese, Chinese punctuation and garbled characters, as well as Japanese and Korean.
The regular expression [\u4e00-\u9fa5]+ can match all Chinese characters.
Regular Expression [\u3002\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5\u00a5]+, can match all Chinese punctuation.
Regular Expression[\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF]+ , which matches all Chinese, Japanese, and Korean characters, but does not contain punctuation.
Therefore, the initial construction of a regular expression that matches garbled characters is as follows:
[^ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u30\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5\u00a5]+
Tested as follows and works well:

If you want to match all text after garbled characters, you can use the following regular expression:
[^ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u3u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]+.*
If you want to match the entire text containing garbled characters, you can use the following regular expression:
[ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u3011\u3\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]*[^ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]+.*
If there are no garbled characters in the text, the test fails

Network security is a long way to go, let's wash it and sleep~

边栏推荐
- 金仓数据库 KingbaseGIS 使用手册(6.4. 几何对象存取函数)
- 2022-08-09 mysql/stonedb-慢SQL-Q16分析
- Install win7 virtual machine in Vmware and related simple knowledge
- The 2022-8-9 sixth group of input and output streams
- 深入理解多线程(第一篇)
- 后台管理实现导入导出
- VR全景拍摄如何拍摄?如何使用拍摄器材?
- “我“是一名测试/开发程序员,小孙的内心独白......
- PyQt5: Getting Started Tutorial
- matplotlib散点图颜色分组图例
猜你喜欢

Bi Sheng Compiler Optimization: Lazy Code Motion

【Burning】It's time to show your true strength!Understand the technical highlights of the 2022 Huawei Developer Competition in one article

高手这样看现货白银走势图

&& 不是此版本的有效语句分隔符

Sun Zhengyi lost 150 billion: it was expensive at the beginning

探索TiDB Lightning源码来解决发现的bug

集群的基础形式

完全背包理论

干货!迈向鲁棒的测试时间适应

新增一地公布2022下半年软考报考时间
随机推荐
【燃】是时候展现真正的实力了!一文看懂2022华为开发者大赛技术亮点
杂谈——程序员的悲哀
Comprehensive analysis of FPGA basics
HBuilder X 不能运行到内置终端
Install win7 virtual machine in Vmware and related simple knowledge
打包报错 AAPT: error: failed to read PNG signature: file does not start with PNG signature.
国内BI厂商一览
[JZOF] 82 binary tree with a path of a certain value (1)
【JZOF】82二叉树中和为某一值的路径(一)
学习编程的第十二天
三:OpenCV图片颜色通道数据转换
外包的水有多深?腾讯15k的外包测试岗能去吗?
全球不用交税的国家,为什么不交
深入理解多线程(第一篇)
完全背包理论
探索TiDB Lightning源码来解决发现的bug
32 JZOF 】 【 print down on binary tree
新增一地公布2022下半年软考报考时间
What kind of mentality do you need to have when using the stock quantitative trading interface
【JZOF】77按之字形打印二叉树