当前位置:网站首页>How to match garbled characters regularly?
How to match garbled characters regularly?
2022-08-10 00:33:00 【ailx10】
When I was protecting the net, I had to write a regex, and the matching text contained garbled characters. At that time, I was nervous and didn't write it out. I felt very embarrassed afterward, but now I'm relieved, taking advantage of the weekend break, briefly researched, in case the same problem will be encountered in the future, the test samples are as follows, including: Chinese, English, Korean, Japanese, Chinese punctuation, English punctuation, and garbled characters.
(Chinese)(¥)(abc+-*/)(한국인)(小さなJapan)(��)(Chinese)(¥)(+-*/)(한국인) (小さなJapan)The regular expression [ -~]+ can match all printable ASCII characters, but cannot match Chinese, Chinese punctuation and garbled characters.
The regular expression [^ -~]+ can match all Chinese, Chinese punctuation and garbled characters, as well as Japanese and Korean.
The regular expression [\u4e00-\u9fa5]+ can match all Chinese characters.
Regular Expression [\u3002\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5\u00a5]+, can match all Chinese punctuation.
Regular Expression[\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF]+ , which matches all Chinese, Japanese, and Korean characters, but does not contain punctuation.
Therefore, the initial construction of a regular expression that matches garbled characters is as follows:
[^ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u30\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5\u00a5]+
Tested as follows and works well:

If you want to match all text after garbled characters, you can use the following regular expression:
[^ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u3u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]+.*
If you want to match the entire text containing garbled characters, you can use the following regular expression:
[ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u3011\u3\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]*[^ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]+.*
If there are no garbled characters in the text, the test fails

Network security is a long way to go, let's wash it and sleep~

边栏推荐
猜你喜欢
随机推荐
JS中表单操作、addEventListener事件监听器
Install win7 virtual machine in Vmware and related simple knowledge
Mysql/stonedb - slow SQL - 2022-08-09 Q16 analysis
直播预告 | ICML 2022 11位一作学者在线分享神经网络,图学习等前沿研究
【燃】是时候展现真正的实力了!一文看懂2022华为开发者大赛技术亮点
金仓数据库 KingbaseGIS 使用手册(6.4. 几何对象存取函数)
String类常用方法
68.qt quick-qml多级折叠下拉导航菜单 支持动态添加/卸载 支持qml/widget加载等
2022-08-09 mysql/stonedb-慢SQL-Q16分析
【对象——对象及原型链上的属性——对象的操作方法】
守护进程
你的手机曾经被监控过吗?
全面解析FPGA基础知识
[Cloud Native] This article explains how to add Tencent Crane to Kubevela addon
【JZOF】82二叉树中和为某一值的路径(一)
32 JZOF 】 【 print down on binary tree
【实用工具系列】MathCAD入门安装及快速上手使用教程
杂谈——程序员的悲哀
Filament-Material 绘制基本图形
迁移学习 & 凯明初始化









