当前位置:网站首页>How to match garbled characters regularly?
How to match garbled characters regularly?
2022-08-10 00:33:00 【ailx10】
When I was protecting the net, I had to write a regex, and the matching text contained garbled characters. At that time, I was nervous and didn't write it out. I felt very embarrassed afterward, but now I'm relieved, taking advantage of the weekend break, briefly researched, in case the same problem will be encountered in the future, the test samples are as follows, including: Chinese, English, Korean, Japanese, Chinese punctuation, English punctuation, and garbled characters.
(Chinese)(¥)(abc+-*/)(한국인)(小さなJapan)(��)(Chinese)(¥)(+-*/)(한국인) (小さなJapan)
The regular expression [ -~]+
can match all printable ASCII characters, but cannot match Chinese, Chinese punctuation and garbled characters.
The regular expression [^ -~]+
can match all Chinese, Chinese punctuation and garbled characters, as well as Japanese and Korean.
The regular expression [\u4e00-\u9fa5]+
can match all Chinese characters.
Regular Expression [\u3002\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5\u00a5]+
, can match all Chinese punctuation.
Regular Expression[\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF]+
, which matches all Chinese, Japanese, and Korean characters, but does not contain punctuation.
Therefore, the initial construction of a regular expression that matches garbled characters is as follows:
[^ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u30\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5\u00a5]+
Tested as follows and works well:
If you want to match all text after garbled characters, you can use the following regular expression:
[^ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u3u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]+.*
If you want to match the entire text containing garbled characters, you can use the following regular expression:
[ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u3011\u3\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]*[^ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]+.*
If there are no garbled characters in the text, the test fails
Network security is a long way to go, let's wash it and sleep~
边栏推荐
- mysql中的key是怎么用的,或者这个值有什么意义,如下图?
- 2020年度SaaS TOP100企业名单
- 2022/8/9 考试总结
- String类常用方法
- iNFTnews | 迪士尼如何布局Web3
- [Cloud Native] This article explains how to add Tencent Crane to Kubevela addon
- 函数习题(下)
- What are the basic steps to develop a quantitative trading strategy?
- 集合运算样例
- Miscellaneous talk - the sorrow of programmers
猜你喜欢
随机推荐
高数_复习_第4章:向量代数和空间解析几何
【JZOF】82二叉树中和为某一值的路径(一)
Force Buckle: 474. Ones and zeros
2022年最新《谷粒学院开发教程》:10 - 前台支付模块
直播预告 | ICML 2022 11位一作学者在线分享神经网络,图学习等前沿研究
生成NC文件时,报错“未定义机床”
torch.distributed多卡/多GPU/分布式DPP(二)——torch.distributed.all_reduce(reduce_mean)&barrier&控制进程执行顺序&随机数种子
Sun Zhengyi lost 150 billion: it was expensive at the beginning
Chapter 15 HMM模型
complete knapsack theory
ElasticSearcch集群
集合运算样例
CV复习:softmax代码实现
Janus官方DEMO介绍
[WeChat applet development (8)] Summary of audio background music playback problems
【TS技术课堂】时间序列预测
浅析量股票化交易的发展现状
“我“是一名测试/开发程序员,小孙的内心独白......
联盟链技术应用的难点
【面试高频题】可逐步优化的链表高频题