当前位置:网站首页>How to match garbled characters regularly?
How to match garbled characters regularly?
2022-08-10 00:33:00 【ailx10】
When I was protecting the net, I had to write a regex, and the matching text contained garbled characters. At that time, I was nervous and didn't write it out. I felt very embarrassed afterward, but now I'm relieved, taking advantage of the weekend break, briefly researched, in case the same problem will be encountered in the future, the test samples are as follows, including: Chinese, English, Korean, Japanese, Chinese punctuation, English punctuation, and garbled characters.
(Chinese)(¥)(abc+-*/)(한국인)(小さなJapan)(��)(Chinese)(¥)(+-*/)(한국인) (小さなJapan)
The regular expression [ -~]+
can match all printable ASCII characters, but cannot match Chinese, Chinese punctuation and garbled characters.
The regular expression [^ -~]+
can match all Chinese, Chinese punctuation and garbled characters, as well as Japanese and Korean.
The regular expression [\u4e00-\u9fa5]+
can match all Chinese characters.
Regular Expression [\u3002\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5\u00a5]+
, can match all Chinese punctuation.
Regular Expression[\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF]+
, which matches all Chinese, Japanese, and Korean characters, but does not contain punctuation.
Therefore, the initial construction of a regular expression that matches garbled characters is as follows:
[^ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u30\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5\u00a5]+
Tested as follows and works well:

If you want to match all text after garbled characters, you can use the following regular expression:
[^ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u3u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]+.*

If you want to match the entire text containing garbled characters, you can use the following regular expression:
[ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u3011\u3\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]*[^ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]+.*

If there are no garbled characters in the text, the test fails

Network security is a long way to go, let's wash it and sleep~

边栏推荐
猜你喜欢
2022-08-09 mysql/stonedb-subquery performance improvement-introduction
金仓数据库 KingbaseGIS 使用手册(6.3. 几何对象创建函数)
ElasticSearcch集群
matplotlib散点图自定义坐标轴(文字坐标轴)
Redis集群
守护进程
离散选择模型之Gumbel分布
Mysql集群 ShardingSphere
torch.distributed多卡/多GPU/分布式DPP(二)——torch.distributed.all_reduce(reduce_mean)&barrier&控制进程执行顺序&随机数种子
外包的水有多深?腾讯15k的外包测试岗能去吗?
随机推荐
少儿编程 电子学会图形化编程等级考试Scratch三级真题解析(判断题)2022年6月
2022/8/9 考试总结
matplotlib散点图颜色分组图例
【AtomicInteger】常规用法
34. Fabric2.2 证书目录里各文件作用
【Burning】It's time to show your true strength!Understand the technical highlights of the 2022 Huawei Developer Competition in one article
干涉BGP的选路---社团属性
torch.distributed多卡/多GPU/分布式DPP(二)——torch.distributed.all_reduce(reduce_mean)&barrier&控制进程执行顺序&随机数种子
杭电多校-Counting Stickmen-(思维+组合数+容斥)
继承关系下构造方法的访问特点
如何正则匹配乱码?
迁移学习 & 凯明初始化
【JZOF】32从上往下打印二叉树
量化交易接口系统有哪些稳定性?
Force Buckle: 474. Ones and zeros
你的手机曾经被监控过吗?
Interfering with BGP routing---community attributes
Leetcode 236. 二叉树的最近公共祖先
OSS文件上传
Controller层代码这么写,简洁又优雅!