当前位置:网站首页>How to match garbled characters regularly?
How to match garbled characters regularly?
2022-08-10 00:33:00 【ailx10】
When I was protecting the net, I had to write a regex, and the matching text contained garbled characters. At that time, I was nervous and didn't write it out. I felt very embarrassed afterward, but now I'm relieved, taking advantage of the weekend break, briefly researched, in case the same problem will be encountered in the future, the test samples are as follows, including: Chinese, English, Korean, Japanese, Chinese punctuation, English punctuation, and garbled characters.
(Chinese)(¥)(abc+-*/)(한국인)(小さなJapan)(��)(Chinese)(¥)(+-*/)(한국인) (小さなJapan)The regular expression [ -~]+ can match all printable ASCII characters, but cannot match Chinese, Chinese punctuation and garbled characters.
The regular expression [^ -~]+ can match all Chinese, Chinese punctuation and garbled characters, as well as Japanese and Korean.
The regular expression [\u4e00-\u9fa5]+ can match all Chinese characters.
Regular Expression [\u3002\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5\u00a5]+, can match all Chinese punctuation.
Regular Expression[\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF]+ , which matches all Chinese, Japanese, and Korean characters, but does not contain punctuation.
Therefore, the initial construction of a regular expression that matches garbled characters is as follows:
[^ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u30\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5\u00a5]+
Tested as follows and works well:

If you want to match all text after garbled characters, you can use the following regular expression:
[^ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u3u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]+.*
If you want to match the entire text containing garbled characters, you can use the following regular expression:
[ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u3011\u3\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]*[^ -~\u2E80-\u2FDF\u3040-\u318F\u31A0-\u31BF\u31F0-\u31FF\u3400-\u4DB5\u4E00-\u9FFF\uA960-\uA97F\uAC00-\uD7FF\u3002\u00a5\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]+.*
If there are no garbled characters in the text, the test fails

Network security is a long way to go, let's wash it and sleep~

边栏推荐
- Transfer Learning & Kemin Initialization
- 金仓数据库 KingbaseGIS 使用手册(6.3. 几何对象创建函数)
- [JZOF] 82 binary tree with a path of a certain value (1)
- Basic operations of xlrd and xlsxwriter
- 量化交易接口系统有哪些稳定性?
- 安踏携手华为运动健康共同验证冠军跑鞋 创新引领中国体育
- leetcode 20. Valid Parentheses 有效的括号(中等)
- Analyses the development status quo of stock trading
- 2022-8-9 第六组 输入输出流
- tiup cluster stop
猜你喜欢
随机推荐
你的手机曾经被监控过吗?
探索TiDB Lightning源码来解决发现的bug
Forbidden (CSRF token missing or incorrect.): /
使用股票量化交易接口需要具备怎么样的心态
制定量化交易策略的基本步骤有哪些?
【TS技术课堂】时间序列预测
【面试高频题】可逐步优化的链表高频题
tiup cluster template
【AtomicInteger】常规用法
What kind of mentality do you need to have when using the stock quantitative trading interface
68.qt quick-qml多级折叠下拉导航菜单 支持动态添加/卸载 支持qml/widget加载等
H5实现分享功能
为什么刀具数据库无法打开?
PyQt5: Getting Started Tutorial
都在说云原生,那云原生到底是什么?
2021年国内外五大BI厂商——优秀的商业智能工具推荐
What are the basic steps to develop a quantitative trading strategy?
Leetcode 98. 验证二叉搜索树
后台管理实现导入导出
2022年最新《谷粒学院开发教程》:10 - 前台支付模块









