当前位置:网站首页>Detailed explanation of IK tokenizer
Detailed explanation of IK tokenizer
2022-08-06 15:01:00 【Life doesn't stop, fight doesn't stop】
What is IK tokenizer
Word segmentation: that is, to divide a paragraph of Chinese or other words into keywords, we will segment our own information when searching, segment the data in the database or the index database, and then perform a matchOperation, the default Chinese word segmentation is to treat each character as a word, such as "I love Wang Yaotian" will be divided into: "I", "love", "王", "Yao", "天", which is obviouslyDoes not meet the requirements, so we need to install the Chinese tokenizer ik to solve this problem.
Ik provides two word segmentation algorithms: ik_smart and ik_max_word, where ik_smart is the least segmentation and ik_max_word is the most fine-grained segmentation!We will test in a while
Install
2. After downloading, put it into our elasticsearch plugin!

3. Restart and observe ES, you can see that the ik tokenizer is loaded

4. The elasticsearch-plugin can use this command to view the loaded plugins

5. Use kibana to test!
View different word segmentation effects
ik_smart is the least segmentation

ik_max_word is the most fine-grained division!Possibility of exhausting thesaurus Dictionaries

We input super like Wang Yaotian

Problem found: Wang Yaotian was disassembled
This kind of word you need needs to be added to the dictionary of our tokenizer by yourself
ik tokenizer adds its own configuration!

Restart ES to see details

Test Wang Yaotian again to see the effect

In the future, we need to configure it by ourselves, and the word segmentation can be configured in the dic file defined by ourselves
边栏推荐
猜你喜欢

两个Set集合获取相同的元素

Tencent Cloud Hu Qiming: Analysis and Optimization of Kubernetes Cloud Resources

归并排序和计数排序

我的创作纪念日的温柔与七夕的浪漫交织了在一起

【直播预告】对话知道创宇丨如何守住内容安全生命线?

学会make/makefile基本用法

一个案例搞懂工厂模式和单例模式

New kernel PHP enterprise website development and construction management system

中科院打脸谷歌:普通电脑追上量子优越性,几小时搞定原本要一万年的计算...

耗时 48 小时,小米工程师发明小米头箍,网友:变身孙悟空不是梦!
随机推荐
js array to remove the specified element [function encapsulation] (including object array to remove the specified element)
迄今为止见过最详细的零拷贝技术讲解
两个Set集合获取相同的元素
LeetCode刷题日记:135. 分发糖果
Kotlin 协程之取消与异常处理探索之旅(下)
如何使用xilinx的DDS生成多项数据
[Numpy] Solution: About the meaning and pits of dtype=object
burst!Ni Xingjun served as the chairman of Alipay China. He was born in technology and wrote the first line of "Alipay" code......
LeetCode:392. 判断子序列————简单
分享几个常用的国外英文论文文献数据库,先收藏再说
Golang 接口原理
科利转债上市价格预测
HyperLynx(二)LineSim的基本操作
因宇航服存在安全问题 NASA叫停国际空间站所有太空行走任务
The real question of the ladder game - 7-6 boss's schedule (25 points)
LeetCode Diary: 135. Distributing Candy
什么是元宇宙?
蚂蚁集团搭建行业首个全图风控,首次详解技术架构
实用新型专利申请文件撰写示例
value to 0 operation