当前位置:网站首页>PP semantic retrieval system
PP semantic retrieval system
2022-04-21 23:16:00 【AI Zeng Xiaojian】
1. Overview of the scene
Retrieval system exists in many products we use everyday , For example, commodity search system 、 Academic literature retrieval department, etc , This scheme provides a complete implementation of the retrieval system . Limit the scene It's the user through Enter the search term Query, Quickly find similar documents in massive data .
Semantic retrieval ( Also known as vector based retrieval ), It means that the retrieval system is no longer confined to users Query Literal itself , It can accurately capture users Query The real intention behind it and use it to search , So as to more accurately return the most consistent results to the user . Find the vector representation of text by using the most advanced semantic indexing model , Index them in high-dimensional vector space , And measure the similarity between the query vector and the index document , Thus, the defects brought by keyword index are solved .
For example, the following Two sets of text Pair, If the similarity is calculated based on keywords , The similarity between the two groups is the same . From the practical semantic point of view , The similarity of the first group is higher than that of the second group .
How to place the license plate on the front of the car How to install the front license plate
How to place the license plate on the front of the car How to install the rear license plate
The key of semantic retrieval system is , Recall using semantics rather than keywords , Achieve more accuracy 、 The purpose of a broader recall of similar results .
2. Product function introduction
Usually, the data of retrieval business is relatively large , Will be divided into Recall ( Indexes )、 Sort Two links . The recall phase is mainly from at least In the ten million level candidate collection , Filter out relevant documents , such The number of candidate sets will be greatly reduced , In the later sorting stage, you can use some complex models to do fine or personalized sorting . It is generally used Multiple recall strategy ( for example Key words recall 、 Hot recall 、 Semantic recall Combination, etc ), After aggregation of multiple recall results , After unified scoring, the best TopK Result .
2.1 System features
-
Low threshold
- Build up... Hand in hand Search system
- The retrieval system can be built without labeling data
- Provide Training 、 forecast 、ANN Engine one-stop capability
-
The effect is good
- Professional solutions for a variety of data scenarios
- Only unsupervised data : SimCSE
- There are only surveillance data : InBatchNegative
- With unsupervised data and There are monitoring data : Fusion model
- Further optimize the scheme : Domain oriented pre training Domain-adaptive Pretraining
- Professional solutions for a variety of data scenarios
-
Fast performance
- be based on Paddle Inference Fast extraction of vectors
- be based on Milvus Fast query and high-performance database building
2.2 Functional architecture
There are two types of indexing methods : Literal based keyword index ; Semantic index . Semantic index Can better Representing semantic information , Solve situations that are not literally similar but semantically similar . This system gives a semantic indexing scheme , Other schemes can be used in actual business . The architecture and functions of the whole scheme are described in detail below .
版权声明
本文为[AI Zeng Xiaojian]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204212314342457.html
边栏推荐
- Why is everything console I can't come out. console. log(depsMap)
- IJCAI2022录用结果出炉!接收率15%,你中了吗?
- pytorch(五)——笔记
- [H.264] SPS frame rate calculation method
- 从零开始自制实现WebServer(十六)---- 学习新工具CMake自动编写MakeFile 分门别类整理源文件心情愉悦
- 2022 TV Box plus le classement de la liste d'achat, plus de la moitié des utilisateurs achètent des boîtes dangbei
- 音视频基本概念和FFmpeg的简单入门
- When the color contrast of beix3 is good, the color of the new version 3.1 is close to the original image
- 瑞芯微芯片AI部分开发记录 第一节 《PC端环境搭建1》
- 痞子衡嵌入式:聊聊系统看门狗WDOG1在i.MXRT1xxx系统启动中的应用及影响
猜你喜欢

2022电视盒子加购榜排名,超一多半的用户选购当贝盒子

MySQL Chapter 5 addition, deletion, modification and query of MySQL table data

【ACM】46. 全排列(1. 这里需要用到前面的元素进行排列,故不用startindex(组合、分割时才用);2. 注意处理数组中元素是否重复使用的问题(使用contains函数))

Its and LPI interrupt of GIC spec 5
![idea 解决项目包出现[wrapper(1)]](/img/78/362594cbf940d3ab89e6d9d8891a5e.png)
idea 解决项目包出现[wrapper(1)]

大厂面试必备技能,android音视频框架

Finally, someone made it clear that this is the global one-piece network technology with low delay

2022r2 mobile pressure vessel filling test exercises and online simulation test

新独立版抖音口红机全修复版本附视频教程

Teach you to easily solve CSRF Cross Site Request Forgery Attack
随机推荐
4. MySQL workbench create access user
(3) Ruixin micro rk3568 SSH replaces dropbear
VOS7.03安装及源码命令
【H.264】简单编码器及SPS
MySQL Chapter 5 addition, deletion, modification and query of MySQL table data
2、Failed to connect to MySQL Server 8.0.28 after 10 attempts
自建vnc类软件注意事项
(三)瑞芯微rk3568 ssh 替换 dropbear
2022 TV Box plus le classement de la liste d'achat, plus de la moitié des utilisateurs achètent des boîtes dangbei
2022電視盒子加購榜排名,超一多半的用戶選購當貝盒子
【ACM】46. 全排列(1. 这里需要用到前面的元素进行排列,故不用startindex(组合、分割时才用);2. 注意处理数组中元素是否重复使用的问题(使用contains函数))
Bit by bit concentrated and clean, the way to break the situation in the detergent industry
Vs2019 configuring opencv4
7.3.1 homogeneous coordinate transformation & homogeneous transformation matrix
MySQL Chapter 3 basic SQL syntax
Overloading of methods
golang力扣leetcode 479.最大回文数乘积
Discussion on digital business of traditional enterprises - Digital Architecture Design (5)
golang力扣leetcode 385.迷你语法分析器
叹为观止,4款惊喜满满的高质量软件,使用起来倍感舒适