The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Last update: Jan 28, 2022

Related tags

Text Data & NLP information_retrieval

Overview

Main Idea

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Setup

Download trained models

There are two models trained for spanish, a bi-encoder and a cross-encoder. These serve to make the retrieval system using the retrieve and rerank idea:

make setup
pip install -r requirements.txt

Basic usage

Setup Elasticsearch index with semantic vectors. For this step we supose that a set of json files is folder. Each json can contain several optional fields but need to contain id and text fiedlds.

from information_retrieval import SemanticEmbedder, CrossEncoder, Prepare, Search

data_folder = 'data/'
text_field = "texto_parrafo"
id_field = "id_parrafo"
elastic_index_name = "sentencias_2.0"

# Read the files, compute embeddings and upload them to elasticsearch
P = Prepare(data_folder, text_field, id_field, elastic_index_name)
P.prepare()

Make queries to retrieve documents:

from information_retrieval import SearchEngine

query = "la vida es bella"
S = SearchEngine(elastic_index_name)
S.retrieve(query) # Only semantic search

S.rerank(query) # Retrieve and rerank

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Related tags

Overview

Main Idea

Setup

Download trained models

Basic usage

Model architecture

Training

Finetuning

Owner

Sergio Arnaud Gomez

Transformers implementation for Fall 2021 Clinic

Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

nlpcommon is a python Open Source Toolkit for text classification.

自然言語で書かれた時間情報表現を抽出/規格化するルールベースの解析器

Official code for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

Twitter-Sentiment-Analysis - Twitter sentiment analysis for india's top online retailers(2019 to 2022)

🗣️ NALP is a library that covers Natural Adversarial Language Processing.

小布助手对话短文本语义匹配的一个baseline

Mapping a variable-length sentence to a fixed-length vector using BERT model

Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"

GVT is a generic translation tool for parts of text on the PC screen with Text to Speak functionality.

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

Implementaion of our ACL 2022 paper Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation

Question and answer retrieval in Turkish with BERT

UniSpeech - Large Scale Self-Supervised Learning for Speech

Sequence Modeling with Structured State Spaces

An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch

Perform sentiment analysis and keyword extraction on Craigslist listings

Natural Language Processing Best Practices & Examples

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries