Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Last update: Sep 29, 2022

Overview

Deep Text Search - AI Based Text Search & Recommendation System

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Creators

Nilesh Verma

Features

Faster Search.
High Accurate Text Recommendation and Search Output Result.
Best for Implementing on python based web application or APIs.
Best implementation for College students and freshers for project creation.
Applications are Text-based News, Social media post, E-commerce Product recommendation and other text-based platforms that want to implement text recommendation and search.

Installation

This library is compatible with both windows and Linux system you can just use PIP command to install this library on your system:

pip install DeepTextSearch

How To Use?

We have provided the Demo folder under the GitHub repository, you can find the example in both .py and .ipynb file. Following are the ideal flow of the code:

1. Importing the Important Classes

There are three important classes you need to load LoadData - for data loading, TextEmbedder - for embedding the text to data, TextSearch - For searching the text.

# Importing the proper classes
from DeepTextSearch import LoadData,TextEmbedder,TextSearch

2. Loading the Texts Data

For loading the Texts data we need to use the LoadData object, from there we can import text data as python list object from the CSV/Text file.

# Load data from CSV file
data = LoadData().from_csv("../your_file_name.csv")
# Load data from Text file
data = LoadData().from_text("../your_file_name.txt")

3. Embedding and Saving The File in Local Folder

For Embedding we are using state of the art multilingual Sentence Transformer Embedding, We also store the information of the Embedding for further use on the local path [embedding-data/] folder.

You can also use the load embedding() method in a TextEmbedder() class to load saved embedding data.

# To use Serching, we must first embed data. After that, we must save all of the data on the local path.
TextEmbedder().embed(corpus_list=data)

# Loading Embedding data
corpus_embedding = TextEmbedder().load_embedding()

3. Searching

We compare Cosian Similarity for searching and recommending, and then the corpus is sorted according to the similarity score:

# You must include the query text and the quantity of comparable texts you want to search for.
TextSearch().find_similar(query_text="What are the key features of Node.js?",top_n=10)

Complete Code

# Importing the proper classes
from DeepTextSearch import LoadData,TextEmbedder,TextSearch
# Load data from CSV file
data = LoadData().from_csv("../your_file_name.csv")
# To use Serching, we must first embed data. After that, we must save all of the data on the local path
TextEmbedder().embed(corpus_list=data)
# You must include the query text and the quantity of comparable texts you want to search for
TextSearch().find_similar(query_text="What are the key features of Node.js?",top_n=10)

License

MIT License

Copyright (c) 2021 Nilesh Verma

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Please do STAR the repository, if it helped you in anyway.

More cool features will be added in future. Feel free to give suggestions, report bugs and contribute.

On-device speech-to-index engine powered by deep learning.

30 Nov 24, 2022

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

XL-Sum This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Lang

190 Jan 3, 2023

State of the Art Neural Networks for Deep Learning

pyradox This python library helps you with implementing various state of the art neural networks in a totally customizable fashion using Tensorflow 2

60 May 29, 2022

😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

------ Update September 2018 ------ It's been a year since TorchMoji and DeepMoji were released. We're trying to understand how it's being used such t

865 Dec 24, 2022

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Here is deepparse. Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning. Use deepparse to Use the pr

192 Dec 20, 2022

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models. Developers can reproduce these SOTA methods and build their own methods.

405 Jan 4, 2023

We evaluate our method on different datasets (including ShapeNet, CUB-200-2011, and Pascal3D+) and achieve state-of-the-art results, outperforming all the other supervised and unsupervised methods and 3D representations, all in terms of performance, accuracy, and training time.

An Effective Loss Function for Generating 3D Models from Single 2D Image without Rendering Papers with code | Paper Nikola Zubić Pietro Lio University

213 Dec 27, 2022

Recommendationsystem - Movie-recommendation - matrixfactorization colloborative filtering recommendation system user

recommendationsystem matrixfactorization colloborative filtering recommendation

1 Jan 1, 2022

Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.

Summary Explorer Summary Explorer is a tool to visually inspect the summaries from several state-of-the-art neural summarization models across multipl

42 Aug 14, 2022

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Related tags

Overview

Deep Text Search - AI Based Text Search & Recommendation System

Creators

Nilesh Verma

Features

Installation

How To Use?

1. Importing the Important Classes

2. Loading the Texts Data

3. Embedding and Saving The File in Local Folder

3. Searching

Complete Code

License

Please do STAR the repository, if it helped you in anyway.

You might also like...

On-device speech-to-index engine powered by deep learning.

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

State of the Art Neural Networks for Deep Learning

😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

We evaluate our method on different datasets (including ShapeNet, CUB-200-2011, and Pascal3D+) and achieve state-of-the-art results, outperforming all the other supervised and unsupervised methods and 3D representations, all in terms of performance, accuracy, and training time.

Recommendationsystem - Movie-recommendation - matrixfactorization colloborative filtering recommendation system user

Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.

Releases(v_03)

v_03(May 31, 2021)

v_02(May 30, 2021)

v_01(May 30, 2021)

Owner

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

RodoSol-ALPR Dataset

Official PyTorch implementation of "Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets" (ICLR 2021)

Awesome Monocular 3D detection

Towards Interpretable Deep Metric Learning with Structural Matching

Spatial Attentive Single-Image Deraining with a High Quality Real Rain Dataset (CVPR'19)

Implement A3C for Mujoco gym envs

[ICCV 2021] Deep Hough Voting for Robust Global Registration

Semi-supervised Adversarial Learning to Generate Photorealistic Face Images of New Identities from 3D Morphable Model

This is the replication package for paper submission: Towards Training Reproducible Deep Learning Models.

PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

Deep Learning for Morphological Profiling

Imagededup - 😎 Finding duplicate images made easy

Meta Language-Specific Layers in Multilingual Language Models

Neural Cellular Automata + CLIP

A little Python application to auto tag your photos with the power of machine learning.

Implementation of CVPR'2022:Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors

Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".

Distinguishing Commercial from Editorial Content in News

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.