Neural search engine for AI papers

Overview

Papers search

build

Neural search engine for ML papers.

Demo

Usage is simple: input an abstract, get the matching papers. The following demo also showcases the finetuning functionality (notice how the paper marked as "irrelevant" is assigned a lower score after finetuning).

Search and finetuning demo

Dataset

We used a stripped-down version of the Kaggle arXiv Dataset in which only the following categories are retained: cs.AI, cs.CL, cs.CV, cs.LG, cs.MA, cs.NE

Setting up the environment

Clone the repository

git clone https://github.com/fissoreg/papers-search/
cd papers-search

For both the folders frontend and backend, run the following commands

cd folder_to_go_into/ # `folder_to_go_into` is either `frontend` or `backend`

python3 -m venv env
source venv/bin/activate

pip install --upgrade pip
pip install -r requirements.txt

Indexing

The app works by suggesting papers whose abstract is similar to the one you provided. The suggestions come from a database of published papers: you need to index all the suggestions for the system to be able to function. This is a lenghty operation, but it needs to be performed only once:

cd backend
python src/app.py --index

For testing, you can index a small number of papers providing the --n argument:

python src/app.py --index --n 10

Running the app

This can be run after indexing (section above).

Run the backend

cd backend
python3 src/app.py

In a new terminal, run the frontend

cd frontend
streamlit run app.py

Connect to http://localhost:8501/ (with your favourite browser).

Formatting, linting and testing

Refer to the Makefile for the specific commands

To format code following the black standard

$ make format

Code linting with flake8

$ make lint

Testing

$ make testdeps
$ make test

Testing with coverage analysis

$ make coverage

Format, test and coverage

$ make build

Contributing

This project is in its starting phase. If you are interested in contributing, don't hesitate to get in touch! (Or go straight to the Issues ;)).

Acknowledgements

Made possible by:

Owner
Giancarlo Fissore
PhD student in Statistical Physics and Machine Learning.
Giancarlo Fissore
This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp

Jacobo José Guijarro Villalba 75 Oct 21, 2022
Camelot: PDF Table Extraction for Humans

Camelot: PDF Table Extraction for Humans Camelot is a Python library that makes it easy for anyone to extract tables from PDF files! Note: You can als

Atlan Technologies Pvt Ltd 3.3k Dec 31, 2022
Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Total-Text-Dataset (Official site) Updated on April 29, 2020 (Detection leaderboard is updated - highlighted E2E methods. Thank you shine-lcy.) Update

Chee Seng Chan 671 Dec 27, 2022
A simple document layout analysis using Python-OpenCV

Run the application: python main.py *Note: For first time running the application, create a folder named "output". The application is a simple documen

Roinand Aguila 109 Dec 12, 2022
EAST for ICPR MTWI 2018 Challenge II (Text detection of network images)

EAST_ICPR2018: EAST for ICPR MTWI 2018 Challenge II (Text detection of network images) Introduction This is a repository forked from argman/EAST for t

QichaoWu 49 Dec 24, 2022
A python script based on opencv and paddleocr, which can automatically pick up tasks, make cookies, and receive rewards in the Destiny 2 Dawning Oven

A python script based on opencv and paddleocr, which can automatically pick up tasks, make cookies, and receive rewards in the Destiny 2 Dawning Oven

1 Dec 22, 2021
Create single line SVG illustrations from your pictures

Create single line SVG illustrations from your pictures

Javier Bórquez 686 Dec 26, 2022
SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

NVIDIA Research Projects 31 Nov 22, 2022
Document Layout Analysis

Eynollah Document Layout Analysis Introduction This tool performs document layout analysis (segmentation) from image data and returns the results as P

QURATOR-SPK 198 Dec 29, 2022
Single Shot Text Detector with Regional Attention

Single Shot Text Detector with Regional Attention Introduction SSTD is initially described in our ICCV 2017 spotlight paper. A third-party implementat

Pan He 215 Dec 07, 2022
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Microsoft 235 Dec 22, 2022
Detect handwritten words in a text-line (classic image processing method).

Word segmentation Implementation of scale space technique for word segmentation as proposed by R. Manmatha and N. Srimal. Even though the paper is fro

Harald Scheidl 190 Jan 03, 2023
Usando o Amazon Textract como OCR para Extração de Dados no DynamoDB

dio-live-textract2 Repositório de código para o live coding do dia 05/10/2021 sobre extração de dados estruturados e gravação em banco de dados a part

hugoportela 0 Jan 19, 2022
An official PyTorch implementation of the paper "Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences", ICCV 2021.

PyTorch implementation of Learning by Aligning (ICCV 2021) This is an official PyTorch implementation of the paper "Learning by Aligning: Visible-Infr

CV Lab @ Yonsei University 30 Nov 05, 2022
SemTorch

SemTorch This repository contains different deep learning architectures definitions that can be applied to image segmentation. All the architectures a

David Lacalle Castillo 154 Dec 07, 2022
Primary QPDF source code and documentation

QPDF QPDF is a command-line tool and C++ library that performs content-preserving transformations on PDF files. It supports linearization, encryption,

QPDF 2.2k Jan 04, 2023
Balabobapy - Using artificial intelligence algorithms to continue the text

Balabobapy - Using artificial intelligence algorithms to continue the text

qxtony 1 Feb 04, 2022
BD-ALL-DIGIT - This Is Bangladeshi All Sim Cloner Tools

BANGLADESHI ALL SIM CLONER TOOLS INSTALL TOOL ON TERMUX $ apt update $ apt upgra

MAHADI HASAN AFRIDI 2 Jan 19, 2022
kaldi-asr/kaldi is the official location of the Kaldi project.

Kaldi Speech Recognition Toolkit To build the toolkit: see ./INSTALL. These instructions are valid for UNIX systems including various flavors of Linux

Kaldi 12.3k Jan 05, 2023