A Python client for the Softcite software mention recognizer server

Overview

Softcite software mention recognizer client

Python client for using the Softcite software mention recognition service. It can be applied to

  • individual PDF files

  • recursively to a local directory, processing all the encountered PDF

  • to a collection of documents harvested by biblio-glutton-harvester and article-dataset-builder, with the benefit of re-using the collection manifest for injectng metadata and keeping track of progress. The collection can be stored locally or on a S3 storage.

Requirements

The client has been tested with Python 3.5-3.7.

The client requires a working Softcite software mention recognition service. Service host and port can be changed in the config.json file of the client.

Install

cd software_mention_client/

It is advised to setup first a virtual environment to avoid falling into one of these gloomy python dependency marshlands:

virtualenv --system-site-packages -p python3 env

source env/bin/activate

Install the dependencies, use:

pip3 install -r requirements.txt

Usage and options

usage: software_mention_client.py [-h] [--repo-in REPO_IN] [--file-in FILE_IN]
                                  [--file-out FILE_OUT]
                                  [--data-path DATA_PATH] [--config CONFIG]
                                  [--reprocess] [--reset] [--load]
                                  [--diagnostic] [--scorched-earth]

Softcite software mention recognizer client

optional arguments:
  -h, --help            show this help message and exit
  --repo-in REPO_IN     path to a directory of PDF files to be processed by
                        the Softcite software mention recognizer
  --file-in FILE_IN     a single PDF input file to be processed by the
                        Softcite software mention recognizer
  --file-out FILE_OUT   path to a single output the software mentions in JSON
                        format, extracted from the PDF file-in
  --data-path DATA_PATH
                        path to the resource files created/harvested by
                        biblio-glutton-harvester
  --config CONFIG       path to the config file, default is ./config.json
  --reprocess           reprocessed failed PDF
  --reset               ignore previous processing states and re-init the
                        annotation process from the beginning
  --load                load json files into the MongoDB instance, the --repo-
                        in parameter must indicate the path to the directory
                        of resulting json files to be loaded
  --diagnostic          perform a full count of annotations and diagnostic
                        using MongoDB regarding the harvesting and
                        transformation process
  --scorched-earth      remove a PDF file after its successful processing in
                        order to save storage space, careful with this!

The logs are written by default in a file ./client.log, but the location of the logs can be changed in the configuration file (default ./config.json).

Processing local PDF files

For processing a single file., the resulting json being written as file at the indicated output path:

python3 software_mention_client.py --file-in toto.pdf --file-out toto.json

For processing recursively a directory of PDF files, the results will be:

  • written to a mongodb server and database indicated in the config file

  • and in the directory of PDF files, as json files, together with each processed PDF

python3 software_mention_client.py --repo-in /mnt/data/biblio/pmc_oa_dir/

The default config file is ./config.json, but could also be specified via the parameter --config:

python3 software_mention_client.py --repo-in /mnt/data/biblio/pmc_oa_dir/ --config ./my_config.json

Processing a collection of PDF harvested by biblio-glutton-harvester

biblio-glutton-harvester and article-dataset-builder creates a collection manifest as a LMDB database to keep track of the harvesting of large collection of files. Storage of the resource can be located on a local file system or on a AWS S3 storage. The software-mention client will use the collection manifest to process these harvested documents.

  • locally:

python3 software_mention_client.py --data-path /mnt/data/biblio-glutton-harvester/data/

--data-path indicates the path to the repository of data harvested by biblio-glutton-harvester.

The resulting JSON files will be enriched by the metadata records of the processed PDF and will be stored together with each processed PDF in the data repository.

If the harvested collection is located on a S3 storage, the access information must be indicated in the configuration file of the client config.json. The extracted software mention will be written in a file with extension .software.json, for example:

-rw-rw-r-- 1 lopez lopez 1.1M Aug  8 03:26 0100a44b-6f3f-4cf7-86f9-8ef5e8401567.pdf
-rw-rw-r-- 1 lopez lopez  485 Aug  8 03:41 0100a44b-6f3f-4cf7-86f9-8ef5e8401567.software.json

If a MongoDB server access information is indicated in the configuration file config.json, the extracted information will additionally be written in MongoDB.

License and contact

Distributed under Apache 2.0 license. The dependencies used in the project are either themselves also distributed under Apache 2.0 license or distributed under a compatible license.

Main author and contact: Patrice Lopez ([email protected])

Nonebot2 简易群管

简易群管 ✨ NoneBot2 简易群管 ✨ _ 踢 改 禁 欢迎issue pr 权限说明:permission=SUPERUSER 安装 💿 pip install nonebot-plugin-admin 导入 📲 在bot.py 导入,语句: nonebot.load_plugin("n

幼稚园园长 74 Dec 22, 2022
Tsar-Bot - Crypto auto trade bot that use sentiment analysis from twitter

Tsar Bot - Crypto Sentiment Bot Tsar Bot is a Twitter Crypto Sentiment Bot that

Hilmi Azizi 26 Dec 15, 2022
🎵 RythmReloaded 🎵 A bot that can play music on Telegram Group and Channel Voice Chats

🎵 RythmReloaded 🎵 A bot that can play music on Telegram Group and Channel Voice Chats POWERED BY MARSHALX TGCALLS Available on telegram as @OptimusP

0 Nov 03, 2021
Easy to use reaction role Discord bot written in Python.

Reaction Light - Discord Role Bot Light yet powerful reaction role bot coded in Python. Key Features Create multiple custom embedded messages with cus

eibex 109 Dec 20, 2022
A Powerful, Smart And Advance Group Manager ... Written with AioGram , Pyrogram and Telethon...

❤️ Shadow ❤️ A Powerful, Smart And Advance Group Manager ... Written with AioGram , Pyrogram and Telethon... ⭐️ Thanks to everyone who starred Shadow,

TeamShadow 17 Oct 21, 2022
Tesseract Open Source OCR Engine (main repository)

Tesseract OCR About This package contains an OCR engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural net (LSTM

48.3k Jan 05, 2023
A demo without 🚀 science, just simple UTXO spending logic.

Stuck TX Demo Docker container that runs 4 dogecoind to demonstrate "the stuck tx problem". Scenario A wallet sends out 3 transactions to a recipient

Patrick Lodder 2 Nov 16, 2021
A wrapper for aqquiring Choice Coin directly through a Python Terminal. Leverages the TinyMan Python-SDK.

CHOICE_TinyMan_Wrapper A wrapper that allows users to acquire Choice Coin directly through their Terminal using ALGO and various Algorand Standard Ass

Choice Coin 16 Sep 24, 2022
The Python client library for the Tuneup Technology App.

Tuneup Technology App Python Client Library The Python client library for the Tuneup Technology App. This library allows you to interact with the cust

Tuneup Technology 0 Jun 29, 2022
A bot to playing music in telegram vcg

Idzeroid Music|| Idzeroid Music adalah sebuah repository music bot telegram untuk memainkan suara di voice chat group anda. Fyi This repo im using for

idzeroid 1 Oct 26, 2021
Upload on Doodstream by Url, File and also by direct forward post from other channel...

Upload on Doodstream by Url, File and also by direct forward post from other channel...

Pʀᴇᴅᴀᴛᴏʀ 8 Aug 10, 2022
Based on nonebot, a common bot framework for maimai.

mai bot 使用指南 此 README 提供了最低程度的 mai bot 教程与支持。 Step 1. 安装 Python 请自行前往 https://www.python.org/ 下载 Python 3 版本( 3.7)并将其添加到环境变量(在安装过程中勾选 Add to system P

Diving-Fish 150 Jan 01, 2023
Whatsapp-APi Wrapper From rzawapi.my.id

Whatsapp-APi Wrapper From rzawapi.my.id

Rezza Priatna 2 Apr 19, 2022
Create light scenes , voice control, ifttt, fuzzywuzzy speech correction and much more with Tuya light bulbs.

LightBox Features: Auto discover tuya lights Set and create moods (aka: light profiles) Change moods via IFTTT List moods via IFTTT FuzzyWuzzy, speech

Robert Nagtegaal 1 Dec 20, 2021
A python tool to Automate Whatsapp through Whatsapp web

This python tool is used to Automate Whatsapp through Whatsapp web. We can add number of contacts whom we want to send text messages on perticular time

5 Jul 21, 2022
Contrastive Language-Audio Pretraining

CLAP Contrastive Language-Audio Pretraining In due time this repo will be full of lovely things, I hope. Feel free to check out the Issues if you're i

Charles Foster 83 Dec 01, 2022
A Telegram Message Manager Bot by @AbirHasan2005

Messages-Manager-Bot A Telegram Message Manager Bot by @AbirHasan2005. This Bot can delete specific type of messages from Group. I specially use for @

Abir Hasan 32 Nov 12, 2022
A Telegram Bot written in Python for mirroring files on the Internet to Google Drive

No support is going to be provided of any kind, only maintaining this for vps user on request. This is a Telegram Bot written in Python for mirroring

0 Dec 26, 2021
Orca is an extensive and extendable Python 3.x library for the Discord API.

Orca is an extensive and extendable Python 3.x library for the Discord API.

RPS 4 Apr 03, 2022
a discord bot for searching your movies, and bot return movie url for you :)

IMDb Discord Bot how to run this bot. the first step you must create prefixes.json file the second step you must create a virtualenv if you use window

Mehdi Radfar 6 Dec 20, 2022