The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

Last update: Dec 25, 2022

Overview

tiara - The Internet Archive Research Assistant

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

by Kay Savetz, May 2021.

Searches Internet Archive using its full text search for new items matching the keywords you specify. Run this script once a day via crontab for daily updates about new items relevant to your ongoing research subjects. It keeps track of the items it has already found, so will only alert you to new-to-you items. The script outputs its findings to an html file, and optionally emails that file to you via SendGrid or your system mail (eg Sendmail or Postfix).

Put your keywords in searchlist.txt, one search term per line. Very general terms (like "dogs") provide too many daily hits to be useful. More specific phrases work better.

Dependency: Internet Archive command line tool (Install with pip install internetarchive) The script also requires read-write access to the directory it lives in.

Issue: Internet Archive cannot generate thumbnails for all items. In these cases, you may see a broken image icon. Issue: Internet Archive's full text search doesn't seem to allow exact phrase matching. So a search for "Pliny The Elder" may turn up items mentioning Pliny The Younger, or with "Pliny" on one page and "elder" on another.

If you find this tool useful, please donate to Internet Archive

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

Related tags

Overview

tiara - The Internet Archive Research Assistant

Owner

Kay Savetz

Converts python code into c++ by using OpenAI CODEX.

Natural Language Processing Tasks and Examples.

A crowdsourced dataset of dialogues grounded in social contexts involving utilization of commonsense.

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

Search-Engine - 📖 AI based search engine

A full spaCy pipeline and models for scientific/biomedical documents.

Product-Review-Summarizer - Created a product review summarizer which clustered thousands of product reviews and summarized them into a maximum of 500 characters, saving precious time of customers and helping them make a wise buying decision.

Contains the code and data for our #ICSE2022 paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences"

Intent parsing and slot filling in PyTorch with seq2seq + attention

auto_code_complete is a auto word-completetion program which allows you to customize it on your need

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

Implementation of the Hybrid Perception Block and Dual-Pruned Self-Attention block from the ITTR paper for Image to Image Translation using Transformers

🎐 a python library for doing approximate and phonetic matching of strings.

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Russian words synonyms and antonyms

Create a machine learning model which will predict if the mortgage will be approved or not based on 5 variables

ACL'22: Structured Pruning Learns Compact and Accurate Models

A sentence aligner for comparable corpora

BiNE: Bipartite Network Embedding

Python SDK for working with Voicegain Speech-to-Text