Search for documents in a domain through Google. The objective is to extract metadata

Last update: Dec 16, 2022

Related tags

Overview

MetaFinder - Metadata search through Google

   _____               __             ___________ .__               .___                   
  /     \     ____   _/  |_  _____    \_   _____/ |__|   ____     __| _/   ____   _______  
 /  \ /  \  _/ __ \  \   __\ \__  \    |    __)   |  |  /    \   / __ |  _/ __ \  \_  __ \ 
/    Y    \ \  ___/   |  |    / __ \_  |     \    |  | |   |  \ / /_/ |  \  ___/   |  | \/ 
\____|__  /  \___  >  |__|   (____  /  \___  /    |__| |___|  / \____ |   \___  >  |__|    
        \/       \/               \/       \/               \/       \/       \/          
        
|_ Author: @JosueEncinar
|_ Description: Search for documents in a domain through Google. The objective is to extract metadata
|_ Usage: python3 metafinder.py -d domain.com -l 100 -o /tmp

Installation:

> pip3 install metafinder

Upgrades are also available using:

> pip3 install metafinder --upgrade

Usage

CLI

metafinder -d domain.com -l 20 -o folder [-t 10] [-v]

Parameters:

d: Specifies the target domain.
l: Specify the maximum number of results to be searched.
o: Specify the path to save the report.
t: Optional. Used to configure the threads (4 by default).
v: Optional. It is used to display the results on the screen as well.

In Code

import metafinder.extractor as metadata_extractor

documents_limit = 5
domain = "target_domain"
data = metadata_extractor.extract_metadata_from_google_search(domain, documents_limit)
for k,v in data.items():
    print(f"{k}:")
    print(f"|_ URL: {v['url']}")
    for metadata,value in v['metadata'].items():
        print(f"|__ {metadata}: {value}")

document_name = "test.pdf"
try:
    metadata_file = metadata_extractor.extract_metadata_from_document(document_name)
    for k,v in metadata_file.items():
        print(f"{k}: {v}")
except FileNotFoundError:
    print("File not found")

Author

This project has been developed by:

Josué Encinar García -- @JosueEncinar

Contributors

Félix Brezo Fernández -- @febrezo

Disclaimer!

This Software has been developed for teaching purposes and for use with permission of a potential target. The author is not responsible for any illegitimate use.

Search for documents in a domain through Google. The objective is to extract metadata

Related tags

Overview

MetaFinder - Metadata search through Google

Installation:

Usage

CLI

In Code

Author

Contributors

Disclaimer!

Owner

Josué Encinar

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

A sample project that exists for PyPUG's "Tutorial on Packaging and Distributing Projects"

PortaSpeech - PyTorch Implementation

Multilingual word vectors in 78 languages

We have built a Voice based Personal Assistant for people to access files hands free in their device using natural language processing.

Use Tensorflow2.7.0 Build OpenAI'GPT-2

Arabic speech recognition, classification and text-to-speech.

Use fastai-v2 with HuggingFace's pretrained transformers

Mastering Transformers, published by Packt

Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models

👄 The most accurate natural language detection library for Python, suitable for long and short text alike

Code for producing Japanese GPT-2 provided by rinna Co., Ltd.

Amazon Multilingual Counterfactual Dataset (AMCD)

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

Library for fast text representation and classification.

Tracking Progress in Natural Language Processing

Repo for Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive Text Summarization

Training RNNs as Fast as CNNs

Applied Natural Language Processing in the Enterprise - An O'Reilly Media Publication