Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Last update: Nov 05, 2021

Related tags

Overview

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

This repository provides two web crawlers to label domain names using the McAfee API (https://www.trustedsource.org/sources/index.pl) and IP reputation using the TALOS API (https://talosintelligence.com/), respectively.

Requirements

BeautifulSoup

Usage

Descriptions of the demonstration code are as follows.

To label the categories of a set of domains, put the domain list in 'data/domain_list.txt' and run 'demo_domain_label.py'. The program will label the (1) category (e.g., Malicious Sites- Parked Domain) as well as (2) risk level (e.g., High Risk) of each domain (using the McAfee API) and save the results in 'res/domain_labels.txt'. When the program continuously outputs ''-Retry-'', please stop the program and wait for a moment. After the waiting, you can start the program again, which can automatically skip the domains already labeled and continue to label the rest domains.
To label the reputation of a set of IP addresses, put the IP list in 'data/IP_list.txt' and run 'demo_IP_label.py'. The program will label the (1) email reputation as well as (2) web reputation (with 3 levels of Poor, Neutral, and Good) and save the results in 'res/IP_labels.txt'. When the program continuously outputs ''None'', please stop the program and wait for a moment. After the waiting, you can start the program again, which can automatically skip the IPs already labeled and continue to label the rest IPs.
An example domain name list (with 21,820 effective second-level domains) and an example IP list (with 67,751 IP addresses) are given in 'data/examples/example_domain_list.txt' and 'data/examples/example_IP_list.txt', repsectively. The corresponding labeled results are saved in 'res/examples/example_domain_labels.txt' and 'res/examples/example_IP_labels.txt', respectively.

If you have questions regarding this repository, you can contact the author via [[email protected]].

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Related tags

Overview

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Requirements

Usage

Owner

A command-line program to download media, like and unlike posts, and more from creators on OnlyFans.

Generate a repository with mirror links for DriveDroid app

Simple proxy scraper made by using ProxyScrape's api.

Scrapy-soccer-games - Scraping information about soccer games from a few websites

Anonymously scrapes onlinesim.ru for new usable phone numbers.

High available distributed ip proxy pool, powerd by Scrapy and Redis

A web scraper that exports your entire WhatsApp chat history.

Scrapy-based cyber security news finder

热搜榜-python爬虫+正则re+beautifulsoup+xpath

Download images from forum threads

爱奇艺会员,腾讯视频,哔哩哔哩,百度,各类签到

Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc)

Demonstration on how to use async python to control multiple playwright browsers for web-scraping

原神爬虫抓取原神界面圣遗物信息

Consulta de CPF e CNPJ na Receita Federal com Web-Scraping

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

New World Market Scraper

A webdriver-based script for reserving Tsinghua badminton courts.

A training task for web scraping using python multithreading and a real-time-updated list of available proxy servers.

This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Related tags

Overview

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Requirements

Usage

Owner

A command-line program to download media, like and unlike posts, and more from creators on OnlyFans.

Generate a repository with mirror links for DriveDroid app

Simple proxy scraper made by using ProxyScrape's api.

Scrapy-soccer-games - Scraping information about soccer games from a few websites

Anonymously scrapes onlinesim.ru for new usable phone numbers.

High available distributed ip proxy pool, powerd by Scrapy and Redis

A web scraper that exports your entire WhatsApp chat history.

Scrapy-based cyber security news finder

热搜榜-python爬虫+正则re+beautifulsoup+xpath

Download images from forum threads

爱奇艺会员,腾讯视频,哔哩哔哩,百度,各类签到

Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc)

Demonstration on how to use async python to control multiple playwright browsers for web-scraping

原神爬虫 抓取原神界面圣遗物信息

Consulta de CPF e CNPJ na Receita Federal com Web-Scraping

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

New World Market Scraper

A webdriver-based script for reserving Tsinghua badminton courts.

A training task for web scraping using python multithreading and a real-time-updated list of available proxy servers.

This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

原神爬虫抓取原神界面圣遗物信息