A web crawler for recording posts in "sina weibo"

Last update: Aug 20, 2022

Overview

Web Crawler for "sina weibo"

A web crawler for recording posts in "sina weibo"

Introduction

This script helps collect attributes of posts in "sina weibo". Users can record posts in different lists (or flows, or collections), like the searching results. The supported lists (or flows, or collections) are listed in "Functions" section.

Functions

Scripts currently available:

Name Description

search.py Search for a word and specific time interval and record all posts, the search result.
Parameters: (Edit these parameters at the head of the script.)
search_string: The string to search for. All posts containing this string will be recorded, 50 pages at most.
start_time: Only posts which are posted after this time will be recorded. (Accurate to hour level)
end_time: Only posts which are posted before this time will be recorded. (Accurate to hour level)
rest_time: The interval between two requests, where the unit is second.
Results are saved as Python pickle format at results/weibo-{search_string}-{start_time}-{end_time}.pkl. The start_time and end_time in filename are formatted as Unix timestamp (the unit is second).

Name	Description
`search.py`	Search for a word and specific time interval and record all posts, the search result. Parameters: (Edit these parameters at the head of the script.) `search_string`: The string to search for. All posts containing this string will be recorded, 50 pages at most. `start_time`: Only posts which are posted after this time will be recorded. (Accurate to hour level) `end_time`: Only posts which are posted before this time will be recorded. (Accurate to hour level) `rest_time`: The interval between two requests, where the unit is second. Results are saved as Python pickle format at `results/weibo-{search_string}-{start_time}-{end_time}.pkl`. The `start_time` and `end_time` in filename are formatted as Unix timestamp (the unit is second).

Installation

Run pip install -r requirements.txt.
According to "Function" section, find the script you need.
Edit parameters at the head of the script.
Run the script with Python.

A web crawler for recording posts in "sina weibo"

Related tags

Overview

Web Crawler for "sina weibo"

Introduction

Functions

Installation

Owner

Web3 Pancakeswap Sniper bot written in python3

An helper library to scrape data from Instagram effortlessly, using the Influencer Hunters APIs.

A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working Links.

Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

Tool to scan for secret files on HTTP servers

Script for scrape user data like "id,username,fullname,followers,tweets .. etc" by Twitter's search engine .

Unja is a fast & light tool for fetching known URLs from Wayback Machine

🐞 Douban Movie / Douban Book Scarpy

Quick Project made to help scrape Lexile and Atos(AR) levels from ISBN

A tool to easily scrape youtube data using the Google API

A simple reddit scraper to get memes (only images) from r/ProgrammerHumor.

Subscrape - A Python scraper for substrate chains

Parse feeds in Python

An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line!

Minimal set of tools to conduct stealthy scraping.

Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil

An introduction to free, automated web scraping with GitHub’s powerful new Actions framework.

A high-level distributed crawling framework.

A high-level distributed crawling framework.

Scraping and visualising India's real-time COVID-19 data from the MOHFW dataset.