webScrap

WebScraping first step.

Authors: Paulo, Claudio M.

First steps in Web Scraping. Project carried out for training in Web Scrapping. The export of information to a structured database (Pandas DataFrame) where the information was obtained by making a request() call from pages with known addresses. Find the information in the 'lxml' code formatted by BeautfullSoup, and finally exported in csv format.

How to automate the search for related words in OLX ads.
Can I use quartile analysis to find the best product at the best price?

Our Plan

Select the list of related words.
Use requests to download the page.
Use BSsoup to format the downloaded page in lxml.
Create a structured database with date and time of posting, ad title, product value, city and neighborhood where it is being advertised.
Filter the database by removing ads whose ad title does not contain the desired words.
Use the percentile and average value metric to find the average price of advertisements by cities (of Brazilian states).

Current progress

Data scraping was carried out and the database was created to analyze the average value by city.

Database formed by information in OLX Brasil website advertisements.

The code is with variables and comments in Portuguese, and the search for advertisements is carried out with words in the Portuguese language.

Web Scraping OLX with Python and Bsoup.

Related tags

Overview

webScrap

WebScraping first step.

Authors: Paulo, Claudio M.

Our Plan

Current progress

References

Owner

claudio paulo

IGLS - Instagram Like Scraper CLI tool

一些爬虫相关的签名、验证码破解

A database scraper created with mechanical soup and sqlite

A spider for Universal Online Judge(UOJ) system, converting problem pages to PDFs.

抢京东茅台脚本，定时自动触发，自动预约，自动停止

京东秒杀商品抢购Python脚本

Libextract: extract data from websites

An automated, headless YouTube Watcher and Scraper

Extract embedded metadata from HTML markup

A Web Scraper built with beautiful soup, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file

Scraping weather data using Python to receive umbrella reminders

Transistor, a Python web scraping framework for intelligent use cases.

A Pixiv web crawler module

Newsscraper - A simple Python 3 module to get crypto or news articles and their content from various RSS feeds.

A webdriver-based script for reserving Tsinghua badminton courts.

优化版本的京东茅台抢购神器

Scrape and display grades onto the console

Incredibly fast crawler designed for OSINT.

A web crawler for recording posts in "sina weibo"

让中国用户使用git从github下载的速度提高1000倍!