A Python package that scrapes Google News article data while remaining undetected by Google.

Overview

googlenewsscraper

Getting Started

Installation

pip install GoogleNewsScraper

Reference

Importing

from GoogleNewsScraper import GoogleNewsScraper

Instantiating Scraper

GoogleNewsScraper(driver)

Constructor Parameters

Name Type Required
driver web driver no

Possible values:

  • 'chrome': The driver will default to use this package's chrome driver
  • A path to some driver (FireFox, for instance) stored on the user's system

Methods

This method is both public and private, though it really should only be used by the class

locate_html_element(self, driver, element, selector, wait_seconds)
Name Type Required Description
driver web driver yes A web driver (Chrome, FireFox, etc)
element string yes Id or class selector of an HTML element
selector Module import yes see below
wait_seconds int no Waits a certain number of seconds in order to locate an HTML element

To configure the 'selector' param:

First install selenium

pip install selenium

Then import By

from selenium.webdriver.common.by import By

Possible values:

  • By.ID
  • By.CLASS_NAME
  • By.CSS_SELECTOR
  • By.LINK_TEXT
  • By.NAME
  • By.PARTIAL_LINK_TEXT
  • By.TAG_NAME
  • By.XPATH

GoogleNewsScraper(...args).search(search_text, date_range, pages, pagination_pause_per_page, cb) -> list or None
Name Type Required Description
search_text str yes A series of word(s) that will be inputted into the Google search engine
date_range str no Filters article by date. Possible values: Past hours, Past 24 hours, Past week, Past month, Past year, Archives
pages str or int no Number of pages that should be scraped (defaults to 'max')
pagination_pause_per_page int no Waits a certain amount of seconds before a new page is scraped (defaults to 2). Time may have to be increased if Google prevents you from scraping all pages.
cb function no Will return all article data on a single page for every page scraped (defaults to False)
  • Example using 'cb' paramater:
def handle_page_data(page_data: list):
  # Do something with page_data

GoogleNewsScraper(...args).search(...args, cb=handle_page_data)

NOTE:

  • If no argument is provided for 'cb,' the scrape method will return a two-dimensional list
  • Each list will contain an object of news article data for every news article on that page

Example of the data that every article-object will contain:

  • 'id': A unique id for every article data object
  • 'description': The preview description of the news article
  • 'title': The title of the news article
  • 'source': The source of news article (New York Times, for instance)
  • 'image_url': The url of the preview news article image
  • 'url': A link to the news article
  • 'date_time': A datetime string that represents the date of when the article was published

Owner
Geminid Systems, Inc
Geminid Systems, Inc
Geminid Systems, Inc
A powerful annex BUBT, BUBT Soft, and BUBT website scraping script.

Annex Bubt Scraping Script I think this is the first public repository that provides free annex-BUBT, BUBT-Soft, and BUBT website scraping API script

Md Imam Hossain 4 Dec 03, 2022
Scrape plants scientific name information from Agroforestry Species Switchboard 2.0.

Agroforestry Species Switchboard 2.0 Scraper Scrape plants scientific name information from Species Switchboard 2.0. Requirements python = 3.10 (you

Mgs. M. Rizqi Fadhlurrahman 2 Dec 23, 2021
Python script who crawl first shodan page and check DBLTEK vulnerability

🐛 MASS DBLTEK EXPLOIT CHECKER USING SHODAN 🕸 Python script who crawl first shodan page and check DBLTEK vulnerability

Divin 4 Jan 09, 2022
This tool crawls a list of websites and download all PDF and office documents

This tool crawls a list of websites and download all PDF and office documents. Then it analyses the PDF documents and tries to detect accessibility issues.

AccessibilityLU 7 Sep 30, 2022
A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working Links.

WaGpScraper A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working

Muhammed Rizad 27 Dec 18, 2022
A Python module to bypass Cloudflare's anti-bot page.

cloudflare-scrape A simple Python module to bypass Cloudflare's anti-bot page (also known as "I'm Under Attack Mode", or IUAM), implemented with Reque

3k Jan 04, 2023
Crawler in Python 3.7, 3.8. 3.9. Pypy3

Description Python Crawler written Python 3. (Supports major Python releases Python3.6, Python3.7 and Python 3.8) Installation and Use Setup VirtualEn

Vinit Kumar 2 Mar 12, 2022
A python script to extract answers to any question on Quora (Quora+ included)

quora-plus-bypass A python script to extract answers to any question on Quora (Quora+ included) Requirements Python 3.x

Nitin Narayanan 10 Aug 18, 2022
ChromiumJniGenerator - Jni Generator module extracted from Chromium project

ChromiumJniGenerator - Jni Generator module extracted from Chromium project

allenxuan 4 Jun 12, 2022
Facebook Group Scraping Using Beautiful Soup & Selenium

Extract Facebook group posts that are related to a specific topic and write them to a .json file.

Fatima Ghadieh 14 Aug 12, 2022
This repo has the source code for the crawler and data crawled from auto-data.net

This repo contains the source code for crawler and crawled data of cars specifications from autodata. The data has roughly 45k cars

Tô Đức Anh 5 Nov 22, 2022
Web scrapper para cotizar articulos

WebScrapper Este web scrapper esta desarrollado en python 3.10.0 para buscar en la pagina de cyber puerta articulos dentro del catalogo. El programa t

Jordan Gaona 1 Oct 27, 2021
TikTok Username Swapper/Claimer/etc

TikTok-Turbo TikTok Username Swapper/Claimer/etc I wanted to create it as fast as possible but i eventually gave up and recoded it many many many many

Kevin 12 Dec 19, 2022
Tool to scan for secret files on HTTP servers

snallygaster Finds file leaks and other security problems on HTTP servers. what? snallygaster is a tool that looks for files accessible on web servers

Hanno Böck 2k Dec 28, 2022
Google Developer Profile Badge Scraper

Google Developer Profile Badge Scraper GDev Profile Badge Scraper is a Google Developer Profile Web Scraper which scrapes for specific badges in a use

Siddhant Lad 7 Jan 10, 2022
A python tool to scrape NFT's off of OpenSea

Right Click Bot A script to download NFT PNG's from OpenSea. All the NFT's you could ever want, no blockchain, for free. Usage Must Use Python 3! Auto

15 Jul 16, 2022
Scrapegoat is a python library that can be used to scrape the websites from internet based on the relevance of the given topic irrespective of language using Natural Language Processing

Scrapegoat is a python library that can be used to scrape the websites from internet based on the relevance of the given topic irrespective of language using Natural Language Processing. It can be ma

10 Jul 06, 2022
A tool to easily scrape youtube data using the Google API

YouTube data scraper To easily scrape any data from the youtube homepage, a youtube channel/user, search results, playlists, and a single video itself

7 Dec 03, 2022
抢京东茅台脚本,定时自动触发,自动预约,自动停止

jd_maotai 抢京东茅台脚本,定时自动触发,自动预约,自动停止 小白信用 99.6,暂时还没抢到过,朋友 80 多抢到了一瓶,所以我感觉是跟信用分没啥关系,完全是看运气的。

Aruelius.L 117 Dec 22, 2022
Twitter Eye is a Twitter Information Gathering Tool With Twitter Eye

Twitter Eye is a Twitter Information Gathering Tool With Twitter Eye, you can search with various keywords and usernames on Twitter.

Jolanda de Koff 19 Dec 12, 2022