A web scraper for nomadlist.com, made to avoid website restrictions.

Related tags

Web Crawlinggypsylist
Overview

Gypsylist

gypsylist.py is a web scraper for nomadlist.com, made to avoid website restrictions.

nomadlist.com is a website with a lot of information for digital nomad people, to find the best places to live and work remotely as a location independent remote worker. Unfortunately most of these contents are restricted if you are not member of this website.

This script doesn't cover all of the information retrievable from the website, but it's just an entry point to evaluate this without to sign up.

Installation

Before to use gypsylist you have to install some requirements:

pip3 install -r requirements.txt

Additionally, having selenium as dependency, you have also to setup the browser driver. To install this, please, take a look here: https://www.selenium.dev/documentation/webdriver/getting_started/install_drivers/.

Now you should be ready to run the script.

Usage

To use gypsylist, at first, browse the nomadlist.com website and apply the filters you need to do your research. Now, get the url path from the address bar of your browser (as shown below):

And use this to scrape with gypsylist:

./gypsylist.py --path "safe-places-for-remote-workers-to-live?sort=cost_for_nomad_in_usd&order=asc" --emoji

This is going to be the expected result:

#1
๐Ÿ™๏ธ  city: Lisbon
๐ŸŒŽ country: Portugal
โญ๏ธ overall: 4/5
๐Ÿ’ต cost: 4/5
๐Ÿ“ก internet: 5/5
๐Ÿ˜€ fun: 5/5
๐Ÿ‘ฎ safety: 4/5

...

#440
๐Ÿ™๏ธ  city: Zurich
๐ŸŒŽ country: Switzerland
โญ๏ธ overall: 3/5
๐Ÿ’ต cost: 1/5
๐Ÿ“ก internet: 5/5
๐Ÿ˜€ fun: 4/5
๐Ÿ‘ฎ safety: 4/5

#441
๐Ÿ™๏ธ  city: Leiden
๐ŸŒŽ country: Netherlands
โญ๏ธ overall: 3/5
๐Ÿ’ต cost: 1/5
๐Ÿ“ก internet: 5/5
๐Ÿ˜€ fun: 4/5
๐Ÿ‘ฎ safety: 4/5

#442
๐Ÿ™๏ธ  city: Honolulu, Hawaii
๐ŸŒŽ country: United States
โญ๏ธ overall: 4/5
๐Ÿ’ต cost: 1/5
๐Ÿ“ก internet: 5/5
๐Ÿ˜€ fun: 5/5
๐Ÿ‘ฎ safety: 4/5

#443
๐Ÿ™๏ธ  city: Lake Tahoe, CA
๐ŸŒŽ country: United States
โญ๏ธ overall: 3/5
๐Ÿ’ต cost: 1/5
๐Ÿ“ก internet: 5/5
๐Ÿ˜€ fun: 4/5
๐Ÿ‘ฎ safety: 4/5

(Always remember --emoji). Have fun!

Known Issues

This is not what you can call "a well written code" (sorry Gods of programming for this). For this reason there are several code smell or bugs that are not under review (due to the short time I dedicated to write the script).

  • Using --headless / -H parameter to set the browser in headless mode, you will retrieve just the first page contents from the website.
Owner
Alessio Greggi
Computer Scientist graduated at the University of Rome, Tor Vergata. Currently working as Linux Engineer. CTF Player during free time.
Alessio Greggi
Scrapy uses Request and Response objects for crawling web sites.

Requests and Responsesยถ Scrapy uses Request and Response objects for crawling web sites. Typically, Request objects are generated in the spiders and p

Md Rashidul Islam 1 Nov 03, 2021
A crawler of doubamovie

่ฑ†็“ฃ็”ตๅฝฑ A crawler of doubamovie ไธ€ไธชๅฐๅฐ็š„ๅ…ฅ้—จ็บงscrapyๆก†ๆžถ็š„ๅบ”็”จ๏ผŒ้€‰ๅ–่ฑ†็“ฃ็”ตๅฝฑๅฏนๆŽ’่กŒๆฆœๅ‰1000็š„็”ตๅฝฑๆ•ฐๆฎ่ฟ›่กŒ็ˆฌๅ–ใ€‚ spider.py start_requestsๆ–นๆณ•ไธบscrapy็š„ๆ–นๆณ•๏ผŒๆˆ‘ไปฌๅฏนๅฎƒ่ฟ›่กŒ้‡ๅ†™ใ€‚ def start_requests(self):

Cats without dried fish 1 Oct 05, 2021
A Python package that scrapes Google News article data while remaining undetected by Google.

A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https

Geminid Systems, Inc 6 Aug 10, 2022
Scrapy-soccer-games - Scraping information about soccer games from a few websites

scrapy-soccer-games Esse projeto tem por finalidade pegar informaรงรฃo de tabela d

Caio Alves 2 Jul 20, 2022
This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.

crawler_to_visual_gmane Analyzing an EMAIL Archive from gmane and vizualizing the data using the D3 JavaScript library. This is a set of tools that al

Saim Zafar 1 Dec 20, 2021
PyQuery-based scraping micro-framework.

demiurge PyQuery-based scraping micro-framework. Supports Python 2.x and 3.x. Documentation: http://demiurge.readthedocs.org Installing demiurge $ pip

Matias Bordese 109 Jul 20, 2022
ๅŽŸ็ฅž็ˆฌ่™ซ ๆŠ“ๅ–ๅŽŸ็ฅž็•Œ้ขๅœฃ้—็‰ฉไฟกๆฏ

ๅŽŸ็ฅžๅœฃ้—็‰ฉๅŠ่‡ชๅŠจ็ˆฌ่™ซ ่ฏดๆ˜Ž ็›ดๆŽฅๆŠ“ๅ–ๅŽŸ็ฅž็•Œ้ขไธญ็š„ๅœฃ้—็‰ฉๆ•ฐๆฎ ็›ฎๅ‰ๅช้€‚้…ไบ†่ƒŒๅŒ…้กต้ข็š„ๆŠ“ๅ– ๅ‡†็กฎ็އ๏ผš97.5%(ๆ™ฎ้€š้€š็”จๆŽฅๅฃ๏ผŒๅฏน 40 ไปถ้šๆœบๅœฃ้—็‰ฉ่ฏ†ๅˆซ๏ผŒ็ปŸ่ฎกๅฎŒๅ…จๆญฃ็กฎ็š„ๆ•ฐ้‡ไธบ 39) ๅ‡†็กฎ็އ๏ผš100%(4k ๅฑๅน•๏ผŒๆ™ฎ้€š้€š็”จๆŽฅๅฃ๏ผŒๅฏน 110 ไปถๅœฃ้—็‰ฉ่ฏ†ๅˆซ๏ผŒ็ปŸ่ฎกๅฎŒๅ…จๆญฃ็กฎ็š„ๆ•ฐ้‡ไธบ 110) ไธๆŽ’้™ค่ฟ˜ๆœ‰ๅฐ้”™่ฏฏ็š„

hwa 28 Oct 10, 2022
A leetcode scraper to compile all questions in leetcode free tier to text file. pdf also available.

A leetcode scraper to compile all questions in leetcode free tier to text file, pdf also available. if new questions get added, run again to get new questions.

3 Dec 07, 2021
Python web scrapper

Website scrapper Web scrapping project in Python. Created for learning purposes. Start Install python Update configuration with websites Launch script

Nogueira Vitor 1 Dec 19, 2021
A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items

combined-shop-scraper A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items. Features Define an

2 Dec 13, 2021
An IpVanish Proxies Scraper

EzProxies Tired of searching for good proxies for hours? Just get an IpVanish account and get thousands of good proxies in few seconds! Showcase Watch

11 Nov 13, 2022
Telegram Group Scrapper

this programe is make your work so much easy on telegrame. do you want to send messages on everyone to your group or others group. use this script it will do your work automatically with one click. a

HackArrOw 3 Dec 03, 2022
Scraping Thailand COVID-19 data from the DDC's tableau dashboard

Scraping COVID-19 data from DDC Dashboard Scraping Thailand COVID-19 data from the DDC's tableau dashboard. Data is updated at 07:30 and 08:00 daily.

Noppakorn Jiravaranun 5 Jan 04, 2022
:arrow_double_down: Dumb downloader that scrapes the web

You-Get NOTICE: Read this if you are looking for the conventional "Issues" tab. You-Get is a tiny command-line utility to download media contents (vid

Mort Yao 46.4k Jan 03, 2023
An application that on a given url, crowls a web page and gets all words, sorts and counts them.

Web-Scrapping-1 An application that on a given url, crowls a web page and gets all words, sorts and counts them. Installation Using the package manage

adriano atambo 1 Jan 16, 2022
A multithreaded tool for searching and downloading images from popular search engines. It is straightforward to set up and run!

๐Ÿ•ณ๏ธ CygnusX1 Code by Trong-Dat Ngo. Overviews ๐Ÿ•ณ๏ธ CygnusX1 is a multithreaded tool ๐Ÿ› ๏ธ , used to search and download images from popular search engine

DatNgo 32 Dec 31, 2022
A high-level distributed crawling framework.

Cola: high-level distributed crawling framework Overview Cola is a high-level distributed crawling framework, used to crawl pages and extract structur

Xuye (Chris) Qin 1.5k Dec 24, 2022
Webservice wrapper for hhursev/recipe-scrapers (python library to scrape recipes from websites)

recipe-scrapers-webservice This is a wrapper for hhursev/recipe-scrapers which provides the api as a webservice, to be consumed as a microservice by o

1 Jul 09, 2022
Meme-videos - Scrapes memes and turn them into a video compilations

Meme Videos Scrapes memes from reddit using praw and request and then converts t

Partho 12 Oct 28, 2022
The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.

The open-source web scrapers that feed the Los Angeles Times' California coronavirus tracker. Processed data ready for analysis is available at datade

Los Angeles Times Data and Graphics Department 51 Dec 14, 2022