Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
Equibles Stocks API for Python

Equibles Stocks API for Python Requirements. Python 2.7 and 3.4+ Installation & Usage pip install If the python package is hosted on Github, you can i

Equibles 3 Apr 15, 2022
A tiny end-to-end latency testing tool implemented by UDP protocol in Python 📈 .

udp-latency A tiny end-to-end latency testing tool implemented by UDP protocol in Python 📈 . Features Compare with other existing latency testing too

Chuanyu Xue 5 Dec 02, 2022
Simple HTTP Server for CircuitPython

Introduction Simple HTTP Server for CircuitPython Dependencies This driver depen

Adafruit Industries 22 Jan 06, 2023
Distribute a portion of your yield to other addresses 💙

YSHARE Distribute a portion of your yield to other addresses. How does it work Desposit your yToken or tokens into this contract Set the benificiaries

11 Nov 24, 2021
It's an extra broadcast driver for masonite. It adds support for socketio.

It's an extra broadcast driver for masonite. It adds support for socketio.

Yubaraj Shrestha 6 Feb 23, 2022
A Python library to ease the integration with the Beem Africa (SMS, AIRTIME, OTP, 2WAY-SMS, BPAY, USSD)

python-client A Python library to easy the integration with the Beem Africa SMS Gateway Features to be Implemented Airtime OTP SMS Two way SMS USSD Bp

Beem Africa 24 Oct 29, 2022
Jogo da forca simples com conexão entre cliente e servidor utilizando TCP.

JogoDaForcaTCP Um jogo da forca simples com conexão entre cliente e servidor utilizando o protocólo TCP. Como jogar: Habilite a porta 20000, inicie o

Kelvin Santos 1 Dec 01, 2021
Wallc - Calculate the layout on the wall to hang up pictures

wallc Calculate the layout on the wall to hang up pictures. Installation pip install git+https://github.com/trbznk/wallc.git Getting Started Currently

Alex Trbznk 68 Sep 09, 2022
A simple port scanner for Web/ip scanning Port 0/500 editable inside the .py file

Simple-Port-Scanner a simple port scanner for Web/ip scanning Port 0/500 editable inside the .py file Open Cmd/Terminal Cmd Downloads Run Command: pip

YABOI 1 Nov 22, 2021
Throttle rTorrent on Plex stream Start/Stop

Dependencies Python 3.6+ Tautulli Script Setup Edit rtorrent_throttle.py and set rTorrent username, password and RPC2 url. Tautulli Setup Commum Scrip

4 Apr 25, 2022
A fire and forget command-line tool to allow for easy transitions of VPN connections between a pool of AWS machines.

VPN Swapper A fire and forget command-line tool to allow for easy transitions of VPN connections between a pool of AWS machines. Dependencies poetry -

Workday 5 Jul 07, 2022
This repository contain sample code of gRPC Communication between Python and GoLang

This repository contain sample code of gRPC Communication between Python and GoLang, the Server is running on GoLang while Python is running the client

Abdullahi Muhammad 2 Nov 29, 2021
Learn how modern web applications and microservice architecture work as you complete a creative assignment

Micro-service Создание микросервиса Цель работы Познакомиться с механизмом работы современных веб-приложений и микросервисной архитектуры в процессе в

Григорий Верховский 1 Dec 19, 2021
Rufus is a Dos tool written in Python3.

🦎 Rufus 🦎 Rufus is a simple but powerful Denial of Service tool written in Python3. The type of the Dos attack is TCP Flood, the power of the attack

Billy 88 Dec 20, 2022
Tsunami-Fi is simple multi-tool bash application for Wi-Fi attacks

🪴 Tsunami-Fi 🪴 Русская версия README 🌿 Description 🌿 Tsunami-Fi is simple multi-tool bash application for Wi-Fi WPS PixieDust and NullPIN attack,

【Kiko】 35 Dec 09, 2022
Dark Utilities - Cloudflare Uam Bypass

Dark Utilities - Cloudflare Uam Bypass

Inplex-sys 26 Dec 14, 2022
Secure connection between tenhou Window client and server.

tenhou-secure The tenhou Windows client looks awesome. However, the traffic between the client and tenhou server is NOT encrypted, including your uniq

1 Nov 11, 2021
👨🏼‍💻 ‎‎‎‏‏ A customizable man-in-the-middle TCP proxy with out-of-the-box support for HTTP & HTTPS.

👨‍💻 mitm A customizable man-in-the-middle TCP proxy with out-of-the-box support for HTTP & HTTPS. Installing pip install mitm Note that OpenSSL 1.1

Felipe 92 Jan 05, 2023
A Python tool used to automate the execution of the following tools : Nmap , Nikto and Dirsearch but also to automate the report generation during a Web Penetration Testing

📡 WebMap A Python tool used to automate the execution of the following tools : Nmap , Nikto and Dirsearch but also to automate the report generation

Iliass Alami Qammouri 274 Jan 01, 2023
Cobalt Strike script for ScareCrow payloads

🎃 🌽 ScareCrow Cobalt Strike intergration CNA A Cobalt Strike script for ScareCrow payload generation. Works only with the binary and DLL Loader. 💣

UserX 401 Dec 11, 2022