Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
Truetool - A TrueCharts automatic and bulk update utility

truetool A easy tool for frequently used TrueNAS SCALE CLI utilities. Previously

TrueCharts 125 Jan 04, 2023
Python Scrcpy Client - allows you to view and control android device in realtime

Python Scrcpy Client This package allows you to view and control android device in realtime. Note: This gif is compressed and experience lower quality

LengYue 126 Jan 02, 2023
This is the code repository for the USENIX Security 2021 paper, "Weaponizing Middleboxes for TCP Reflected Amplification".

weaponizing-censors Censors pose a threat to the entire Internet. In this work, we show that censoring middleboxes and firewalls can be weaponized by

UMD Breakerspace 119 Dec 31, 2022
Openconnect VPN RPi Gateway

Openconnect-VPN-RPi-Gateway See the blog (Chinese) for how to build an Openconne

Zhongze Tang 2 Jan 30, 2022
Typhon is a macOS specific payload aimed at targetting Jamf managed devices.

Typhon is a macOS specific payload aimed at targetting Jamf managed devices. This payload can be used to manipulate macOS devices into communicating with a Mythic instance, which acts as a Jamf serve

Mythic Agents 29 Dec 23, 2022
AdaFruit Funhouse publishing Temperature, Humidity and Pressure to MQTT / Apache Pulsar

pulsar-adafruit-funhouse AdaFruit Funhouse publishing Temperature, Humidity and Pressure to MQTT / Apache Pulsar Device Get your own from adafruit Ada

Timothy Spann 1 Dec 30, 2021
A p2p chat app for zephyr

A p2p chat app for zephyr

L3gacy B3ta 4 Jun 02, 2021
Eclipse zenoh Python API

Eclipse zenoh Python API Eclipse zenoh is an extremely efficient and fault-tolerant Named Data Networking (NDN) protocol that is able to scale down to

26 Jan 05, 2023
Network Dynaimcs Simulation

A Final Year Project in CUHK, Autumn 2021 Network Dynaimcs Simulation Files param.h edit all the variables & settings here simulate.c the main program

Likchun 0 Mar 28, 2022
Remote vanilla PDB (over TCP sockets) done right: no extras, proper handling around connection failures and CI.

Overview docs tests package Remote vanilla PDB (over TCP sockets) done right: no extras, proper handling around connection failures and CI. Based on p

Ionel Cristian Mărieș 227 Dec 27, 2022
A Simple but Powerful cross-platform port scanning & and network automation tool.

DEDMAP is a Simple but Powerful, Clever and Flexible Cross-Platform Port Scanning tool made with ease to use and convenience in mind. Both TCP

Anurag Mondal 30 Dec 16, 2022
A project that forwards data it receives in a URL POST Request to a Discord Webhook link

Mailman Mailman is a project that basically just forwards data it receives in a URL POST Request to a Discord Webhook link and act as a sort of messag

Prakhar Trivedi 2 Mar 14, 2022
πŸ‘¨πŸΌβ€πŸ’» β€Žβ€Žβ€Žβ€β€ A customizable man-in-the-middle TCP proxy with out-of-the-box support for HTTP & HTTPS.

πŸ‘¨β€πŸ’» mitm A customizable man-in-the-middle TCP proxy with out-of-the-box support for HTTP & HTTPS. Installing pip install mitm Note that OpenSSL 1.1

Felipe 92 Jan 05, 2023
GitHub action for sspanel automatically checks in to get free traffic quota

SSPanel_Checkin This is a dish chicken script for automatic check-in of sspanel for GitHub action, It is only applicable when there is no verification

FeedCatWithFish 7 Apr 28, 2022
A simple, 2-person chat program that runs on a single computer. No Internet, just you

localChat A simple, 2-person chat program that runs on a single computer. No Internet, just you. Simple and Local This was created with ease of use in

Owls 2 Aug 19, 2022
IPE is a simple tool for analyzing IP addresses. With IPE you can find out the server region, city, country, longitude and latitude and much more in seconds.

IPE is a simple tool for analyzing IP addresses. With IPE you can find out the server region, city, country, longitude and latitude and much more in seconds.

Paul 0 Jun 11, 2022
Connection package to a raspberry or any other machine using ssh, it simplifies the deployment scripts and monitoring.

Connection package to a raspberry or any other machine using ssh, it simplifies the deployment scripts and monitoring.

Dashstrom 7 Mar 29, 2022
An ftp syncing python package that I use to sync pokemon saves between my hacked 3ds running ftpd and my server

Sync file pairs over ftp and apply patches to them. Useful for using ftpd to transfer ROM save files to and from your DS if you also play on an emulator. Setup a cron job to check for your DS's ftp s

17 Jan 04, 2023
NetMiaou is an crossplatform hacking tool that can do reverse shells, send files, create an http server or send and receive tcp packet

NetMiaou is an crossplatform hacking tool that can do reverse shells, send files, create an http server or send and receive tcp packet

TRIKKSS 5 Oct 05, 2022
This Tool can help enginners and biggener in network, the tool help you to find of any ip with subnet mask that can calucate them and show you ( Availble IP's , Subnet Mask, Network-ID, Broadcast-ID )

This Tool can help enginners and biggener in network, the tool help you to find of any ip with subnet mask that can calucate them and show you ( Availble IP's , Subnet Mask, Network-ID, Broadcast-ID

12 Dec 13, 2022