Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
PySocks lets you send traffic through SOCKS proxy servers.

PySocks lets you send traffic through SOCKS proxy servers. It is a modern fork of SocksiPy with bug fixes and extra features. Acts as a drop-i

1.1k Dec 07, 2022
Base on browser-time to get har from network, and use python to analyze the data .

base on browser-time to get har from network, and use python to analyze the data

1 Dec 20, 2021
Ctech Didik Auto Script VPN 👨🏻‍💻Youtube: Ctech Didik

CTech Didik Auto Script VPN SUPPORT OPERATING SYSTEM Debian GNU/Linux 11 (Bullseye) Debian GNU/Linux 10 (Buster) Debian GNU/Linux 9 (Stretch) Ubuntu S

Ctech Didik 27 Dec 20, 2022
IPE is a simple tool for analyzing IP addresses. With IPE you can find out the server region, city, country, longitude and latitude and much more in seconds.

IPE is a simple tool for analyzing IP addresses. With IPE you can find out the server region, city, country, longitude and latitude and much more in seconds.

Paul 0 Jun 11, 2022
sshuttle: where transparent proxy meets VPN meets ssh

Transparent proxy server that works as a poor man's VPN. Forwards over ssh. Doesn't require admin. Works with Linux and MacOS. Supports DNS tunneling.

9.4k Jan 09, 2023
Python implementation of the Session open group server

API Documentation CLI Reference Want to build from source? See BUILDING.md. Want to deploy using Docker? See DOCKER.md. Installation Instructions Vide

Oxen 36 Jan 02, 2023
With the use of this tool, you can change your MAC address

Akshat0404/MAC_CHANGER This tool has to be used on linux kernel. Now o

1 Jan 25, 2022
EchoDNS - Analyze your DNS traffic super easy, shows all requested DNS traffic

EchoDNS - Analyze your DNS traffic super easy, shows all requested DNS traffic

Oli Zimmermann 1 Jan 11, 2022
ip2domain - get ip to domain, Know the domian corresponding to the local network connection IP

What is Sometimes, we need to know what connections our local machine has, and what are their IP, domain name, program and parameters? get ip to domai

51pwn 4 Sep 30, 2022
A Python Packages to make own chat room

Chathon A Python packages for make own chat room Install PyPI pip install chathon

1 Dec 10, 2021
FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware.

FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware. FIRM-AFL addresses two fundamental problems in IoT fuzzing

356 Dec 23, 2022
pfSense integration with Home Assistant

hass-pfsense Join pfSense with home-assistant! hass-pfsense uses the built-in xmlrpc service of pfSense for all interactions. No special plugins or so

Travis Glenn Hansen 105 Dec 24, 2022
Proxlist - Retrieve proxy servers.

Finding and storing a list of proxies can be taxing - especially ones that are free and may not work only minutes from now. proxlist will validate the proxy and return a rotating random proxy to you

Justin Hammond 2 Mar 17, 2022
SonicWall SSL-VPN Exploit

VisualDoor SonicWall SSL-VPN Exploit, as used by Phineas Fisher to hack Cayman Trust Bank and Hacking Team.

169 Nov 15, 2022
Vent domain information retrieval tool, which is capable of retrieving customer information

Vent domain information retrieval tool, which is capable of retrieving customer information. This tool has been created for the purpose of complete education, Iam not responsible for any illegal acti

Md. Ridwanul Islam Muntakim 25 Dec 09, 2022
The Delegate Network: An Interactive Voice Response Delegative Democracy Implementation of Liquid Democracy

The Delegate Network Overview The delegate network is a completely transparent, easy-to-use and understand version of what is sometimes called liquid

James Bowery 2 Feb 25, 2022
Python port of proxy-www (https://github.com/justjavac/proxy-www)

proxy-www.py Python port of proxy-www (https://github.com/justjavac/proxy-www). Implemented additional functionalities! How to install pip install pro

Minjun Kim (Lapis0875) 20 Dec 08, 2021
Automated network configuration backups using Github actions and git-scraping

Network Config Scraper This repository demonstrates the use of Github Actions and git-scraping to build an automated backup solution for network confi

WWT 19 Dec 14, 2022
Load balancing DICOM router

dicom-loadbalancer Load balancing DICOM router (WORK IN PROGRESS) The DICOM loadbalancer provides functionality for acting as any number of DICOM SCPs

Søren Boll Overgaard 1 Jan 15, 2022
A Simple Web Server made by Python3.

A Simple Web Server made by Python3.

GGN_2015 2 Nov 27, 2021