Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
Pritunl is a distributed enterprise vpn server built using the OpenVPN protocol.

Pritunl is a distributed enterprise vpn server built using the OpenVPN protocol.

Pritunl 3.8k Jan 03, 2023
A library of functions that can be used to manage the download of claims from the LBRY network.

lbrytools A library of functions that can be used to manage the download of claims from the LBRY network. It includes methods to download claims by UR

13 Dec 03, 2022
Simple app that redirect fixed URL to changing URL, configurable via POST requests

This is a basic URL redirection service. It stores associations between apps and redirection URLs, for apps with changing URLs. You can then use GET r

Maxime Weyl 2 Jan 28, 2022
Desktop application for checking sites connection in a background mode

Site connectivity checker Desktop application for checking site connection in a background mode by sending ICMP messages. Problem and solution Usually

Karina Singatullina 26 Dec 19, 2022
This will generate a very basic DHCP config with use of PHPIPAM systems.

phpipam-dhcp-config-generator This will generate a very basic DHCP config with use of PHPIPAM systems. Requirements PHPIPAM Custom Fields domain_name

1 Oct 24, 2021
CORS Bypass Proxy Cloud Function

CORS Bypass Proxy Cloud Function

Elayamani K 1 Oct 23, 2021
euserv auto-renew script - A Python script which can help you renew your free EUserv IPv6 VPS.

eu_ex eu_ex means EUserv_extend. A Python script which can help you renew your free EUserv IPv6 VPS. This Script can check the VPS amount in your acco

A beam of light 92 Jan 25, 2022
stellar-add-guest is a small tool to generate a new guest for Stellar Wireless (Enterprise mode) in OmniVista 2500 hosted on OmniSwitch with AOS Release 8

stellar-add-guest is a small tool to generate a new guest for Stellar Wireless (Enterprise mode) in OmniVista 2500 hosted on OmniSwitch with AOS Release 8.

BennyE 3 Jan 24, 2022
Python module to interface with Tuya WiFi smart devices

TinyTuya Python module to interface with Tuya WiFi smart devices Description This python module controls and monitors Tuya compatible WiFi Smart Devic

Jason Cox 365 Dec 26, 2022
NSX-T infrastructure as code - SDDC deployment

Deploy NSX-T Infrastructure - Simple Topology by Nicolas MICHEL @vpackets / LinkedIn Introduction The purpose of this entire repository is to automate

21 Nov 28, 2022
Tripwire monitors ports and icmp to send the admin a message if somebody is scanning a machine that shouldn't be touched

Tripwire monitors ports and icmp to send the admin a message if somebody is scanning a machine that shouldn't be touched

3 Apr 05, 2022
Proxlist - Retrieve proxy servers.

Finding and storing a list of proxies can be taxing - especially ones that are free and may not work only minutes from now. proxlist will validate the proxy and return a rotating random proxy to you

Justin Hammond 2 Mar 17, 2022
track IP Address

ipX Table of Contents ipX Welcome Features Uses Author 📝 License Welcome find the location of an IP address. Specifically, you can get the following

Ali Shahid 15 Sep 26, 2022
PcapConverter - A project for generating 15min frames out of a .pcap file containing network traffic

CMB Assignment 02 code + notebooks This is a project for containing code for the

Yannik S 2 Jan 24, 2022
Scan any IP address except IPv6 using Python.

Port_Scanner-python To use this tool called "Console Port Scanner", you need to enter an IP address (NOT IPv6). It might take a long time to scan port

1 Dec 24, 2021
A fully automated, accurate, and extensive scanner for finding log4j RCE CVE-2021-44228

log4j-scan A fully automated, accurate, and extensive scanner for finding vulnerable log4j hosts Features Support for lists of URLs. Fuzzing for more

FullHunt 3.2k Jan 02, 2023
A Python3 discord trojan, utilizing discord webhooks for sending information.

Vape-Lite-RAT A Python3 discord trojan, utilizing discord webhooks for sending information. What you do with this code / project / idea is non of my b

NightTab 12 Oct 15, 2022
Juniper SNMP Migrations For Python

Juniper SNMP Migrations This example will show how to use the PyEZ plugin for Nornir to build a NETCONF connection to a remote device validate that SN

Calvin Remsburg 1 Jan 07, 2022
Raspberry Pi Based Serial Console Server, with PushBullet Notification of IP changes, Automatic VPN termination, custom menu, Power Outlet Control, and a lot more

ConsolePi Acts as a serial Console Server, allowing you to remotely connect to ConsolePi via Telnet/SSH/bluetooth to gain Console Access to devices co

120 Jan 05, 2023
Python code that get the name and ip address of a computer/laptop

IP Address This is a python code that provides the name and the internet protocol address of the computer. You need to install socket pip install sock

CODE 2 Feb 21, 2022