Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
Library containing the core modules for the kingdom-python-server.

🏰 Kingdom Core Library containing the core modules for the kingdom-python-server. Installation Use the package manager pip to install kingdom-core. p

T10 4 Dec 27, 2021
Utility for converting IP Fabric webhooks into a Teams format.

IP Fabric Webhook Integration for Microsoft Teams Setup IP Fabric Setup Go to Settings Webhooks Add webhook Provide a name URL will be: 'http://Y

Community Fabric 1 Jan 26, 2022
A simple GitHub Action that physically puts your senses on alert when your build/release fails

GH Release Paniker A simple GitHub Action that physically puts your senses on alert when your build/release fails Usage Requirements: Raspberry Pi, LE

Hemanth Krishna 5 Dec 20, 2021
订阅转换,添加免流host

普通订阅转免流订阅 原理 将原来的订阅解析后添加免流host 使用方法 服务器域名/&&订阅链接&&免流host&&转换后服务器前缀 我这里已经在服务器上搭建好了

163 Apr 01, 2022
Converts Cisco formatted MAC Addresses to PC formatted MAC Addresses

Cisco-MAC-to-PC-MAC Converts a file with a list of Cisco formatted MAC Addresses to PC formatted MAC Addresses... Ex: abcd.efgh.ijkl to AB:CD:EF:GH:I

Stew Alexander 0 Jan 04, 2022
A simple framwork to streamline the Domain Adaptation training process.

FastDA Introduction This is a simple framework for domain adaptation training. You can use it to build your own training process. It heavily relies on

Vincent Zhang 7 Nov 22, 2022
A working cloudflare uam bypass !!

Dark Utilities - Cloudflare Uam Bypass Our Website https://over-spam.space/ ! Additional Informations The proxies type are http,https ... You need fas

Inplex-sys 26 Dec 14, 2022
Dark Utilities - Cloudflare Uam Bypass

Dark Utilities - Cloudflare Uam Bypass

Inplex-sys 26 Dec 14, 2022
Event-driven networking engine written in Python.

Twisted For information on changes in this release, see the NEWS file. What is this? Twisted is an event-based framework for internet applications, su

Twisted Matrix Labs 4.9k Jan 08, 2023
A vpn that sits in your browser, accessible via a website

VPNInYourBrowser A vpn that sits in your browser, accessible via a website Example setup: https://VPNInBrowser.jaffa42.repl.co Setup Put the code onto

1 Jan 20, 2022
A Python Packages to make own chat room

Chathon A Python packages for make own chat room Install PyPI pip install chathon

1 Dec 10, 2021
Tool to transfer credential files from Firefox to your local machine to decrypt offline.

Firefox-Dumper Firefox Dumper identifies the current user's Firefox profile directory and exfiltrates the credential files to the attacker's FTP serve

Joe Helle 22 Sep 10, 2022
This repository contain sample code of gRPC Communication between Python and GoLang

This repository contain sample code of gRPC Communication between Python and GoLang, the Server is running on GoLang while Python is running the client

Abdullahi Muhammad 2 Nov 29, 2021
Minimal, self-hosted, 0-config alternative to ngrok. Caddy+OpenSSH+50 lines of Python.

If you have a webserver running on one computer (say your development laptop), and you want to expose it securely (ie HTTPS) via a public URL, SirTunnel allows you to easily do that.

Anders Pitman 423 Jan 02, 2023
The World Most Fastest Proxy Checker In Python, Maybe?!

The World's Most Fastest Proxy Checker In Python, Maybe?! Features Based on Python 3.7+ Save Valid Porixes into the custom file Multi-Thread Fully Asy

Cyber 4 Feb 10, 2022
API Server for VoIP analysis (CDR + Audio CODECs)

Swagger generated server Overview This server was generated by the swagger-codegen project. By using the OpenAPI-Spec from a remote server, you can ea

Noor Muhammad Malik 1 Jan 11, 2022
Official ProtonVPN Linux app

ProtonVPN Linux App Copyright (c) 2021 Proton Technologies AG This repository holds the ProtonVPN Linux App. For licensing information see COPYING. Fo

ProtonVPN 288 Jan 01, 2023
It's a little project for change MAC address, for ethical hacking purposes

MACChangerPy It's a small project for MAC address change, for ethical hacking purposes, don't use it for bad purposes, any infringement will be your r

Erick Adriano Nunes da Silva 1 Mar 11, 2022
Linux SBC featuring two wifi radios, masquerading as a USB charger.

The WiFiWart is an open source WiFi penetration device masquerading as a regular wall charger. It features a 1.2Ghz Cortex A7 MPU with two WiFi chips onboard.

Walker 151 Dec 26, 2022
Simple python script for automated network scans with random name generator(useful for CTF boxes).

📄 Automated NMAP script Description Simple python script for automated network scans with random name generator(useful for CTF boxes). Requirements 1

Dhmos Funk 2 Oct 29, 2021