a small library for extracting rich content from urls

Last update: Dec 27, 2022

Related tags

Overview

A small library for extracting rich content from urls.

what does it do?

micawber supplies a few methods for retrieving rich metadata about a variety of links, such as links to youtube videos. micawber also provides functions for parsing blocks of text and html and replacing links to videos with rich embedded content.

examples

here is a quick example:

import micawber

# load up rules for some default providers, such as youtube and flickr
providers = micawber.bootstrap_basic()

providers.request('http://www.youtube.com/watch?v=54XHDUOHuzU')

# returns the following dictionary:
{
    'author_name': 'pascalbrax',
    'author_url': u'http://www.youtube.com/user/pascalbrax'
    'height': 344,
    'html': u'<iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&feature=oembed" frameborder="0" allowfullscreen></iframe>',
    'provider_name': 'YouTube',
    'provider_url': 'http://www.youtube.com/',
    'title': 'Future Crew - Second Reality demo - HD',
    'type': u'video',
    'thumbnail_height': 360,
    'thumbnail_url': u'http://i2.ytimg.com/vi/54XHDUOHuzU/hqdefault.jpg',
    'thumbnail_width': 480,
    'url': 'http://www.youtube.com/watch?v=54XHDUOHuzU',
    'width': 459,
    'version': '1.0',
}

providers.parse_text('this is a test:\nhttp://www.youtube.com/watch?v=54XHDUOHuzU')

# returns the following string:
this is a test:
<iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&feature=oembed" frameborder="0" allowfullscreen></iframe>

providers.parse_html('<p>http://www.youtube.com/watch?v=54XHDUOHuzU</p>')

# returns the following html:
<p><iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&amp;feature=oembed" frameborder="0" allowfullscreen="allowfullscreen"></iframe></p>

a small library for extracting rich content from urls

Related tags

Overview

what does it do?

examples

Owner

Charles Leifer

This is a module that I had created along with my friend. It's a basic web scraping module

This tool crawls a list of websites and download all PDF and office documents

A simple flask application to scrape gogoanime website.

Unja is a fast & light tool for fetching known URLs from Wayback Machine

Visual scraping for Scrapy

Extract gene TSS site form gencode/ensembl/gencode database GTF file and export bed format file.

A package designed to scrape data from Yahoo Finance.

This script is intended to crawl license information of repositories through the GitHub API.

Simple tool to scrape and download cross country ski timings and results from live.skidor.com

A Very simple free proxy list scraper.

Binance Smart Chain Contract Scraper + Contract Evaluator

Web-Scrapper using Python and Flask

让中国用户使用git从github下载的速度提高1000倍!

Libextract: extract data from websites

京东茅台抢购最新优化版本，京东秒杀，添加误差时间调整，优化了茅台抢购进程队列

Automated data scraper for Thailand COVID-19 data

Scrape and display grades onto the console

🐞 Douban Movie / Douban Book Scarpy

👨🏼‍⚖️ reddit bot that turns comment chains into ace attorney scenes

Get-web-images - A python code that get images from any site