Python MapReduce library written in Cython.

Related tags

Miscellaneoushadoopy
Overview
Brandyn White 
  
Andrew Miller 
  
   

Source  
   https://github.com/bwhite/hadoopy/
Issues  
   https://github.com/bwhite/hadoopy/issues
Docs    
   http://bwhite.github.com/hadoopy/

IRC: #hadoopy @ freenode.net

Requirements
python development headers (python-dev), build tools (build-essential)

Optional
cython (>=.13) (without this it falls back to the pregenerated .c files)

Features
- oozie support
- Automated job parallelization 'auto-oozie' available in the hadoopy_flow project (maintained out of branch)
- typedbytes support (very fast)
- Local execution of unmodified MapReduce job with launch_local
- Read/write sequence files of TypedBytes directly to HDFS from python (readtb, writetb)
- Works on OS X
- Allows printing to stdout and stderr in Hadoop tasks without causing problems (uses the 'pipe hopping' technique, both are available in the task's stderr)
- critical path is in Cython
- works on clusters without any extra installation, Python, or any Python libraries (uses Pyinstaller that is included in this source tree)
- Simple HDFS access (readtb and ls) inside Python, even inside running jobs
- Unit test interface
- Reporting using status and counters (and print statements! no need to be scared of them in Hadoopy)
- Supports design patterns in the Lin/Dyer book (
   http://www.umiacs.umd.edu/~jimmylin/book.html)

Limitations
- Hadoop Local currently unsupported due to a bug in Hadoop's handling of the distributed cache in this mode.  Use psuedo-distributed instead for now.  (
   https://github.com/bwhite/hadoopy/issues/40)

Used in
- A Case for Query by Image and Text Content: Searching Computer Help using Screenshots and Keywords (to appear in WWW'11)
- Web-Scale Computer Vision using MapReduce for Multimedia Data Mining (at KDD'10)
- Vitrieve: Visual Search engine
- Picarus: Hadoop computer vision toolbox

Ubuntu Install (others are similar)
sudo apt-get install python-dev build-essential
sudo python setup.py install
  
 
Owner
Brandyn White
Brandyn White
PSP (Python Starter Package) is meant for those who want to start coding in python but are new to the coding scene.

Python Starter Package PSP (Python Starter Package) is meant for those who want to start coding in python, but are new to the coding scene. We include

Giter/ 1 Nov 20, 2021
The bidirectional mapping library for Python.

bidict The bidirectional mapping library for Python. Status bidict: has been used for many years by several teams at Google, Venmo, CERN, Bank of Amer

Joshua Bronson 1.2k Dec 31, 2022
Active Transport Analytics Model: A new strategic transport modelling and data visualization framework

{ATAM} Active Transport Analytics Model Active Transport Analytics Model (“ATAM”

ATAM Analytics 2 Dec 21, 2022
A Google sheet which keeps track of the locations that want to visit and a price cutoff

FlightDeals Here's how the program works. First, I have a Google sheet which keeps track of the locations that I want to visit and a price cutoff. It

Lynne Munini 5 Nov 21, 2022
A toolkit for developing and deploying serverless Python code in AWS Lambda.

Python-lambda is a toolset for developing and deploying serverless Python code in AWS Lambda. A call for contributors With python-lambda and pytube bo

Nick Ficano 1.4k Jan 03, 2023
Simple Crud Python vs MySQL

Simple Crud Python vs MySQL The idea came when I was studying MySQ... A desire to create a python program that can give access to a "localhost" databa

Lucas 1 Jan 21, 2022
API for SpeechAnalytics integration with FreePBX/Asterisk

freepbx_speechanalytics_api API for SpeechAnalytics integration with FreePBX/Asterisk Скопировать файл settings.py.sample в settings.py и отредактиров

Iqtek, LLC 3 Nov 03, 2022
Python MQTT v5.0 async client

gmqtt: Python async MQTT client implementation. Installation The latest stable version is available in the Python Package Index (PyPi) and can be inst

Gurtam 306 Jan 03, 2023
Blender addon - Breakdown in object mode

Breakdowner Breakdown in object mode Download latest Demo Youtube Description Same breakdown shortcut as in armature mode in object mode Currently onl

Samuel Bernou 4 Mar 30, 2022
Python script for diving image data to train test and val

dataset-division-to-train-val-test-python python script for dividing image data to train test and val If you have an image dataset in the following st

Muhammad Zeeshan 1 Nov 14, 2022
PyDy, short for Python Dynamics, is a tool kit written in the Python

PyDy, short for Python Dynamics, is a tool kit written in the Python programming language that utilizes an array of scientific programs to enable the study of multibody dynamics. The goal is to have

PyDy 307 Jan 01, 2023
3x - This Is 3x Friendlist Cloner Tools

3X FRIENDLIST CLONER TOOLS COMMAND $ apt update $ apt upgrade $ apt install pyth

MAHADI HASAN AFRIDI 2 Jan 17, 2022
《赛马娘》(ウマ娘: Pretty Derby)辅助 🐎🖥 基于 auto-derby 可视化操作/设置 启动器 一键包

ok-derby 《赛马娘》(ウマ娘: Pretty Derby)辅助 🐎 🖥 基于 auto-derby 可视化操作/设置 启动器 一键包 便捷,好用的 auto_derby 管理器! 功能 支持客户端 DMM (前台) 实验性 安卓 ADB 连接(后台)开发基于 1080x1920 分辨率

秋葉あんず 90 Jan 01, 2023
This is the DBMS Project done in 5th sem of B.E CS.

Student-Result-Management-System This is the DBMS Project done in 5th sem of B.E CS. You need to install SQlite DB Browser in your pc or laptop to ope

Vivek kulkarni 1 Jan 14, 2022
An OrpheusDL Tidal module

OrpheusDL - Tidal A Tidal module for the OrpheusDL modular archival music program Report Bug · Request Feature Table of content About OrpheusDL - Tida

Daniel 54 Dec 29, 2022
The purpose is to have a fairly simple python assignment that introduces the basic features and tools of python

This repository contains the code for the python introduction lab. The purpose is to have a fairly simple python assignment that introduces the basic

1 Jan 24, 2022
Svg-turtle - Use the Python turtle to write SVG files

SaVaGe Turtle Use the Python turtle to write SVG files If you're using the Pytho

Don Kirkby 7 Dec 21, 2022
Bookmarkarchiver - Python script that archives all of your bookmarks on the Internet Archive

bookmarkarchiver Python script that archives all of your bookmarks on the Internet Archive. Supports all major browsers. bookmarkarchiver uses the off

Anthony Chen 3 Oct 09, 2022
Analisador de strings feito em Python // String parser made in Python

Este é um analisador feito em Python, neste programa, estou estudando funções e a sua junção com "if's" e dados colocados pelo usuário. Neste código,

Dev Nasser 1 Nov 03, 2021
Translation patch for Hololive ERROR

Translation patch for Hololive ERROR How do I install the patch? Grab the Translation.zip file for the latest version from the releases page, and unzi

18 Jul 20, 2022