Intake is a lightweight package for finding, investigating, loading and disseminating data.

Last update: Jan 01, 2023

Overview

Intake: A general interface for loading data

Intake is a lightweight set of tools for loading and sharing data in data science projects. Intake helps you:

Load data from a variety of formats (see the current list of known plugins) into containers you already know, like Pandas dataframes, Python lists, NumPy arrays, and more.
Convert boilerplate data loading code into reusable Intake plugins
Describe data sets in catalog files for easy reuse and sharing between projects and with others.
Share catalog information (and data sets) over the network with the Intake server

Documentation is available at Read the Docs.

Status of intake and related packages is available at Status Dashboard

Weekly news about this repo and other related projects can be found on the wiki

Install

Recommended method using conda:

conda install -c conda-forge intake

You can also install using pip, in which case you have a choice as to how many of the optional dependencies you install, with the simplest having least requirements

pip install intake

and additional sections [server], [plot] and [dataframe], or to include everything:

pip install intake[complete]

Note that you may well need specific drivers and other plugins, which usually have additional dependencies of their own.

Development

Create development Python environment with the required dependencies, ideally with conda. The requirements can be found in the yml files in the scripts/ci/ directory of this repo.
- e.g. conda env create -f scripts/ci/environment-py38.yml and then conda activate test_env
Install intake using pip install -e .[complete]
Use pytest to run tests.
Create a fork on github to be able to submit PRs.
We respect, but do not enforce, pep8 standards; all new code should be covered by tests.

Intake is a lightweight package for finding, investigating, loading and disseminating data.

Related tags

Overview

Intake: A general interface for loading data

Install

Development

Owner

Intake

Python ELT Studio, an application for building ELT (and ETL) data flows.

PyChemia, Python Framework for Materials Discovery and Design

Useful tool for inserting DataFrames into the Excel sheet.

Nobel Data Analysis

📊 Python Flask game that consolidates data from Nasdaq, allowing the user to practice buying and selling stocks.

Feature engineering and machine learning: together at last

PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)

Bamboolib - a GUI for pandas DataFrames

Python tools for querying and manipulating BIDS datasets.

Retentioneering: product analytics, data-driven customer journey map optimization, marketing analytics, web analytics, transaction analytics, graph visualization, and behavioral segmentation with customer segments in Python.

.npy, .npz, .mtx converter.

Working Time Statistics of working hours and working conditions by industry and company

A Python and R autograding solution

GWpy is a collaboration-driven Python package providing tools for studying data from ground-based gravitational-wave detectors

A lightweight, hub-and-spoke dashboard for multi-account Data Science projects

Deep universal probabilistic programming with Python and PyTorch

Show you how to integrate Zeppelin with Airflow

Used for data processing in machine learning, and help us to construct ML model more easily from scratch

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

The Master's in Data Science Program run by the Faculty of Mathematics and Information Science