Flenser

Have you ever been handed a dataset you've never seen before?

Flenser is a simple, minimal, automated exploratory data analysis tool. It runs a set of simple tests against each column within a dataset, and outputs a HTML file noting which tests trigger per column, alongside relevant outputs.

Flenser is intended to be run at the earliest stages of data exploration, when you have no familiarity with the dataset. It will do its best to tell you what is actually going on in the dataset, regardless of what is supposed to be going on in the dataset.

Flenser is designed to be helpful, not 'helpful': it will not attempt to modify or make assumptions about your dataset. Instead it will apply each simple test, to every column, and show you outputs that will allow your human brain to make decisions about what is actually going on.

Additional tests can be added by modifying the Test dataclass.

How to run

python3 flenser.py 'filename.csv'

Flenser will print its default list of nans. You may specify one or more additional nan values to use, as follows:

python3 flenser.py 'filename.csv' 'nan1' 'nan2' 'nan3' ...

With thanks to

Recurse
Kelly F
Rebecca H
Azhad S
Shivam S
Christina M
Adam K
Edith V
Justin R

Flenser is a simple, minimal, automated exploratory data analysis tool.

Related tags

Overview

Flenser

How to run

With thanks to

Owner

John McCambridge

Pipeline and Dataset helpers for complex algorithm evaluation.

simple way to build the declarative and destributed data pipelines with python

Data-sets from the survey and analysis

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Jupyter notebooks for the book "The Elements of Statistical Learning".

ToeholdTools is a Python package and desktop app designed to facilitate analyzing and designing toehold switches, created as part of the 2021 iGEM competition.

BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems

Mortgage-loan-prediction - Show how to perform advanced Analytics and Machine Learning in Python using a full complement of PyData utilities

Titanic data analysis for python

📊 Python Flask game that consolidates data from Nasdaq, allowing the user to practice buying and selling stocks.

Integrate bus data from a variety of sources (batch processing and real time processing).

[CVPR2022] This repository contains code for the paper "Nested Collaborative Learning for Long-Tailed Visual Recognition", published at CVPR 2022

Picka: A Python module for data generation and randomization.

InDels analysis of CRISPR lines by NGS amplicon sequencing technology for a multicopy gene family.

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

Statistical Analysis 📈 focused on statistical analysis and exploration used on various data sets for personal and professional projects.

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Package for decomposing EMG signals into motor unit firings, as used in Formento et al 2021.

nrgpy is the Python package for processing NRG Data Files

A variant of LinUCB bandit algorithm with local differential privacy guarantee