Exploratory data analysis

Last update: Nov 07, 2021

Related tags

Data Analysis EDA

Overview

Exploratory data analysis

An Exploratory data analysis APP

TAPIWA CHAMBOKO

🚀 About Me

I'm a full stack developer experienced in deploying artificial intelligence powered apps

Authors

@Tapiwa chamboko

Acknowledgements

dataprofessor
Pandas Profiling in Data Science

Demo

Live demo

Click here for Live demo

Installation

Install required packages

  pip install streamlit
  pip install pycaret
  pip insatll scikit-learn==0.23.2
  pip install numpy
  pip install seaborn 
  pip install pandas
  pip install matplotlib
  pip install plotly-express
  pip install streamlit-lottie

Datasets

Drop your Datasets in the app to get resuilts
you can use he exaple data provided in the app

Code

import streamlit as st
import pandas as pd  
import plotly.express as px  
import base64  
from io import StringIO, BytesIO  
import numpy as np
import pandas as pd
from sklearn import datasets
import matplotlib.pyplot as plt
from pandas_profiling import ProfileReport
from streamlit_pandas_profiling import st_profile_report

def app():
    st.markdown('''
# **Exploratory data analysis App**
Please upload your xlsx file or click the button below to use example dataset
---
''')

# Upload CSV data
    with st.sidebar.header('Upload your XLSX data'):
        uploaded_file = st.sidebar.file_uploader("Upload your input XLSX file", type=["xlsx"])
       

    # Pandas Profiling Report
    if uploaded_file is not None:
        @st.cache
        def load_csv():
            csv = pd.read_excel(uploaded_file,engine='openpyxl')
            #csv = pd.read_csv(uploaded_file,encoding='latin1', index_col=None,usecols = "A,B,C,D,E,F,H,G,H,I,J")
            return csv
        df = load_csv()
        pr = ProfileReport(df, explorative=True)
        st.header('**Input DataFrame**')
        st.write(df)
        st.write('---')
        st.header('**Exploratory data analysis Report**')
        st_profile_report(pr)
        
    else:
        st.info('Awaiting for XLSX file to be uploaded.')
        
        if st.button('Press to use Example Dataset'):
            # Example data
            @st.cache
            def load_data():
                a = pd.DataFrame(
                    np.random.rand(100, 5),
                    columns=['a', 'b', 'c', 'd', 'e']
                )
                return a
            df = load_data()
            pr = ProfileReport(df, explorative=True)
            st.header('**Input DataFrame**')
            st.write(df)
            st.write('---')
            st.header('**Exploratory data analysis Report**')
            st_profile_report(pr)

Deployment

To deploy this project we used streamlit to create Web App

Run this code below

  streamlit run app.py

Appendix

Happy Coding!!!!!!

Exploratory data analysis

Related tags

Overview

Exploratory data analysis

An Exploratory data analysis APP

TAPIWA CHAMBOKO

🚀 About Me

Authors

Acknowledgements

Demo

Installation

Datasets

Code

Deployment

Appendix

Owner

tapiwa chamboko

A collection of robust and fast processing tools for parsing and analyzing web archive data.

Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine

Program that predicts the NBA mvp based on data from previous years.

💬 Python scripts to parse Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames.

Flexible HDF5 saving/loading and other data science tools from the University of Chicago

Python implementation of Principal Component Analysis

Useful tool for inserting DataFrames into the Excel sheet.

Retail-Sim is python package to easily create synthetic dataset of retaile store.

Python for Data Analysis, 2nd Edition

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Unsub is a collection analysis tool that assists libraries in analyzing their journal subscriptions.

The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.

Python Implementation of Scalable In-Memory Updatable Bitmap Indexing

Using Data Science with Machine Learning techniques (ETL pipeline and ML pipeline) to classify received messages after disasters.

Sensitivity Analysis Library in Python (Numpy). Contains Sobol, Morris, Fractional Factorial and FAST methods.

Weather Image Recognition - Python weather application using series of data

SNV calling pipeline developed explicitly to process individual or trio vcf files obtained from Illumina based pipeline (grch37/grch38).

Validation and inference over LinkML instance data using souffle

WAL enables programmable waveform analysis.

This program analyzes a DNA sequence and outputs snippets of DNA that are likely to be protein-coding genes.