pandas, scikit-learn, xgboost and seaborn integration

Overview

pandas-ml

Latest Docs https://travis-ci.org/pandas-ml/pandas-ml.svg?branch=master

Overview

pandas, scikit-learn and xgboost integration.

Installation

$ pip install pandas_ml

Documentation

http://pandas-ml.readthedocs.org/en/stable/

Example

>>> import pandas_ml as pdml
>>> import sklearn.datasets as datasets

# create ModelFrame instance from sklearn.datasets
>>> df = pdml.ModelFrame(datasets.load_digits())
>>> type(df)
<class 'pandas_ml.core.frame.ModelFrame'>

# binarize data (features), not touching target
>>> df.data = df.data.preprocessing.binarize()
>>> df.head()
   .target  0  1  2  3  4  5  6  7  8 ...  54  55  56  57  58  59  60  61  62  63
0        0  0  0  1  1  1  1  0  0  0 ...   0   0   0   0   1   1   1   0   0   0
1        1  0  0  0  1  1  1  0  0  0 ...   0   0   0   0   0   1   1   1   0   0
2        2  0  0  0  1  1  1  0  0  0 ...   1   0   0   0   0   1   1   1   1   0
3        3  0  0  1  1  1  1  0  0  0 ...   1   0   0   0   1   1   1   1   0   0
4        4  0  0  0  1  1  0  0  0  0 ...   0   0   0   0   0   1   1   1   0   0
[5 rows x 65 columns]

# split to training and test data
>>> train_df, test_df = df.model_selection.train_test_split()

# create estimator (accessor is mapped to sklearn namespace)
>>> estimator = df.svm.LinearSVC()

# fit to training data
>>> train_df.fit(estimator)

# predict test data
>>> test_df.predict(estimator)
0     4
1     2
2     7
...
448    5
449    8
Length: 450, dtype: int64

# Evaluate the result
>>> test_df.metrics.confusion_matrix()
Predicted   0   1   2   3   4   5   6   7   8   9
Target
0          52   0   0   0   0   0   0   0   0   0
1           0  37   1   0   0   1   0   0   3   3
2           0   2  48   1   0   0   0   1   1   0
3           1   1   0  44   0   1   0   0   3   1
4           1   0   0   0  43   0   1   0   0   0
5           0   1   0   0   0  39   0   0   0   0
6           0   1   0   0   1   0  35   0   0   0
7           0   0   0   0   2   0   0  42   1   0
8           0   2   1   0   1   0   0   0  33   1
9           0   2   1   2   0   0   0   0   1  38

Supported Packages

  • scikit-learn
  • patsy
  • xgboost
Comments
  • Fixed imports of deprecated modules which were removed in pandas 0.24.0

    Fixed imports of deprecated modules which were removed in pandas 0.24.0

    Certain functions were deprecated in a previous version of pandas and moved to a different module (see #117). This PR fixes the imports of those functions.

    opened by kristofve 8
  • REL: v0.4.0

    REL: v0.4.0

    • [x] Compat/test for sklearn 0.18.0 (#81)
      • [x] initial fix (#81)
      • [x] wrapper for cross validation classes (re-enable skipped tests) (#85)
      • [x] tests for multioutput (#86)
      • [x] Update doc
    • [x] Compat/test for pandas 0.19.0 (#83)
    • [x] Update release note (#88)
    opened by sinhrks 4
  • Importation error

    Importation error

    I tried to import pandas_ml but it gave the error :

    AttributeError: type object 'NDFrame' has no attribute 'groupby'

    I'm running python3.8.1 and I installed pandas_ml via pip (version 20.0.2)

    I dig in the code, error is l.80 of file series.py

    @Appender(pd.core.generic.NDFrame.groupby.__doc__)

    Here pandas is imported at the top of the file with a classic import pandas as pd

    I guess there is a problem with the versions...

    Thanks in advance for any help

    opened by ierezell 2
  • Confusion Matrix no accessible

    Confusion Matrix no accessible

    Hi,

    I've been using confusion_matrix since it was an independent package. I've installed pandas_ml to continue using the package, but it seems that the setup.py script does not install the package.

    Could it be an issue with the find_packages function?

    opened by mmartinortiz 2
  • Seaborn Scatterplot matrix / pairplot integration

    Seaborn Scatterplot matrix / pairplot integration

    import seaborn as sns
    sns.set()
    
    df = sns.load_dataset("iris")
    sns.pairplot(df, hue="species")
    

    displays

    iris_scatter_matrix

    but pairplot doesn't work the same way with ModelFrame

    import pandas as pd
    pd.set_option('max_rows', 10)
    import sklearn.datasets as datasets
    import pandas_ml as pdml  # https://github.com/pandas-ml/pandas-ml
    import seaborn as sns
    import matplotlib.pyplot as plt
    df = pdml.ModelFrame(datasets.load_iris())
    sns.pairplot(df, hue=".target")
    

    iris_modelframe

    There is some useless subplots

    opened by scls19fr 2
  • Error while running train.py from speech commands in tensorflow examples.

    Error while running train.py from speech commands in tensorflow examples.

    Have the following error: File "train.py", line 27, in <module> from callbacks import ConfusionMatrixCallback File "/home/tesseract/ayush_workspace/NLP/WakeWord/tensorflow_trainer/ml/callbacks.py", line 21, in <module> from pandas_ml import ConfusionMatrix File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/__init__.py", line 3, in <module> from pandas_ml.core import ModelFrame, ModelSeries # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/__init__.py", line 3, in <module> from pandas_ml.core.frame import ModelFrame # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/frame.py", line 18, in <module> from pandas_ml.core.series import ModelSeries File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 11, in <module> class ModelSeries(ModelTransformer, pd.Series): File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 80, in ModelSeries @Appender(pd.core.generic.NDFrame.groupby.__doc__) AttributeError: type object 'NDFrame' has no attribute 'groupby' Happening with both version 5 and 6.1

    opened by ayush7 1
  • error for example https://pandas-ml.readthedocs.io/en/latest/xgboost.html

    error for example https://pandas-ml.readthedocs.io/en/latest/xgboost.html

    code from example https://pandas-ml.readthedocs.io/en/latest/xgboost.html '''import pandas_ml as pdml import sklearn.datasets as datasets df = pdml.ModelFrame(datasets.load_digits()) train_df, test_df = df.cross_validation.train_test_split() estimator = df.xgboost.XGBClassifier() train_df.fit(estimator) predicted = test_df.predict(estimator) q=1 test_df.metrics.confusion_matrix() train_df.xgboost.plot_importance()

    tuned_parameters = [{'max_depth': [3, 4]}] cv = df.grid_search.GridSearchCV(df.xgb.XGBClassifier(), tuned_parameters, cv=5)

    df.fit(cv) df.grid_search.describe(cv) q=1

    '''

    gives error ''' File "E:\Pandas\my_code\S_pandas_ml_feb27.py", line 10, in train_df.xgboost.plot_importance() File "C:\Users\sndr\Anaconda3\Lib\site-packages\pandas_ml\xgboost\base.py", line 61, in plot_importance return xgb.plot_importance(self._df.estimator.booster(),

    builtins.TypeError: 'str' object is not callable ''' I use Windows and 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)] Python Type "help", "copyright", "credits" or "license" for more information.

    opened by Sandy4321 1
  • pandas 0.24.0 has deprecated pandas.util.decorators

    pandas 0.24.0 has deprecated pandas.util.decorators

    See https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.24.0.html#deprecations

    This causes the import statement in https://github.com/pandas-ml/pandas-ml/blob/master/pandas_ml/core/frame.py to break.

    Looks like just need to change it to 'from pandas.utils'

    opened by usul83 1
  • 'mean_absoloute_error

    'mean_absoloute_error

    from sklearn import metrics print('MAE:',metrics.mean_absoloute_error(y_test,y_pred)) module 'sklearn.metrics' has no attribute 'mean_absoloute_error This error is occurred..any solution

    opened by vikramk1507 0
  • AttributeError: type object 'NDFrame' has no attribute 'groupby'

    AttributeError: type object 'NDFrame' has no attribute 'groupby'

    AttributeError: type object 'NDFrame' has no attribute 'groupby'

    from pandas_ml import ConfusionMatrix cm = ConfusionMatrix(actu, pred) cm.print_stats()


    AttributeError Traceback (most recent call last) in ----> 1 from pandas_ml import confusion_matrix 2 3 cm = ConfusionMatrix(actu, pred) 4 cm.print_stats()

    /usr/local/lib/python3.8/site-packages/pandas_ml/init.py in 1 #!/usr/bin/env python 2 ----> 3 from pandas_ml.core import ModelFrame, ModelSeries # noqa 4 from pandas_ml.tools import info # noqa 5 from pandas_ml.version import version as version # noqa

    /usr/local/lib/python3.8/site-packages/pandas_ml/core/init.py in 1 #!/usr/bin/env python 2 ----> 3 from pandas_ml.core.frame import ModelFrame # noqa 4 from pandas_ml.core.series import ModelSeries # noqa

    /usr/local/lib/python3.8/site-packages/pandas_ml/core/frame.py in 16 from pandas_ml.core.accessor import _AccessorMethods 17 from pandas_ml.core.generic import ModelPredictor, _shared_docs ---> 18 from pandas_ml.core.series import ModelSeries 19 20

    /usr/local/lib/python3.8/site-packages/pandas_ml/core/series.py in 9 10 ---> 11 class ModelSeries(ModelTransformer, pd.Series): 12 """ 13 Wrapper for pandas.Series to support sklearn.preprocessing

    /usr/local/lib/python3.8/site-packages/pandas_ml/core/series.py in ModelSeries() 78 return df 79 ---> 80 @Appender(pd.core.generic.NDFrame.groupby.doc) 81 def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True, 82 group_keys=True, squeeze=False):

    AttributeError: type object 'NDFrame' has no attribute 'groupby'

    opened by gfranco008 5
  • AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score'

    AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score'

    I am using scikit-learn version 0.23.1 and I get the following error: AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score' when calling the function ConfusionMatrix.

    opened by petraknovak 11
  • Error while running train.py from speech commands in tensorflow examples. AttributeError: type object 'NDFrame' has no attribute 'groupby'

    Error while running train.py from speech commands in tensorflow examples. AttributeError: type object 'NDFrame' has no attribute 'groupby'

    Have the following error: File "train.py", line 27, in <module> from callbacks import ConfusionMatrixCallback File "/home/tesseract/ayush_workspace/NLP/WakeWord/tensorflow_trainer/ml/callbacks.py", line 21, in <module> from pandas_ml import ConfusionMatrix File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/__init__.py", line 3, in <module> from pandas_ml.core import ModelFrame, ModelSeries # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/__init__.py", line 3, in <module> from pandas_ml.core.frame import ModelFrame # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/frame.py", line 18, in <module> from pandas_ml.core.series import ModelSeries File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 11, in <module> class ModelSeries(ModelTransformer, pd.Series): File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 80, in ModelSeries @Appender(pd.core.generic.NDFrame.groupby.__doc__) AttributeError: type object 'NDFrame' has no attribute 'groupby' Happening with both version 5 and 6.1

    opened by ayush7 3
  • Pandas 1.0.0rc0/0.6.1 module 'sklearn.preprocessing' has no attribute 'Imputer'

    Pandas 1.0.0rc0/0.6.1 module 'sklearn.preprocessing' has no attribute 'Imputer'

    SKLEARN

    sklearn.preprocessing.Imputer Warning DEPRECATED

    class sklearn.preprocessing.Imputer(*args, **kwargs)[source] Imputation transformer for completing missing values.

    Read more in the User Guide.

    
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-1-e0471065d85c> in <module>
          1 import pandas as pd
          2 import numpy as np
    ----> 3 import pandas_ml as pdml
          4 a1 = np.random.randint(0,2,size=(100,2))
          5 df = pd.DataFrame(a1,columns=['i1','i2'])
    
    C:\g\test\lib\pandas_ml\__init__.py in <module>
          1 #!/usr/bin/env python
          2 
    ----> 3 from pandas_ml.core import ModelFrame, ModelSeries       # noqa
          4 from pandas_ml.tools import info                         # noqa
          5 from pandas_ml.version import version as __version__     # noqa
    
    C:\g\test\lib\pandas_ml\core\__init__.py in <module>
          1 #!/usr/bin/env python
          2 
    ----> 3 from pandas_ml.core.frame import ModelFrame       # noqa
          4 from pandas_ml.core.series import ModelSeries     # noqa
    
    C:\g\test\lib\pandas_ml\core\frame.py in <module>
          8 
          9 import pandas_ml.imbaccessors as imbaccessors
    ---> 10 import pandas_ml.skaccessors as skaccessors
         11 import pandas_ml.smaccessors as smaccessors
         12 import pandas_ml.snsaccessors as snsaccessors
    
    C:\g\test\lib\pandas_ml\skaccessors\__init__.py in <module>
         17 from pandas_ml.skaccessors.neighbors import NeighborsMethods                      # noqa
         18 from pandas_ml.skaccessors.pipeline import PipelineMethods                        # noqa
    ---> 19 from pandas_ml.skaccessors.preprocessing import PreprocessingMethods              # noqa
         20 from pandas_ml.skaccessors.svm import SVMMethods                                  # noqa
    
    C:\g\test\lib\pandas_ml\skaccessors\preprocessing.py in <module>
         11     _keep_col_classes = [pp.Binarizer,
         12                          pp.FunctionTransformer,
    ---> 13                          pp.Imputer,
         14                          pp.KernelCenterer,
         15                          pp.LabelEncoder,
    
    AttributeError: module 'sklearn.preprocessing' has no attribute 'Imputer'
    
    opened by apiszcz 11
Releases(v0.6.1)
Simple and flexible ML workflow engine.

This is a simple and flexible ML workflow engine. It helps to orchestrate events across a set of microservices and create executable flow to handle requests. Engine is designed to be configurable wit

Katana ML 295 Jan 06, 2023
A python library for Bayesian time series modeling

PyDLM Welcome to pydlm, a flexible time series modeling library for python. This library is based on the Bayesian dynamic linear model (Harrison and W

Sam 438 Dec 17, 2022
Distributed Deep learning with Keras & Spark

Elephas: Distributed Deep Learning with Keras & Spark Elephas is an extension of Keras, which allows you to run distributed deep learning models at sc

Max Pumperla 1.6k Dec 29, 2022
This project has Classification and Clustering done Via kNN and K-Means respectfully

This project has Classification and Clustering done Via kNN and K-Means respectfully. It later tests its efficiency via F1/accuracy/recall/precision for kNN and Davies-Bouldin Index for Clustering. T

Mohammad Ali Mustafa 0 Jan 20, 2022
Drug prediction

I have collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of 5 medications, Drug A, Drug B, Drug c, Dr

Khazar 1 Jan 28, 2022
2021 Machine Learning Security Evasion Competition

2021 Machine Learning Security Evasion Competition This repository contains code samples for the 2021 Machine Learning Security Evasion Competition. P

Fabrício Ceschin 8 May 01, 2022
Regularization and Feature Selection in Least Squares Temporal Difference Learning

Regularization and Feature Selection in Least Squares Temporal Difference Learning Description This is Python implementations of Least Angle Regressio

Mina Parham 0 Jan 18, 2022
MaD GUI is a basis for graphical annotation and computational analysis of time series data.

MaD GUI Machine Learning and Data Analytics Graphical User Interface MaD GUI is a basis for graphical annotation and computational analysis of time se

Machine Learning and Data Analytics Lab FAU 10 Dec 19, 2022
MiniTorch - a diy teaching library for machine learning engineers

This repo is the full student code for minitorch. It is designed as a single repo that can be completed part by part following the guide book. It uses

1.1k Jan 07, 2023
A Lightweight Hyperparameter Optimization Tool 🚀

The mle-hyperopt package provides a simple and intuitive API for hyperparameter optimization of your Machine Learning Experiment (MLE) pipeline.

Robert Lange 137 Dec 02, 2022
Python library for multilinear algebra and tensor factorizations

scikit-tensor is a Python module for multilinear algebra and tensor factorizations

Maximilian Nickel 394 Dec 09, 2022
Adversarial Framework for (non-) Parametric Image Stylisation Mosaics

Fully Adversarial Mosaics (FAMOS) Pytorch implementation of the paper "Copy the Old or Paint Anew? An Adversarial Framework for (non-) Parametric Imag

Zalando Research 120 Dec 24, 2022
A Time Series Library for Apache Spark

Flint: A Time Series Library for Apache Spark The ability to analyze time series data at scale is critical for the success of finance and IoT applicat

Two Sigma 970 Jan 04, 2023
ML Optimizers from scratch using JAX

Toy implementations of some popular ML optimizers using Python/JAX

Shreyansh Singh 38 Jul 29, 2022
ThunderSVM: A Fast SVM Library on GPUs and CPUs

What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss

Xtra Computing Group 1.4k Dec 22, 2022
A collection of video resources for machine learning

Machine Learning Videos This is a collection of recorded talks at machine learning conferences, workshops, seminars, summer schools, and miscellaneous

Dustin Tran 1.5k Dec 29, 2022
Book Recommender System Using Sci-kit learn N-neighbours

Model-Based-Recommender-Engine I created a book Recommender System using Sci-kit learn's N-neighbours algorithm for my model and the streamlit library

1 Jan 13, 2022
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

23.3k Dec 31, 2022
PySpark ML Bank Churn Prediction

PySpark-Bank-Churn Surname: corresponds to the record (row) number and has no effect on the output. CreditScore: contains random values and has no eff

kemalgunay 2 Nov 11, 2021
Cryptocurrency price prediction and exceptions in python

Cryptocurrency price prediction and exceptions in python This is a coursework on foundations of computing module Through this coursework i worked on m

Panagiotis Sotirellos 1 Nov 07, 2021