A naive Bayes model for cancer classification using a set of documents

Last update: Nov 24, 2021

Related tags

Machine Learning naivebayes

Overview

Naivebayes text classifcation model for cancer and noncancer documents

Author: Alex King

Purpose
Requirements/files included
How to use

1. Purpose

The Purpose of this program is to read in from csv files containing two columns:

                    Document | classifcation
                    xxxxxx   | cancer/nocancer
                    xxxxxx   | cancer/nocancer
                    xxxxxx   | cancer/nocancer

This program uses the data to read into classes containing each documents one file is used as the training set, and the other as the testing set. Each set goes through the same tokenization. From there one is trained and the other is tested.

2. Requirements/files used

* python3 * numpy library - for calculating log * pandas library - for reading in csv files * main.py and naivesbayes.py * stopwords.txt - list of stop words * Scoring.docx - list of scoring for precsion, Recall, F-score

3. How to use

This program has 3 modes of operation for tokenizing your sets:

                $python3 main.py -train 1 -test 1

This first command will execute std tokenization on training set 1 and test set 1. To change which training set just change the 1 into a 2.

                $python3 main.py -train 2 -test 1

#NOTE do not change testing set number leave it as 1 it was intended for multiple testing sets

For binary:

                $python3 main.py -train # -test 1 -b

For stopwords:

                $python3 main.py -train # -test 1 -s

For both stopwords and binary:

                $python3 main.py -train # -test 1 -b -s

A naive Bayes model for cancer classification using a set of documents

Related tags

Overview

Naivebayes text classifcation model for cancer and noncancer documents

Author: Alex King

1. Purpose

2. Requirements/files used

3. How to use

Owner

Alex W King

Predict the output which should give a fair idea about the chances of admission for a student for a particular university

Classification based on Fuzzy Logic(C-Means).

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

A repository of PyBullet utility functions for robotic motion planning, manipulation planning, and task and motion planning

Responsible Machine Learning with Python

Turning images into '9-pan' palettes using KMeans clustering from sklearn.

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Machine Learning toolbox for Humans

This is a curated list of medical data for machine learning

Dragonfly is an open source python library for scalable Bayesian optimisation.

Python based GBDT implementation

Polyglot Machine Learning example for scraping similar news articles.

Automated machine learning: Review of the state-of-the-art and opportunities for healthcare

BioPy is a collection (in-progress) of biologically-inspired algorithms written in Python

SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker.

LILLIE: Information Extraction and Database Integration Using Linguistics and Learning-Based Algorithms

XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.

A simple python program that draws a tree for incrementing values using the Collatz Conjecture.

This handbook accompanies the course: Machine Learning with Hung-Yi Lee