Active Learning demo using two small datasets

Last update: Nov 10, 2021

Related tags

Data Analysis ActiveLearningDemo

Overview

ActiveLearningDemo

How to run

step one

put the dataset folder and use command below to split the dataset to the required structure

run utils.py

For each dataset, six .mat documents should be included: TrainingMatrix.mat, TrainingLabels.mat, TestingMatrix.mat, TestingLabels.mat, UnlabeledMatrix.mat and UnlabeledLabels.mat.

step two

Train the model. You can set arguments:

Active learning

optional arguments:
  -h, --help            show this help message and exit
  --src SRC             dataset path
  --dst DST             destination path
  --type TYPE           sample strategy:random, entropy, combine
  --solver SOLVER       model solver
  --max_iter MAX_ITER   max iteration of each training
  --k K                 samele added for each iteration
  --n N                 number of iterations
  --plot_type PLOT_TYPE
                        plot single for one case(single) or plot average for
                        entire database(average)

You can utilize both one dataset with multiple subsets inside and one case of a dataset with only six .mat documents. By default, I used "newton-cg" solver and "combine" type which can train model with both strategies at once. To get results on different datasets directly, you can use:

python main.py --src your dataset path(./datasets/MMI) --dst output path(./img)

Result

MMI dataset

use "lbfgs" solver:

use "newton-cg" solver:

MindReading dataset

use "lbfgs" solver:

use "newton-cg" solver:

Active Learning demo using two small datasets

Related tags

Overview

ActiveLearningDemo

How to run

Result

Owner

wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

A neural-based binary analysis tool

A Python adaption of Augur to prioritize cell types in perturbation analysis.

A crude Hy handle on Pandas library

Pyspark Spotify ETL

Integrate bus data from a variety of sources (batch processing and real time processing).

ICLR 2022 Paper submission trend analysis

Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format

AptaMat is a simple script which aims to measure differences between DNA or RNA secondary structures.

Accurately separate the TLD from the registered domain and subdomains of a URL, using the Public Suffix List.

A pipeline that creates consensus sequences from a Nanopore reads. I

TextDescriptives - A Python library for calculating a large variety of statistics from text

CPSPEC is an astrophysical data reduction software for timing

This mini project showcase how to build and debug Apache Spark application using Python

Unsub is a collection analysis tool that assists libraries in analyzing their journal subscriptions.

Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database

Investigating EV charging data

🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

BErt-like Neurophysiological Data Representation