MiniSom is a minimalistic implementation of the Self Organizing Maps

Overview

MiniSom

Self Organizing Maps

MiniSom is a minimalistic and Numpy based implementation of the Self Organizing Maps (SOM). SOM is a type of Artificial Neural Network able to convert complex, nonlinear statistical relationships between high-dimensional data items into simple geometric relationships on a low-dimensional display. Minisom is designed to allow researchers to easily build on top of it and to give students the ability to quickly grasp its details.

Updates about MiniSom are posted on Twitter.

Installation

Just use pip:

pip install minisom

or download MiniSom to a directory of your choice and use the setup script:

git clone https://github.com/JustGlowing/minisom.git
python setup.py install

How to use it

In order to use MiniSom you need your data organized as a Numpy matrix where each row corresponds to an observation or as list of lists like the following:

data = [[ 0.80,  0.55,  0.22,  0.03],
        [ 0.82,  0.50,  0.23,  0.03],
        [ 0.80,  0.54,  0.22,  0.03],
        [ 0.80,  0.53,  0.26,  0.03],
        [ 0.79,  0.56,  0.22,  0.03],
        [ 0.75,  0.60,  0.25,  0.03],
        [ 0.77,  0.59,  0.22,  0.03]]      

Then you can train MiniSom just as follows:

from minisom import MiniSom    
som = MiniSom(6, 6, 4, sigma=0.3, learning_rate=0.5) # initialization of 6x6 SOM
som.train(data, 100) # trains the SOM with 100 iterations

You can obtain the position of the winning neuron on the map for a given sample as follows:

som.winner(data[0])

For an overview of all the features implemented in minisom you can browse the following examples: https://github.com/JustGlowing/minisom/tree/master/examples

Export a SOM and load it again

A model can be saved using pickle as follows

import pickle
som = MiniSom(7, 7, 4)

# ...train the som here

# saving the som in the file som.p
with open('som.p', 'wb') as outfile:
    pickle.dump(som, outfile)

and can be loaded as follows

with open('som.p', 'rb') as infile:
    som = pickle.load(infile)

Note that if a lambda function is used to define the decay factor MiniSom will not be pickable anymore.

Explore parameters

You can use this dashboard to explore the effect of the parameters on a sample dataset: https://share.streamlit.io/justglowing/minisom/dashboard/dashboard.py

Examples

Here are some of the charts you'll see how to generate in the examples:

Seeds map Class assignment
Handwritteng digits mapping Hexagonal Topology som hexagonal toplogy
Color quantization Outliers detection

Other tutorials

How to cite MiniSom

@misc{vettigliminisom,
  title={MiniSom: minimalistic and NumPy-based implementation of the Self Organizing Map},
  author={Giuseppe Vettigli},
  year={2018},
  url={https://github.com/JustGlowing/minisom/},
}

Who uses Minisom?

Guidelines to contribute

  1. In the description of your Pull Request explain clearly what does it implements/fixes and your changes. Possibly give an example in the description of the PR. In cases that the PR is about a code speedup, report a reproducible example and quantify the speedup.
  2. Give your pull request a helpful title that summarises what your contribution does.
  3. Write unit tests for your code and make sure the existing tests are up to date. pytest can be used for this:
pytest minisom.py
  1. Make sure that there a no stylistic issues using pycodestyle:
pycodestyle minisom.py
  1. Make sure your code is properly commented and documented. Each public method needs to be documented as the existing ones.
Comments
  • Introducing possibility to train the SOM so that learning_rate and sigma are constant during one epoch.

    Introducing possibility to train the SOM so that learning_rate and sigma are constant during one epoch.

    This pull request introduces the possibility to train the SOM so that learning_rate and sigma are only being decreased after each epoch. During one epoch the SOM is updated once per given input vector (=len(data) times) with constant learning_rate and sigma. This should lead to a greater independence between the order of the input vectors and the resulting SOM.

    In order to use this feature, one only has to use train_epochs() instead of train().

    learning_rate and sigma could (should?) technically be updated only once every epoch but in order to change as little code as possible those parameters are still updated every time update() gets called (but with constant paramters during one epoch). This could be 'optimised' if desired.

    opened by jriege555 22
  • Fixed topographic_error() and quantization_error()

    Fixed topographic_error() and quantization_error()

    Problems:

    • The previous topographic_error() method is incorrect. bmu_1 and bmu_2 are not the coordinates of the best two matching units.
    • The previous topographic_error() and quantization_error() uses explicit for-loops, which is very slow.

    Fixes:

    • Fixed incorrect implementation of topographic_error() method.
    • Changed the topographic_error() and quantization_error() methods with vectorized implementation.
    opened by wei-zhang-thz 17
  • quantization error (theoretical question)

    quantization error (theoretical question)

    I have a question about the interpretability of the quantization error.

    How can we know that the SOM is reliable ? does the quantization error need to be lower than a certain value ?

    For exemple, in my case, i have a quantization errror of 7.0 which is quite high in comparison to the exemple given in the documentation. Does that mean my som is not reliable ?

    question 
    opened by lachhebo 13
  • Do you know why nodes change completely when I reran the same setup with varying number of iterations?

    Do you know why nodes change completely when I reran the same setup with varying number of iterations?

    Hey :-)

    First of all thank you for providing this tool, it seems very handy! I am using SOM with geopotential height anomalies over a given region as input variables to cluster meteorological circulation patterns (ca. 2000 observations). What is really strange is that the SOM nodes differ completely when I rerun the same setup with more iterations (e.g. doubling from 10000 to 20000). It produces nodes not only in a different order, but also such that have no analogue in the new SOM... Is there anything I am doing wrong?

    Thank you very much - below some details about the setup

    The example I am using most often is sigma=1 (Gaussian), lr=0.5, SOM sizes between 2x4 to 4x5. The problem occurs no matter the initialization (pca or random) and no matter the training (single, batch, random). My code is basically only:

    SOM

    som = MiniSom(som_m, som_n, ndims, sigma=sigma, learning_rate=lr, neighborhood_function='gaussian') som.pca_weights_init(somarr) som.train_batch(somarr,10000,verbose=True)

    ...

    plot

    for m in range(som_m): for n in range(som_n): ax... pltarr = som.get_weights()[m,n,:].reshape((nsomlat,nsomlon)) p = ax.contourf(somlons,somlats,pltarr,cmap='seismic', transform=ccrs.PlateCarree())

    question 
    opened by michel039 12
  • Vectorized the _activate function

    Vectorized the _activate function

    Great library, but I noticed that the training code for your SOMs is not vectorized. You use the fast_norm function a lot, which may be faster than linalg.norm for 1D arrays, but iterating over every spot in the SOM is a lot slower than just calling linalg.norm.

    This pull request replaces fast_norm with linalg.norm in 2 places where I saw iteration over the whole SOM. Some simple testing with a 100x100 SOM showed ~40x speedup on my laptop.

    After making the changes, the unit tests failed, which I believe is caused by incorrectly setting up the testing weights as a 2D array rather than a 3D array. So I changed that too, and now the unit tests pass. I also did a few rough tests of my own, and the results of self.winner(x) and the training seem to be the same as before.

    opened by AustinT 11
  • Time Series

    Time Series

    Hello! I am trying to use my time series data for the example uploaded, but I encounter this error when initializing pca. Also, the second image is the error that I encounter when I use random initialization.

    image image

    opened by jaybhiesantos 10
  • How to cluster images?

    How to cluster images?

    I would like to know how to cluster images instead of reading CSV I want to read all images from disk and cluster those images using SOM.

    Can you please share some examples?

    opened by balavenkatesh3322 10
  • Example: Hexagonal Topology bokeh

    Example: Hexagonal Topology bokeh

    Summary

    This branch actions on https://github.com/JustGlowing/minisom/issues/86 by adding to the existing examples/HexagonalTopology.ipynb notebook an interactive bokeh example of the equivalent matplotlib plot.

    The purpose of adding interactivity was so that further exploration could be conducted on the plot to see where the original data points are mapped to in the SOM space.

    Check

    • [x] This branch adds value to the main repository, so it is worthwhile to include.
    • [x] The bokeh plot is equivalent to the matplotlib plot.
    • [ ] The code is error free and works on your machine.
    • [x] The logic of showing data points in the hover tooltip is sound.

    Note

    This "closes #86".

    opened by avisionh 10
  • speed up in update method

    speed up in update method

    Hi! Thanks for sharing the library! I noticed that if you change the loop in the update method with an einsum operation you can speed up the training by some amount. Hope you find it useful. Christos

    opened by Sourmpis 10
  • Add topographic error calculation for hexagonal grid

    Add topographic error calculation for hexagonal grid

    This PR adds the functionality for Topographic Error calculation, computed by finding the first-best-matching and second-best-matching neurons in the hexagonal grid.

    Screenshot 2022-04-12 005139

    The topographic error calculation is based on the above equation, which considers if the first-best-matching and second-best-matching neurons are neighbors in the SOM grid.

    opened by TharindaDilshan 9
  • new visualizations

    new visualizations

    Hi, I have implemented a number of visualizations in the BasicUsage file. Addionally, I did some minor changes (mainly typos) in some other files. As this is my first use of github, I do not know how to separate both topics and make two pull requests... I hope this works out!

    opened by bijae 9
  • Topographic error wrong for hexagonal topography with rectangular grid

    Topographic error wrong for hexagonal topography with rectangular grid

    Hi,

    I am trying to get the topographic error from a SOM with 11x7 neurons, hexagonal topography.

    When I do, I get this error:

         21     return (-1, -1)
         22 y = som._weights.shape[1]
    ---> 23 coords = som.convert_map_to_euclidean((index % y, int(index/y)))
         24 return coords
    
    File ~/.local/lib/python3.8/site-packages/minisom.py:243, in MiniSom.convert_map_to_euclidean(self, xy)
        237 def convert_map_to_euclidean(self, xy):
        238     """Converts map coordinates into euclidean coordinates
        239     that reflects the chosen topology.
        240 
        241     Only useful if the topology chosen is not rectangular.
        242     """
    --> 243     return self._xx.T[xy], self._yy.T[xy]
    
    IndexError: index 8 is out of bounds for axis 1 with size 7
    

    I don't think this line of code makes sense:

    coords = som.convert_map_to_euclidean((index % y, int(index/y)))

    Shouldn't the parameters be inverted, e.g.:

    coords = som.convert_map_to_euclidean((int(index/y), index % y))

    Anyway, thanks for the amazing work!

    bug 
    opened by mbarison 6
  • Matching Matlab hyperparameters

    Matching Matlab hyperparameters

    Hi there!Thank you for this great work!

    I switched to using python from the Matlab, version of SOM However I found the result was quite different. Where I could have a perfect 100% in MatLab but somehow only get 19% in f1-score here.

    The only thing I changed from the default setting in Matlab is using a 10*10. som = MiniSom(10, 10, 4096, sigma=1.5, learning_rate=0.7,activation_distance='euclidean', neighborhood_function='gaussian', topology='hexagonal', random_seed=10) And this is what I had for my settings using minisom.

    Any suggestions so I could maybe recreate the result from Matlab?

    Thank you in advance!

    question 
    opened by AmousQiu 3
  • Is there a way to obtain a distance of each point to its BMU?

    Is there a way to obtain a distance of each point to its BMU?

    Hi, first and foremost thank you for your great work and allowing to implement SOM algorithm in such convienent way. I wanted to ask if there is a possibility to obtain a kind of list with the distances between each point and its Best Matching Unit (Node) on trained SOM grid? I have read the documentation and saw different attributes for the SOM object, however it appears to me that none of them allow to return the (euclidean) distance to BMU. Thanks in advance for support!

    question 
    opened by JMiklaszewski 1
  • Is there an option to obtain the BMU value directly?

    Is there an option to obtain the BMU value directly?

    Hi there,

    I am trying to use BMU values a metric to classify my data. Features are seismic attributes. Your function “distance_from_weights” was my first guess but it´s not exporting BMUS directly. We do have to manipulate it to remove the second BMU.

    np.argsort(distance_from_weights(data), axis=1)[:, :2] -----> np.argsort(distance_from_weights(data), axis=1)[:, :1]

    Do you mind to build that function?

    question 
    opened by akol67 1
  • Wrong value in topographic error function?

    Wrong value in topographic error function?

    So a topographic error occurs when the two bmu of a sample are not adjacent. Shouldn't then t = 1? If the bmu are two hops apart in a corner, their euclidean distance is sqrt(2) = 1.4142 . So with distance > 1.42 this doesn't count as an error. Or am I missing something?

    question 
    opened by SandroMartens 0
  • Example spatio-temporal climate data

    Example spatio-temporal climate data

    This pull request is to load a SOM example on climate data notebook, which is usually 2D (time, lat, lon).

    I've been looking a lot into SOM examples, and it's hard to find examples on climate data...so I hope this notebook can help future users (and also me, if you find something wrong on the use).

    For the example, I've used the tutorial dataset from Xarray.

    opened by carocamargo 2
Releases(2.3.0)
Owner
Giuseppe Vettigli
Data Scientist, teaching fellow, Python enthusiast, fearless visionarist, lateral thinker.
Giuseppe Vettigli
Code to train models from "Paraphrastic Representations at Scale".

Paraphrastic Representations at Scale Code to train models from "Paraphrastic Representations at Scale". The code is written in Python 3.7 and require

John Wieting 71 Dec 19, 2022
A Library for Modelling Probabilistic Hierarchical Graphical Models in PyTorch

A Library for Modelling Probabilistic Hierarchical Graphical Models in PyTorch

Korbinian Pöppel 47 Nov 28, 2022
a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

Arno Barton 1 Oct 29, 2021
Veri Setinizi Yolov5 Formatına Dönüştürün

Veri Setinizi Yolov5 Formatına Dönüştürün! Bu Repo da Neler Var? Xml Formatındaki Veri Setini .Txt Formatına Çevirme Xml Formatındaki Dosyaları Silme

Kadir Nar 4 Aug 22, 2022
New approach to benchmark VQA models

VQA Benchmarking This repository contains the web application & the python interface to evaluate VQA models. Documentation Please see the documentatio

4 Jul 25, 2022
Turn based roguelike in python

pyTB Turn based roguelike in python Documentation can be found here: http://mcgillij.github.io/pyTB/index.html Screenshot Dependencies Written in Pyth

Jason McGillivray 4 Sep 29, 2022
A simple image/video to Desmos graph converter run locally

Desmos Bezier Renderer A simple image/video to Desmos graph converter run locally Sample Result Setup Install dependencies apt update apt install git

Kevin JY Cui 339 Dec 23, 2022
A MatConvNet-based implementation of the Fully-Convolutional Networks for image segmentation

MatConvNet implementation of the FCN models for semantic segmentation This package contains an implementation of the FCN models (training and evaluati

VLFeat.org 175 Feb 18, 2022
Reimplementation of the paper `Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words? (ACL2020)`

Human Attention for Text Classification Re-implementation of the paper Human Attention Maps for Text Classification: Do Humans and Neural Networks Foc

Shunsuke KITADA 15 Dec 13, 2021
Script that attempts to force M1 macs into RGB mode when used with monitors that are defaulting to YPbPr.

fix_m1_rgb Script that attempts to force M1 macs into RGB mode when used with monitors that are defaulting to YPbPr. No warranty provided for using th

Kevin Gao 116 Jan 01, 2023
An educational tool to introduce AI planning concepts using mobile manipulator robots.

JEDAI Explains Decision-Making AI Virtual Machine Image The recommended way of using JEDAI is to use pre-configured Virtual Machine image that is avai

Autonomous Agents and Intelligent Robots 13 Nov 15, 2022
Official Datasets and Implementation from our Paper "Video Class Agnostic Segmentation in Autonomous Driving".

Video Class Agnostic Segmentation [Method Paper] [Benchmark Paper] [Project] [Demo] Official Datasets and Implementation from our Paper "Video Class A

Mennatullah Siam 26 Oct 24, 2022
Crosslingual Segmental Language Model

Crosslingual Segmental Language Model This repository contains the code from Multilingual unsupervised sequence segmentation transfers to extremely lo

C.M. Downey 1 Jun 13, 2022
MVGCN: a novel multi-view graph convolutional network (MVGCN) framework for link prediction in biomedical bipartite networks.

MVGCN MVGCN: a novel multi-view graph convolutional network (MVGCN) framework for link prediction in biomedical bipartite networks. Developer: Fu Hait

13 Dec 01, 2022
Experiments for Fake News explainability project

fake-news-explainability Experiments for fake news explainability project This repository only contains the notebooks used to train the models and eva

Lorenzo Flores (Lj) 1 Dec 03, 2022
Numerical-computing-is-fun - Learning numerical computing with notebooks for all ages.

As much as this series is to educate aspiring computer programmers and data scientists of all ages and all backgrounds, it is also a reminder to mysel

EKA foundation 758 Dec 25, 2022
MlTr: Multi-label Classification with Transformer

MlTr: Multi-label Classification with Transformer This is official implement of "MlTr: Multi-label Classification with Transformer". Abstract The task

程星 38 Nov 08, 2022
Investigating Attention Mechanism in 3D Point Cloud Object Detection (arXiv 2021)

Investigating Attention Mechanism in 3D Point Cloud Object Detection (arXiv 2021) This repository is for the following paper: "Investigating Attention

52 Nov 19, 2022
Here is the implementation of our paper S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations.

S2VC Here is the implementation of our paper S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations. In thi

81 Dec 15, 2022
Robot Reinforcement Learning on the Constraint Manifold

Implementation of "Robot Reinforcement Learning on the Constraint Manifold"

31 Dec 05, 2022