monolish: MONOlithic Liner equation Solvers for Highly-parallel architecture

Overview

monolish: MONOlithic LIner equation Solvers for Highly-parallel architecture

monolish is a linear equation solver library that monolithically fuses variable data type, matrix structures, matrix data format, vendor specific data transfer APIs, and vendor specific numerical algebra libraries.


monolish let developer forget about:

  • Performance tuning
  • Processor differences which execute library (Intel CPU, NVIDIA GPU, AMD CPU, ARM CPU, NEC SX-Aurora TSUBASA, etc.)
  • Vendor specific data transfer APIs (host RAM to Device RAM)
  • Finding bottlenecks and performance benchmarks
  • The argument data type of matrix/vector operations
  • Matrix structures and storage formats
  • Cumbersome package dependency

License

Copyright 2021 RICOS Co. Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments
  • It seems that the `Dense(M, N, min, max)` constructor is not completely random.

    It seems that the `Dense(M, N, min, max)` constructor is not completely random.

    Running the following simple program

    #include <iostream>
    #include <monolish_blas.hpp>
    
    int main() {
      monolish::matrix::Dense<double>x(2, 3, 0.0, 10.0);
      x.print_all();
      return 0;
    }
    
    $ g++ -O3 main.cpp -o main.out -lmonolish_cpu
    

    will produce results like this.

    [email protected]:/$ ./main.out
    1 1 5.27196
    1 2 2.82358 <--
    1 3 2.13893 <--
    2 1 9.72054
    2 2 2.82358 <--
    2 3 2.13893 <--
    [email protected]:/$ ./main.out
    1 1 5.3061
    1 2 9.75236
    1 3 7.15652
    2 1 5.28961
    2 2 2.05967
    2 3 0.59838
    [email protected]:/$ ./main.out
    1 1 9.33149 <--
    1 2 4.75639 <--
    1 3 8.71093 <--
    2 1 9.33149 <--
    2 2 4.75639 <--
    2 3 8.71093 <--
    

    The arrows (<--) indicate that the number is repeating.

    This is probably due to that the pseudo-random number generator does not split well when it is parallelized by OpenMP.

    https://github.com/ricosjp/monolish/blob/1b89942e869b7d0acd2d82b4c47baeba2fbdf3e6/src/utils/dense_constructor.cpp#L120-L127

    This may happen not only with Dense, but also with random constructors of other data structures.

    I tested this on docker image ghcr.io/ricosjp/monolish/mkl:0.14.1.

    opened by lotz84 5
  • impl. transpose matvec, matmul

    impl. transpose matvec, matmul

    I want to give modern and intuitive transposition information. But I have no idea how to implement it easily.

    First, we create the following function as a prototype

    matmul(A,B,C) // C=AB
    matmul_TNN(A, B, C); // C=A^TB
    matvec(A,x,y); // y = Ax
    matvec_T(A, x, y); // y=A^Tx
    

    This interface is not beautiful. However, it has the following advantages

    • It does not affect other functions.
    • Easy to trace with logger
    • Simple to implement FFI in the future.
    • When beautiful ideas appear in the future, these functions can be implemented wrapping it.
    opened by t-hishinuma 2
  • try -fopenmp-cuda-mode flag

    try -fopenmp-cuda-mode flag

    memo:

    Clang supports two data-sharing models for Cuda devices: Generic and Cuda modes. The default mode is Generic. Cuda mode can give an additional performance and can be activated using the -fopenmp-cuda-mode flag. In Generic mode all local variables that can be shared in the parallel regions are stored in the global memory. In Cuda mode local variables are not shared between the threads and it is user responsibility to share the required data between the threads in the parallel regions.

    https://clang.llvm.org/docs/OpenMPSupport.html#basic-support-for-cuda-devices

    opened by t-hishinuma 2
  • Reserch the effect of the level information of the performance of cusparse ILU precondition

    Reserch the effect of the level information of the performance of cusparse ILU precondition

    The level information may not improve the performance but spend extra time doing analysis. For example, a tridiagonal matrix has no parallelism. In this case, CUSPARSE_SOLVE_POLICY_NO_LEVEL performs better than CUSPARSE_SOLVE_POLICY_USE_LEVEL. If the user has an iterative solver, the best approach is to do csrsv2_analysis() with CUSPARSE_SOLVE_POLICY_USE_LEVEL once. Then do csrsv2_solve() with CUSPARSE_SOLVE_POLICY_NO_LEVEL in the first run and with CUSPARSE_SOLVE_POLICY_USE_LEVEL in the second run, picking faster one to perform the remaining iterations.

    https://docs.nvidia.com/cuda/cusparse/index.html#csric02

    opened by t-hishinuma 2
  •  ignoring return value in test

    ignoring return value in test

    matrix_transpose.cpp:60:3: warning: ignoring return value of 'monolish::matrix::COO<Float>& monolish::matrix::COO<Float>::transpose() [with Float = double]', declared with attribute nodiscard [-Wunused-result]
       60 |   A.transpose();
    
    opened by t-hishinuma 2
  • Automatic deploy at release

    Automatic deploy at release

    impl. in github actions

    • [x] generate Doxyben (need to chenge version name)
    • [x] generate deb file
    • [x] generate monolish docker

    need to get version number...

    opened by t-hishinuma 2
  • write how to install nvidia-docker

    write how to install nvidia-docker

    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
    curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee > /etc/apt/sources.list.d/nvidia-docker.list
    sudo apt update -y
    sudo apt install -y nvidia-docker2
    sudo systemctl restart docker
    
    opened by t-hishinuma 1
  • Resolve conflict of libmonolish-cpu and libmonolish-nvidia-gpu deb package

    Resolve conflict of libmonolish-cpu and libmonolish-nvidia-gpu deb package

    What conflicts?

    libomp.so is contained in both package

    How to resolve?

    • (a) Use libomp5-12 distributed by ubuntu
    • (b) Create another package of libomp in allgebra stage (libomp-allgebra)
    opened by termoshtt 1
  • resolve curse of type name in src/

    resolve curse of type name in src/

    In src/, int and size_t are written. When change the class or function declarations in include/, I don't want to rewrite src/. Use auto or decltype() to remove them.

    opened by t-hishinuma 1
  • LLVM OpenMP Offloading can be installed by apt?

    LLVM OpenMP Offloading can be installed by apt?

    docker run -it --gpus all -v $PWD:/work nvidia/cuda:11.7.0-devel-ubuntu22.04
    ==
    apt update -y
    apt install -y git intel-mkl cmake ninja-build ccache clang clang-tools libomp-14-dev gcc gfortran
    git config --global --add safe.directory /work
    cd /work; make gpu
    

    pass??

    opened by t-hishinuma 0
  • cusparse IC / ILU functions is deprecated

    cusparse IC / ILU functions is deprecated

    but, sample code of cusparse is not updated

    https://docs.nvidia.com/cuda/cusparse/index.html#csric02

    I dont like trial and error, so wait for the sample code to be updated.

    opened by t-hishinuma 0
Releases(0.17.0)
Owner
RICOS Co. Ltd.
株式会社科学計算総合研究所 / Research Institute for Computational Science Co. Ltd.
RICOS Co. Ltd.
ml4ir: Machine Learning for Information Retrieval

ml4ir: Machine Learning for Information Retrieval | changelog Quickstart → ml4ir Read the Docs | ml4ir pypi | python ReadMe ml4ir is an open source li

Salesforce 77 Jan 06, 2023
A simple guide to MLOps through ZenML and its various integrations.

ZenBytes Join our Slack Community and become part of the ZenML family Give the main ZenML repo a GitHub star to show your love ZenBytes is a series of

ZenML 127 Dec 27, 2022
A modular active learning framework for Python

Modular Active Learning framework for Python3 Page contents Introduction Active learning from bird's-eye view modAL in action From zero to one in a fe

modAL 1.9k Dec 31, 2022
List of Data Science Cheatsheets to rule the world

Data Science Cheatsheets List of Data Science Cheatsheets to rule the world. Table of Contents Business Science Business Science Problem Framework Dat

Favio André Vázquez 11.7k Dec 30, 2022
Graphsignal is a machine learning model monitoring platform.

Graphsignal is a machine learning model monitoring platform. It helps ML engineers, MLOps teams and data scientists to quickly address issues with data and models as well as proactively analyze model

Graphsignal 143 Dec 05, 2022
Data from "Datamodels: Predicting Predictions with Training Data"

Data from "Datamodels: Predicting Predictions with Training Data" Here we provid

Madry Lab 51 Dec 09, 2022
Iterative stochastic gradient descent (SGD) linear regressor with regularization

SGD-Linear-Regressor Iterative stochastic gradient descent (SGD) linear regressor with regularization Dataset: Kaggle “Graduate Admission 2” https://w

Zechen Ma 1 Oct 29, 2021
AutoX是一个高效的自动化机器学习工具,它主要针对于表格类型的数据挖掘竞赛。 它的特点包括: 效果出色、简单易用、通用、自动化、灵活。

English | 简体中文 AutoX是什么? AutoX一个高效的自动化机器学习工具,它主要针对于表格类型的数据挖掘竞赛。 它的特点包括: 效果出色: AutoX在多个kaggle数据集上,效果显著优于其他解决方案(见效果对比)。 简单易用: AutoX的接口和sklearn类似,方便上手使用。

4Paradigm 431 Dec 28, 2022
Predict the output which should give a fair idea about the chances of admission for a student for a particular university

Predict the output which should give a fair idea about the chances of admission for a student for a particular university.

ArvindSandhu 1 Jan 11, 2022
Solve automatic numerical differentiation problems in one or more variables.

numdifftools The numdifftools library is a suite of tools written in _Python to solve automatic numerical differentiation problems in one or more vari

Per A. Brodtkorb 181 Dec 16, 2022
50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

[Due to the time taken @ uni, work + hell breaking loose in my life, since things have calmed down a bit, will continue commiting!!!] [By the way, I'm

Daniel Han-Chen 1.4k Jan 01, 2023
Tribuo - A Java machine learning library

Tribuo - A Java prediction library (v4.1) Tribuo is a machine learning library in Java that provides multi-class classification, regression, clusterin

Oracle 1.1k Dec 28, 2022
Simple, light-weight config handling through python data classes with to/from JSON serialization/deserialization.

Simple but maybe too simple config management through python data classes. We use it for machine learning.

Eren Gölge 67 Nov 29, 2022
Python Machine Learning Jupyter Notebooks (ML website)

Python Machine Learning Jupyter Notebooks (ML website) Dr. Tirthajyoti Sarkar, Fremont, California (Please feel free to connect on LinkedIn here) Also

Tirthajyoti Sarkar 2.6k Jan 03, 2023
LinearRegression2 Tvads and CarSales

LinearRegression2_Tvads_and_CarSales This project infers the insight that how the TV ads for cars and car Sales are being linked with each other. It i

Ashish Kumar Yadav 1 Dec 29, 2021
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

alkaline-ml 1.3k Dec 22, 2022
Adversarial Framework for (non-) Parametric Image Stylisation Mosaics

Fully Adversarial Mosaics (FAMOS) Pytorch implementation of the paper "Copy the Old or Paint Anew? An Adversarial Framework for (non-) Parametric Imag

Zalando Research 120 Dec 24, 2022
A python library for easy manipulation and forecasting of time series.

Time Series Made Easy in Python darts is a python library for easy manipulation and forecasting of time series. It contains a variety of models, from

Unit8 5.2k Jan 04, 2023
A pure-python implementation of the UpSet suite of visualisation methods by Lex, Gehlenborg et al.

pyUpSet A pure-python implementation of the UpSet suite of visualisation methods by Lex, Gehlenborg et al. Contents Purpose How to install How it work

288 Jan 04, 2023
Pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code

pandas-method-chaining pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code. It is a fork from pandas-v

Francis 5 May 14, 2022