monolish: MONOlithic Liner equation Solvers for Highly-parallel architecture

Overview

monolish: MONOlithic LIner equation Solvers for Highly-parallel architecture

monolish is a linear equation solver library that monolithically fuses variable data type, matrix structures, matrix data format, vendor specific data transfer APIs, and vendor specific numerical algebra libraries.


monolish let developer forget about:

  • Performance tuning
  • Processor differences which execute library (Intel CPU, NVIDIA GPU, AMD CPU, ARM CPU, NEC SX-Aurora TSUBASA, etc.)
  • Vendor specific data transfer APIs (host RAM to Device RAM)
  • Finding bottlenecks and performance benchmarks
  • The argument data type of matrix/vector operations
  • Matrix structures and storage formats
  • Cumbersome package dependency

License

Copyright 2021 RICOS Co. Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments
  • It seems that the `Dense(M, N, min, max)` constructor is not completely random.

    It seems that the `Dense(M, N, min, max)` constructor is not completely random.

    Running the following simple program

    #include <iostream>
    #include <monolish_blas.hpp>
    
    int main() {
      monolish::matrix::Dense<double>x(2, 3, 0.0, 10.0);
      x.print_all();
      return 0;
    }
    
    $ g++ -O3 main.cpp -o main.out -lmonolish_cpu
    

    will produce results like this.

    [email protected]:/$ ./main.out
    1 1 5.27196
    1 2 2.82358 <--
    1 3 2.13893 <--
    2 1 9.72054
    2 2 2.82358 <--
    2 3 2.13893 <--
    [email protected]:/$ ./main.out
    1 1 5.3061
    1 2 9.75236
    1 3 7.15652
    2 1 5.28961
    2 2 2.05967
    2 3 0.59838
    [email protected]:/$ ./main.out
    1 1 9.33149 <--
    1 2 4.75639 <--
    1 3 8.71093 <--
    2 1 9.33149 <--
    2 2 4.75639 <--
    2 3 8.71093 <--
    

    The arrows (<--) indicate that the number is repeating.

    This is probably due to that the pseudo-random number generator does not split well when it is parallelized by OpenMP.

    https://github.com/ricosjp/monolish/blob/1b89942e869b7d0acd2d82b4c47baeba2fbdf3e6/src/utils/dense_constructor.cpp#L120-L127

    This may happen not only with Dense, but also with random constructors of other data structures.

    I tested this on docker image ghcr.io/ricosjp/monolish/mkl:0.14.1.

    opened by lotz84 5
  • impl. transpose matvec, matmul

    impl. transpose matvec, matmul

    I want to give modern and intuitive transposition information. But I have no idea how to implement it easily.

    First, we create the following function as a prototype

    matmul(A,B,C) // C=AB
    matmul_TNN(A, B, C); // C=A^TB
    matvec(A,x,y); // y = Ax
    matvec_T(A, x, y); // y=A^Tx
    

    This interface is not beautiful. However, it has the following advantages

    • It does not affect other functions.
    • Easy to trace with logger
    • Simple to implement FFI in the future.
    • When beautiful ideas appear in the future, these functions can be implemented wrapping it.
    opened by t-hishinuma 2
  • try -fopenmp-cuda-mode flag

    try -fopenmp-cuda-mode flag

    memo:

    Clang supports two data-sharing models for Cuda devices: Generic and Cuda modes. The default mode is Generic. Cuda mode can give an additional performance and can be activated using the -fopenmp-cuda-mode flag. In Generic mode all local variables that can be shared in the parallel regions are stored in the global memory. In Cuda mode local variables are not shared between the threads and it is user responsibility to share the required data between the threads in the parallel regions.

    https://clang.llvm.org/docs/OpenMPSupport.html#basic-support-for-cuda-devices

    opened by t-hishinuma 2
  • Reserch the effect of the level information of the performance of cusparse ILU precondition

    Reserch the effect of the level information of the performance of cusparse ILU precondition

    The level information may not improve the performance but spend extra time doing analysis. For example, a tridiagonal matrix has no parallelism. In this case, CUSPARSE_SOLVE_POLICY_NO_LEVEL performs better than CUSPARSE_SOLVE_POLICY_USE_LEVEL. If the user has an iterative solver, the best approach is to do csrsv2_analysis() with CUSPARSE_SOLVE_POLICY_USE_LEVEL once. Then do csrsv2_solve() with CUSPARSE_SOLVE_POLICY_NO_LEVEL in the first run and with CUSPARSE_SOLVE_POLICY_USE_LEVEL in the second run, picking faster one to perform the remaining iterations.

    https://docs.nvidia.com/cuda/cusparse/index.html#csric02

    opened by t-hishinuma 2
  •  ignoring return value in test

    ignoring return value in test

    matrix_transpose.cpp:60:3: warning: ignoring return value of 'monolish::matrix::COO<Float>& monolish::matrix::COO<Float>::transpose() [with Float = double]', declared with attribute nodiscard [-Wunused-result]
       60 |   A.transpose();
    
    opened by t-hishinuma 2
  • Automatic deploy at release

    Automatic deploy at release

    impl. in github actions

    • [x] generate Doxyben (need to chenge version name)
    • [x] generate deb file
    • [x] generate monolish docker

    need to get version number...

    opened by t-hishinuma 2
  • write how to install nvidia-docker

    write how to install nvidia-docker

    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
    curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee > /etc/apt/sources.list.d/nvidia-docker.list
    sudo apt update -y
    sudo apt install -y nvidia-docker2
    sudo systemctl restart docker
    
    opened by t-hishinuma 1
  • Resolve conflict of libmonolish-cpu and libmonolish-nvidia-gpu deb package

    Resolve conflict of libmonolish-cpu and libmonolish-nvidia-gpu deb package

    What conflicts?

    libomp.so is contained in both package

    How to resolve?

    • (a) Use libomp5-12 distributed by ubuntu
    • (b) Create another package of libomp in allgebra stage (libomp-allgebra)
    opened by termoshtt 1
  • resolve curse of type name in src/

    resolve curse of type name in src/

    In src/, int and size_t are written. When change the class or function declarations in include/, I don't want to rewrite src/. Use auto or decltype() to remove them.

    opened by t-hishinuma 1
  • LLVM OpenMP Offloading can be installed by apt?

    LLVM OpenMP Offloading can be installed by apt?

    docker run -it --gpus all -v $PWD:/work nvidia/cuda:11.7.0-devel-ubuntu22.04
    ==
    apt update -y
    apt install -y git intel-mkl cmake ninja-build ccache clang clang-tools libomp-14-dev gcc gfortran
    git config --global --add safe.directory /work
    cd /work; make gpu
    

    pass??

    opened by t-hishinuma 0
  • cusparse IC / ILU functions is deprecated

    cusparse IC / ILU functions is deprecated

    but, sample code of cusparse is not updated

    https://docs.nvidia.com/cuda/cusparse/index.html#csric02

    I dont like trial and error, so wait for the sample code to be updated.

    opened by t-hishinuma 0
Releases(0.17.0)
Owner
RICOS Co. Ltd.
株式会社科学計算総合研究所 / Research Institute for Computational Science Co. Ltd.
RICOS Co. Ltd.
Scikit-Garden or skgarden is a garden for Scikit-Learn compatible decision trees and forests.

Scikit-Garden or skgarden (pronounced as skarden) is a garden for Scikit-Learn compatible decision trees and forests.

260 Dec 21, 2022
EbookMLCB - ebook Machine Learning cơ bản

Mã nguồn cuốn ebook "Machine Learning cơ bản", Vũ Hữu Tiệp. ebook Machine Learning cơ bản pdf-black_white, pdf-color. Mọi hình thức sao chép, in ấn đề

943 Jan 02, 2023
LightGBM + Optuna: no brainer

AutoLGBM LightGBM + Optuna: no brainer auto train lightgbm directly from CSV files auto tune lightgbm using optuna auto serve best lightgbm model usin

Rishiraj Acharya 22 Dec 15, 2022
Lingtrain Alignment Studio is an ML based app for texts alignment on different languages.

Lingtrain Alignment Studio Intro Lingtrain Alignment Studio is the ML based app for accurate texts alignment on different languages. Extracts parallel

Sergei Averkiev 186 Jan 03, 2023
A Python Module That Uses ANN To Predict A Stocks Price And Also Provides Accurate Technical Analysis With Many High Potential Implementations!

Stox A Module to predict the "close price" for the next day and give "technical analysis". It uses a Neural Network and the LSTM algorithm to predict

Stox 31 Dec 16, 2022
neurodsp is a collection of approaches for applying digital signal processing to neural time series

neurodsp is a collection of approaches for applying digital signal processing to neural time series, including algorithms that have been proposed for the analysis of neural time series. It also inclu

NeuroDSP 224 Dec 02, 2022
Used Logistic Regression, Random Forest, and XGBoost to predict the outcome of Search & Destroy games from the Call of Duty World League for the 2018 and 2019 seasons.

Call of Duty World League: Search & Destroy Outcome Predictions Growing up as an avid Call of Duty player, I was always curious about what factors led

Brett Vogelsang 2 Jan 18, 2022
30 Days Of Machine Learning Using Pytorch

Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch

Mayur 119 Nov 24, 2022
A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.

MLOps template with examples for Data pipelines, ML workflow management, API development and Monitoring.

Utsav 33 Dec 03, 2022
PennyLane is a cross-platform Python library for differentiable programming of quantum computers

PennyLane is a cross-platform Python library for differentiable programming of quantum computers. Train a quantum computer the same way as a neural ne

PennyLaneAI 1.6k Jan 01, 2023
ZenML 🙏: MLOps framework to create reproducible ML pipelines for production machine learning.

ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. It has a simple, flexible syntax, is cloud and tool agnostic, and has interfaces/abstraction

ZenML 2.6k Jan 08, 2023
Machine-care - A simple python script to take care of simple maintenance tasks

Machine care An simple python script to take care of simple maintenance tasks fo

2 Jul 10, 2022
customer churn prediction prevention in telecom industry using machine learning and survival analysis

Telco Customer Churn Prediction - Plotly Dash Application Description This dash application allows you to predict telco customer churn using machine l

Benaissa Mohamed Fayçal 3 Nov 20, 2021
A logistic regression model for health insurance purchasing prediction

Logistic_Regression_Model A logistic regression model for health insurance purchasing prediction This code is using these packages, so please make sur

ShawnWang 1 Nov 29, 2021
Scikit-learn compatible wrapper of the Random Bits Forest program written by (Wang et al., 2016)

sklearn-compatible Random Bits Forest Scikit-learn compatible wrapper of the Random Bits Forest program written by Wang et al., 2016, available as a b

Tamas Madl 8 Jul 24, 2021
Provide an input CSV and a target field to predict, generate a model + code to run it.

automl-gs Give an input CSV file and a target field you want to predict to automl-gs, and get a trained high-performing machine learning or deep learn

Max Woolf 1.8k Jan 04, 2023
🔬 A curated list of awesome machine learning strategies & tools in financial market.

🔬 A curated list of awesome machine learning strategies & tools in financial market.

GeorgeZou 1.6k Dec 30, 2022
Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

Thoughtworks 318 Jan 02, 2023
A collection of neat and practical data science and machine learning projects

Data Science A collection of neat and practical data science and machine learning projects Explore the docs » Report Bug · Request Feature Table of Co

Will Fong 2 Dec 10, 2021
The MLOps is the process of continuous integration and continuous delivery of Machine Learning artifacts as a software product, keeping it inside a loop of Design, Model Development and Operations.

MLOps The MLOps is the process of continuous integration and continuous delivery of Machine Learning artifacts as a software product, keeping it insid

Maykon Schots 25 Nov 27, 2022