KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite

Overview

KSAI Lite

English | 简体中文

Documentation Status Release License

KSAI Lite是一个轻量级、灵活性强、高性能且易于扩展的深度学习推理框架,底层基于tensorflow lite,定位支持包括移动端、嵌入式以及服务器端在内的多硬件平台。

当前KSAI Lite已经应用在金山office内部业务中,并逐步支持金山企业的生产任务和众多外部用户。

快速入门

使用KSAI Lite,只需几个简单的步骤,就可以把模型部署到多种终端设备中,运行高性能的推理任务,使用流程如下所示:

一. 准备模型

KSAI Lite框架直接支持模型结构为tflite模型。 如果您手中的模型是由诸如Caffe、MXNet、PyTorch等框架产出的,那么您可以使用工具将模型转换为tflite格式。

二. 模型优化

KSAI Lite框架基于底层tensorflow lite的优化方法,拥有优秀的加速、优化策略及实现,包含量化、子图融合、Kernel优选等优化手段。优化后的模型更轻量级,耗费资源更少,并且执行速度也更快。

三. 下载或编译

KSAI Lite提供了多平台的官方Release预测库下载,我们优先推荐您直接下载 KSAI Lite预编译库,包括了Linux-X64, Linux-ARM, Linux-MIPS64以及Windows-X64索引库Windows-X64动态链接库。 您也可以根据目标平台选择对应的源码编译方法。KSAI Lite 提供了源码编译脚本,位于 tools/目录下,只需要按照docs/目录下的准备环境说明文档environment setup.md搭建好环境然后切到tools/目录调用编译脚本两个步骤即可一键编译得到目标平台的KSAI Lite预测库。

四. 预测示例

KSAI Lite提供了C++ API,并且提供了相应API的完整使用示例: 目录为tensorflow/lite/examples/reg_test/reg_test.cc 您可以参考示例快速了解使用方法,并集成到您自己的项目中去,也可以参考KSAI-Toolkits该项目。

主要特性

  • 多硬件支持
    • KSAI Lite架构已经验证和完整支持从 Mobile 到 Server 多种硬件平台,包括 intel X86、ARM、华为 Kunpeng 920、龙芯Loongson-3A R3、兆芯C4600、Phytium FT1500a等,且正在不断增加更多新硬件支持。
  • 轻量级部署
    • KSAI Lite在设计上对图优化模块和执行引擎实现了良好的解耦拆分,移动端可以直接部署执行阶段,无任何第三方依赖。
  • 高性能
    • 极致的 ARM及X86 CPU 性能优化:针对不同微架构特点实现kernel的定制,最大发挥计算性能,在主流模型上展现出领先的速度优势。
  • 多模型多算子
    • KSAI Lite和tensorflow训练框架的OP对齐,提供广泛的模型支持能力。
    • 目前已对视觉类模型做到了较为充分的支持,覆盖分类、检测和识别,包含了特色的OCR模型的支持,并在不断丰富中。
  • 强大的图分析和优化能力
    • 不同于常规的移动端预测引擎基于 Python 脚本工具转化模型, Lite 架构上有完整基于 C++ 开发的 IR 及相应 Pass 集合,以支持操作融合,计算剪枝,存储优化,量化计算等多类计算图优化。

持续集成

System X86 Linux ARM Linux MIPS64 Linux windows
CPU(32bit) Build Status - - Build Status
CPU(64bit) Build Status - - Build Status
高通骁龙845 - Build Status - -
华为kunpeng920 - Build Status - -
龙芯Loongson-3A - - Build Status -
兆芯C4600 - Build Status - -
Phytium FT1500a - Build Status - -

交流与反馈

版权和许可证

KSAI-Lite由Apache-2.0 license提供

As a part of the HAKE project, includes the reproduced SOTA models and the corresponding HAKE-enhanced versions (CVPR2020).

HAKE-Action HAKE-Action (TensorFlow) is a project to open the SOTA action understanding studies based on our Human Activity Knowledge Engine. It inclu

Yong-Lu Li 94 Nov 18, 2022
Context Axial Reverse Attention Network for Small Medical Objects Segmentation

CaraNet: Context Axial Reverse Attention Network for Small Medical Objects Segmentation This repository contains the implementation of a novel attenti

401 Dec 23, 2022
This repository contains the implementation of the paper Contrastive Instance Association for 4D Panoptic Segmentation using Sequences of 3D LiDAR Scans

Contrastive Instance Association for 4D Panoptic Segmentation using Sequences of 3D LiDAR Scans This repository contains the implementation of the pap

Photogrammetry & Robotics Bonn 40 Dec 01, 2022
Unofficial Implementation of MLP-Mixer in TensorFlow

mlp-mixer-tf Unofficial Implementation of MLP-Mixer [abs, pdf] in TensorFlow. Note: This project may have some bugs in it. I'm still learning how to i

Rishabh Anand 24 Mar 23, 2022
style mixing for animation face

An implementation of StyleGAN on Animation dataset. Install git clone https://github.com/MorvanZhou/anime-StyleGAN cd anime-StyleGAN pip install -r re

Morvan 46 Nov 30, 2022
Code for the tech report Toward Training at ImageNet Scale with Differential Privacy

Differentially private Imagenet training Code for the tech report Toward Training at ImageNet Scale with Differential Privacy by Alexey Kurakin, Steve

Google Research 29 Nov 03, 2022
This package is for running the semantic SLAM algorithm using extracted planar surfaces from the received detection

Semantic SLAM This package can perform optimization of pose estimated from VO/VIO methods which tend to drift over time. It uses planar surfaces extra

Hriday Bavle 125 Dec 02, 2022
learned_optimization: Training and evaluating learned optimizers in JAX

learned_optimization: Training and evaluating learned optimizers in JAX learned_optimization is a research codebase for training learned optimizers. I

Google 533 Dec 30, 2022
This was initially the repo for the project of [email protected] of Asaf Mazar, Millad Kassaie and Georgios Chochlakis named "Powered by the Will? Exploring Lay Theories of Behavior Change through Social Media"

Subreddit Analysis This repo includes tools for Subreddit analysis, originally developed for our class project of PSYC 626 in USC, titled "Powered by

Georgios Chochlakis 1 Dec 17, 2021
S2s2net - Sentinel-2 Super-Resolution Segmentation Network

S2S2Net Sentinel-2 Super-Resolution Segmentation Network Getting started Install

Wei Ji 10 Nov 10, 2022
PAthological QUpath Obsession - QuPath and Python conversations

PAQUO: PAthological QUpath Obsession Welcome to paquo 👋 , a library for interacting with QuPath from Python. paquo's goal is to provide a pythonic in

Bayer AG 60 Dec 31, 2022
Official Implementation of DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation [Arxiv] [Paper] As acquiring pixel-wise an

Lukas Hoyer 305 Dec 29, 2022
SoGCN: Second-Order Graph Convolutional Networks

SoGCN: Second-Order Graph Convolutional Networks This is the authors' implementation of paper "SoGCN: Second-Order Graph Convolutional Networks" in Py

Yuehao 7 Aug 16, 2022
Multi Task RL Baselines

MTRL Multi Task RL Algorithms Contents Introduction Setup Usage Documentation Contributing to MTRL Community Acknowledgements Introduction M

Facebook Research 171 Jan 09, 2023
Mining-the-Social-Web-3rd-Edition - The official online compendium for Mining the Social Web, 3rd Edition (O'Reilly, 2018)

Mining the Social Web, 3rd Edition The official code repository for Mining the Social Web, 3rd Edition (O'Reilly, 2019). The book is available from Am

Mikhail Klassen 838 Jan 01, 2023
For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training.

LongScientificFormer For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training. Some code

Athar Sefid 6 Nov 02, 2022
The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

Angtian Wang 76 Nov 23, 2022
PyTorch implementation of paper: AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer, ICCV 2021.

AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer [Paper] [PyTorch Implementation] [Paddle Implementation] Overview This reposit

148 Dec 30, 2022
ICCV2021 - Mining Contextual Information Beyond Image for Semantic Segmentation

Introduction The official repository for "Mining Contextual Information Beyond Image for Semantic Segmentation". Our full code has been merged into ss

55 Nov 09, 2022
AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation

AtlasNet [Project Page] [Paper] [Talk] AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation Thibault Groueix, Matthew Fisher, Vladimir

577 Dec 17, 2022