On-device speech-to-index engine powered by deep learning.

Overview

Octopus

GitHub

PyPI Maven Central Cocoapods

Made in Vancouver, Canada by Picovoice

Twitter URL YouTube Channel Views

Octopus is Picovoice's Speech-to-Index engine. It directly indexes speech without relying on a text representation. This acoustic-only approach boosts accuracy by removing out-of-vocabulary limitation and eliminating the problem of competing hypothesis (e.g. homophones)

Table of Contents

Demos

Python Demos

Install the demo package:

sudo pip3 install pvoctopusdemo

Run the following in the terminal:

octopus_demo  --access_key {AccessKey} --audio_paths ${AUDIO_PATHS}

Replace ${AccessKey} with your AccessKey obtained from Picovoice Console and ${AUDIO_PATHS} with a space-separated list of audio files. Octopus starts processing the audio files and asks you for search phrases and shows results interactively.

For more information about the Python demos go to demo/python.

C Demos

Build the demo:

cmake -S demo/c/ -B demo/c/build && cmake --build demo/c/build

Index a given audio file:

./demo/c/build/octopus_index_demo ${LIBRARY_PATH} ${ACCESS_KEY} ${AUDIO_PATH} ${INDEX_PATH}

Then search the index for a given phrase:

./demo/c/build/octopus_search_demo ${LIBRARY_PATH} ${MODEL_PATH} ${ACCESS_KEY} ${INDEX_PATH} ${SEARCH_PHRASE}

Replace ${LIBRARY_PATH} with path to appropriate library available under lib, ${ACCESS_KEY} with AccessKey obtained from Picovoice Console, ${AUDIO_PATH} with the path to a given audio file and format, ${INDEX_PATH} with the path to cached index file and ${SEARCH_PHRASE} to a search phrase.

For more information about C demos go to demo/c.

Android Demos

Using Android Studio, open demo/android/OctopusDemo as an Android project.

Replace "${YOUR_ACCESS_KEY_HERE}" inside MainActivity.java with your AccessKey obtained from Picovoice Console. Then run the demo.

For more information about Android demos go to demo/android.

iOS Demos

From the demo/ios/OctopusDemo, run the following to install the Octopus CocoaPod:

pod install

Replace "{YOUR_ACCESS_KEY_HERE}" inside ViewModel.swift with your AccessKey obtained from Picovoice Console. Then, using Xcode, open the generated OctopusDemo.xcworkspace and run the application.

For more information about iOS demos go to demo/ios.

Web Demos

From demo/web run the following in the terminal:

yarn
yarn start

(or)

npm install
npm run start

Open http://localhost:5000 in your browser to try the demo.

SDKs

Python

Create an instance of the engine:

import pvoctopus
access_key = ""  # AccessKey provided by Picovoice Console (https://picovoice.ai/console/)
handle = pvoctopus.create(access_key=access_key)

Index your raw audio data or file:

audio_data = [..]
metadata = handle.index(audio_data)
# or 
audio_file_path = "/path/to/my/audiofile.wav"
metadata = handle.index_file(audio_file_path)

Then search the metadata for phrases:

{match.end_sec} ({match.probablity})") ">
avocado_matches = matches['avocado']
for match in avocado_matches:
    print(f"Match for `avocado`: {match.start_sec} -> {match.end_sec} ({match.probablity})")

When done the handle resources have to be released explicitly:

handle.delete()

C

pv_octopus.h header file contains relevant information. Build an instance of the object:

    const char *model_path = "..."; // absolute path to the model file available at `lib/common/octopus_params.pv`
    const char *access_key = "..." // AccessKey provided by Picovoice Console (https://picovoice.ai/console/)
    pv_octopus_t *handle = NULL;
    pv_status_t status = pv_octopus_init(access_key, model_path, &handle);
    if (status != PV_STATUS_SUCCESS) {
        // error handling logic
    }

Index audio data using constructed object:

const char *audio_path = "..."; // absolute path to the audio file to be indexed
void *indices = NULL;
int32_t num_indices_bytes = 0;
pv_status_t status = pv_octopus_index_file(handle, audio_path, &indices, &num_indices_bytes);
if (status != PV_STATUS_SUCCESS) {
    // error handling logic
}

Search the indexed data:

const char *phrase = "...";
pv_octopus_match_t *matches = NULL;
int32_t num_matches = 0;
pv_status_t status = pv_octopus_search(handle, indices, num_indices_bytes, phrase, &matches, &num_matches);
if (status != PV_STATUS_SUCCESS) {
    // error handling logic
}

When done be sure to release the acquired resources:

pv_octopus_delete(handle);

Android

Create an instance of the engine:

import ai.picovoice.octopus.*;

final String accessKey = "..."; // AccessKey provided by Picovoice Console (https://picovoice.ai/console/)
try {
    Octopus handle = new Octopus.Builder(accessKey).build(appContext);
} catch (OctopusException ex) { }

Index audio data using constructed object:

final String audioFilePath = "/path/to/my/audiofile.wav"
try {
    OctopusMetadata metadata = handle.indexAudioFile(audioFilePath);
} catch (OctopusException ex) { }

Search the indexed data:

HashMap <String, OctopusMatch[]> matches = handle.search(metadata, phrases);

for (Map.Entry<String, OctopusMatch[]> entry : map.entrySet()) {
    final String phrase = entry.getKey();
    for (OctopusMatch phraseMatch : entry.getValue()){
        final float startSec = phraseMatch.getStartSec();
        final float endSec = phraseMatch.getEndSec();
        final float probability = phraseMatch.getProbability();
    }
}

When done be sure to release the acquired resources:

metadata.delete();
handle.delete();

iOS

Create an instance of the engine:

import Octopus

let accessKey : String = // .. AccessKey provided by Picovoice Console (https://picovoice.ai/console/)
do {
    let handle = try Octopus(accessKey: accessKey)
} catch { }

Index audio data using constructed object:

let audioFilePath = "/path/to/my/audiofile.wav"
do {
    let metadata = try handle.indexAudioFile(path: audioFilePath)
} catch { }

Search the indexed data:

let matches: Dictionary<String, [OctopusMatch]> = try octopus.search(metadata: metadata, phrases: phrases)
for (phrase, phraseMatches) in matches {
    for phraseMatch in phraseMatches {
        var startSec = phraseMatch.startSec;
        var endSec = phraseMatch.endSec;
        var probability = phraseMatch.probability;
    }
}

When done be sure to release the acquired resources:

handle.delete();

Web

Octopus is available on modern web browsers (i.e., not Internet Explorer) via WebAssembly. Octopus is provided pre-packaged as a Web Worker to allow it to perform processing off the main thread.

Vanilla JavaScript and HTML (CDN Script Tag)

">
>
<html lang="en">

<head>
  <script src="https://unpkg.com/@picovoice/octopus-web-en-worker/dist/iife/index.js">script>
  <script type="application/javascript">
    // The metadata object to save the result of indexing for later searches
    let octopusMetadata = undefined

    function octopusIndexCallback(metadata) {
      octopusMetadata = metadata
    }

    function octopusSearchCallback(matches) {
      console.log(`Search results (${matches.length}):`)
      console.log(`Start: ${match.startSec}s -> End: ${match.endSec}s (Probability: ${match.probability})`)
    }

    async function startOctopus() {
      // Create an Octopus Worker
      // Note: you receive a Worker object, _not_ an individual Octopus instance
      const accessKey = ... // AccessKey string provided by Picovoice Console (https://picovoice.ai/console/)
      const OctopusWorker = await OctopusWorkerFactory.create(
        accessKey,
        octopusIndexCallback,
        octopusSearchCallback
      )
    }

    document.addEventListener("DOMContentLoaded", function () {
      startOctopus();
      // Send Octopus the audio signal
      const audioSignal = new Int16Array(/* Provide data with correct format*/)
      OctopusWorker.postMessage({
        command: "index",
        input: audioSignal,
      });
    });

    const searchText = ...
    OctopusWorker.postMessage({
      command: "search",
      metadata: octopusMetadata,
      searchPhrase: searchText,
    });
  script>
head>

<body>body>

html>

Vanilla JavaScript and HTML (ES Modules)

yarn add @picovoice/octopus-web-en-worker

(or)

npm install @picovoice/octopus-web-en-worker
End: ${match.endSec}s (Probability: ${match.probability})`); } async function startOctopus() { // Create an Octopus Worker // Note: you receive a Worker object, _not_ an individual Octopus instance const accessKey = // .. AccessKey provided by Picovoice Console (https://picovoice.ai/console/) const OctopusWorker = await OctopusWorkerFactory.create( accessKey, octopusIndexCallback, octopusSearchCallback ); } startOctopus() ... // Send Octopus the audio signal const audioSignal = new Int16Array(/* Provide data with correct format*/) OctopusWorker.postMessage({ command: "index", input: audioSignal, }); ... const searchText = ...; OctopusWorker.postMessage({ command: "search", metadata: octopusMetadata, searchPhrase: searchText, }); ">
import { OctopusWebEnWorker } from "@picovoice/octopus-web-en-worker";

// The metadata object to save the result of indexing for later searches
let octopusMetadata = undefined;

function octopusIndexCallback(metadata) {
  octopusMetadata = metadata;
}

function octopusSearchCallback(matches) {
  console.log(`Search results (${matches.length}):`);
  console.log(`Start: ${match.startSec}s -> End: ${match.endSec}s (Probability: ${match.probability})`);
}


async function startOctopus() {
  // Create an Octopus Worker
  // Note: you receive a Worker object, _not_ an individual Octopus instance
  const accessKey = // .. AccessKey provided by Picovoice Console (https://picovoice.ai/console/)
  const OctopusWorker = await OctopusWorkerFactory.create(
    accessKey,
    octopusIndexCallback,
    octopusSearchCallback
  );
}

startOctopus()

...

// Send Octopus the audio signal
const audioSignal = new Int16Array(/* Provide data with correct format*/)
OctopusWorker.postMessage({
  command: "index",
  input: audioSignal,
});

...

const searchText = ...;
OctopusWorker.postMessage({
  command: "search",
  metadata: octopusMetadata,
  searchPhrase: searchText,
});

Releases

v1.0.0 Oct 8th, 2021

  • Initial release.
You might also like...
A python script to lookup Passport Index Dataset

visa-cli A python script to lookup Passport Index Dataset Installation pip install visa-cli Usage usage: visa-cli [-h] [-d DESTINATION_COUNTRY] [-f]

This is a virtual picture dragging application. Users may virtually slide photos across the screen. The distance between the index and middle fingers determines the movement. Smaller distances indicate click and motion, whereas bigger distances indicate only hand movement.
A set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI.

Overview This is a set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI. Make TFRecords To run t

PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing
PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

Implementation of "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement" by pytorch

This repository is used to suspend the results of our paper "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement"

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

StrengthNet Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis" https://arxiv.org/abs/2110

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

CPC_DeepCluster This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEEC

A fast, dataset-agnostic, deep visual search engine for digital art history

imgs.ai imgs.ai is a fast, dataset-agnostic, deep visual search engine for digital art history based on neural network embeddings. It utilizes modern

Comments
  • Bump terser from 5.13.1 to 5.16.1 in /binding/web

    Bump terser from 5.13.1 to 5.16.1 in /binding/web

    Bumps terser from 5.13.1 to 5.16.1.

    Changelog

    Sourced from terser's changelog.

    v5.16.1

    • Properly handle references in destructurings (const { [reference]: val } = ...)
    • Allow parsing of .#privatefield in nested classes
    • Do not evaluate operations that return large strings if that would make the output code larger
    • Make collapse_vars handle block scope correctly
    • Internal improvements: Typos (#1311), more tests, small-scale refactoring

    v5.16.0

    • Disallow private fields in object bodies (#1011)
    • Parse #privatefield in object (#1279)
    • Compress #privatefield in object

    v5.15.1

    • Fixed missing parentheses around optional chains
    • Avoid bare let or const as the bodies of if statements (#1253)
    • Small internal fixes (#1271)
    • Avoid inlining a class twice and creating two equivalent but !== classes.

    v5.15.0

    • Basic support for ES2022 class static initializer blocks.
    • Add AudioWorkletNode constructor options to domprops list (#1230)
    • Make identity function inliner not inline id(...expandedArgs)

    v5.14.2

    • Security fix for RegExps that should not be evaluated (regexp DDOS)
    • Source maps improvements (#1211)
    • Performance improvements in long property access evaluation (#1213)

    v5.14.1

    • keep_numbers option added to TypeScript defs (#1208)
    • Fixed parsing of nested template strings (#1204)

    v5.14.0

    • Switched to @​jridgewell/source-map for sourcemap generation (#1190, #1181)
    • Fixed source maps with non-terminated segments (#1106)
    • Enabled typescript types to be imported from the package (#1194)
    • Extra DOM props have been added (#1191)
    • Delete the AST while generating code, as a means to save RAM
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Convert c owned memory into ctypes owned memory for metadata objects

    Convert c owned memory into ctypes owned memory for metadata objects

    ctypes doesn't free the c_void_p from c, but if we convert it into a block of memory created from ctypes (serialize -> deserialize), then it will garbage collect and free when appropriate.

    opened by ErisMik 1
Releases(v1.2)
  • v1.2(Aug 12, 2022)

Owner
Picovoice
Edge Voice AI Platform
Picovoice
PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

Contrast to Divide: self-supervised pre-training for learning with noisy labels This is an official implementation of "Contrast to Divide: self-superv

55 Nov 23, 2022
Using OpenAI's CLIP to upscale and enhance images

CLIP Upscaler and Enhancer Using OpenAI's CLIP to upscale and enhance images Based on nshepperd's JAX CLIP Guided Diffusion v2.4 Sample Results Viewpo

Tripp Lyons 5 Jun 14, 2022
A clear, concise, simple yet powerful and efficient API for deep learning.

The Gluon API Specification The Gluon API specification is an effort to improve speed, flexibility, and accessibility of deep learning technology for

Gluon API 2.3k Dec 17, 2022
QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

152 Jan 02, 2023
Perception-aware multi-sensor fusion for 3D LiDAR semantic segmentation (ICCV 2021)

Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation (ICCV 2021) [中文|EN] 概述 本工作主要探索一种高效的多传感器(激光雷达和摄像头)融合点云语义分割方法。现有的多传感器融合方法主要将点云投影

ICE 126 Dec 30, 2022
VGGFace2-HQ - A high resolution face dataset for face editing purpose

The first open source high resolution dataset for face swapping!!! A high resolution version of VGGFace2 for academic face editing purpose

Naiyuan Liu 232 Dec 29, 2022
PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners

Masked Autoencoders: A PyTorch Implementation This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners: @

Meta Research 4.8k Jan 04, 2023
Analysing poker data from home games with friends

Poker Game Analysis Analysing poker data from home games with friends. Not a lot of data is collected, so this project is primarily focussed on descri

Stavros Karmaniolos 1 Oct 15, 2022
Text completion with Hugging Face and TensorFlow.js running on Node.js

Katana ML Text Completion 🤗 Description Runs with with Hugging Face DistilBERT and TensorFlow.js on Node.js distilbert-model - converter from Hugging

Katana ML 2 Nov 04, 2022
Official repository for Fourier model that can generate periodic signals

Conditional Generation of Periodic Signals with Fourier-Based Decoder Jiyoung Lee, Wonjae Kim, Daehoon Gwak, Edward Choi This repository provides offi

8 May 25, 2022
Official code for paper "ISNet: Costless and Implicit Image Segmentation for Deep Classifiers, with Application in COVID-19 Detection"

Official code for paper "ISNet: Costless and Implicit Image Segmentation for Deep Classifiers, with Application in COVID-19 Detection". LRPDenseNet.py

Pedro Ricardo Ariel Salvador Bassi 2 Sep 21, 2022
Simple reimplemetation experiments about FcaNet

FcaNet-CIFAR An implementation of the paper FcaNet: Frequency Channel Attention Networks on CIFAR10/CIFAR100 dataset. how to run Code: python Cifar.py

76 Feb 04, 2021
[ICCV 2021] Released code for Causal Attention for Unbiased Visual Recognition

CaaM This repo contains the codes of training our CaaM on NICO/ImageNet9 dataset. Due to my recent limited bandwidth, this codebase is still messy, wh

Wang Tan 66 Dec 31, 2022
This is the official implementation of 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection, built on SECOND.

3D-CVF This is the official implementation of 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object

YecheolKim 97 Dec 20, 2022
Locally Most Powerful Bayesian Test for Out-of-Distribution Detection using Deep Generative Models

LMPBT Supplementary code for the Paper entitled ``Locally Most Powerful Bayesian Test for Out-of-Distribution Detection using Deep Generative Models"

1 Sep 29, 2022
这是一个利用facenet和retinaface实现人脸识别的库,可以进行在线的人脸识别。

Facenet+Retinaface:人脸识别模型在Pytorch当中的实现 目录 注意事项 Attention 所需环境 Environment 文件下载 Download 预测步骤 How2predict 参考资料 Reference 注意事项 该库中包含了两个网络,分别是retinaface和

Bubbliiiing 102 Dec 30, 2022
Repo for EchoVPR: Echo State Networks for Visual Place Recognition

EchoVPR Repo for EchoVPR: Echo State Networks for Visual Place Recognition Currently under development Dirs: data: pre-collected hidden representation

Anil Ozdemir 4 Oct 04, 2022
Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs.

Lunar Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs. About Lunar can be modified to work

Zeyad Mansour 276 Jan 07, 2023
Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos Introduction This repo is official PyTorch implementatio

Gyeongsik Moon 29 Sep 24, 2022
Official PyTorch implementation of CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds

CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds Introduction This is the official PyTorch implementation of o

Yijia Weng 96 Dec 07, 2022