On-device speech-to-index engine powered by deep learning.

Overview

Octopus

GitHub

PyPI Maven Central Cocoapods

Made in Vancouver, Canada by Picovoice

Twitter URL YouTube Channel Views

Octopus is Picovoice's Speech-to-Index engine. It directly indexes speech without relying on a text representation. This acoustic-only approach boosts accuracy by removing out-of-vocabulary limitation and eliminating the problem of competing hypothesis (e.g. homophones)

Table of Contents

Demos

Python Demos

Install the demo package:

sudo pip3 install pvoctopusdemo

Run the following in the terminal:

octopus_demo  --access_key {AccessKey} --audio_paths ${AUDIO_PATHS}

Replace ${AccessKey} with your AccessKey obtained from Picovoice Console and ${AUDIO_PATHS} with a space-separated list of audio files. Octopus starts processing the audio files and asks you for search phrases and shows results interactively.

For more information about the Python demos go to demo/python.

C Demos

Build the demo:

cmake -S demo/c/ -B demo/c/build && cmake --build demo/c/build

Index a given audio file:

./demo/c/build/octopus_index_demo ${LIBRARY_PATH} ${ACCESS_KEY} ${AUDIO_PATH} ${INDEX_PATH}

Then search the index for a given phrase:

./demo/c/build/octopus_search_demo ${LIBRARY_PATH} ${MODEL_PATH} ${ACCESS_KEY} ${INDEX_PATH} ${SEARCH_PHRASE}

Replace ${LIBRARY_PATH} with path to appropriate library available under lib, ${ACCESS_KEY} with AccessKey obtained from Picovoice Console, ${AUDIO_PATH} with the path to a given audio file and format, ${INDEX_PATH} with the path to cached index file and ${SEARCH_PHRASE} to a search phrase.

For more information about C demos go to demo/c.

Android Demos

Using Android Studio, open demo/android/OctopusDemo as an Android project.

Replace "${YOUR_ACCESS_KEY_HERE}" inside MainActivity.java with your AccessKey obtained from Picovoice Console. Then run the demo.

For more information about Android demos go to demo/android.

iOS Demos

From the demo/ios/OctopusDemo, run the following to install the Octopus CocoaPod:

pod install

Replace "{YOUR_ACCESS_KEY_HERE}" inside ViewModel.swift with your AccessKey obtained from Picovoice Console. Then, using Xcode, open the generated OctopusDemo.xcworkspace and run the application.

For more information about iOS demos go to demo/ios.

Web Demos

From demo/web run the following in the terminal:

yarn
yarn start

(or)

npm install
npm run start

Open http://localhost:5000 in your browser to try the demo.

SDKs

Python

Create an instance of the engine:

import pvoctopus
access_key = ""  # AccessKey provided by Picovoice Console (https://picovoice.ai/console/)
handle = pvoctopus.create(access_key=access_key)

Index your raw audio data or file:

audio_data = [..]
metadata = handle.index(audio_data)
# or 
audio_file_path = "/path/to/my/audiofile.wav"
metadata = handle.index_file(audio_file_path)

Then search the metadata for phrases:

{match.end_sec} ({match.probablity})") ">
avocado_matches = matches['avocado']
for match in avocado_matches:
    print(f"Match for `avocado`: {match.start_sec} -> {match.end_sec} ({match.probablity})")

When done the handle resources have to be released explicitly:

handle.delete()

C

pv_octopus.h header file contains relevant information. Build an instance of the object:

    const char *model_path = "..."; // absolute path to the model file available at `lib/common/octopus_params.pv`
    const char *access_key = "..." // AccessKey provided by Picovoice Console (https://picovoice.ai/console/)
    pv_octopus_t *handle = NULL;
    pv_status_t status = pv_octopus_init(access_key, model_path, &handle);
    if (status != PV_STATUS_SUCCESS) {
        // error handling logic
    }

Index audio data using constructed object:

const char *audio_path = "..."; // absolute path to the audio file to be indexed
void *indices = NULL;
int32_t num_indices_bytes = 0;
pv_status_t status = pv_octopus_index_file(handle, audio_path, &indices, &num_indices_bytes);
if (status != PV_STATUS_SUCCESS) {
    // error handling logic
}

Search the indexed data:

const char *phrase = "...";
pv_octopus_match_t *matches = NULL;
int32_t num_matches = 0;
pv_status_t status = pv_octopus_search(handle, indices, num_indices_bytes, phrase, &matches, &num_matches);
if (status != PV_STATUS_SUCCESS) {
    // error handling logic
}

When done be sure to release the acquired resources:

pv_octopus_delete(handle);

Android

Create an instance of the engine:

import ai.picovoice.octopus.*;

final String accessKey = "..."; // AccessKey provided by Picovoice Console (https://picovoice.ai/console/)
try {
    Octopus handle = new Octopus.Builder(accessKey).build(appContext);
} catch (OctopusException ex) { }

Index audio data using constructed object:

final String audioFilePath = "/path/to/my/audiofile.wav"
try {
    OctopusMetadata metadata = handle.indexAudioFile(audioFilePath);
} catch (OctopusException ex) { }

Search the indexed data:

HashMap <String, OctopusMatch[]> matches = handle.search(metadata, phrases);

for (Map.Entry<String, OctopusMatch[]> entry : map.entrySet()) {
    final String phrase = entry.getKey();
    for (OctopusMatch phraseMatch : entry.getValue()){
        final float startSec = phraseMatch.getStartSec();
        final float endSec = phraseMatch.getEndSec();
        final float probability = phraseMatch.getProbability();
    }
}

When done be sure to release the acquired resources:

metadata.delete();
handle.delete();

iOS

Create an instance of the engine:

import Octopus

let accessKey : String = // .. AccessKey provided by Picovoice Console (https://picovoice.ai/console/)
do {
    let handle = try Octopus(accessKey: accessKey)
} catch { }

Index audio data using constructed object:

let audioFilePath = "/path/to/my/audiofile.wav"
do {
    let metadata = try handle.indexAudioFile(path: audioFilePath)
} catch { }

Search the indexed data:

let matches: Dictionary<String, [OctopusMatch]> = try octopus.search(metadata: metadata, phrases: phrases)
for (phrase, phraseMatches) in matches {
    for phraseMatch in phraseMatches {
        var startSec = phraseMatch.startSec;
        var endSec = phraseMatch.endSec;
        var probability = phraseMatch.probability;
    }
}

When done be sure to release the acquired resources:

handle.delete();

Web

Octopus is available on modern web browsers (i.e., not Internet Explorer) via WebAssembly. Octopus is provided pre-packaged as a Web Worker to allow it to perform processing off the main thread.

Vanilla JavaScript and HTML (CDN Script Tag)

">
>
<html lang="en">

<head>
  <script src="https://unpkg.com/@picovoice/octopus-web-en-worker/dist/iife/index.js">script>
  <script type="application/javascript">
    // The metadata object to save the result of indexing for later searches
    let octopusMetadata = undefined

    function octopusIndexCallback(metadata) {
      octopusMetadata = metadata
    }

    function octopusSearchCallback(matches) {
      console.log(`Search results (${matches.length}):`)
      console.log(`Start: ${match.startSec}s -> End: ${match.endSec}s (Probability: ${match.probability})`)
    }

    async function startOctopus() {
      // Create an Octopus Worker
      // Note: you receive a Worker object, _not_ an individual Octopus instance
      const accessKey = ... // AccessKey string provided by Picovoice Console (https://picovoice.ai/console/)
      const OctopusWorker = await OctopusWorkerFactory.create(
        accessKey,
        octopusIndexCallback,
        octopusSearchCallback
      )
    }

    document.addEventListener("DOMContentLoaded", function () {
      startOctopus();
      // Send Octopus the audio signal
      const audioSignal = new Int16Array(/* Provide data with correct format*/)
      OctopusWorker.postMessage({
        command: "index",
        input: audioSignal,
      });
    });

    const searchText = ...
    OctopusWorker.postMessage({
      command: "search",
      metadata: octopusMetadata,
      searchPhrase: searchText,
    });
  script>
head>

<body>body>

html>

Vanilla JavaScript and HTML (ES Modules)

yarn add @picovoice/octopus-web-en-worker

(or)

npm install @picovoice/octopus-web-en-worker
End: ${match.endSec}s (Probability: ${match.probability})`); } async function startOctopus() { // Create an Octopus Worker // Note: you receive a Worker object, _not_ an individual Octopus instance const accessKey = // .. AccessKey provided by Picovoice Console (https://picovoice.ai/console/) const OctopusWorker = await OctopusWorkerFactory.create( accessKey, octopusIndexCallback, octopusSearchCallback ); } startOctopus() ... // Send Octopus the audio signal const audioSignal = new Int16Array(/* Provide data with correct format*/) OctopusWorker.postMessage({ command: "index", input: audioSignal, }); ... const searchText = ...; OctopusWorker.postMessage({ command: "search", metadata: octopusMetadata, searchPhrase: searchText, }); ">
import { OctopusWebEnWorker } from "@picovoice/octopus-web-en-worker";

// The metadata object to save the result of indexing for later searches
let octopusMetadata = undefined;

function octopusIndexCallback(metadata) {
  octopusMetadata = metadata;
}

function octopusSearchCallback(matches) {
  console.log(`Search results (${matches.length}):`);
  console.log(`Start: ${match.startSec}s -> End: ${match.endSec}s (Probability: ${match.probability})`);
}


async function startOctopus() {
  // Create an Octopus Worker
  // Note: you receive a Worker object, _not_ an individual Octopus instance
  const accessKey = // .. AccessKey provided by Picovoice Console (https://picovoice.ai/console/)
  const OctopusWorker = await OctopusWorkerFactory.create(
    accessKey,
    octopusIndexCallback,
    octopusSearchCallback
  );
}

startOctopus()

...

// Send Octopus the audio signal
const audioSignal = new Int16Array(/* Provide data with correct format*/)
OctopusWorker.postMessage({
  command: "index",
  input: audioSignal,
});

...

const searchText = ...;
OctopusWorker.postMessage({
  command: "search",
  metadata: octopusMetadata,
  searchPhrase: searchText,
});

Releases

v1.0.0 Oct 8th, 2021

  • Initial release.
You might also like...
A python script to lookup Passport Index Dataset

visa-cli A python script to lookup Passport Index Dataset Installation pip install visa-cli Usage usage: visa-cli [-h] [-d DESTINATION_COUNTRY] [-f]

This is a virtual picture dragging application. Users may virtually slide photos across the screen. The distance between the index and middle fingers determines the movement. Smaller distances indicate click and motion, whereas bigger distances indicate only hand movement.
A set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI.

Overview This is a set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI. Make TFRecords To run t

PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing
PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

Implementation of "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement" by pytorch

This repository is used to suspend the results of our paper "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement"

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

StrengthNet Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis" https://arxiv.org/abs/2110

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

CPC_DeepCluster This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEEC

A fast, dataset-agnostic, deep visual search engine for digital art history

imgs.ai imgs.ai is a fast, dataset-agnostic, deep visual search engine for digital art history based on neural network embeddings. It utilizes modern

Comments
  • Bump terser from 5.13.1 to 5.16.1 in /binding/web

    Bump terser from 5.13.1 to 5.16.1 in /binding/web

    Bumps terser from 5.13.1 to 5.16.1.

    Changelog

    Sourced from terser's changelog.

    v5.16.1

    • Properly handle references in destructurings (const { [reference]: val } = ...)
    • Allow parsing of .#privatefield in nested classes
    • Do not evaluate operations that return large strings if that would make the output code larger
    • Make collapse_vars handle block scope correctly
    • Internal improvements: Typos (#1311), more tests, small-scale refactoring

    v5.16.0

    • Disallow private fields in object bodies (#1011)
    • Parse #privatefield in object (#1279)
    • Compress #privatefield in object

    v5.15.1

    • Fixed missing parentheses around optional chains
    • Avoid bare let or const as the bodies of if statements (#1253)
    • Small internal fixes (#1271)
    • Avoid inlining a class twice and creating two equivalent but !== classes.

    v5.15.0

    • Basic support for ES2022 class static initializer blocks.
    • Add AudioWorkletNode constructor options to domprops list (#1230)
    • Make identity function inliner not inline id(...expandedArgs)

    v5.14.2

    • Security fix for RegExps that should not be evaluated (regexp DDOS)
    • Source maps improvements (#1211)
    • Performance improvements in long property access evaluation (#1213)

    v5.14.1

    • keep_numbers option added to TypeScript defs (#1208)
    • Fixed parsing of nested template strings (#1204)

    v5.14.0

    • Switched to @​jridgewell/source-map for sourcemap generation (#1190, #1181)
    • Fixed source maps with non-terminated segments (#1106)
    • Enabled typescript types to be imported from the package (#1194)
    • Extra DOM props have been added (#1191)
    • Delete the AST while generating code, as a means to save RAM
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Convert c owned memory into ctypes owned memory for metadata objects

    Convert c owned memory into ctypes owned memory for metadata objects

    ctypes doesn't free the c_void_p from c, but if we convert it into a block of memory created from ctypes (serialize -> deserialize), then it will garbage collect and free when appropriate.

    opened by ErisMik 1
Releases(v1.2)
  • v1.2(Aug 12, 2022)

Owner
Picovoice
Edge Voice AI Platform
Picovoice
This repo provides function call to track multi-objects in videos

Custom Object Tracking Introduction This repo provides function call to track multi-objects in videos with a given trained object detection model and

Jeff Lo 51 Nov 22, 2022
A Transformer-Based Siamese Network for Change Detection

ChangeFormer: A Transformer-Based Siamese Network for Change Detection (Under review at IGARSS-2022) Wele Gedara Chaminda Bandara, Vishal M. Patel Her

Wele Gedara Chaminda Bandara 214 Dec 29, 2022
We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

Multi-Modal Self-Supervision using GDT and StiCa This is an official pytorch implementation of papers: Multi-modal Self-Supervision from Generalized D

Facebook Research 42 Dec 09, 2022
Syed Waqas Zamir 906 Dec 30, 2022
Semantic Segmentation of images using PixelLib with help of Pascalvoc dataset trained with Deeplabv3+ framework.

CARscan- Approach 1 - Segmentation of images by detecting contours. It failed because in images with elements along with cars were also getting detect

Padmanabha Banerjee 5 Jul 29, 2021
AWS provides a Python SDK, "Boto3" ,which can be used to access the AWS-account from the local.

Boto3 - The AWS SDK for Python Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to wri

Shreyas Srivastava 1 Oct 25, 2021
A 10000+ hours dataset for Chinese speech recognition

WenetSpeech Official website | Paper A 10000+ Hours Multi-domain Chinese Corpus for Speech Recognition Download Please visit the official website, rea

310 Jan 03, 2023
The toolkit to generate auto labeled datasets

Ozeu Ozeu is the toolkit to autolabal dataset for instance segmentation. You can generate datasets labaled with segmentation mask and bounding box fro

Xiong Jie 28 Mar 28, 2022
A collection of Google research projects related to Federated Learning and Federated Analytics.

Federated Research Federated Research is a collection of research projects related to Federated Learning and Federated Analytics. Federated learning i

Google Research 483 Jan 05, 2023
Galactic and gravitational dynamics in Python

Gala is a Python package for Galactic and gravitational dynamics. Documentation The documentation for Gala is hosted on Read the docs. Installation an

Adrian Price-Whelan 101 Dec 22, 2022
How will electric vehicles affect traffic congestion and energy consumption: an integrated modelling approach

EV-charging-impact This repository contains the code that has been used for the Queue modelling for the paper "How will electric vehicles affect traff

7 Nov 30, 2022
Python scripts to detect faces in Python with the BlazeFace Tensorflow Lite models

Python scripts to detect faces using Python with the BlazeFace Tensorflow Lite models. Tested on Windows 10, Tensorflow 2.4.0 (Python 3.8).

Ibai Gorordo 46 Nov 17, 2022
Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide.

SARS-CoV-2 processing requests Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide. Prerequisites This autom

useGalaxy.eu 17 Aug 13, 2022
[NeurIPS 2021] "Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems"

Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems Introduction Multi-agent control i

VITA 6 May 05, 2022
Implementation of ECCV20 paper: the devil is in classification: a simple framework for long-tail object detection and instance segmentation

Implementation of our ECCV 2020 paper The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation This repo contains code o

twang 98 Sep 17, 2022
ReSSL: Relational Self-Supervised Learning with Weak Augmentation

ReSSL: Relational Self-Supervised Learning with Weak Augmentation This repository contains PyTorch evaluation code, training code and pretrained model

mingkai 45 Oct 25, 2022
Hydra: an Extensible Fuzzing Framework for Finding Semantic Bugs in File Systems

Hydra: An Extensible Fuzzing Framework for Finding Semantic Bugs in File Systems Paper Finding Semantic Bugs in File Systems with an Extensible Fuzzin

gts3.org (<a href=[email protected])"> 129 Dec 15, 2022
Use AI to generate a optimized stock portfolio

Use AI, Modern Portfolio Theory, and Monte Carlo simulation's to generate a optimized stock portfolio that minimizes risk while maximizing returns. Ho

Greg James 30 Dec 22, 2022
CLNTM - Contrastive Learning for Neural Topic Model

Contrastive Learning for Neural Topic Model This repository contains the impleme

Thong Thanh Nguyen 25 Nov 24, 2022
The official PyTorch code implementation of "Personalized Trajectory Prediction via Distribution Discrimination" in ICCV 2021.

Personalized Trajectory Prediction via Distribution Discrimination (DisDis) The official PyTorch code implementation of "Personalized Trajectory Predi

25 Dec 20, 2022