Memory efficient transducer loss computation

Overview

Introduction

This project implements the optimization techniques proposed in Improving RNN Transducer Modeling for End-to-End Speech Recognition to reduce the memory consumption for computing transducer loss.

How does it differ from the RNN-T loss from torchaudio

It produces same output as torchaudio for the same input, so optimized_transducer should be equivalent to torchaudio.functional.rnnt_loss().

This project is more memory efficient and potentially faster (TODO: This needs some benchmarks)

Also, torchaudio accepts only output from nn.Linear, but we also support output from log-softmax (You can set the option from_log_softmax to True in this case).

How does it differ from warp-transducer

It borrows the methods of computing alpha and beta from warp-transducer. Therefore, optimized_transducer produces the same alpha and beta as warp-transducer for the same input.

However, warp-transducer produces different gradients for CPU and CUDA when using the same input. See https://github.com/HawkAaron/warp-transducer/issues/93

This project produces consistent gradient on CPU and CUDA for the same input, just like what torchaudio is doing. (We borrow the gradient computation formula from torchaudio).

optimized_transducer uses less memory than that of warp-transducer and is potentially faster. (TODO: This needs some benchmarks).

Installation

You can install it via pip:

pip install optimized_transducer

To check that optimized_transducer was installed successfully, please run

python3 -c "import optimized_transducer; print(optimized_transducer.__version__)"

which should print the version of the installed optimized_transducer, e.g., 1.2.

Installation FAQ

What operating systems are supported ?

It has been tested on Ubuntu 18.04. It should also work on macOS and other unixes systems. It may work on Windows, though it is not tested.

How to display installation log ?

Use

pip install --verbose optimized_transducer

How to reduce installation time ?

Use

export OT_MAKE_ARGS="-j"
pip install --verbose optimized_transducer

It will pass -j to make.

Which version of PyTorch is supported ?

It has been tested on PyTorch >= 1.5.0. It may work on PyTorch < 1.5.0

How to install a CPU version of optimized_transducer ?

Use

export OT_CMAKE_ARGS="-DCMAKE_BUILD_TYPE=Release -DOT_WITH_CUDA=OFF"
export OT_MAKE_ARGS="-j"
pip install --verbose optimized_transducer

It will pass -DCMAKE_BUILD_TYPE=Release -DOT_WITH_CUDA=OFF to cmake.

What Python versions are supported ?

Python >= 3.6 is known to work. It may work for Python 2.7, though it is not tested.

Where to get help if I have problems with the installation ?

Please file an issue at https://github.com/csukuangfj/optimized_transducer/issues and describe your problem there.

Usage

optimized_transducer expects that the output shape of the joint network is NOT (N, T, U, V), but is (sum_all_TU, V), which is a concatenation of 2-D tensors: (T_1 * U_1, V), (T_2 * U_2, V), ..., (T_N, U_N, V). Note: (T_1 * U_1, V) is just the reshape of a 3-D tensor (T_1, U_1, V).

Suppose your original joint network looks somewhat like the following:

encoder_out = torch.rand(N, T, D) # from the encoder
decoder_out = torch.rand(N, U, D) # from the decoder, i.e., the prediction network

encoder_out = encoder_out.unsqueeze(2) # Now encoder out is (N, T, 1, D)
decoder_out = decoder_out.unsqueeze(1) # Now decoder out is (N, 1, U, D)

x = encoder_out + decoder_out # x is of shape (N, T, U, D)
activation = torch.tanh(x)

logits = linear(activation) # linear is an instance of `nn.Linear`.

loss = torchaudio.functional.rnnt_loss(
    logits=logits,
    targets=targets,
    logit_lengths=logit_lengths,
    target_lengths=target_lengths,
    blank=blank_id,
    reduction="mean",
)

You need to change it to the following:

encoder_out = torch.rand(N, T, D) # from the encoder
decoder_out = torch.rand(N, U, D) # from the decoder, i.e., the prediction network

encoder_out_list = [encoder_out[i, :logit_lengths[i], :] for i in range(N)]
decoder_out_list = [decoder_out[i, :target_lengths[i]+1, :] for i in range(N)]

x = [e.unsqueeze(1) + d.unsqueeze(0) for e, d in zip(encoder_out_list, decoder_out_list)]
x = [p.reshape(-1, D) for p in x]
x = torch.cat(x)

activation = torch.tanh(x)
logits = linear(activation) # linear is an instance of `nn.Linear`.

loss = optimized_transducer.transducer_loss(
    logits=logits,
    targets=targets,
    logit_lengths=logit_lengths,
    target_lengths=target_lengths,
    blank=blank_id,
    reduction="mean",
    from_log_softmax=False,
)

Caution: We used from_log_softmax=False in the above example since logits is the output of nn.Linear.

Hint: If logits is the output of log-softmax, you should use from_log_softmax=True.

In most cases, you should pass the output of nn.Linear to compute the loss, i.e., use from_log_softmax=False, to save memory.

If you want to do some operations on the output of log-softmax before feeding it to optimized_transducer.transducer_loss(), from_log_softmax=True is helpful in this case. But be aware that this will increase the memory usage.

For more usages, please refer to

For developers

As a developer, you don't need to use pip install optimized_transducer. To make development easier, you can use

git clone https://github.com/csukuangfj/optimized_transducer.git
cd optimized_transducer
mkdir build
cd build
cmake -DOT_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
export PYTHONPATH=$PWD/../optimized_transducer/python:$PWD/lib:$PYTHONPATH

I usually create a file path.sh inside the build direcotry, containing

export PYTHONPATH=$PWD/../optimized_transducer/python:$PWD/lib:$PYTHONPATH

so what you need to do is

cd optimized_transducer/build
source path.sh

# Then you are ready to run Python tests
python3 optimized_transducer/python/tests/test_compute_transducer_loss.py

# You can also use "import optimized_transducer" in your Python projects

To run all Python tests, use

cd optimized_transducer/build
ctest --output-on-failure
Comments
  • Issue with optimized-transducer installation

    Issue with optimized-transducer installation

    I started installing K2, lhotse and Icefall. So far I was able to test K2 and it works perfectly, lhotse also works but when I tried to install icefall I got a weird issue about optimized-transducer. The log is below.

    Collecting kaldilm Using cached kaldilm-1.11-cp38-cp38-linux_x86_64.whl Collecting kaldialign Using cached kaldialign-0.2-cp38-cp38-linux_x86_64.whl Requirement already satisfied: sentencepiece>=0.1.96 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from -r requirements.txt (line 3)) (0.1.96) Requirement already satisfied: tensorboard in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from -r requirements.txt (line 4)) (2.7.0) Requirement already satisfied: typeguard in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from -r requirements.txt (line 5)) (2.13.3) Collecting optimized_transducer Using cached optimized_transducer-1.3.tar.gz (47 kB) Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (1.8.1) Requirement already satisfied: werkzeug>=0.11.15 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (2.0.2) Requirement already satisfied: numpy>=1.12.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (1.21.2) Requirement already satisfied: protobuf>=3.6.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (3.19.3) Requirement already satisfied: wheel>=0.26 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (0.37.1) Requirement already satisfied: setuptools>=41.0.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (58.0.4) Requirement already satisfied: grpcio>=1.24.3 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (1.43.0) Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (0.4.6) Requirement already satisfied: absl-py>=0.4 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (1.0.0) Requirement already satisfied: google-auth<3,>=1.6.3 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (2.3.3) Requirement already satisfied: requests<3,>=2.21.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (2.27.1) Requirement already satisfied: markdown>=2.6.8 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (3.3.6) Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (0.6.1) Requirement already satisfied: six in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from absl-py>=0.4->tensorboard->-r requirements.txt (line 4)) (1.16.0) Requirement already satisfied: cachetools<5.0,>=2.0.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 4)) (4.2.4) Requirement already satisfied: pyasn1-modules>=0.2.1 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 4)) (0.2.8) Requirement already satisfied: rsa<5,>=3.1.4 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 4)) (4.8) Requirement already satisfied: requests-oauthlib>=0.7.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard->-r requirements.txt (line 4)) (1.3.0) Requirement already satisfied: importlib-metadata>=4.4 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from markdown>=2.6.8->tensorboard->-r requirements.txt (line 4)) (4.10.1) Requirement already satisfied: zipp>=0.5 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard->-r requirements.txt (line 4)) (3.7.0) Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 4)) (0.4.8) Requirement already satisfied: charset-normalizer~=2.0.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->-r requirements.txt (line 4)) (2.0.10) Requirement already satisfied: certifi>=2017.4.17 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->-r requirements.txt (line 4)) (2021.10.8) Requirement already satisfied: idna<4,>=2.5 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->-r requirements.txt (line 4)) (3.3) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->-r requirements.txt (line 4)) (1.26.8) Requirement already satisfied: oauthlib>=3.0.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard->-r requirements.txt (line 4)) (3.1.1) Building wheels for collected packages: optimized-transducer Building wheel for optimized-transducer (setup.py): started Building wheel for optimized-transducer (setup.py): finished with status 'error' ERROR: Command errored out with exit status 1: command: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"'; file='"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-qa004082 cwd: /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/ Complete output (153 lines): running bdist_wheel running build running build_py creating build creating build/lib.linux-x86_64-3.8 creating build/lib.linux-x86_64-3.8/optimized_transducer copying optimized_transducer/python/optimized_transducer/init.py -> build/lib.linux-x86_64-3.8/optimized_transducer copying optimized_transducer/python/optimized_transducer/transducer_loss.py -> build/lib.linux-x86_64-3.8/optimized_transducer running build_ext For fast compilation, run: export OT_MAKE_ARGS="-j"; python setup.py install Setting PYTHON_EXECUTABLE to /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python build command is:

              cd build/temp.linux-x86_64-3.8
    
              cmake -DCMAKE_BUILD_TYPE=Release -DPYTHON_EXECUTABLE=/home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173
    
              make  _optimized_transducer
    

    -- Enabled languages: CXX;CUDA -- The CXX compiler identification is GNU 6.5.0 -- The CUDA compiler identification is NVIDIA 11.1.74 -- Check for working CXX compiler: /cm/shared/apps/gcc6/6.5.0/bin/g++ -- Check for working CXX compiler: /cm/shared/apps/gcc6/6.5.0/bin/g++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Check for working CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc -- Check for working CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc -- works -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Automatic GPU detection failed. Building for common architectures. -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX -- OT_COMPUTE_ARCH_FLAGS: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86 -- OT_COMPUTE_ARCH_CANDIDATES 35;50;60;61;70;75;80;86 -- Adding arch 35 -- Adding arch 50 -- Adding arch 60 -- Adding arch 61 -- Adding arch 70 -- Adding arch 75 -- Adding arch 80 -- Adding arch 86 -- OT_COMPUTE_ARCHS: 35;50;60;61;70;75;80;86 -- Downloading pybind11 -- pybind11 is downloaded to /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/_deps/pybind11-src -- pybind11 v2.6.0 -- Found PythonInterp: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python (found version "3.8.12") -- Found PythonLibs: /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/libpython3.8.so -- Performing Test HAS_FLTO -- Performing Test HAS_FLTO - Success -- Python executable: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python -- Looking for C++ include pthread.h -- Looking for C++ include pthread.h - found -- Looking for pthread_create -- Looking for pthread_create - not found -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - found -- Found Threads: TRUE CMake Warning (dev) at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:29 (find_package): Policy CMP0074 is not set: find_package uses _ROOT variables. Run "cmake --help-policy CMP0074" for policy details. Use the cmake_policy command to set the policy and suppress this warning.

    Environment variable CUDA_ROOT is set to:
    
      /cm/shared/apps/cuda11.1/toolkit/11.1.0
    
    For compatibility, CMake is ignoring the variable.
    

    Call Stack (most recent call first): /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include) /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) cmake/torch.cmake:11 (find_package) CMakeLists.txt:130 (include) This warning is for project developers. Use -Wno-dev to suppress it.

    -- Found CUDA: /cm/shared/apps/cuda11.1/toolkit/11.1.0 (found version "11.1") -- Caffe2: CUDA detected: 11.1 -- Caffe2: CUDA nvcc is: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc -- Caffe2: CUDA toolkit directory: /cm/shared/apps/cuda11.1/toolkit/11.1.0 -- Caffe2: Header version is: 11.1 -- Could NOT find CUDNN (missing: CUDNN_LIBRARY_PATH CUDNN_INCLUDE_PATH) CMake Warning at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:111 (message): Caffe2: Cannot find cuDNN library. Turning the option off Call Stack (most recent call first): /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include) /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) cmake/torch.cmake:11 (find_package) CMakeLists.txt:130 (include)

    -- /cm/shared/apps/cuda11.1/toolkit/11.1.0/lib64/libnvrtc.so shorthash is 1f6b333a -- Automatic GPU detection failed. Building for common architectures. -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX -- Added CUDA NVCC flags for: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86 CMake Error at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:96 (message): Your installed Caffe2 version uses cuDNN but I cannot find the cuDNN libraries. Please set the proper cuDNN prefixes and / or install cuDNN. Call Stack (most recent call first): /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) cmake/torch.cmake:11 (find_package) CMakeLists.txt:130 (include)

    -- Configuring incomplete, errors occurred! See also "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeOutput.log". See also "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeError.log". make: *** No rule to make target `_optimized_transducer'. Stop. Traceback (most recent call last): File "", line 1, in File "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py", line 101, in setuptools.setup( File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/init.py", line 153, in setup return distutils.core.setup(**attrs) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/core.py", line 148, in setup dist.run_commands() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 299, in run self.run_command('build') File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build.py", line 135, in run self.run_command(cmd_name) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run _build_ext.run(self) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 340, in run self.build_extensions() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions self._build_extensions_serial() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial self.build_extension(ext) File "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py", line 60, in build_extension raise Exception( Exception: Build optimized_transducer failed. Please check the error message. You can ask for help by creating an issue on GitHub.

    Click: https://github.com/csukuangfj/optimized_transducer/issues/new


    ERROR: Failed building wheel for optimized-transducer Running setup.py clean for optimized-transducer Failed to build optimized-transducer Installing collected packages: optimized-transducer, kaldilm, kaldialign Running setup.py install for optimized-transducer: started Running setup.py install for optimized-transducer: finished with status 'error' ERROR: Command errored out with exit status 1: command: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"'; file='"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-mcbah0p8/install-record.txt --single-version-externally-managed --compile --install-headers /home/local/QCRI/ahussein/anaconda3/envs/k2/include/python3.8/optimized-transducer cwd: /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/ Complete output (155 lines): running install running build running build_py creating build creating build/lib.linux-x86_64-3.8 creating build/lib.linux-x86_64-3.8/optimized_transducer copying optimized_transducer/python/optimized_transducer/init.py -> build/lib.linux-x86_64-3.8/optimized_transducer copying optimized_transducer/python/optimized_transducer/transducer_loss.py -> build/lib.linux-x86_64-3.8/optimized_transducer running build_ext For fast compilation, run: export OT_MAKE_ARGS="-j"; python setup.py install Setting PYTHON_EXECUTABLE to /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python build command is:

                cd build/temp.linux-x86_64-3.8
    
                cmake -DCMAKE_BUILD_TYPE=Release -DPYTHON_EXECUTABLE=/home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173
    
                make  _optimized_transducer
    
    -- Enabled languages: CXX;CUDA
    -- The CXX compiler identification is GNU 6.5.0
    -- The CUDA compiler identification is NVIDIA 11.1.74
    -- Check for working CXX compiler: /cm/shared/apps/gcc6/6.5.0/bin/g++
    -- Check for working CXX compiler: /cm/shared/apps/gcc6/6.5.0/bin/g++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Check for working CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc
    -- Check for working CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc -- works
    -- Detecting CUDA compiler ABI info
    -- Detecting CUDA compiler ABI info - done
    -- Automatic GPU detection failed. Building for common architectures.
    -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX
    -- OT_COMPUTE_ARCH_FLAGS: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
    -- OT_COMPUTE_ARCH_CANDIDATES 35;50;60;61;70;75;80;86
    -- Adding arch 35
    -- Adding arch 50
    -- Adding arch 60
    -- Adding arch 61
    -- Adding arch 70
    -- Adding arch 75
    -- Adding arch 80
    -- Adding arch 86
    -- OT_COMPUTE_ARCHS: 35;50;60;61;70;75;80;86
    -- Downloading pybind11
    -- pybind11 is downloaded to /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/_deps/pybind11-src
    -- pybind11 v2.6.0
    -- Found PythonInterp: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python (found version "3.8.12")
    -- Found PythonLibs: /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/libpython3.8.so
    -- Performing Test HAS_FLTO
    -- Performing Test HAS_FLTO - Success
    -- Python executable: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python
    -- Looking for C++ include pthread.h
    -- Looking for C++ include pthread.h - found
    -- Looking for pthread_create
    -- Looking for pthread_create - not found
    -- Looking for pthread_create in pthreads
    -- Looking for pthread_create in pthreads - not found
    -- Looking for pthread_create in pthread
    -- Looking for pthread_create in pthread - found
    -- Found Threads: TRUE
    CMake Warning (dev) at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:29 (find_package):
      Policy CMP0074 is not set: find_package uses <PackageName>_ROOT variables.
      Run "cmake --help-policy CMP0074" for policy details.  Use the cmake_policy
      command to set the policy and suppress this warning.
    
      Environment variable CUDA_ROOT is set to:
    
        /cm/shared/apps/cuda11.1/toolkit/11.1.0
    
      For compatibility, CMake is ignoring the variable.
    Call Stack (most recent call first):
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include)
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
      cmake/torch.cmake:11 (find_package)
      CMakeLists.txt:130 (include)
    This warning is for project developers.  Use -Wno-dev to suppress it.
    
    -- Found CUDA: /cm/shared/apps/cuda11.1/toolkit/11.1.0 (found version "11.1")
    -- Caffe2: CUDA detected: 11.1
    -- Caffe2: CUDA nvcc is: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc
    -- Caffe2: CUDA toolkit directory: /cm/shared/apps/cuda11.1/toolkit/11.1.0
    -- Caffe2: Header version is: 11.1
    -- Could NOT find CUDNN (missing: CUDNN_LIBRARY_PATH CUDNN_INCLUDE_PATH)
    CMake Warning at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:111 (message):
      Caffe2: Cannot find cuDNN library.  Turning the option off
    Call Stack (most recent call first):
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include)
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
      cmake/torch.cmake:11 (find_package)
      CMakeLists.txt:130 (include)
    
    
    -- /cm/shared/apps/cuda11.1/toolkit/11.1.0/lib64/libnvrtc.so shorthash is 1f6b333a
    -- Automatic GPU detection failed. Building for common architectures.
    -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX
    -- Added CUDA NVCC flags for: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
    CMake Error at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:96 (message):
      Your installed Caffe2 version uses cuDNN but I cannot find the cuDNN
      libraries.  Please set the proper cuDNN prefixes and / or install cuDNN.
    Call Stack (most recent call first):
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
      cmake/torch.cmake:11 (find_package)
      CMakeLists.txt:130 (include)
    
    
    -- Configuring incomplete, errors occurred!
    See also "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeOutput.log".
    See also "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeError.log".
    make: *** No rule to make target `_optimized_transducer'.  Stop.
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py", line 101, in <module>
        setuptools.setup(
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
        return distutils.core.setup(**attrs)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/command/install.py", line 61, in run
        return orig.install.run(self)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/install.py", line 545, in run
        self.run_command('build')
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build.py", line 135, in run
        self.run_command(cmd_name)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
        _build_ext.run(self)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 340, in run
        self.build_extensions()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
        self._build_extensions_serial()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
        self.build_extension(ext)
      File "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py", line 60, in build_extension
        raise Exception(
    Exception:
    Build optimized_transducer failed. Please check the error message.
    You can ask for help by creating an issue on GitHub.
    
    Click:
    	https://github.com/csukuangfj/optimized_transducer/issues/new
    
    ----------------------------------------
    

    ERROR: Command errored out with exit status 1: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"'; file='"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-mcbah0p8/install-record.txt --single-version-externally-managed --compile --install-headers /home/local/QCRI/ahussein/anaconda3/envs/k2/include/python3.8/optimized-transducer Check the logs for full command output.

    opened by AmirHussein96 8
  • Warprnnt gradient for CPU

    Warprnnt gradient for CPU

    @csukuangfj Just wanted to note that the gradient is not incorrect for CPU vs GPU, the instructions clearly state that for CPU you need to provide log_softmax(joint-logits) whereas for the GPU you should only provide joint-logits since the cuda kernel will efficiently compute the log_softmax internally.

    Anyway yours is also an efficient implementation, also written in c++, could you benchmark the solutions if you have time ? Even a naive one would give some hint as to speed in relative terms. The memory efficient implementation of yours is very interesting too, which reduces speed but saves a lot of memory.

    opened by titu1994 2
  • "ModuleNotFoundError: No module named '_optimized_transducer'" when testing.

    I install the optimized_transducer as follows:

    git clone https://github.com/csukuangfj/optimized_transducer.git
    cd optimized_transducer
    mkdir build
    cd build
    cmake -DOT_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
    export PYTHONPATH=$PWD/../optimized_transducer/python:$PWD/lib:$PYTHONPATH
    

    The cmake log as follows:

    -- Enabled languages: CXX;CUDA
    -- The CXX compiler identification is GNU 7.5.0
    -- The CUDA compiler identification is NVIDIA 10.1.243
    -- Check for working CXX compiler: /usr/bin/c++
    -- Check for working CXX compiler: /usr/bin/c++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
    -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
    -- Detecting CUDA compiler ABI info
    -- Detecting CUDA compiler ABI info - done
    -- Autodetected CUDA architecture(s):  7.0
    -- OT_COMPUTE_ARCH_FLAGS: -gencode;arch=compute_70,code=sm_70
    -- OT_COMPUTE_ARCH_CANDIDATES 35;50;60;61;70;75
    -- Skipping arch 35
    -- Skipping arch 50
    -- Skipping arch 60
    -- Skipping arch 61
    -- Adding arch 70
    -- Skipping arch 75
    -- OT_COMPUTE_ARCHS: 70
    -- Downloading pybind11
    -- pybind11 is downloaded to /ceph-meixu/luomingshuang/optimized_transducer/build/_deps/pybind11-src
    -- pybind11 v2.6.0
    -- Found PythonInterp: /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/bin/python (found version "3.8.11")
    -- Found PythonLibs: /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/libpython3.8.so
    -- Performing Test HAS_FLTO
    -- Performing Test HAS_FLTO - Success
    -- Python executable: /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/bin/python
    -- Looking for C++ include pthread.h
    -- Looking for C++ include pthread.h - found
    -- Looking for pthread_create
    -- Looking for pthread_create - not found
    -- Looking for pthread_create in pthreads
    -- Looking for pthread_create in pthreads - not found
    -- Looking for pthread_create in pthread
    -- Looking for pthread_create in pthread - found
    -- Found Threads: TRUE
    -- Found CUDA: /usr/local/cuda (found version "10.1")
    -- Caffe2: CUDA detected: 10.1
    -- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
    -- Caffe2: CUDA toolkit directory: /usr/local/cuda
    -- Caffe2: Header version is: 10.1
    -- Found CUDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so
    -- Found cuDNN: v7.6.2  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libcudnn.so)
    -- Autodetected CUDA architecture(s):  7.0
    -- Added CUDA NVCC flags for: -gencode;arch=compute_70,code=sm_70
    -- Found Torch: /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/torch/lib/libtorch.so
    -- PyTorch version: 1.7.0+cu101
    -- PyTorch cuda version: 10.1
    -- Use FetchContent provided by k2
    -- Downloading googletest
    
    -- googletest is downloaded to /ceph-meixu/luomingshuang/optimized_transducer/build/_deps/googletest-src
    -- googletest's binary dir is /ceph-meixu/luomingshuang/optimized_transducer/build/_deps/googletest-build
    -- The C compiler identification is GNU 7.5.0
    -- Check for working C compiler: /usr/bin/cc
    -- Check for working C compiler: /usr/bin/cc -- works
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Detecting C compile features
    -- Detecting C compile features - done
    -- Downloading moderngpu
    -- moderngpu is downloaded to /ceph-meixu/luomingshuang/optimized_transducer/build/_deps/moderngpu-src
    -- Configuring done
    -- Generating done
    -- Build files have been written to: /ceph-meixu/luomingshuang/optimized_transducer/build
    

    But when I use python optimized_transducer/python/tests/test_compute_transducer_loss.py for testing, there is an error as follows:

    /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/torchaudio/backend/utils.py:53: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
      warnings.warn(
    Traceback (most recent call last):
      File "optimized_transducer/python/tests/test_compute_transducer_loss.py", line 8, in <module>
        import optimized_transducer
      File "/ceph-meixu/luomingshuang/optimized_transducer/optimized_transducer/python/optimized_transducer/__init__.py", line 1, in <module>
        from .transducer_loss import TransducerLoss, transducer_loss  # noqa
      File "/ceph-meixu/luomingshuang/optimized_transducer/optimized_transducer/python/optimized_transducer/transducer_loss.py", line 3, in <module>
        import _optimized_transducer
    ModuleNotFoundError: No module named '_optimized_transducer'
    

    Hope to know how I can solve it. Thanks!

    opened by luomingshuang 2
  • Update transducer-loss.h

    Update transducer-loss.h

    I found that https://github.com/csukuangfj/optimized_transducer/blob/0c75a5712f709024165fe62360dd25905cca8c68/optimized_transducer/csrc/transducer-loss.h#L17 and https://github.com/csukuangfj/optimized_transducer/blob/0c75a5712f709024165fe62360dd25905cca8c68/optimized_transducer/python/tests/test_compute_transducer_loss.py#L61 were not Inconsistent. I think that the front was not correct. Here I fixed it. @csukuangfj , what do you think?

    opened by shanguanma 2
  • fix for CMakeLists.txt

    fix for CMakeLists.txt

    When I run make -j in the build dir, there is an error happens: error: #error C++14 or later compatible compiler is required to use ATen.. So I add the following two commands to CMakeLists.txt and the make -j process can run successfully.

    set(CMAKE_CXX_STANDARD 14)
    set(CMAKE_CXX_STANDARD_REQUIRED ON)
    

    I'm not sure if the above two commands are necesary for the CMakeLists.txt in all environments.

    opened by luomingshuang 1
  • Fix installation on macOS.

    Fix installation on macOS.

    To fix the following error when running

    python3 -c "import optimized_transducer; print(optimized_transducer.__version__)"
    

    on macOS:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/Users/fangjun/py38/lib/python3.8/site-packages/optimized_transducer/__init__.py", line 1, in <module>
        from .transducer_loss import TransducerLoss, transducer_loss  # noqa
      File "/Users/fangjun/py38/lib/python3.8/site-packages/optimized_transducer/transducer_loss.py", line 3, in <module>
        import _optimized_transducer
    ImportError: dlopen(/Users/fangjun/py38/lib/python3.8/site-packages/_optimized_transducer.cpython-38-darwin.so, 2): Symbol not found: _THPVariableClass
      Referenced from: /Users/fangjun/py38/lib/python3.8/site-packages/_optimized_transducer.cpython-38-darwin.so
      Expected in: flat namespace
     in /Users/fangjun/py38/lib/python3.8/site-packages/_optimized_transducer.cpython-38-darwin.so
    
    opened by csukuangfj 0
  • Disable warp level parallel reduction

    Disable warp level parallel reduction

    Somehow it produces incorrect alpha and beta for a large value of sum_all_TU using warps.

    We disable warp level parallel reduction for now and use the method from https://github.com/HawkAaron/warp-transducer to compute alpha and beta.

    Will revisit the issues about warps after gaining more experience with CUDA programming.

    opened by csukuangfj 0
  • transducer grad compute formular

    transducer grad compute formular

    The formular for gradient is below inwarprnnt_numba and warp_transducer cpu:

        T, U, _ = log_probs.shape
        grads = np.full(log_probs.shape, -float("inf"))
        log_like = betas[0, 0]  # == alphas[T - 1, U - 1] + betas[T - 1, U - 1]
    
        # // grad to last blank transition
        grads[T - 1, U - 1, blank] = alphas[T - 1, U - 1]
        grads[: T - 1, :, blank] = alphas[: T - 1, :] + betas[1:, :]
    
        # // grad to label transition
        for u, l in enumerate(labels):
            grads[:, u, l] = alphas[:, u] + betas[:, u + 1]
    
        grads = -np.exp(grads + log_probs - log_like)
    

    that is not same to torchaudio, optimized_transducer and ,warp_transducer gpu, but you said that warp_transducer cpu grad is same to optimized_transducer and torchaudio, how that is achieved?

    opened by zh794390558 9
  • install error

    install error

    1. CUDA_cublas_LIBRARY not found error when compiling ,my cuda version 10.2
    2. /usr/include/c++/7/bits/basic_string.tcc(1067): error: expression must have pointer type detected during: instantiation of "std::basic_string<_CharT, _Traits, _Alloc>::_Rep *std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc &) [with _CharT=char16_t, _Traits=std::char_traits<char16_t>, _Alloc=std::allocator<char16_t>]"

    To Fix the above two problems, I have to use root to modify some settings of the linux system. Is there any better solution?

    opened by zmqwer 0
  • loss value and decode library?

    loss value and decode library?

    thanks very much for your great project! I have two questions to ask: 1. how big is the the transducer loss for a well performed model? or the model is converged? 2. is there any fast decode solution? I found the decode module in many project implementing the beam search decode algorithm is extremely slow

    opened by xiongjun19 10
Releases(v1.4)
Owner
Fangjun Kuang
Was vorbei ist, ist vorbei.
Fangjun Kuang
Codebase for Image Classification Research, written in PyTorch.

pycls pycls is an image classification codebase, written in PyTorch. It was originally developed for the On Network Design Spaces for Visual Recogniti

Facebook Research 2k Jan 01, 2023
Using contrastive learning and OpenAI's CLIP to find good embeddings for images with lossy transformations

The official code for the paper "Inverse Problems Leveraging Pre-trained Contrastive Representations" (to appear in NeurIPS 2021).

Sriram Ravula 26 Dec 10, 2022
PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

Under construction... Attention in Attention Network for Image Super-Resolution (A2N) This repository is an PyTorch implementation of the paper "Atten

Haoyu Chen 71 Dec 30, 2022
Kindle is an easy model build package for PyTorch.

Kindle is an easy model build package for PyTorch. Building a deep learning model became so simple that almost all model can be made by copy and paste from other existing model codes. So why code? wh

Jongkuk Lim 77 Nov 11, 2022
Official Implementation (PyTorch) of "Point Cloud Augmentation with Weighted Local Transformations", ICCV 2021

PointWOLF: Point Cloud Augmentation with Weighted Local Transformations This repository is the implementation of PointWOLF(To appear). Sihyeon Kim1*,

MLV Lab (Machine Learning and Vision Lab at Korea University) 16 Nov 03, 2022
Code for "My(o) Armband Leaks Passwords: An EMG and IMU Based Keylogging Side-Channel Attack" paper

Myo Keylogging This is the source code for our paper My(o) Armband Leaks Passwords: An EMG and IMU Based Keylogging Side-Channel Attack by Matthias Ga

Secure Mobile Networking Lab 7 Jan 03, 2023
Code for "Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans" CVPR 2021 best paper candidate

News 05/17/2021 To make the comparison on ZJU-MoCap easier, we save quantitative and qualitative results of other methods at here, including Neural Vo

ZJU3DV 748 Jan 07, 2023
Source code for Task-Aware Variational Adversarial Active Learning

Contrastive Coding for Active Learning under Class Distribution Mismatch Official PyTorch implementation of ["Contrastive Coding for Active Learning u

27 Nov 23, 2022
A3C LSTM Atari with Pytorch plus A3G design

NEWLY ADDED A3G A NEW GPU/CPU ARCHITECTURE OF A3C FOR SUBSTANTIALLY ACCELERATED TRAINING!! RL A3C Pytorch NEWLY ADDED A3G!! New implementation of A3C

David Griffis 532 Jan 02, 2023
Scalable, event-driven, deep-learning-friendly backtesting library

...Minimizing the mean square error on future experience. - Richard S. Sutton BTGym Scalable event-driven RL-friendly backtesting library. Build on

Andrew 922 Dec 27, 2022
UPSNet: A Unified Panoptic Segmentation Network

UPSNet: A Unified Panoptic Segmentation Network Introduction UPSNet is initially described in a CVPR 2019 oral paper. Disclaimer This repository is te

Uber Research 622 Dec 26, 2022
This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

OpenAI 3k Dec 26, 2022
Pytorch implementation of "Geometrically Adaptive Dictionary Attack on Face Recognition" (WACV 2022)

Geometrically Adaptive Dictionary Attack on Face Recognition This is the Pytorch code of our paper "Geometrically Adaptive Dictionary Attack on Face R

6 Nov 21, 2022
Deep Q Learning with OpenAI Gym and Pokemon Showdown

pokemon-deep-learning An openAI gym project for pokemon involving deep q learning. Made by myself, Sam Little, and Layton Webber. This code captures g

2 Dec 22, 2021
pq is a jq-like Pickle file viewer

pq PQ is a jq-like viewer/processing tool for pickle files. howto # pq '' file.pkl {'other': 456, 'test': 123} # pq 'table' file.pkl |other|test| | 45

3 Mar 15, 2022
Implementation of the Swin Transformer in PyTorch.

Swin Transformer - PyTorch Implementation of the Swin Transformer architecture. This paper presents a new vision Transformer, called Swin Transformer,

597 Jan 03, 2023
Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

CLIORA This is the official codebase for ICLR oral paper: Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling. We introduce

Bo Wan 32 Dec 23, 2022
State of the art Semantic Sentence Embeddings

Contrastive Tension State of the art Semantic Sentence Embeddings Published Paper · Huggingface Models · Report Bug Overview This is the official code

Fredrik Carlsson 88 Dec 30, 2022
THIS IS THE **OLD** PYMC PROJECT. PLEASE USE PYMC3 INSTEAD:

Introduction Version: 2.3.8 Authors: Chris Fonnesbeck Anand Patil David Huard John Salvatier Web site: https://github.com/pymc-devs/pymc Documentation

PyMC 7.2k Jan 07, 2023
Real-Time-Student-Attendence-System - Real Time Student Attendence System

Real-Time-Student-Attendence-System The Student Attendance Management System Pro

Rounak Das 1 Feb 15, 2022