Memory efficient transducer loss computation

Overview

Introduction

This project implements the optimization techniques proposed in Improving RNN Transducer Modeling for End-to-End Speech Recognition to reduce the memory consumption for computing transducer loss.

How does it differ from the RNN-T loss from torchaudio

It produces same output as torchaudio for the same input, so optimized_transducer should be equivalent to torchaudio.functional.rnnt_loss().

This project is more memory efficient and potentially faster (TODO: This needs some benchmarks)

Also, torchaudio accepts only output from nn.Linear, but we also support output from log-softmax (You can set the option from_log_softmax to True in this case).

How does it differ from warp-transducer

It borrows the methods of computing alpha and beta from warp-transducer. Therefore, optimized_transducer produces the same alpha and beta as warp-transducer for the same input.

However, warp-transducer produces different gradients for CPU and CUDA when using the same input. See https://github.com/HawkAaron/warp-transducer/issues/93

This project produces consistent gradient on CPU and CUDA for the same input, just like what torchaudio is doing. (We borrow the gradient computation formula from torchaudio).

optimized_transducer uses less memory than that of warp-transducer and is potentially faster. (TODO: This needs some benchmarks).

Installation

You can install it via pip:

pip install optimized_transducer

To check that optimized_transducer was installed successfully, please run

python3 -c "import optimized_transducer; print(optimized_transducer.__version__)"

which should print the version of the installed optimized_transducer, e.g., 1.2.

Installation FAQ

What operating systems are supported ?

It has been tested on Ubuntu 18.04. It should also work on macOS and other unixes systems. It may work on Windows, though it is not tested.

How to display installation log ?

Use

pip install --verbose optimized_transducer

How to reduce installation time ?

Use

export OT_MAKE_ARGS="-j"
pip install --verbose optimized_transducer

It will pass -j to make.

Which version of PyTorch is supported ?

It has been tested on PyTorch >= 1.5.0. It may work on PyTorch < 1.5.0

How to install a CPU version of optimized_transducer ?

Use

export OT_CMAKE_ARGS="-DCMAKE_BUILD_TYPE=Release -DOT_WITH_CUDA=OFF"
export OT_MAKE_ARGS="-j"
pip install --verbose optimized_transducer

It will pass -DCMAKE_BUILD_TYPE=Release -DOT_WITH_CUDA=OFF to cmake.

What Python versions are supported ?

Python >= 3.6 is known to work. It may work for Python 2.7, though it is not tested.

Where to get help if I have problems with the installation ?

Please file an issue at https://github.com/csukuangfj/optimized_transducer/issues and describe your problem there.

Usage

optimized_transducer expects that the output shape of the joint network is NOT (N, T, U, V), but is (sum_all_TU, V), which is a concatenation of 2-D tensors: (T_1 * U_1, V), (T_2 * U_2, V), ..., (T_N, U_N, V). Note: (T_1 * U_1, V) is just the reshape of a 3-D tensor (T_1, U_1, V).

Suppose your original joint network looks somewhat like the following:

encoder_out = torch.rand(N, T, D) # from the encoder
decoder_out = torch.rand(N, U, D) # from the decoder, i.e., the prediction network

encoder_out = encoder_out.unsqueeze(2) # Now encoder out is (N, T, 1, D)
decoder_out = decoder_out.unsqueeze(1) # Now decoder out is (N, 1, U, D)

x = encoder_out + decoder_out # x is of shape (N, T, U, D)
activation = torch.tanh(x)

logits = linear(activation) # linear is an instance of `nn.Linear`.

loss = torchaudio.functional.rnnt_loss(
    logits=logits,
    targets=targets,
    logit_lengths=logit_lengths,
    target_lengths=target_lengths,
    blank=blank_id,
    reduction="mean",
)

You need to change it to the following:

encoder_out = torch.rand(N, T, D) # from the encoder
decoder_out = torch.rand(N, U, D) # from the decoder, i.e., the prediction network

encoder_out_list = [encoder_out[i, :logit_lengths[i], :] for i in range(N)]
decoder_out_list = [decoder_out[i, :target_lengths[i]+1, :] for i in range(N)]

x = [e.unsqueeze(1) + d.unsqueeze(0) for e, d in zip(encoder_out_list, decoder_out_list)]
x = [p.reshape(-1, D) for p in x]
x = torch.cat(x)

activation = torch.tanh(x)
logits = linear(activation) # linear is an instance of `nn.Linear`.

loss = optimized_transducer.transducer_loss(
    logits=logits,
    targets=targets,
    logit_lengths=logit_lengths,
    target_lengths=target_lengths,
    blank=blank_id,
    reduction="mean",
    from_log_softmax=False,
)

Caution: We used from_log_softmax=False in the above example since logits is the output of nn.Linear.

Hint: If logits is the output of log-softmax, you should use from_log_softmax=True.

In most cases, you should pass the output of nn.Linear to compute the loss, i.e., use from_log_softmax=False, to save memory.

If you want to do some operations on the output of log-softmax before feeding it to optimized_transducer.transducer_loss(), from_log_softmax=True is helpful in this case. But be aware that this will increase the memory usage.

For more usages, please refer to

For developers

As a developer, you don't need to use pip install optimized_transducer. To make development easier, you can use

git clone https://github.com/csukuangfj/optimized_transducer.git
cd optimized_transducer
mkdir build
cd build
cmake -DOT_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
export PYTHONPATH=$PWD/../optimized_transducer/python:$PWD/lib:$PYTHONPATH

I usually create a file path.sh inside the build direcotry, containing

export PYTHONPATH=$PWD/../optimized_transducer/python:$PWD/lib:$PYTHONPATH

so what you need to do is

cd optimized_transducer/build
source path.sh

# Then you are ready to run Python tests
python3 optimized_transducer/python/tests/test_compute_transducer_loss.py

# You can also use "import optimized_transducer" in your Python projects

To run all Python tests, use

cd optimized_transducer/build
ctest --output-on-failure
Comments
  • Issue with optimized-transducer installation

    Issue with optimized-transducer installation

    I started installing K2, lhotse and Icefall. So far I was able to test K2 and it works perfectly, lhotse also works but when I tried to install icefall I got a weird issue about optimized-transducer. The log is below.

    Collecting kaldilm Using cached kaldilm-1.11-cp38-cp38-linux_x86_64.whl Collecting kaldialign Using cached kaldialign-0.2-cp38-cp38-linux_x86_64.whl Requirement already satisfied: sentencepiece>=0.1.96 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from -r requirements.txt (line 3)) (0.1.96) Requirement already satisfied: tensorboard in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from -r requirements.txt (line 4)) (2.7.0) Requirement already satisfied: typeguard in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from -r requirements.txt (line 5)) (2.13.3) Collecting optimized_transducer Using cached optimized_transducer-1.3.tar.gz (47 kB) Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (1.8.1) Requirement already satisfied: werkzeug>=0.11.15 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (2.0.2) Requirement already satisfied: numpy>=1.12.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (1.21.2) Requirement already satisfied: protobuf>=3.6.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (3.19.3) Requirement already satisfied: wheel>=0.26 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (0.37.1) Requirement already satisfied: setuptools>=41.0.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (58.0.4) Requirement already satisfied: grpcio>=1.24.3 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (1.43.0) Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (0.4.6) Requirement already satisfied: absl-py>=0.4 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (1.0.0) Requirement already satisfied: google-auth<3,>=1.6.3 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (2.3.3) Requirement already satisfied: requests<3,>=2.21.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (2.27.1) Requirement already satisfied: markdown>=2.6.8 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (3.3.6) Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (0.6.1) Requirement already satisfied: six in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from absl-py>=0.4->tensorboard->-r requirements.txt (line 4)) (1.16.0) Requirement already satisfied: cachetools<5.0,>=2.0.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 4)) (4.2.4) Requirement already satisfied: pyasn1-modules>=0.2.1 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 4)) (0.2.8) Requirement already satisfied: rsa<5,>=3.1.4 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 4)) (4.8) Requirement already satisfied: requests-oauthlib>=0.7.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard->-r requirements.txt (line 4)) (1.3.0) Requirement already satisfied: importlib-metadata>=4.4 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from markdown>=2.6.8->tensorboard->-r requirements.txt (line 4)) (4.10.1) Requirement already satisfied: zipp>=0.5 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard->-r requirements.txt (line 4)) (3.7.0) Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 4)) (0.4.8) Requirement already satisfied: charset-normalizer~=2.0.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->-r requirements.txt (line 4)) (2.0.10) Requirement already satisfied: certifi>=2017.4.17 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->-r requirements.txt (line 4)) (2021.10.8) Requirement already satisfied: idna<4,>=2.5 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->-r requirements.txt (line 4)) (3.3) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->-r requirements.txt (line 4)) (1.26.8) Requirement already satisfied: oauthlib>=3.0.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard->-r requirements.txt (line 4)) (3.1.1) Building wheels for collected packages: optimized-transducer Building wheel for optimized-transducer (setup.py): started Building wheel for optimized-transducer (setup.py): finished with status 'error' ERROR: Command errored out with exit status 1: command: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"'; file='"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-qa004082 cwd: /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/ Complete output (153 lines): running bdist_wheel running build running build_py creating build creating build/lib.linux-x86_64-3.8 creating build/lib.linux-x86_64-3.8/optimized_transducer copying optimized_transducer/python/optimized_transducer/init.py -> build/lib.linux-x86_64-3.8/optimized_transducer copying optimized_transducer/python/optimized_transducer/transducer_loss.py -> build/lib.linux-x86_64-3.8/optimized_transducer running build_ext For fast compilation, run: export OT_MAKE_ARGS="-j"; python setup.py install Setting PYTHON_EXECUTABLE to /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python build command is:

              cd build/temp.linux-x86_64-3.8
    
              cmake -DCMAKE_BUILD_TYPE=Release -DPYTHON_EXECUTABLE=/home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173
    
              make  _optimized_transducer
    

    -- Enabled languages: CXX;CUDA -- The CXX compiler identification is GNU 6.5.0 -- The CUDA compiler identification is NVIDIA 11.1.74 -- Check for working CXX compiler: /cm/shared/apps/gcc6/6.5.0/bin/g++ -- Check for working CXX compiler: /cm/shared/apps/gcc6/6.5.0/bin/g++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Check for working CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc -- Check for working CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc -- works -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Automatic GPU detection failed. Building for common architectures. -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX -- OT_COMPUTE_ARCH_FLAGS: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86 -- OT_COMPUTE_ARCH_CANDIDATES 35;50;60;61;70;75;80;86 -- Adding arch 35 -- Adding arch 50 -- Adding arch 60 -- Adding arch 61 -- Adding arch 70 -- Adding arch 75 -- Adding arch 80 -- Adding arch 86 -- OT_COMPUTE_ARCHS: 35;50;60;61;70;75;80;86 -- Downloading pybind11 -- pybind11 is downloaded to /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/_deps/pybind11-src -- pybind11 v2.6.0 -- Found PythonInterp: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python (found version "3.8.12") -- Found PythonLibs: /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/libpython3.8.so -- Performing Test HAS_FLTO -- Performing Test HAS_FLTO - Success -- Python executable: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python -- Looking for C++ include pthread.h -- Looking for C++ include pthread.h - found -- Looking for pthread_create -- Looking for pthread_create - not found -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - found -- Found Threads: TRUE CMake Warning (dev) at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:29 (find_package): Policy CMP0074 is not set: find_package uses _ROOT variables. Run "cmake --help-policy CMP0074" for policy details. Use the cmake_policy command to set the policy and suppress this warning.

    Environment variable CUDA_ROOT is set to:
    
      /cm/shared/apps/cuda11.1/toolkit/11.1.0
    
    For compatibility, CMake is ignoring the variable.
    

    Call Stack (most recent call first): /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include) /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) cmake/torch.cmake:11 (find_package) CMakeLists.txt:130 (include) This warning is for project developers. Use -Wno-dev to suppress it.

    -- Found CUDA: /cm/shared/apps/cuda11.1/toolkit/11.1.0 (found version "11.1") -- Caffe2: CUDA detected: 11.1 -- Caffe2: CUDA nvcc is: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc -- Caffe2: CUDA toolkit directory: /cm/shared/apps/cuda11.1/toolkit/11.1.0 -- Caffe2: Header version is: 11.1 -- Could NOT find CUDNN (missing: CUDNN_LIBRARY_PATH CUDNN_INCLUDE_PATH) CMake Warning at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:111 (message): Caffe2: Cannot find cuDNN library. Turning the option off Call Stack (most recent call first): /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include) /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) cmake/torch.cmake:11 (find_package) CMakeLists.txt:130 (include)

    -- /cm/shared/apps/cuda11.1/toolkit/11.1.0/lib64/libnvrtc.so shorthash is 1f6b333a -- Automatic GPU detection failed. Building for common architectures. -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX -- Added CUDA NVCC flags for: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86 CMake Error at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:96 (message): Your installed Caffe2 version uses cuDNN but I cannot find the cuDNN libraries. Please set the proper cuDNN prefixes and / or install cuDNN. Call Stack (most recent call first): /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) cmake/torch.cmake:11 (find_package) CMakeLists.txt:130 (include)

    -- Configuring incomplete, errors occurred! See also "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeOutput.log". See also "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeError.log". make: *** No rule to make target `_optimized_transducer'. Stop. Traceback (most recent call last): File "", line 1, in File "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py", line 101, in setuptools.setup( File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/init.py", line 153, in setup return distutils.core.setup(**attrs) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/core.py", line 148, in setup dist.run_commands() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 299, in run self.run_command('build') File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build.py", line 135, in run self.run_command(cmd_name) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run _build_ext.run(self) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 340, in run self.build_extensions() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions self._build_extensions_serial() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial self.build_extension(ext) File "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py", line 60, in build_extension raise Exception( Exception: Build optimized_transducer failed. Please check the error message. You can ask for help by creating an issue on GitHub.

    Click: https://github.com/csukuangfj/optimized_transducer/issues/new


    ERROR: Failed building wheel for optimized-transducer Running setup.py clean for optimized-transducer Failed to build optimized-transducer Installing collected packages: optimized-transducer, kaldilm, kaldialign Running setup.py install for optimized-transducer: started Running setup.py install for optimized-transducer: finished with status 'error' ERROR: Command errored out with exit status 1: command: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"'; file='"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-mcbah0p8/install-record.txt --single-version-externally-managed --compile --install-headers /home/local/QCRI/ahussein/anaconda3/envs/k2/include/python3.8/optimized-transducer cwd: /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/ Complete output (155 lines): running install running build running build_py creating build creating build/lib.linux-x86_64-3.8 creating build/lib.linux-x86_64-3.8/optimized_transducer copying optimized_transducer/python/optimized_transducer/init.py -> build/lib.linux-x86_64-3.8/optimized_transducer copying optimized_transducer/python/optimized_transducer/transducer_loss.py -> build/lib.linux-x86_64-3.8/optimized_transducer running build_ext For fast compilation, run: export OT_MAKE_ARGS="-j"; python setup.py install Setting PYTHON_EXECUTABLE to /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python build command is:

                cd build/temp.linux-x86_64-3.8
    
                cmake -DCMAKE_BUILD_TYPE=Release -DPYTHON_EXECUTABLE=/home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173
    
                make  _optimized_transducer
    
    -- Enabled languages: CXX;CUDA
    -- The CXX compiler identification is GNU 6.5.0
    -- The CUDA compiler identification is NVIDIA 11.1.74
    -- Check for working CXX compiler: /cm/shared/apps/gcc6/6.5.0/bin/g++
    -- Check for working CXX compiler: /cm/shared/apps/gcc6/6.5.0/bin/g++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Check for working CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc
    -- Check for working CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc -- works
    -- Detecting CUDA compiler ABI info
    -- Detecting CUDA compiler ABI info - done
    -- Automatic GPU detection failed. Building for common architectures.
    -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX
    -- OT_COMPUTE_ARCH_FLAGS: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
    -- OT_COMPUTE_ARCH_CANDIDATES 35;50;60;61;70;75;80;86
    -- Adding arch 35
    -- Adding arch 50
    -- Adding arch 60
    -- Adding arch 61
    -- Adding arch 70
    -- Adding arch 75
    -- Adding arch 80
    -- Adding arch 86
    -- OT_COMPUTE_ARCHS: 35;50;60;61;70;75;80;86
    -- Downloading pybind11
    -- pybind11 is downloaded to /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/_deps/pybind11-src
    -- pybind11 v2.6.0
    -- Found PythonInterp: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python (found version "3.8.12")
    -- Found PythonLibs: /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/libpython3.8.so
    -- Performing Test HAS_FLTO
    -- Performing Test HAS_FLTO - Success
    -- Python executable: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python
    -- Looking for C++ include pthread.h
    -- Looking for C++ include pthread.h - found
    -- Looking for pthread_create
    -- Looking for pthread_create - not found
    -- Looking for pthread_create in pthreads
    -- Looking for pthread_create in pthreads - not found
    -- Looking for pthread_create in pthread
    -- Looking for pthread_create in pthread - found
    -- Found Threads: TRUE
    CMake Warning (dev) at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:29 (find_package):
      Policy CMP0074 is not set: find_package uses <PackageName>_ROOT variables.
      Run "cmake --help-policy CMP0074" for policy details.  Use the cmake_policy
      command to set the policy and suppress this warning.
    
      Environment variable CUDA_ROOT is set to:
    
        /cm/shared/apps/cuda11.1/toolkit/11.1.0
    
      For compatibility, CMake is ignoring the variable.
    Call Stack (most recent call first):
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include)
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
      cmake/torch.cmake:11 (find_package)
      CMakeLists.txt:130 (include)
    This warning is for project developers.  Use -Wno-dev to suppress it.
    
    -- Found CUDA: /cm/shared/apps/cuda11.1/toolkit/11.1.0 (found version "11.1")
    -- Caffe2: CUDA detected: 11.1
    -- Caffe2: CUDA nvcc is: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc
    -- Caffe2: CUDA toolkit directory: /cm/shared/apps/cuda11.1/toolkit/11.1.0
    -- Caffe2: Header version is: 11.1
    -- Could NOT find CUDNN (missing: CUDNN_LIBRARY_PATH CUDNN_INCLUDE_PATH)
    CMake Warning at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:111 (message):
      Caffe2: Cannot find cuDNN library.  Turning the option off
    Call Stack (most recent call first):
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include)
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
      cmake/torch.cmake:11 (find_package)
      CMakeLists.txt:130 (include)
    
    
    -- /cm/shared/apps/cuda11.1/toolkit/11.1.0/lib64/libnvrtc.so shorthash is 1f6b333a
    -- Automatic GPU detection failed. Building for common architectures.
    -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX
    -- Added CUDA NVCC flags for: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
    CMake Error at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:96 (message):
      Your installed Caffe2 version uses cuDNN but I cannot find the cuDNN
      libraries.  Please set the proper cuDNN prefixes and / or install cuDNN.
    Call Stack (most recent call first):
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
      cmake/torch.cmake:11 (find_package)
      CMakeLists.txt:130 (include)
    
    
    -- Configuring incomplete, errors occurred!
    See also "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeOutput.log".
    See also "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeError.log".
    make: *** No rule to make target `_optimized_transducer'.  Stop.
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py", line 101, in <module>
        setuptools.setup(
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
        return distutils.core.setup(**attrs)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/command/install.py", line 61, in run
        return orig.install.run(self)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/install.py", line 545, in run
        self.run_command('build')
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build.py", line 135, in run
        self.run_command(cmd_name)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
        _build_ext.run(self)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 340, in run
        self.build_extensions()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
        self._build_extensions_serial()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
        self.build_extension(ext)
      File "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py", line 60, in build_extension
        raise Exception(
    Exception:
    Build optimized_transducer failed. Please check the error message.
    You can ask for help by creating an issue on GitHub.
    
    Click:
    	https://github.com/csukuangfj/optimized_transducer/issues/new
    
    ----------------------------------------
    

    ERROR: Command errored out with exit status 1: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"'; file='"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-mcbah0p8/install-record.txt --single-version-externally-managed --compile --install-headers /home/local/QCRI/ahussein/anaconda3/envs/k2/include/python3.8/optimized-transducer Check the logs for full command output.

    opened by AmirHussein96 8
  • Warprnnt gradient for CPU

    Warprnnt gradient for CPU

    @csukuangfj Just wanted to note that the gradient is not incorrect for CPU vs GPU, the instructions clearly state that for CPU you need to provide log_softmax(joint-logits) whereas for the GPU you should only provide joint-logits since the cuda kernel will efficiently compute the log_softmax internally.

    Anyway yours is also an efficient implementation, also written in c++, could you benchmark the solutions if you have time ? Even a naive one would give some hint as to speed in relative terms. The memory efficient implementation of yours is very interesting too, which reduces speed but saves a lot of memory.

    opened by titu1994 2
  • "ModuleNotFoundError: No module named '_optimized_transducer'" when testing.

    I install the optimized_transducer as follows:

    git clone https://github.com/csukuangfj/optimized_transducer.git
    cd optimized_transducer
    mkdir build
    cd build
    cmake -DOT_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
    export PYTHONPATH=$PWD/../optimized_transducer/python:$PWD/lib:$PYTHONPATH
    

    The cmake log as follows:

    -- Enabled languages: CXX;CUDA
    -- The CXX compiler identification is GNU 7.5.0
    -- The CUDA compiler identification is NVIDIA 10.1.243
    -- Check for working CXX compiler: /usr/bin/c++
    -- Check for working CXX compiler: /usr/bin/c++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
    -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
    -- Detecting CUDA compiler ABI info
    -- Detecting CUDA compiler ABI info - done
    -- Autodetected CUDA architecture(s):  7.0
    -- OT_COMPUTE_ARCH_FLAGS: -gencode;arch=compute_70,code=sm_70
    -- OT_COMPUTE_ARCH_CANDIDATES 35;50;60;61;70;75
    -- Skipping arch 35
    -- Skipping arch 50
    -- Skipping arch 60
    -- Skipping arch 61
    -- Adding arch 70
    -- Skipping arch 75
    -- OT_COMPUTE_ARCHS: 70
    -- Downloading pybind11
    -- pybind11 is downloaded to /ceph-meixu/luomingshuang/optimized_transducer/build/_deps/pybind11-src
    -- pybind11 v2.6.0
    -- Found PythonInterp: /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/bin/python (found version "3.8.11")
    -- Found PythonLibs: /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/libpython3.8.so
    -- Performing Test HAS_FLTO
    -- Performing Test HAS_FLTO - Success
    -- Python executable: /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/bin/python
    -- Looking for C++ include pthread.h
    -- Looking for C++ include pthread.h - found
    -- Looking for pthread_create
    -- Looking for pthread_create - not found
    -- Looking for pthread_create in pthreads
    -- Looking for pthread_create in pthreads - not found
    -- Looking for pthread_create in pthread
    -- Looking for pthread_create in pthread - found
    -- Found Threads: TRUE
    -- Found CUDA: /usr/local/cuda (found version "10.1")
    -- Caffe2: CUDA detected: 10.1
    -- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
    -- Caffe2: CUDA toolkit directory: /usr/local/cuda
    -- Caffe2: Header version is: 10.1
    -- Found CUDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so
    -- Found cuDNN: v7.6.2  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libcudnn.so)
    -- Autodetected CUDA architecture(s):  7.0
    -- Added CUDA NVCC flags for: -gencode;arch=compute_70,code=sm_70
    -- Found Torch: /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/torch/lib/libtorch.so
    -- PyTorch version: 1.7.0+cu101
    -- PyTorch cuda version: 10.1
    -- Use FetchContent provided by k2
    -- Downloading googletest
    
    -- googletest is downloaded to /ceph-meixu/luomingshuang/optimized_transducer/build/_deps/googletest-src
    -- googletest's binary dir is /ceph-meixu/luomingshuang/optimized_transducer/build/_deps/googletest-build
    -- The C compiler identification is GNU 7.5.0
    -- Check for working C compiler: /usr/bin/cc
    -- Check for working C compiler: /usr/bin/cc -- works
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Detecting C compile features
    -- Detecting C compile features - done
    -- Downloading moderngpu
    -- moderngpu is downloaded to /ceph-meixu/luomingshuang/optimized_transducer/build/_deps/moderngpu-src
    -- Configuring done
    -- Generating done
    -- Build files have been written to: /ceph-meixu/luomingshuang/optimized_transducer/build
    

    But when I use python optimized_transducer/python/tests/test_compute_transducer_loss.py for testing, there is an error as follows:

    /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/torchaudio/backend/utils.py:53: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
      warnings.warn(
    Traceback (most recent call last):
      File "optimized_transducer/python/tests/test_compute_transducer_loss.py", line 8, in <module>
        import optimized_transducer
      File "/ceph-meixu/luomingshuang/optimized_transducer/optimized_transducer/python/optimized_transducer/__init__.py", line 1, in <module>
        from .transducer_loss import TransducerLoss, transducer_loss  # noqa
      File "/ceph-meixu/luomingshuang/optimized_transducer/optimized_transducer/python/optimized_transducer/transducer_loss.py", line 3, in <module>
        import _optimized_transducer
    ModuleNotFoundError: No module named '_optimized_transducer'
    

    Hope to know how I can solve it. Thanks!

    opened by luomingshuang 2
  • Update transducer-loss.h

    Update transducer-loss.h

    I found that https://github.com/csukuangfj/optimized_transducer/blob/0c75a5712f709024165fe62360dd25905cca8c68/optimized_transducer/csrc/transducer-loss.h#L17 and https://github.com/csukuangfj/optimized_transducer/blob/0c75a5712f709024165fe62360dd25905cca8c68/optimized_transducer/python/tests/test_compute_transducer_loss.py#L61 were not Inconsistent. I think that the front was not correct. Here I fixed it. @csukuangfj , what do you think?

    opened by shanguanma 2
  • fix for CMakeLists.txt

    fix for CMakeLists.txt

    When I run make -j in the build dir, there is an error happens: error: #error C++14 or later compatible compiler is required to use ATen.. So I add the following two commands to CMakeLists.txt and the make -j process can run successfully.

    set(CMAKE_CXX_STANDARD 14)
    set(CMAKE_CXX_STANDARD_REQUIRED ON)
    

    I'm not sure if the above two commands are necesary for the CMakeLists.txt in all environments.

    opened by luomingshuang 1
  • Fix installation on macOS.

    Fix installation on macOS.

    To fix the following error when running

    python3 -c "import optimized_transducer; print(optimized_transducer.__version__)"
    

    on macOS:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/Users/fangjun/py38/lib/python3.8/site-packages/optimized_transducer/__init__.py", line 1, in <module>
        from .transducer_loss import TransducerLoss, transducer_loss  # noqa
      File "/Users/fangjun/py38/lib/python3.8/site-packages/optimized_transducer/transducer_loss.py", line 3, in <module>
        import _optimized_transducer
    ImportError: dlopen(/Users/fangjun/py38/lib/python3.8/site-packages/_optimized_transducer.cpython-38-darwin.so, 2): Symbol not found: _THPVariableClass
      Referenced from: /Users/fangjun/py38/lib/python3.8/site-packages/_optimized_transducer.cpython-38-darwin.so
      Expected in: flat namespace
     in /Users/fangjun/py38/lib/python3.8/site-packages/_optimized_transducer.cpython-38-darwin.so
    
    opened by csukuangfj 0
  • Disable warp level parallel reduction

    Disable warp level parallel reduction

    Somehow it produces incorrect alpha and beta for a large value of sum_all_TU using warps.

    We disable warp level parallel reduction for now and use the method from https://github.com/HawkAaron/warp-transducer to compute alpha and beta.

    Will revisit the issues about warps after gaining more experience with CUDA programming.

    opened by csukuangfj 0
  • transducer grad compute formular

    transducer grad compute formular

    The formular for gradient is below inwarprnnt_numba and warp_transducer cpu:

        T, U, _ = log_probs.shape
        grads = np.full(log_probs.shape, -float("inf"))
        log_like = betas[0, 0]  # == alphas[T - 1, U - 1] + betas[T - 1, U - 1]
    
        # // grad to last blank transition
        grads[T - 1, U - 1, blank] = alphas[T - 1, U - 1]
        grads[: T - 1, :, blank] = alphas[: T - 1, :] + betas[1:, :]
    
        # // grad to label transition
        for u, l in enumerate(labels):
            grads[:, u, l] = alphas[:, u] + betas[:, u + 1]
    
        grads = -np.exp(grads + log_probs - log_like)
    

    that is not same to torchaudio, optimized_transducer and ,warp_transducer gpu, but you said that warp_transducer cpu grad is same to optimized_transducer and torchaudio, how that is achieved?

    opened by zh794390558 9
  • install error

    install error

    1. CUDA_cublas_LIBRARY not found error when compiling ,my cuda version 10.2
    2. /usr/include/c++/7/bits/basic_string.tcc(1067): error: expression must have pointer type detected during: instantiation of "std::basic_string<_CharT, _Traits, _Alloc>::_Rep *std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc &) [with _CharT=char16_t, _Traits=std::char_traits<char16_t>, _Alloc=std::allocator<char16_t>]"

    To Fix the above two problems, I have to use root to modify some settings of the linux system. Is there any better solution?

    opened by zmqwer 0
  • loss value and decode library?

    loss value and decode library?

    thanks very much for your great project! I have two questions to ask: 1. how big is the the transducer loss for a well performed model? or the model is converged? 2. is there any fast decode solution? I found the decode module in many project implementing the beam search decode algorithm is extremely slow

    opened by xiongjun19 10
Releases(v1.4)
Owner
Fangjun Kuang
Was vorbei ist, ist vorbei.
Fangjun Kuang
This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".

Self-Diagnosis and Self-Debiasing This repository contains the source code for Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based

Timo Schick 62 Dec 12, 2022
Public Code for NIPS submission SimiGrad: Fine-Grained Adaptive Batching for Large ScaleTraining using Gradient Similarity Measurement

Public code for NIPS submission "SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training using Gradient Similarity Measurement" This repo co

Heyang Qin 0 Oct 13, 2021
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

This is the official PyTorch implementation of the ALBEF paper [Blog]. This repository supports pre-training on custom datasets, as well as finetuning on VQA, SNLI-VE, NLVR2, Image-Text Retrieval on

Salesforce 805 Jan 09, 2023
Code for the paper: Sketch Your Own GAN

Sketch Your Own GAN Project | Paper | Youtube Our method takes in one or a few hand-drawn sketches and customizes an off-the-shelf GAN to match the in

677 Dec 28, 2022
Multi-objective constrained optimization for energy applications via tree ensembles

Multi-objective constrained optimization for energy applications via tree ensembles

C⚙G - Imperial College London 1 Nov 19, 2021
PyTorch wrappers for using your model in audacity!

audacitorch This package contains utilities for prepping PyTorch audio models for use in Audacity. More specifically, it provides abstract classes for

Hugo Flores García 130 Dec 14, 2022
DropNAS: Grouped Operation Dropout for Differentiable Architecture Search

DropNAS: Grouped Operation Dropout for Differentiable Architecture Search DropNAS, a grouped operation dropout method for one-level DARTS, with better

weijunhong 4 Aug 15, 2022
Continual Learning of Electronic Health Records (EHR).

Continual Learning of Longitudinal Health Records Repo for reproducing the experiments in Continual Learning of Longitudinal Health Records (2021). Re

Jacob 7 Oct 21, 2022
pytorch implementation for Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network arXiv:1609.04802

PyTorch SRResNet Implementation of Paper: "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network"(https://arxiv.org/abs

Jiu XU 436 Jan 09, 2023
High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

Anakin2.0 Welcome to the Anakin GitHub. Anakin is a cross-platform, high-performance inference engine, which is originally developed by Baidu engineer

514 Dec 28, 2022
Zeyuan Chen, Yangchao Wang, Yang Yang and Dong Liu.

Principled S2R Dehazing This repository contains the official implementation for PSD Framework introduced in the following paper: PSD: Principled Synt

zychen 78 Dec 30, 2022
Official implementation of the PICASO: Permutation-Invariant Cascaded Attentional Set Operator

PICASO Official PyTorch implemetation for the paper PICASO:Permutation-Invariant Cascaded Attentive Set Operator. Requirements Python 3 torch = 1.0 n

Samira Zare 0 Dec 23, 2021
U-Net for GBM

My Final Year Project(FYP) In National University of Singapore(NUS) You need Pytorch(stable 1.9.1) Both cuda version and cpu version are OK File Str

PinkR1ver 1 Oct 27, 2021
This is a re-implementation of TransGAN: Two Pure Transformers Can Make One Strong GAN (CVPR 2021) in PyTorch.

TransGAN: Two Transformers Can Make One Strong GAN [YouTube Video] Paper Authors: Yifan Jiang, Shiyu Chang, Zhangyang Wang CVPR 2021 This is re-implem

Ahmet Sarigun 79 Jan 05, 2023
[NeurIPS'20] Self-supervised Co-Training for Video Representation Learning. Tengda Han, Weidi Xie, Andrew Zisserman.

CoCLR: Self-supervised Co-Training for Video Representation Learning This repository contains the implementation of: InfoNCE (MoCo on videos) UberNCE

Tengda Han 271 Jan 02, 2023
Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt) Task Training huge unsupervised deep neural networks yields to strong progress in

2 Aug 05, 2022
Lenia - Mathematical Life Forms

For full version list, see Timeline in Lenia portal [2020-10-13] Update Python version with multi-kernel and multi-channel extensions (v3.4 LeniaNDK.p

Bert Chan 3.1k Dec 28, 2022
Full body anonymization - Realistic Full-Body Anonymization with Surface-Guided GANs

Realistic Full-Body Anonymization with Surface-Guided GANs This is the official

Håkon Hukkelås 30 Nov 18, 2022
Towards Improving Embedding Based Models of Social Network Alignment via Pseudo Anchors

PSML paper: Towards Improving Embedding Based Models of Social Network Alignment via Pseudo Anchors PSML_IONE,PSML_ABNE,PSML_DEEPLINK,PSML_SNNA: numpy

13 Nov 27, 2022
Code for Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations

Implementation for Iso-Points (CVPR 2021) Official code for paper Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations paper |

Yifan Wang 66 Nov 08, 2022