An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Overview

The implementation of paper CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval.

CLIP4Clip is a video-text retrieval model based on CLIP (ViT-B/32). We investigate three similarity calculation approaches: parameter-free type, sequential type, and tight type, in this work. The model achieve SOTA results on MSR-VTT, MSVC, and LSMDC.

CLIP4Clip

Requirement

# From CLIP 
conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
pip install ftfy regex tqdm
pip install opencv-python boto3 requests pandas

Data Preparing

For MSRVTT

The official data and video links can be found in link.

For the convenience, you can also download the splits and captions by,

wget https://github.com/ArrowLuo/CLIP4Clip/releases/download/v0.0/msrvtt_data.zip

For MSVD

Raw videos can be download from link.

The splits and raw_captions can be found in the wonderful job collaborative-experts. For the convenience, you can also download them by,

wget https://github.com/ArrowLuo/CLIP4Clip/releases/download/v0.0/msvd_data.zip

For LSMDC

You must obtain permission from MPII to download and use the data. The download link is here. The 1000 test clips data is link. Read our paper and the dataloader for more information.

How to Run

--features_path is the video root path

--linear_patch can be set with 2d or 3d

--sim_header can be set with meanP, seqLSTM, seqTransf, or tightTransf

read our paper for more details on --linear_patch and --sim_header. Test more hyperparameters for better performance.

Download CLIP (ViT-B/32) weight,

 wget -P ./modules https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt

Then, run

MSRVTT

DATA_PATH=[Your MSRVTT data and videos path]
python -m torch.distributed.launch --nproc_per_node=4 \
main_task_retrieval.py --do_train --num_thread_reader=0 \
--epochs=5 --batch_size=128 --n_display=50 \
--train_csv ${DATA_PATH}/MSRVTT_train.9k.csv \
--val_csv ${DATA_PATH}/MSRVTT_JSFUSION_test.csv \
--data_path ${DATA_PATH}/MSRVTT_data.json \
--features_path ${DATA_PATH}/MSRVTT_Videos \
--output_dir ckpts/ckpt_msrvtt_retrieval_looseType \
--lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 \
--datatype msrvtt --expand_msrvtt_sentences  \
--feature_framerate 1 --coef_lr 1e-3 \
--freeze_layer_num 0  --slice_framepos 2 \
--loose_type --linear_patch 2d --sim_header meanP

MSVD

DATA_PATH=[Your MSVD data and videos path]
python -m torch.distributed.launch --nproc_per_node=4 \
main_task_retrieval.py --do_train --num_thread_reader=2 \
--epochs=5 --batch_size=128 --n_display=50 \
--data_path ${DATA_PATH} \
--features_path ${DATA_PATH}/MSVD_Videos \
--output_dir ckpts/ckpt_msvd_retrieval_looseType \
--lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 \
--datatype msvd \
--feature_framerate 1 --coef_lr 1e-3 \
--freeze_layer_num 0 --slice_framepos 2 \
--loose_type --linear_patch 2d --sim_header meanP

LSMDC

DATA_PATH=[Your LSMDC data and videos path]
python -m torch.distributed.launch --nproc_per_node=4 \
main_task_retrieval.py --do_train --num_thread_reader=2 \
--epochs=5 --batch_size=128 --n_display=50 \
--data_path ${DATA_PATH} \
--features_path ${DATA_PATH}/LSMDC_Videos \
--output_dir ckpts/ckpt_lsmdc_retrieval_looseType \
--lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 \
--datatype lsmdc --feature_framerate 1 --coef_lr 1e-3 \
--freeze_layer_num 0  --slice_framepos 2 \
--loose_type --linear_patch 2d --sim_header meanP

Citation

If you find CLIP4Clip useful in your work, you can cite the following paper:

@Article{Luo2021CLIP4Clip,
  author  = {Huaishao Luo and Lei Ji and Ming Zhong and Yang Chen and Wen Lei and Nan Duan and Tianrui Li},
  title   = {CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval},
  journal = {arXiv preprint arXiv:2104.08860},
  year    = {2021},
}

Acknowledgments

Our code is based on CLIP (ViT-B/32) and UniVL.

Comments
  • Poor performance when reproduce CLIP4clip(meanP) on ActivityNet

    Poor performance when reproduce CLIP4clip(meanP) on ActivityNet

    @ArrowLuo Hi, I directly train the CLIP4clip(meanP) on ActivityNet and get [email protected]=37.9 which is much worse than 40.5 reported in Table 4.

    I extracted images from the original videos with FPS=1, and trained the CLIP4clip(meanP) on 8 RTX3090. Due to the GPU memory constrain, I set the gradient_accumulation_steps=2.

    The caption is downloaded from https://cs.stanford.edu/people/ranjaykrishna/densevid/.

    opened by jianghaojun 14
  • How to download

    How to download "cross_pytorch_model.bin" as pretrained weights used in the project? & problem in DDP

    1. 我按照readme提供的参数,无法收敛,观察到日志中:
    2. 05/20/2022 00:18:31 - INFO - Weight doesn't exsits. xxx/modules/cross-base/cross_pytorch_model.bin
    3. 05/20/2022 00:18:42 - INFO - Weights from pretrained model not used in : clip.input_resolution clip.context_length clip.vocab_size 以上,请问如何下载modules/cross-base/cross_pytorch_model.bin文件?谢谢

    同时,我在2机8卡上进行分布式训练,按照正常的理解是每台机器8个进程,共16个(和GPU数一致),但跑这个工程的时候,每台机器先是启动了8个进程,接着又各派生了8个子进程,每台机器上存在16个进程,并且其中8个进程休眠,这个问题一直没有解决,想交流一下其他人是否遇到了这个问题?

    opened by AAUfoa 10
  • Some questions about the results of the MARVTT with `sim_header seqTransf`.

    Some questions about the results of the MARVTT with `sim_header seqTransf`.

    When I use the following configuration to train the model on MSRVTT Training-9K, the best result I got is 07/27/2021 13:11:01 - INFO - sim matrix size: 1000, 1000 07/27/2021 13:11:01 - INFO - Length-T: 1000, Length-V:1000 07/27/2021 13:11:01 - INFO - Text-to-Video: 07/27/2021 13:11:01 - INFO - >>> [email protected]: 43.2 - [email protected]: 71.0 - [email protected]: 79.4 - Median R: 2.0 - Mean R: 15.4 07/27/2021 13:11:01 - INFO - Video-to-Text: 07/27/2021 13:11:01 - INFO - >>> [email protected]: 43.1 - V2T$R@5: 71.2 - [email protected]: 80.7 - V2T$Median R: 2.0 - V2T$Mean R: 11.9. It's worse than the results [email protected]: 44.5 listed in the paper. Did i miss some details? Here is the configuration. CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_addr=127.0.0.2 --master_port 29552 main_ta sk_retrieval.py --num_thread_reader=4 --epochs=5 --batch_size=128 --n_display=20 --train_csv /home/hadoop-vacv/cephfs/data/caoshuqia ng/data/jobs/MSRVTT/csv/msrvtt_data/MSRVTT_train.9k.csv --val_csv /home/hadoop-vacv/cephfs/data/caoshuqiang/data/jobs/MSRVTT/csv/msr vtt_data/MSRVTT_JSFUSION_test.csv --data_path /home/hadoop-vacv/cephfs/data/caoshuqiang/data/jobs/MSRVTT/csv/msrvtt_data/MSRVTT_data .json --features_path /home/hadoop-vacv/cephfs/data/caoshuqiang/data/jobs/MSRVTT/MSRVTT_Videos --output_dir /home/hadoop-vacv/cephfs /data/caoshuqiang/code/vicab/newexp/hope/clip_raw --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 12 --datatype msrvtt -- expand_msrvtt_sentences --feature_framerate 1 --coef_lr 1e-3 --freeze_layer_num 0 --slice_framepos 2 --loose_type --linear_patch 2d --sim_header seqTransf --do_train.

    opened by sqiangcao99 10
  • subprocess.CalledProcessError

    subprocess.CalledProcessError

    Hi, I wanna train CLIP4Clip on MSRVTT. But I got this issue. Can you Help me?

    subprocess.CalledProcessError: Command '['/home/nccu/anaconda3/envs/Clip4Video-alvin/bin/python', '-u', 'main_task_retrieval.py', '--local_rank=3', '--do_train', '--num_thread_reader=0', '--epochs=5', '--batch_size=128', '--n_display=50', '--train_csv', './MSRVTT/videos/MSRVTT_train.9k.csv', '--val_csv', './MSRVTT/videos/MSRVTT_JSFUSION_test.csv', '--data_path', './MSRVTT/videos/MSRVTT_data.json', '--features_path', './MSRVTT/videos/MSRVTT_Videos', '--output_dir', 'ckpts/ckpt_msrvtt_retrieval_looseType', '--lr', '1e-4', '--max_words', '32', '--max_frames', '12', '--batch_size_val', '16', '--datatype', 'msrvtt', '--expand_msrvtt_sentences', '--feature_framerate', '1', '--coef_lr', '1e-3', '--freeze_layer_num', '0', '--slice_framepos', '2', '--loose_type', '--linear_patch', '2d', '--sim_header', 'meanP', '--pretrained_clip_name', 'ViT-B/32']' returned non-zero exit status 1.

    I will appreciate your help with this situation. Thank you in advance.

    opened by alvinlin1271320 9
  • The reproduction results of `meanP` and `seqTransf` on  `DiDeMo` dataset are much worse than those in the paper

    The reproduction results of `meanP` and `seqTransf` on `DiDeMo` dataset are much worse than those in the paper

    I trained clip4clip on didemo dataset, and the [email protected] of text-to-video is much worse than that shown in paper.

    The metric reported in paper is 43.4 on DiDeMo when similarity calculator is meanP and is 42.8 when the head is seqTransf.

    But according to my reproduction result, based on meanP, max t2v [email protected] is only up to 40.5, and it only up to 40.2 based on seqTransf.

    All settings remain the same as in the paper.

    image
    opened by HanielF 8
  • Query search

    Query search

    As I understood correctly, after training and evaluation, videos are set on the basis of ranks and similarity scores. Is there any script available to make a search query as text and get the video with some freedom of search based on confidence or similarity index with some closest ranked video ?

    opened by Tortoise17 7
  • What is the meaning of --num_thread_reader=0 in MSR-VTT training configuration?

    What is the meaning of --num_thread_reader=0 in MSR-VTT training configuration?

    Can you explain the number of thread reader in the training configuration? I can adjust this value to decrease my training time? (Why --num_thread_reader=0 in MSR-VTT while --num_thread_reader=2 in other dataset.) Thank you so much!

    opened by thinh276 6
  • loss becomes nan

    loss becomes nan

    Hi, I ran the code on MSRVTT dataset with 2 A100s, and its loss becomes nan after some iterations, like this issue. However, I found that the RawVideoExtractorCV2 function succeeded in reading the video when testing only one video input (Directly test the video_to_tensor function in RawVideoExtractorCV2 ), but failed to read the video with multiple num_workers when running the given scripts (the print log is printed in line 63 by myself, but no thing will be printed in line 211 ). Is there something wrong with the multiple threads setting?

    opened by fake-warrior8 5
  • Data split issue

    Data split issue

    Hi @ArrowLuo, I have a question about the data split during the fine-tuning stage. The paper claims that you refer to the data split of the ECCV'20 paper "Multi-modal transformer for video retrieval". In the ECCV'20 paper's code, they sample a subset from the training data to cross-validate and select model. And in your implementation, you directly validate on the 1k testing set. Is it ok to use the testing set to select the best model?

    opened by zhixinma 5
  • torch.distributed.init_process_group(backend=

    torch.distributed.init_process_group(backend="nccl") error and some other errors

    Dear author,

    Thank you for helping me about 4 months before.

    I'd tried to leave you recomment to your helping words, but my issue was already canceled. I really really feel sorry for that.

    Now, I have another issues on my running process with MSVD datasets.

    I successed to run your framework several time, but I got another problem on my environment these days.

    If you don't mind, Can I burrow your hand one more time?

    the following is my error log, and I can manage the GPU sever of two.

    each separation means that the error log raised on each server.

    If you need another request required for, leave me your comments.

    Sincerely,

    Server 16

    (CLIP4Clip) [email protected]:~/CLIP4Clip$ main_task_retrieval.py DATA_PATH=/home/key2317/CLIP4Clip/msvd_data/ VISIBLE_DEVICES=3,4,0,5 python -m torch.distributed.launch --nproc_per_node=4
    main_task_retrieval.py --do_train --num_thread_reader=2
    --epochs=5 --batch_size=128 --n_display=50
    --data_path ${DATA_PATH}
    --features_path ${DATA_PATH}/MSVD_Videos
    --output_dir ckpts/ckpt_msvd_retrieval_looseType
    --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16
    --datatype msvd
    --feature_framerate 1 --coef_lr 1e-3
    --freeze_layer_num 0 --slice_framepos 2
    --loose_type --linear_patch 2d --sim_header meanP
    --pretrained_clip_name ViT-B/32main_task_retrieval.py: command not found (CLIP4Clip) [email protected]:~/CLIP4Clip$ CUDA_VISIBLE_DEVICES=3,4,0,5 python -m torch.distributed.launch --nproc_per_node=4 \

    main_task_retrieval.py --do_train --num_thread_reader=2
    --epochs=5 --batch_size=128 --n_display=50
    --data_path ${DATA_PATH}
    --features_path ${DATA_PATH}/MSVD_Videos
    --output_dir ckpts/ckpt_msvd_retrieval_looseType
    --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16
    --datatype msvd
    --feature_framerate 1 --coef_lr 1e-3
    --freeze_layer_num 0 --slice_framepos 2
    --loose_type --linear_patch 2d --sim_header meanP
    --pretrained_clip_name ViT-B/32 Traceback (most recent call last): File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/runpy.py", line 185, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/runpy.py", line 111, in _get_module_details import(pkg_name) File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/init.py", line 189, in _load_global_deps() File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/init.py", line 142, in _load_global_deps ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL) File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/ctypes/init.py", line 373, in init self._handle = _dlopen(self._name, mode) OSError: /home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/lib/../../../../libcublas.so.11: undefined symbol: free_gemm_select, version libcublasLt.so.11 (CLIP4Clip) [email protected]:~/CLIP4Clip$

    Server 19

    (CLIP4Clip) [email protected]:~/video-multimodal/CLIP4Clip$ python -m torch.distributed.launch --nproc_per_node=4 \

    main_task_retrieval.py --do_train --num_thread_reader=2
    --epochs=5 --batch_size=128 --n_display=50
    --data_path ${DATA_PATH}
    --features_path ${DATA_PATH}/MSVD_Videos
    --output_dir ckpts/ckpt_msvd_retrieval_looseType
    --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16
    --datatype msvd
    --feature_framerate 1 --coef_lr 1e-3
    --freeze_layer_num 0 --slice_framepos 2
    --loose_type --linear_patch 2d --sim_header meanP
    --pretrained_clip_name ViT-B/32

    Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

    Traceback (most recent call last): File "main_task_retrieval.py", line 29, in torch.distributed.init_process_group(backend="nccl") File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 436, in init_process_group store, rank, world_size = next(rendezvous_iterator) File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 179, in _env_rendezvous_handler store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout) RuntimeError: Address already in use Traceback (most recent call last): File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in main() File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['/home/key2317/anaconda3/envs/CLIP4Clip/bin/python', '-u', 'main_task_retrieval.py', '--local_rank=3', '--do_train', '--num_thread_reader=2', '--epochs=5', '--batch_size=128', '--n_display=50', '--data_path', '--features_path', '/MSVD_Videos', '--output_dir', 'ckpts/ckpt_msvd_retrieval_looseType', '--lr', '1e-4', '--max_words', '32', '--max_frames', '12', '--batch_size_val', '16', '--datatype', 'msvd', '--feature_framerate', '1', '--coef_lr', '1e-3', '--freeze_layer_num', '0', '--slice_framepos', '2', '--loose_type', '--linear_patch', '2d', '--sim_header', 'meanP', '--pretrained_clip_name', 'ViT-B/32']' returned non-zero exit status 1. (CLIP4Clip) [email protected]:~/video-multimodal/CLIP4Clip$ Traceback (most recent call last): File "main_task_retrieval.py", line 29, in torch.distributed.init_process_group(backend="nccl") File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group barrier() File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier work = _default_pg.barrier() RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370172916/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled system error, NCCL version 2.7.8 Traceback (most recent call last): File "main_task_retrieval.py", line 29, in torch.distributed.init_process_group(backend="nccl") File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group barrier() File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier work = _default_pg.barrier() RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370172916/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled system error, NCCL version 2.7.8 Traceback (most recent call last): File "main_task_retrieval.py", line 29, in torch.distributed.init_process_group(backend="nccl") File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group barrier() File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier work = _default_pg.barrier() RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370172916/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled system error, NCCL version 2.7.8

    opened by celestialxevermore 5
  • Some errors : subprocess.CalledProcessError

    Some errors : subprocess.CalledProcessError

    Dear Author, really thanks you all for opening this open source.

    I'm a novice and do not have much techniques in dealing with and running all kinds of framework, so today, I've been suffered from error :

    subprocess.CalledProcessError: Command '['/home/key2317/anaconda3/envs/CLIP4Clip/bin/python', '-u', 'main_task_retrieval.py', '--local_rank=3', '--do_train', '--num_thread_reader=0', '--epochs=5', '--batch_size=128', '--n_display=50', '--train_csv', '/home/key2317/CLIP4Clip/msrvtt_data/MSRVTT_train.9k.csv', '--val_csv', '/home/key2317/CLIP4Clip/msrvtt_data/MSRVTT_JSFUSION_test.csv', '--data_path', '/home/key2317/CLIP4Clip/msrvtt_data/MSRVTT_data.json', '--features_path', '/home/key2317/CLIP4Clip/msrvtt_data/MSRVTT_Videos', '--output_dir', 'ckpts/ckpt_msrvtt_retrieval_looseType', '--lr', '1e-4', '--max_words', '32', '--max_frames', '12', '--batch_size_val', '16', '--datatype', 'msrvtt', '--expand_msrvtt_sentences', '--feature_framerate', '1', '--coef_lr', '1e-3', '--freeze_layer_num', '0', '--slice_framepos', '2', '--loose_type', '--linear_patch', '2d', '--sim_header', 'meanP', '--pretrained_clip_name', 'ViT-B/32']' returned non-zero exit status 1.

    From entering my starting command, It seemed to run well for about 5 seconds showing inner state like vison_layers: 12, vision_width : 768, blah blah blah. But unfortunately, It was all ended up with aforementioned messages.

    One of my colleague guessed that It would be problem in our unmatching issue in GPU environments, in short, GPU problems, but I'm not sure how can I diagnose my problem.

    I am really interested in your papers, also codes, but, because of this initial step, I cannot go for next. Would you please help me?

    Thanks.

    opened by celestialxevermore 5
  • About mean_pooling on text sequence

    About mean_pooling on text sequence

    Dear author, I hope you enjoy your new year.

    I have a quetion about, the reason why you do not use the function, _mean_pooling_for_similarity_sequence.

    Is there any special reason for that?

    I also looked out your previous model, UniVL, but I cannot find any reason about that.

    I hope you reply soon.

    Thx.

    Sincerly,

    opened by celestialxevermore 1
  • run simple inference

    run simple inference

    1.Is there a simple example to run an inference Eg: python infer.py --model --input 'red car', and return the video(s) or url(s) or something like that. 2. Is it possible to get the embeddings?

    opened by jdso1988 1
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • train on DiDeMo

    train on DiDeMo

    Thanks for sharing your wonderful work!

    When I train on DiDeMo dataset, the following information appears on the terminal more than once. 139ae0410327e9b89ba27ca5d1d6cfe

    But the model training was not interrupted. The model could be trained and tested normally. I ran according to the training parameters on DiDeMo dataset you provided, but changed the batch size. I want to know whether you have encountered the same situation, and whether it will affect the model results? I searched the Internet for solutions, but I could hardly find them.

    opened by qjyyyy 1
  • MSVD Weights

    MSVD Weights

    Hello, thanks for providing the code!

    In regards to the models used for generating the results in the paper, is this uploaded anywhere that can be shared? I would be interested in the inference without running the training from scratch.

    opened by ntseng450 1
Owner
ArrowLuo
Each place has water. The problem is that the depth we dig is not enough. So be more patient, be more try.
ArrowLuo
A minimal code for fairseq vq-wav2vec model inference.

vq-wav2vec inference A minimal code for fairseq vq-wav2vec model inference. Runs without installing the fairseq toolkit and its dependencies. Usage ex

Vladimir Larin 7 Nov 15, 2022
A paper list of pre-trained language models (PLMs).

Large-scale pre-trained language models (PLMs) such as BERT and GPT have achieved great success and become a milestone in NLP.

RUCAIBox 124 Jan 02, 2023
Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

Zhenhailong Wang 2 Jul 15, 2022
Baseline code for Korean open domain question answering(ODQA)

Open-Domain Question Answering(ODQA)는 다양한 주제에 대한 문서 집합으로부터 자연어 질의에 대한 답변을 찾아오는 task입니다. 이때 사용자 질의에 답변하기 위해 주어지는 지문이 따로 존재하지 않습니다. 따라서 사전에 구축되어있는 Knowl

VUMBLEB 69 Nov 04, 2022
Voilà turns Jupyter notebooks into standalone web applications

Rendering of live Jupyter notebooks with interactive widgets. Introduction Voilà turns Jupyter notebooks into standalone web applications. Unlike the

Voilà Dashboards 4.5k Jan 03, 2023
Nmt - TensorFlow Neural Machine Translation Tutorial

Neural Machine Translation (seq2seq) Tutorial Authors: Thang Luong, Eugene Brevdo, Rui Zhao (Google Research Blogpost, Github) This version of the tut

6.1k Dec 29, 2022
Checking spelling of form elements

Checking spelling of form elements. You can check the source files of external workflows/reports and configuration files

СКБ Контур (команда 1с) 15 Sep 12, 2022
A list of NLP(Natural Language Processing) tutorials

NLP Tutorial A list of NLP(Natural Language Processing) tutorials built on PyTorch. Table of Contents A step-by-step tutorial on how to implement and

Allen Lee 1.3k Dec 25, 2022
Control the classic General Instrument SP0256-AL2 speech chip and AY-3-8910 sound generator with a Raspberry Pi and this Python library.

GI-Pi Control the classic General Instrument SP0256-AL2 speech chip and AY-3-8910 sound generator with a Raspberry Pi and this Python library. The SP0

Nick Bild 8 Dec 15, 2021
构建一个多源(公众号、RSS)、干净、个性化的阅读环境

2C 构建一个多源(公众号、RSS)、干净、个性化的阅读环境 作为一名微信公众号的重度用户,公众号一直被我设为汲取知识的地方。随着使用程度的增加,相信大家或多或少会有一个比较头疼的问题——广告问题。 假设你关注的公众号有十来个,若一个公众号两周接一次广告,理论上你会面临二十多次广告,实际上会更多,运

howie.hu 678 Dec 28, 2022
2021 AI CUP Competition on Traditional Chinese Scene Text Recognition - Intermediate Contest

繁體中文場景文字辨識 程式碼說明 組別:這就是我 成員:蔣明憲 唐碩謙 黃玥菱 林冠霆 蕭靖騰 目錄 環境套件 安裝方式 資料夾布局 前處理-製作偵測訓練註解檔 前處理-製作分類訓練樣本 part.py : 從 json 裁切出分類訓練樣本 Class.py : 將切出來的樣本按照文字分類到各資料夾

HuanyueTW 3 Jan 14, 2022
Tools and data for measuring the popularity & growth of various programming languages.

growth-data Tools and data for measuring the popularity & growth of various programming languages. Install the dependencies $ pip install -r requireme

3 Jan 06, 2022
Community and sentiment analysis based on tweets

The project has set itself the goal of analyzing the thoughts and interaction of Italian users through the social posts expressed through the Twitter platform on the day of the entry into force of th

3 Nov 17, 2022
NLP command-line assistant powered by OpenAI

NLP command-line assistant powered by OpenAI

Axel 16 Dec 09, 2022
Yet Another Compiler Visualizer

yacv: Yet Another Compiler Visualizer yacv is a tool for visualizing various aspects of typical LL(1) and LR parsers. Check out demo on YouTube to see

Ashutosh Sathe 129 Dec 17, 2022
🦅 Pretrained BigBird Model for Korean (up to 4096 tokens)

Pretrained BigBird Model for Korean What is BigBird • How to Use • Pretraining • Evaluation Result • Docs • Citation 한국어 | English What is BigBird? Bi

Jangwon Park 183 Dec 14, 2022
基于百度的语音识别,用python实现,pyaudio+pyqt

Speech-recognition 基于百度的语音识别,python3.8(conda)+pyaudio+pyqt+baidu-aip 百度有面向python

J-L 1 Jan 03, 2022
Beautiful visualizations of how language differs among document types.

Scattertext 0.1.0.0 A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding t

Jason S. Kessler 2k Dec 27, 2022
[Preprint] Escaping the Big Data Paradigm with Compact Transformers, 2021

Compact Transformers Preprint Link: Escaping the Big Data Paradigm with Compact Transformers By Ali Hassani[1]*, Steven Walton[1]*, Nikhil Shah[1], Ab

SHI Lab 367 Dec 31, 2022
Toy example of an applied ML pipeline for me to experiment with MLOps tools.

Toy Machine Learning Pipeline Table of Contents About Getting Started ML task description and evaluation procedure Dataset description Repository stru

Shreya Shankar 190 Dec 21, 2022