3D ResNets for Action Recognition (CVPR 2018)

Overview

3D ResNets for Action Recognition

Update (2020/4/13)

We published a paper on arXiv.

Hirokatsu Kataoka, Tenga Wakamiya, Kensho Hara, and Yutaka Satoh,
"Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs",
arXiv preprint, arXiv:2004.04968, 2020.

We uploaded the pretrained models described in this paper including ResNet-50 pretrained on the combined dataset with Kinetics-700 and Moments in Time.

Update (2020/4/10)

We significantly updated our scripts. If you want to use older versions to reproduce our CVPR2018 paper, you should use the scripts in the CVPR2018 branch.

This update includes as follows:

  • Refactoring whole project
  • Supporting the newer PyTorch versions
  • Supporting distributed training
  • Supporting training and testing on the Moments in Time dataset.
  • Adding R(2+1)D models
  • Uploading 3D ResNet models trained on the Kinetics-700, Moments in Time, and STAIR-Actions datasets

Summary

This is the PyTorch code for the following papers:

Hirokatsu Kataoka, Tenga Wakamiya, Kensho Hara, and Yutaka Satoh,
"Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs",
arXiv preprint, arXiv:2004.04968, 2020.

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh,
"Towards Good Practice for Action Recognition with Spatiotemporal 3D Convolutions",
Proceedings of the International Conference on Pattern Recognition, pp. 2516-2521, 2018.

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh,
"Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?",
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6546-6555, 2018.

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh,
"Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition",
Proceedings of the ICCV Workshop on Action, Gesture, and Emotion Recognition, 2017.

This code includes training, fine-tuning and testing on Kinetics, Moments in Time, ActivityNet, UCF-101, and HMDB-51.

Citation

If you use this code or pre-trained models, please cite the following:

@inproceedings{hara3dcnns,
  author={Kensho Hara and Hirokatsu Kataoka and Yutaka Satoh},
  title={Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages={6546--6555},
  year={2018},
}

Pre-trained models

Pre-trained models are available here.
All models are trained on Kinetics-700 (K), Moments in Time (M), STAIR-Actions (S), or merged datasets of them (KM, KS, MS, KMS).
If you want to finetune the models on your dataset, you should specify the following options.

r3d18_K_200ep.pth: --model resnet --model_depth 18 --n_pretrain_classes 700
r3d18_KM_200ep.pth: --model resnet --model_depth 18 --n_pretrain_classes 1039
r3d34_K_200ep.pth: --model resnet --model_depth 34 --n_pretrain_classes 700
r3d34_KM_200ep.pth: --model resnet --model_depth 34 --n_pretrain_classes 1039
r3d50_K_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 700
r3d50_KM_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 1039
r3d50_KMS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 1139
r3d50_KS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 800
r3d50_M_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 339
r3d50_MS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 439
r3d50_S_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 100
r3d101_K_200ep.pth: --model resnet --model_depth 101 --n_pretrain_classes 700
r3d101_KM_200ep.pth: --model resnet --model_depth 101 --n_pretrain_classes 1039
r3d152_K_200ep.pth: --model resnet --model_depth 152 --n_pretrain_classes 700
r3d152_KM_200ep.pth: --model resnet --model_depth 152 --n_pretrain_classes 1039
r3d200_K_200ep.pth: --model resnet --model_depth 200 --n_pretrain_classes 700
r3d200_KM_200ep.pth: --model resnet --model_depth 200 --n_pretrain_classes 1039

Old pretrained models are still available here.
However, some modifications are required to use the old pretrained models in the current scripts.

Requirements

conda install pytorch torchvision cudatoolkit=10.1 -c soumith
  • FFmpeg, FFprobe

  • Python 3

Preparation

ActivityNet

  • Download videos using the official crawler.
  • Convert from avi to jpg files using util_scripts/generate_video_jpgs.py
python -m util_scripts.generate_video_jpgs mp4_video_dir_path jpg_video_dir_path activitynet
  • Add fps infomartion into the json file util_scripts/add_fps_into_activitynet_json.py
python -m util_scripts.add_fps_into_activitynet_json mp4_video_dir_path json_file_path

Kinetics

  • Download videos using the official crawler.
    • Locate test set in video_directory/test.
  • Convert from avi to jpg files using util_scripts/generate_video_jpgs.py
python -m util_scripts.generate_video_jpgs mp4_video_dir_path jpg_video_dir_path kinetics
  • Generate annotation file in json format similar to ActivityNet using util_scripts/kinetics_json.py
    • The CSV files (kinetics_{train, val, test}.csv) are included in the crawler.
python -m util_scripts.kinetics_json csv_dir_path 700 jpg_video_dir_path jpg dst_json_path

UCF-101

  • Download videos and train/test splits here.
  • Convert from avi to jpg files using util_scripts/generate_video_jpgs.py
python -m util_scripts.generate_video_jpgs avi_video_dir_path jpg_video_dir_path ucf101
  • Generate annotation file in json format similar to ActivityNet using util_scripts/ucf101_json.py
    • annotation_dir_path includes classInd.txt, trainlist0{1, 2, 3}.txt, testlist0{1, 2, 3}.txt
python -m util_scripts.ucf101_json annotation_dir_path jpg_video_dir_path dst_json_path

HMDB-51

  • Download videos and train/test splits here.
  • Convert from avi to jpg files using util_scripts/generate_video_jpgs.py
python -m util_scripts.generate_video_jpgs avi_video_dir_path jpg_video_dir_path hmdb51
  • Generate annotation file in json format similar to ActivityNet using util_scripts/hmdb51_json.py
    • annotation_dir_path includes brush_hair_test_split1.txt, ...
python -m util_scripts.hmdb51_json annotation_dir_path jpg_video_dir_path dst_json_path

Running the code

Assume the structure of data directories is the following:

~/
  data/
    kinetics_videos/
      jpg/
        .../ (directories of class names)
          .../ (directories of video names)
            ... (jpg files)
    results/
      save_100.pth
    kinetics.json

Confirm all options.

python main.py -h

Train ResNets-50 on the Kinetics-700 dataset (700 classes) with 4 CPU threads (for data loading).
Batch size is 128.
Save models at every 5 epochs. All GPUs is used for the training. If you want a part of GPUs, use CUDA_VISIBLE_DEVICES=....

python main.py --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \
--result_path results --dataset kinetics --model resnet \
--model_depth 50 --n_classes 700 --batch_size 128 --n_threads 4 --checkpoint 5

Continue Training from epoch 101. (~/data/results/save_100.pth is loaded.)

python main.py --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \
--result_path results --dataset kinetics --resume_path results/save_100.pth \
--model_depth 50 --n_classes 700 --batch_size 128 --n_threads 4 --checkpoint 5

Calculate top-5 class probabilities of each video using a trained model (~/data/results/save_200.pth.)
Note that inference_batch_size should be small because actual batch size is calculated by inference_batch_size * (n_video_frames / inference_stride).

python main.py --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \
--result_path results --dataset kinetics --resume_path results/save_200.pth \
--model_depth 50 --n_classes 700 --n_threads 4 --no_train --no_val --inference --output_topk 5 --inference_batch_size 1

Evaluate top-1 video accuracy of a recognition result (~/data/results/val.json).

python -m util_scripts.eval_accuracy ~/data/kinetics.json ~/data/results/val.json --subset val -k 1 --ignore

Fine-tune fc layers of a pretrained model (~/data/models/resnet-50-kinetics.pth) on UCF-101.

python main.py --root_path ~/data --video_path ucf101_videos/jpg --annotation_path ucf101_01.json \
--result_path results --dataset ucf101 --n_classes 101 --n_pretrain_classes 700 \
--pretrain_path models/resnet-50-kinetics.pth --ft_begin_module fc \
--model resnet --model_depth 50 --batch_size 128 --n_threads 4 --checkpoint 5
Comments
  • question about the 'Temporal duration of inputs'

    question about the 'Temporal duration of inputs'

    [email protected] , in the opts.py ,whether I can change temporal duration of inputs in parser.add_argument('--sample_duration', default=16, type=int, help='Temporal duration of inputs'),like 32 frames,64 frames,etc? have you take the similar experiments? I really appreciate for your reply, Thanks.

    opened by sophiazy 25
  • Performance of pretrained weights on UCF101

    Performance of pretrained weights on UCF101

    Hi, Nice work! I have a question about your results on UCF101 split 1. I've evaluated your pretrained weight of "resnext-101-kinetics-ucf101_split1.pth" on UCF101 split 1 and got the accuracy of ~85.99. I'm wondering if it is the right accuracy or not. Would you please provide the accuracies of the pretrained models?

    opened by MohsenFayyaz89 21
  • Train from scratch on UCF101 using ResNet18 and get 10% gain without doing anything

    Train from scratch on UCF101 using ResNet18 and get 10% gain without doing anything

    I train and evaluate the code and get 10% gain without changing anything. Here is my process:

    1. Parse video data using codes from READ.ME.
    2. Train the model using python3 main.py --root_path ./datasets/ --video_path UCF101/jpg --annotation_path ucf101_01.json --result_path results --dataset ucf101 --model resnet --model_depth 18 --n_classes 101 --batch_size 16 and get the model result datasets/desults/save_200.pth.
    3. Test the dataset using python3 main.py --root_path ./datasets/ --video_path UCF101/jpg --annotation_path ucf101_01.json --result_path results --dataset ucf101 --resume_path results/save_200.pth --model resnet --model_depth 18 --n_classes 101 --batch_size 16 --no_train --test and get the result in val.json and the command window says the clip accuracy is 0.346.
    4. Get the accuracy using eval_ucf101.py to compare ucf101_01.json with val.json and the top1 accuracy for video is 52.66%, which is about 10% over 42.4% (reported in the paper).

    I only use UCF101 split_01 so there are no overlap videos in train and test data. It is a little bit strange, and is it because in the paper, the training procedure did not last for 200 epochs? My platform is Pytorch 0.4 and I only modified one place to avoid the error, which is reported in another issue.

    opened by BestJuly 16
  • RuntimeError: invalid argument 1: must be strictly positive at /opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/TH/generic/THTensorMath.c:2247

    RuntimeError: invalid argument 1: must be strictly positive at /opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/TH/generic/THTensorMath.c:2247

    Hi dear Need help....on running main.py ,everything is going well till dataset loading as shown below: model generated dataset loading [0/9537] dataset loading [1000/9537] dataset loading [2000/9537] dataset loading [3000/9537] dataset loading [4000/9537] dataset loading [5000/9537] dataset loading [6000/9537] dataset loading [7000/9537] dataset loading [8000/9537] dataset loading [9000/9537] dataset loading [0/3783] dataset loading [1000/3783] dataset loading [2000/3783] dataset loading [3000/3783] run

    error occured here:

    train at epoch 1 Traceback (most recent call last): File "/media/psrana/New Volume/chandni/HAR_3D_TU/main.py", line 139, in train_logger, train_batch_logger) File "/media/psrana/New Volume/chandni/HAR_3D_TU/train.py", line 22, in train_epoch for i, (inputs, targets) in enumerate(data_loader): File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 417, in iter return DataLoaderIter(self) File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 242, in init self._put_indices() File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 290, in _put_indices indices = next(self.sample_iter, None) File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 119, in iter for idx in self.sampler: File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 50, in iter return iter(torch.randperm(len(self.data_source)).long()) RuntimeError: invalid argument 1: must be strictly positive at /opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/TH/generic/THTensorMath.c:2247

    what could be the

    opened by chandnikathuria1992 12
  • Very Slow Training

    Very Slow Training

    I am training Resnet with depth 34 on the kinetics dataset, however the training procedure is not improving anything. How long does it take till the model starts improving ? I have attached a screenshot; currently I am at epoch 34 but the loss is still 5.99 and is not decreasing, and accuracy is very volatile

    selection_121

    opened by cryptedp 11
  • RuntimeError: expected a non-empty list of Tensors

    RuntimeError: expected a non-empty list of Tensors

    Traceback (most recent call last): File "main.py", line 129, in train_logger, train_batch_logger) File "/home/hareesh/Downloads/3D-ResNets-PyTorch-master/train.py", line 22, in train_epoch for i, (inputs, targets) in enumerate(data_loader): File "/home/hareesh/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 286, in next return self._process_next_batch(batch) File "/home/hareesh/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch raise batch.exc_type(batch.exc_msg) RuntimeError: Traceback (most recent call last): File "/home/hareesh/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/hareesh/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 57, in samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/hareesh/Downloads/3D-ResNets-PyTorch-master/datasets/ucf101.py", line 193, in getitem clip = torch.stack(clip, 0).permute(1, 0, 2, 3) RuntimeError: expected a non-empty list of Tensors

    Please let me know cause of this error.

    opened by hareeshdevarakonda 10
  • Input of Densenet

    Input of Densenet

    Thank you for your wonderful work. Just read the paper, it is noted that each clip contains 16 frames. I read two other papers in which the author claims that 32 frame input would be better, have you tried 32 frames input? If you trained such models, can you please release the pretrained models?

    opened by Tord-Zhang 10
  • Asking about the using of 3D ResNet on video sequence

    Asking about the using of 3D ResNet on video sequence

    Hello,

    I'm new in this kind of 3D Convolution, so, I'm trying to understand how does this works. My dataset (UNBC McMaster) is including some videos that contains sequence of frames. For each frame, we have one pain intensity level. Now, I want to use 3D ResNet to predict pain level as regression problem. So, let says we have a sequence including 32 frames, which mean I have 32 labels for this sequence. Normally, with CNN + LSTM, I will use CNN to extract features and then put it through LSTM, take the output and label of the last frame to compute loss. So, for 3D ResNet, should I take the output of the model and the label of last frame to calculate loss ?

    opened by glmanhtu 9
  • i have some problem do a test

    i have some problem do a test

    I would like to test network on UCF101 after finetune in UCF101 using pretrained kinetics you provided.

    i use resnet18. I want to get the accuracy of a video unit, not clip.

    i use instruction below

    python main.py --root_path UCF101 --video_path jpg --annotation_path ucf101_01.json --result_path test --dataset ucf101 --model resnet --model_depth 18 --n_classes 101 --batch_size 64 --n_threads 4 --pretrain_path 18result1s/save_200.pth --no_train --no_val --test --test_subset val --n_finetune_classes 101

    jpg is the same as your data directory. 18results1s/save_200.pth is pretrained on Kinetics, finetune in UCF101's network.

    it make a error.

    run dataset loading [0/3783] dataset loading [1000/3783] dataset loading [2000/3783] dataset loading [3000/3783] test test.py:42: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. inputs = Variable(inputs, volatile=True) test.py:45: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. outputs = F.softmax(outputs) Traceback (most recent call last): File "main.py", line 162, in test.test(test_loader, model, opt, test_data.class_names) File "test.py", line 50, in test test_results, class_names) File "test.py", line 20, in calculate_video_results 'label': class_names[locs[i]], KeyError: tensor(12)

    KeyError : tensor(12) , i change my instruction, the number in parentheses is change, maybe.

    can you help me?

    opened by lee2h 9
  • Performance of fine-tuning on UCF101

    Performance of fine-tuning on UCF101

    I downloaded the network ResNet-101 pretrained on Kinetics, and fine-tuned on UCF101 following the example script. However, I can only get 82.5 by averaging the three splits. In the paper, the authors reported 88.9. Any suggestion?

    opened by zhihuilics 9
  • Pretrain models cannot download

    Pretrain models cannot download

    Hello, I am making a demo about 3D convolution network. After reading your CVPR paper, I am happy to use your pretrain models. However, when I try to open your link, I get "The folder has been put in the recycle bin.". These pretrain models are extramely important for my program, because I don't have many GPU to train the network from scratch. Please give me a chance to use res3D convolution network...... @kenshohara

    opened by KeCh96 8
  • main.py: error: unrecognized arguments: hmdb51_3.json

    main.py: error: unrecognized arguments: hmdb51_3.json

    I'm trying to train resnet on hmdb51. So I've 3 json files. But whichever I'm passing with the --annotation_path argument, it's keep giving this error. Please help!

    opened by soumyadbanik 2
  • ABOUT ucf101_json

    ABOUT ucf101_json

    I used python script to generate three json files of UCF101, they are ucf101_01.json,ucf101_01.json,ucf101_03.json I run the main function to train resnet python main.py --root_path ./data --video_path ucf101_jpg/ --annotation_path ucf101_json/ucf101_01.json
    --result_path results --dataset ucf101 --n_classes 101 --model resnet --model_depth 50 --batch_size 64
    --n_threads 4 --checkpoint 5

    I want to know when ucf101_02.json and ucf101_03.json should be used,thank you very much!!!

    opened by theones-g 0
  • Image Resolution 112*112

    Image Resolution 112*112

    I wanted to be able to input larger image resolutions. However when I do input image size of 480*480 it takes almost 10 minutes to process a tiny 10 second clip.

    It seems when I increase image size, the model inference run-time become exponentially greater.

    There is crucial motion information being lost when I downscale my images to 112*112 and it is effecting the precision of the model on my test sets.

    Is there any alternative model or method that will allow me to proceed with larger image resolutions using the 3D-ResNet model?

    Is it practical to use 3D-CNN with input sizes of 480*480 images for video classification tasks?

    opened by darshvirbelandis 1
  • Why is opt.n_val_samples 3

    Why is opt.n_val_samples 3

    I found that when fine-tuning UCF101, split1 partition was used to verify that the number in the dataset was 11349 instead of 3783, and why was batchsize opt.batch_size // opt.n_val_samples?

    opened by YTHmamba 0
  • AssertionError when I inference

    AssertionError when I inference

    I used the r2p1d18_K_200ep.pth and finetune it on hmdb51 dataset,and when I want to use it to inference there is an AssertionError `CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --root_path /home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data --video_path hmdb51-videos/jpg --annotation_path hmdb51_1.json \

    --result_path results --dataset hmdb51 --resume_path results/save_200.pth
    --model_depth 18 --n_classes 51 --n_threads 4 --no_train --no_val --inference --output_topk 5 --inference_batch_size 1 Namespace(accimage=False, annotation_path=PosixPath('/home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data/hmdb51_1.json'), arch='resnet-18', batch_size=128, batchnorm_sync=False, begin_epoch=1, checkpoint=10, colorjitter=False, conv1_t_size=7, conv1_t_stride=1, dampening=0.0, dataset='hmdb51', dist_url='tcp://127.0.0.1:23456', distributed=False, file_type='jpg', ft_begin_module='', inference=True, inference_batch_size=1, inference_crop='center', inference_no_average=False, inference_stride=16, inference_subset='val', input_type='rgb', learning_rate=0.1, lr_scheduler='multistep', manual_seed=1, mean=[0.4345, 0.4051, 0.3775], mean_dataset='kinetics', model='resnet', model_depth=18, momentum=0.9, multistep_milestones=[50, 100, 150], n_classes=51, n_epochs=200, n_input_channels=3, n_pretrain_classes=0, n_threads=4, n_val_samples=3, nesterov=False, no_cuda=False, no_hflip=False, no_max_pool=False, no_mean_norm=False, no_std_norm=False, no_train=True, no_val=True, optimizer='sgd', output_topk=5, overwrite_milestones=False, plateau_patience=10, pretrain_path=None, resnet_shortcut='B', resnet_widen_factor=1.0, resnext_cardinality=32, result_path=PosixPath('/home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data/results'), resume_path=PosixPath('/home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data/results/save_200.pth'), root_path=PosixPath('/home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data'), sample_duration=16, sample_size=112, sample_t_stride=1, std=[0.2768, 0.2713, 0.2737], tensorboard=False, train_crop='random', train_crop_min_ratio=0.75, train_crop_min_scale=0.25, train_t_crop='random', value_scale=1, video_path=PosixPath('/home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data/hmdb51-videos/jpg'), weight_decay=0.001, wide_resnet_k=2, world_size=-1) loading checkpoint /home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data/results/save_200.pth model Traceback (most recent call last): File "main.py", line 428, in main_worker(-1, opt) File "main.py", line 345, in main_worker model = resume_model(opt.resume_path, opt.arch, model) File "main.py", line 89, in resume_model assert arch == checkpoint['arch'] AssertionError `

    opened by z369437558 0
Owner
Kensho Hara
Kensho Hara
Grounding Representation Similarity with Statistical Testing

Grounding Representation Similarity with Statistical Testing This repo contains code to replicate the results in our paper, which evaluates representa

26 Dec 02, 2022
Solving Zero-Shot Learning in Named Entity Recognition with Common Sense Knowledge

Zero-Shot Learning in Named Entity Recognition with Common Sense Knowledge Associated code for the paper Zero-Shot Learning in Named Entity Recognitio

Søren Hougaard Mulvad 13 Dec 25, 2022
[CVPR 2022] TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing (CVPR 2022) This repository provides the official PyTorch impleme

Billy XU 128 Jan 03, 2023
Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes, ICCV 2017

AdaptationSeg This is the Python reference implementation of AdaptionSeg proposed in "Curriculum Domain Adaptation for Semantic Segmentation of Urban

Yang Zhang 128 Oct 19, 2022
Implementation of ConvMixer in TensorFlow and Keras

ConvMixer ConvMixer, an extremely simple model that is similar in spirit to the ViT and the even-more-basic MLP-Mixer in that it operates directly on

Sayan Nath 8 Oct 03, 2022
A deep learning framework for historical document image analysis

DIVA-DAF Description A deep learning framework for historical document image analysis. How to run Install dependencies # clone project git clone https

9 Aug 04, 2022
PyTorch implementation of DeepDream algorithm

neural-dream This is a PyTorch implementation of DeepDream. The code is based on neural-style-pt. Here we DeepDream a photograph of the Golden Gate Br

121 Nov 05, 2022
Tensorflow 2 implementation of the paper: Learning and Evaluating Representations for Deep One-class Classification published at ICLR 2021

Deep Representation One-class Classification (DROC). This is not an officially supported Google product. Tensorflow 2 implementation of the paper: Lea

Google Research 137 Dec 23, 2022
The official PyTorch implementation for the paper "sMGC: A Complex-Valued Graph Convolutional Network via Magnetic Laplacian for Directed Graphs".

Magnetic Graph Convolutional Networks About The official PyTorch implementation for the paper sMGC: A Complex-Valued Graph Convolutional Network via M

3 Feb 25, 2022
clustering moroccan stocks time series data using k-means with dtw (dynamic time warping)

Moroccan Stocks Clustering Context Hey! we don't always have to forecast time series am I right ? We use k-means to cluster about 70 moroccan stock pr

Ayman Lafaz 7 Oct 18, 2022
A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation (ICCV 2021)

A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation (ICCV 2021) This repository contains the official implemen

81 Dec 14, 2022
Contrastive Language-Image Pretraining

CLIP [Blog] [Paper] [Model Card] [Colab] CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pair

OpenAI 11.5k Jan 08, 2023
Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".

Detecting Twenty-thousand Classes using Image-level Supervision Detic: A Detector with image classes that can use image-level labels to easily train d

Meta Research 1.3k Jan 04, 2023
Danfeng Hong, Lianru Gao, Jing Yao, Bing Zhang, Antonio Plaza, Jocelyn Chanussot. Graph Convolutional Networks for Hyperspectral Image Classification, IEEE TGRS, 2021.

Graph Convolutional Networks for Hyperspectral Image Classification Danfeng Hong, Lianru Gao, Jing Yao, Bing Zhang, Antonio Plaza, Jocelyn Chanussot T

Danfeng Hong 154 Dec 13, 2022
PyTorch Implementation for Fracture Detection in Wrist Bone X-ray Images

wrist-d PyTorch Implementation for Fracture Detection in Wrist Bone X-ray Images note: Paper: Under Review at MPDI Diagnostics Submission Date: Novemb

Fatih UYSAL 5 Oct 12, 2022
Angle data is a simple data type.

angledat Angle data is a simple data type. Installing + using Put angledat.py in the main dir of your project. Import it and use. Comments Comments st

1 Jan 05, 2022
GLNet for Memory-Efficient Segmentation of Ultra-High Resolution Images

GLNet for Memory-Efficient Segmentation of Ultra-High Resolution Images Collaborative Global-Local Networks for Memory-Efficient Segmentation of Ultra-

VITA 298 Dec 12, 2022
Machine Translation Implement By Bi-GRU And Transformer

Seq2Seq Translation Implement By Bidirectional GRU And Transformer In Pytorch Before You Run The Code You should download the data through the link be

He Wang 2 Oct 27, 2021
一个多模态内容理解算法框架,其中包含数据处理、预训练模型、常见模型以及模型加速等模块。

Overview 架构设计 插件介绍 安装使用 框架简介 方便使用,支持多模态,多任务的统一训练框架 能力列表: bert + 分类任务 自定义任务训练(插件注册) 框架设计 框架采用分层的思想组织模型训练流程。 DATA 层负责读取用户数据,根据 field 管理数据。 Parser 层负责转换原

Tencent 265 Dec 22, 2022
No-reference Image Quality Assessment(NIQA) Algorithms (BRISQUE, NIQE, PIQE, RankIQA, MetaIQA)

No-Reference Image Quality Assessment Algorithms No-reference Image Quality Assessment(NIQA) is a task of evaluating an image without a reference imag

Dae-Young Song 26 Jan 04, 2023