Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

Overview

NÜWA - Pytorch (wip)

Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch. This repository will be populated in the case that Microsoft does not open source the code by end of December. It may also contain an extension into video and audio, using a dual decoder approach.

DeepReader

Citations

@misc{wu2021nuwa,
    title   = {N\"UWA: Visual Synthesis Pre-training for Neural visUal World creAtion}, 
    author  = {Chenfei Wu and Jian Liang and Lei Ji and Fan Yang and Yuejian Fang and Daxin Jiang and Nan Duan},
    year    = {2021},
    eprint  = {2111.12417},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}
Comments
  • Question about generated videos?

    Question about generated videos?

    There are a lot of negative numbers and very small decimals (like 5e-1). But the loss degrades normally when training. Is that a normal situation? How can I make the result visible?

    opened by Fitzwong 0
  • Why the video does not pass through the encoder?

    Why the video does not pass through the encoder?

    Hi! lucidrains. Thanks for providing a great repo which is convenient to understand the NUWA paper.
    I have a question as follows: In the NUWA paper, we can see that the inputs of the Encoder are caption tokens (caption condition) and the video tokens (3DNA condition). So, in my eye, the video tokens sequence should fully self-attend in the Encoder, right? And then, the outputs condition the Decoder. The Decoder provided by you is as following. 截屏2022-05-12 上午11 07 12. It has causal self-attention and text-condition as we expected. But from the definition in paper, the condition contains the text-condition and 3DNA condition, and these two condition the Decoder. Is my opinion right? I am just curious about the condition in the NUWA paper. The Encoder in your repo is only the Text-Encoder, but the video does not pass through the encoder to condition the Encoder.

    Looking forward to your reply! Thanks!

    opened by Wang-Xiaodong1899 0
  • Questions about function forward() in NUWA please.

    Questions about function forward() in NUWA please.

    I'm confused me that, in function forward() of class NUWA, the ground-truth video is fed to transformer and calculate the output video, which is different from function generate().

    frame_embeddings = self.video_transformer(
                frame_embeddings,  # calculated from ground-truth video
                context = text_embeds,
                context_mask = text_mask
            )
    

    So when training NUWA, the loss comes from logits. But the logits are not only from text, but ground-truth video (only one transformer layer, different from the auto-regressive model in generate function). Is that some kind of cheating when training? Or should I generate logits in the same way as in generate(), and then calculate loss to train?

    opened by Fitzwong 1
  • Type of dataset for training VQ-GAN

    Type of dataset for training VQ-GAN

    Hi,

    First, thanks a lot for the amazing work! I have one question regarding the training of the VQ-GAN, do you recommend training it on a dataset similar to the dataset the nuwa model will be trained? What I mean is, if I want to train nuwa to generate sport videos based on text, do I need to also train the VQ-GAN on a sport dataset?

    Thanks a lot

    opened by antonibigata 0
  • Pseudocode for 3DNA?

    Pseudocode for 3DNA?

    me no comprendai le complex einops 😢

    Can someone give the 3DNA pseudocode to illustrate what's going on 🤗

    (Also how did lucidrains bang out thousands of lines of code in a few weeks - is he confirmed to be human? 🤔)

    opened by neel04 4
Releases(0.7.7a)
Owner
Phil Wang
Working with Attention. It's all we need
Phil Wang
OpenDILab Multi-Agent Environment

Go-Bigger: Multi-Agent Decision Intelligence Environment GoBigger Doc (中文版) Ongoing 2021.11.13 We are holding a competition —— Go-Bigger: Multi-Agent

OpenDILab 441 Jan 05, 2023
A curated (most recent) list of resources for Learning with Noisy Labels

A curated (most recent) list of resources for Learning with Noisy Labels

Jiaheng Wei 321 Jan 09, 2023
Official PyTorch implementation of MAAD: A Model and Dataset for Attended Awareness

MAAD: A Model for Attended Awareness in Driving Install // Datasets // Training // Experiments // Analysis // License Official PyTorch implementation

7 Oct 16, 2022
TextBPN Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection

TextBPN Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection; Accepted by ICCV2021. Note: The complete code (including training and t

S.X.Zhang 84 Dec 13, 2022
Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in ONNX

ONNX msg_chn_wacv20 depth completion Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20 model in

Ibai Gorordo 19 Oct 22, 2022
This is a code repository for the paper "Graph Auto-Encoders for Financial Clustering".

Repository for the paper "Graph Auto-Encoders for Financial Clustering" Requirements Python 3.6 torch torch_geometric Instructions This is a simple c

Edward Turner 1 Dec 02, 2021
RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering

RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering Authors: Xi Ye, Semih Yavuz, Kazuma Hashimoto, Yingbo Zhou and

Salesforce 72 Dec 05, 2022
Classification of ecg datas for disease detection

ecg_classification Classification of ecg datas for disease detection

Atacan ÖZKAN 5 Sep 09, 2022
Addition of pseudotorsion caclulation eta, theta, eta', and theta' to barnaba package

Addition to Original Barnaba Code: This is modified version of Barnaba package to calculate RNA pseudotorsion angles eta, theta, eta', and theta'. Ple

Mandar Kulkarni 1 Jan 11, 2022
Generating synthetic mobility data for a realistic population with RNNs to improve utility and privacy

lbs-data Motivation Location data is collected from the public by private firms via mobile devices. Can this data also be used to serve the public goo

Alex 11 Sep 22, 2022
Luminaire is a python package that provides ML driven solutions for monitoring time series data.

A hands-off Anomaly Detection Library Table of contents What is Luminaire Quick Start Time Series Outlier Detection Workflow Anomaly Detection for Hig

Zillow 670 Jan 02, 2023
Autonomous Ground Vehicle Navigation and Control Simulation Examples in Python

Autonomous Ground Vehicle Navigation and Control Simulation Examples in Python THIS PROJECT IS CURRENTLY A WORK IN PROGRESS AND THUS THIS REPOSITORY I

Joshua Marshall 14 Dec 31, 2022
Unsupervised Domain Adaptation for Nighttime Aerial Tracking (CVPR2022)

Unsupervised Domain Adaptation for Nighttime Aerial Tracking (CVPR2022) Junjie Ye, Changhong Fu, Guangze Zheng, Danda Pani Paudel, and Guang Chen. Uns

Intelligent Vision for Robotics in Complex Environment 91 Dec 30, 2022
Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

PyTorch RL Minimal Implementations There are implementations of some reinforcement learning algorithms, whose characteristics are as follow: Less pack

Gemini Light 4 Dec 31, 2022
Visual odometry package based on hardware-accelerated NVIDIA Elbrus library with world class quality and performance.

Isaac ROS Visual Odometry This repository provides a ROS2 package that estimates stereo visual inertial odometry using the Isaac Elbrus GPU-accelerate

NVIDIA Isaac ROS 343 Jan 03, 2023
Official implementation of "UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspective with Transformer"

[AAAI2022] UCTransNet This repo is the official implementation of "UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspectiv

Haonan Wang 199 Jan 03, 2023
The pytorch implementation of the paper "text-guided neural image inpainting" at MM'2020

TDANet: Text-Guided Neural Image Inpainting, MM'2020 (Oral) MM | ArXiv This repository implements the paper "Text-Guided Neural Image Inpainting" by L

LisaiZhang 75 Dec 22, 2022
classify fashion-mnist dataset with pytorch

Fashion-Mnist Classifier with PyTorch Inference 1- clone this repository: git clone https://github.com/Jhamed7/Fashion-Mnist-Classifier.git 2- Instal

1 Jan 14, 2022
Find the Heart simple Python Game

This is a simple Python game for finding a heart emoji. There is a 3 x 3 matrix in which a heart emoji resides. The location of the heart is randomized and is not revealed. The player must guess the

p.katekomol 1 Jan 24, 2022
Official PyTorch Implementation of Rank & Sort Loss [ICCV2021]

Rank & Sort Loss for Object Detection and Instance Segmentation The official implementation of Rank & Sort Loss. Our implementation is based on mmdete

Kemal Oksuz 229 Dec 20, 2022