Cleaned up code for DSTC 10: SIMMC 2.0 track: subtask 2: multimodal coreference resolution

Last update: Dec 05, 2022

Related tags

Overview

UNITER-Based Situated Coreference Resolution with Rich Multimodal Input: arXiv

MMCoref_cleaned

Code for the MMCoref task of the SIMMC 2.0 dataset.
Pretrained vision-language models adapted from Transformers-VQA.
Zero-shot visual feature extraction using CLIP and BUTD.
Zero-shot non-visual prefab feature (flattened into strings) extraction using BERT and SBERT.

Dependencies

requirements.txt

Download the data and pretrained/trained model checkpoints

Data: Put the data in ./data. Unpack all image in ./data/all_images and all scene.jsons (including teststd split) in ./data/simmc2_scene_jsons_dstc10_public/public.
Pretrained models: Checkpoints in ./pretrained and ./model/Transformers-VQA-master/models/pretrained. Download links in placeholder.txt in these folders.
Trained models: Checkpints in ./trained. Download from ./trained/placeholder.txt

Preprocess

Convert json files ~~using ./scripts/converter.py~~ *Currently not working. (Someone managed to lose the latest converter.py.) Download the processed data instead.
Get BERT/SBERT embeddings of non-visual prefab features using ./scripts/{get_KB_embedding, get_KB_embedding_SBERT, get_KB_embedding_no_duplicate}.py
Get CLIP/BUTD embeddigns for images using scripts ./scripts/get-visual-features-{CLIP, RCNN}.ipynb
Or just download everything from ./processed/placeholder.txt

Train

Under ./sh/train. See the arguments for used input.

Inference and evaluate

Under ./sh/infer_eval (devtest split) and ./sh/infer_eval_dev (dev split)
Outputs at ./output (same format as the original dialogue json).
Logits at ./output/logit {dialogue_idx: {round_idx: [[logit, label], ...]}}
run ./scripts/output_filter_error.py to select and reformat error cases.

Ensemble

cd script python ensemble --method optuna

output saved to output/logit/blended_devtest.json

Cleaned up code for DSTC 10: SIMMC 2.0 track: subtask 2: multimodal coreference resolution

Related tags

Overview

UNITER-Based Situated Coreference Resolution with Rich Multimodal Input: arXiv

MMCoref_cleaned

Dependencies

Download the data and pretrained/trained model checkpoints

Preprocess

Train

Inference and evaluate

Ensemble

Owner

Yichen (William) Huang

Open Source Light Field Toolbox for Super-Resolution

Cross View SLAM

Official code of "Mitigating the Mutual Error Amplification for Semi-Supervised Object Detection"

[CVPRW 2022] Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network

基于PaddleOCR搭建的OCR server... 离线部署用

Automatic 2D-to-3D Video Conversion with CNNs

OHLC Average Prediction of Apple Inc. Using LSTM Recurrent Neural Network

This is the repository for The Machine Learning Workshops, published by AI DOJO

Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)

Structure-Preserving Deraining with Residue Channel Prior Guidance (ICCV2021)

Pytorch implementation for DFN: Distributed Feedback Network for Single-Image Deraining.

Riemann Noise Injection With PyTorch

Reinforcement Learning with Q-Learning Algorithm on gym's frozen lake environment implemented in python

SparseInst: Sparse Instance Activation for Real-Time Instance Segmentation, CVPR 2022

This repository contains the code for designing risk bounded motion plans for car-like robot using Carla Simulator.

MDETR: Modulated Detection for End-to-End Multi-Modal Understanding

Code for the paper "Reinforced Active Learning for Image Segmentation"

Source code for deep symbolic optimization.

Edge Restoration Quality Assessment

This repository contains the official code of the paper Equivariant Subgraph Aggregation Networks (ICLR 2022)