CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

Last update: Mar 10, 2022

Related tags

Overview

CLIP-Indonesian

CLIP (Radford et al., 2021) is a multimodal model that can connect images and text by training a vision encoder and a text encoder jointly to project the representation of images and the corresponding text into the same embedding space. The expected outcome is the text embeddings and image embeddings are located near each other.

This repository hosts the code for CLIP-Indonesian, which is a CLIP multimodal model trained on Indonesian data.

For the image encoder, we use VIT, more specifically openai/clip-vit-base-patch32. Meanwhile, for the text encoder, we experimented with two models: IndoBERT Large (indobenchmark/indobert-base-p2) and Indonesian RoBERTa Base (flax-community/indonesian-roberta-base).

Most of the CLIP script is based on HybridCLIP and clip-italian.

Still a work in progress so may not give the best result (yet) :)

clip-indonesian was presented in PyCon ID 2021. You can view the slide deck here.

Dataset

More details about the dataset used can be found here.

Results

The results of the training can be accessed here.

Demo

References

Bianchi, F., Attanasio, G., Pisoni, R., Terragni, S., Sarti, G., Lakshmi, S. (2021). Contrastive Language-Image Pre-training for the Italian Language arXiv preprint arXiv:2108.08688.

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision. ICML.

Wilie, B., Vincentio, K., Winata, G. I., Cahyawijaya, S., Li, X., Lim, Z. Y., ... & Purwarianti, A. (2020). IndoNLU: Benchmark and resources for evaluating Indonesian natural language understanding. arXiv preprint arXiv:2009.05387.

Hybrid CLIP by the HuggingFace team

Indonesian Roberta Base by Wilson Wongso, Steven Limcorn, Samsul Rahmadani, and Chew Kok Wah

Indonesian Translated Datasets by Samsul Rahmadani

Acknowledgment

All training was done on a TPUv3-8 VM sponsored by TPU Research Cloud.

CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

Related tags

Overview

CLIP-Indonesian

Dataset

Results

Demo

Links

References

Acknowledgment

Owner

Galuh

[ICCV 2021] HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration

This is the pytorch re-implementation of the IterNorm

[NeurIPS 2021] Official implementation of paper "Learning to Simulate Self-driven Particles System with Coordinated Policy Optimization".

Spearmint Bayesian optimization codebase

Code for reproducing our paper: LMSOC: An Approach for Socially Sensitive Pretraining

Deep Watershed Transform for Instance Segmentation

CLASP - Contrastive Language-Aminoacid Sequence Pretraining

Chatbot in 200 lines of code using TensorLayer

Unsupervised Discovery of Object Radiance Fields

[CVPR 2021] Unsupervised Degradation Representation Learning for Blind Super-Resolution

Adversarial vulnerability of powerful near out-of-distribution detection

Pytorch implementation of paper: "NeurMiPs: Neural Mixture of Planar Experts for View Synthesis"

A PyTorch implementation of unsupervised SimCSE

Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

Code release for DS-NeRF (Depth-supervised Neural Radiance Fields)

Survival analysis in Python

[CVPR 2021] Unsupervised 3D Shape Completion through GAN Inversion

Highly comparative time-series analysis

YOLOv5 in PyTorch > ONNX > CoreML > TFLite

DeepProbLog is an extension of ProbLog that integrates Probabilistic Logic Programming with deep learning by introducing the neural predicate.