A transformer which can randomly augment VOC format dataset (both image and bbox) online.

Last update: Mar 05, 2022

Related tags

Overview

VocAug

It is difficult to find a script which can augment VOC-format dataset, especially the bbox. Or find a script needs complex requirements so it is hard to use. Or, it is offline but not online so it needs very very large disk volume.

Here, is a simple transformer which can randomly augment VOC format dataset online! It can work with only numpy and cv2 packages!

The highlight is,

it augments both image and b-box!!!
it only use cv2 & numpy, means it could be used simply without any other awful packages!!!
it is an online transformer!!!

It contains methods of:

Random HSV augmentation
Random Cropping augmentation
Random Flipping augmentation
Random Noise augmentation
Random rotation or translation augmentation

All the methods can adjust abundant arguments in the constructed function of class VocAug.voc_aug.

Here are some visualized examples:

(click to enlarge)

e.g. #1	e.g. #2

This script was created when I was writing YOLOv1 object detectin algorithm for learning and entertainment. See more details at https://github.com/BestAnHongjun/YOLOv1-pytorch

Quick Start

1. Download this repo.

git clone https://github.com/BestAnHongjun/VOC-Augmentation.git

or you can download the zip file directly.

2. Enter project directory

cd VOC-Augmentation

3. Install the requirements

pip install -r requirements.txt

For some machines with mixed environments, you need to use pip3 but not pip.

Or you can install the requirements by hand. The default version is ok.

pip install numpy
pip install opencv-python
pip install opencv-contrib-python
pip install matplotlib

4.Create your own project directory

Create your own project directory, then copy the VocAug directory to yours. Or you can use this directory directly.

5. Create your own demo.py file

Or you can use my demo.py directly.

Thus, you should have a project directory with structure like this:

Project_Dir
  |- VocAug (dir)
  |- demo.py

Open your demo.py.

First, import some system packages.

import os
import matplotlib.pyplot as plt

Second, import my VocAug module in your project directory.

from VocAug.voc_aug import voc_aug
from VocAug.transform.voc2vdict import voc2vdict
from VocAug.utils.viz_bbox import viz_vdict

Third, Create two transformer.

voc2vdict_transformer = voc2vdict()
augmentation_transformer = voc_aug()

For the class voc2vdict, when you call its instance with args of xml_file_path and image_file_path, it can read the xml file and the image file and then convert them to VOC-format-dict, represented by vdict.

What is vdict? It is a python dict, which has a structure like:

vdict = {
    "image": numpy.array([[[....]]]),   # Cv2 image Mat. (Shape:[h, w, 3], RGB format)
    "filename": 000048,                 # filename without suffix
    "objects": [{                       # A list of dicts representing b-boxes
        "class_name": "house",
        "class_id": 2,                  # index of self.class_list
        "bbox": (x_min, y_min, x_max, y_max)
    }, {
        ...
    }]
}

For the class voc_aug, when you call its instance by args of vdict, it can augment both image and bbox of the vdict, then return a vdict augmented.

It will randomly use augmentation methods include:

Random HSV augmentation
Random Cropping augmentation
Random Flipping augmentation
Random Noise augmentation
Random rotation or translation augmentation

Then, let's augment the vdict.

# prepare the xml-file-path and the image-file-path
filename = "000007"
file_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), "dataset")
xml_file_path = os.path.join(file_dir, "Annotations", "{}.xml".format(filename))
image_file_path = os.path.join(file_dir, "JPEGImages", "{}.jpg".format(filename))

# Firstly convert the VOC format xml&image path to VOC-dict(vdict), then augment it.
src_vdict = voc2vdict_transformer(xml_file_path, image_file_path)
image_aug_vdict = augmentation_transformer(src_vdict)

The 000007.jpg and 000007.xml is in the dataset directory under Annotations and JPEGImages separately.

Then you can visualize the vdict. I have prepare a tool for you. That is viz_vdict function in VocAug.utils.viz_bbox module. It will return you a cv2 image when you input a vdict into it.

You can use it like:

image_src = src_vdict.get("image")
image_src_with_bbox = viz_vdict(src_vdict)

image_aug = image_aug_vdict.get("image")
image_aug_with_bbox = viz_vdict(image_aug_vdict)

Visualize them by matplotlib.

plt.figure(figsize=(15, 10))
plt.subplot(2, 2, 1)
plt.title("src")
plt.imshow(image_src)
plt.subplot(2, 2, 3)
plt.title("src_bbox")
plt.imshow(image_src_with_bbox)
plt.subplot(2, 2, 2)
plt.title("aug")
plt.imshow(image_aug)
plt.subplot(2, 2, 4)
plt.title("aug_bbox")
plt.imshow(image_aug_with_bbox)
plt.show()

Then you will get a random result like this.

For more detail see demo.py .

Detail of Algorithm

I am writing this part...

A transformer which can randomly augment VOC format dataset (both image and bbox) online.

Related tags

Overview

VocAug

The highlight is,

It contains methods of:

Here are some visualized examples:

More

Quick Start

1. Download this repo.

2. Enter project directory

3. Install the requirements

4.Create your own project directory

5. Create your own demo.py file

Detail of Algorithm

Owner

Coder.AN

Official Implementation of Neural Splines

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Lip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural Networks

PyJokes - Joking around with Python library pyjokes

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis

a basic code repository for basic task in CV(classification,detection,segmentation)

Repository for MeshTalk supplemental material and code once the (already approved) 16 GHS captures our lab will make publicly available are released.

Generating Band-Limited Adversarial Surfaces Using Neural Networks

CvT2DistilGPT2 is an encoder-to-decoder model that was developed for chest X-ray report generation.

Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

Instance-level Image Retrieval using Reranking Transformers

Real-Time High-Resolution Background Matting

PyTorch Implementation of Unsupervised Depth Completion with Calibrated Backprojection Layers (ORAL, ICCV 2021)

Churn-Prediction-Project - In this project, a churn prediction model is developed for a private bank as a term project for Data Mining class.

Text Generation by Learning from Demonstrations

Real-time LIDAR-based Urban Road and Sidewalk detection for Autonomous Vehicles 🚗

Python code for the paper How to scale hyperparameters for quickshift image segmentation

HIVE: Evaluating the Human Interpretability of Visual Explanations

Image marine sea litter prediction Shiny