《Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching》(CVPR 2020)

Last update: Oct 27, 2022

Overview

Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching

This contains the codes for cross-view geo-localization method described in: Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching, CVPR2020.

Abstract

Cross-view geo-localization is the problem of estimating the position and orientation (latitude, longitude and azimuth angle) of a camera at ground level given a large-scale database of geo-tagged aerial (\eg, satellite) images. Existing approaches treat the task as a pure location estimation problem by learning discriminative feature descriptors, but neglect orientation alignment. It is well-recognized that knowing the orientation between ground and aerial images can significantly reduce matching ambiguity between these two views, especially when the ground-level images have a limited Field of View (FoV) instead of a full field-of-view panorama. Therefore, we design a Dynamic Similarity Matching network to estimate cross-view orientation alignment during localization. In particular, we address the cross-view domain gap by applying a polar transform to the aerial images to approximately align the images up to an unknown azimuth angle. Then, a two-stream convolutional network is used to learn deep features from the ground and polar-transformed aerial images. Finally, we obtain the orientation by computing the correlation between cross-view features, which also provides a more accurate measure of feature similarity, improving location recall. Experiments on standard datasets demonstrate that our method significantly improves state-of-the-art performance. Remarkably, we improve the top-1 location recall rate on the CVUSA dataset by a factor of $1.5\times$ for panoramas with known orientation, by a factor of $3.3\times$ for panoramas with unknown orientation, and by a factor of $6\times$ for $180^{\circ}$-FoV images with unknown orientation.

Experiment Dataset

We use two existing dataset to do the experiments

CVUSA dataset: a dataset in America, with pairs of ground-level images and satellite images. All ground-level images are panoramic images.
The dataset can be accessed from https://github.com/viibridges/crossnet
CVACT dataset: a dataset in Australia, with pairs of ground-level images and satellite images. All ground-level images are panoramic images.
The dataset can be accessed from https://github.com/Liumouliu/OriCNN

Dataset Preparation: Polar transform

Please Download the two datasets from above links, and then put them under the director "Data/". The structure of the director "Data/" should be: "Data/CVUSA/ Data/ANU_data_small/"
Please run "data_preparation.py" to get polar transformed aerial images of the two datasets and pre-crop-and-resize the street-view images in CVACT dataset to accelerate the training speed.

Codes

Codes for training and testing on unknown orientation (train_grd_noise=360) and different FoV.

Training: CVUSA: python train_cvusa_fov.py --polar 1 --train_grd_noise 360 --train_grd_FOV $YOUR_FOV --test_grd_FOV $YOUR_FOV CVACT: python train_cvact_fov.py --polar 1 --train_grd_noise 360 --train_grd_FOV $YOUR_FOV --test_grd_FOV $YOUR_FOV
Evaluation: CVUSA: python test_cvusa_fov.py --polar 1 --train_grd_noise 360 --train_grd_FOV $YOUR_FOV --test_grd_FOV $YOUR_FOV CVACT: python test_cvact_fov.py --polar 1 --train_grd_noise 360 --train_grd_FOV $YOUR_FOV --test_grd_FOV $YOUR_FOV

Note that the test set construction operations are inside the data preparation script, polar_input_data_orien_FOV_3.py for CVUSA and ./OriNet_CVACT/input_data_act_polar_3.py for CVACT. We use "np.random.rand(2019)" in test_cvusa_fov.py and test_cvact_fov.py to make sure the constructed test set is the same one whenever they are used for performance evaluation for different models.

In case readers are interested to see the query images of newly constructed test sets where the ground images are with unkown orientation and small FoV, we provide the following two python scripts to save the images and their ground truth orientations at the local disk:

CVUSA datset: python generate_test_data_cvusa.py
CVACT dataset: python generate_test_data_cvact.py

Readers are encouraged to visit "https://github.com/Liumouliu/OriCNN" to access codes for evaluation on the fine-grained geo-localization CVACT_test set.

Models:

Our trained models for CVUSA and CVACT are available in here.

There is also an "Initialize" model for your own training step. The VGG16 part in the "Initialize" model is initialised by the online model and other parts are initialised randomly.

Please put them under the director of "Model/" and then you can use them for training or evaluation.

Publications

This work is published in CVPR 2020.
[Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching]

If you are interested in our work and use our code, we are pleased that you can cite the following publication:
Yujiao Shi, Xin Yu, Dylan Campbell, Hongdong Li. Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching.

@inproceedings{shi2020where, title={Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching}, author={Shi, Yujiao and Yu, Xin and Campbell, Dylan and Li, Hongdong}, booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, year={2020} }

《Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching》(CVPR 2020)

Related tags

Overview

Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching

Abstract

Experiment Dataset

Dataset Preparation: Polar transform

Codes

Models:

Publications

Owner

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

code for `Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation`

This is the official PyTorch implementation for "Mesa: A Memory-saving Training Framework for Transformers".

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

This repository is the official implementation of Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning (NeurIPS21).

A synthetic texture-invariant dataset for object detection of UAVs

Deep metric learning methods implemented in Chainer

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

IMBENS: class-imbalanced ensemble learning in Python.

Pytorch modules for paralel models with same architecture. Ideal for multi agent-based systems

🔊 Audio and fastai v2

[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

Code for 1st place solution in Sleep AI Challenge SNU Hospital

Deep Surface Reconstruction from Point Clouds with Visibility Information

Principled Detection of Out-of-Distribution Examples in Neural Networks

The source codes for TME-BNA: Temporal Motif-Preserving Network Embedding with Bicomponent Neighbor Aggregation.

Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization

Offical implementation of Shunted Self-Attention via Multi-Scale Token Aggregation

a grammar based feedback fuzzer

Kaggle DSTL Satellite Imagery Feature Detection