Image2GPS

This is a course project for CIS 5190 Applied Machine Learning, taught by Prof. Dinesh Jayaraman. [report]

Predicting GPS coordinates from images is useful in GPS-denied scenarios such as indoor navigation or urban environments. In this project, we explore two practical approaches for image-based GPS localization:

  • Regression-based method: an EfficientNet-B0 backbone with a linear regression head that directly predicts latitude and longitude from an input image.
  • Retrieval-based method: a more interpretable approach that performs k-nearest-neighbor (k-NN) image retrieval using learned visual feature embeddings.

Both approaches achieve comparable performance. The retrieval-based method adopts a two-stage training strategy that incorporates contrastive loss for representation learning, explicitly aligning the learned feature space with geographic distance. This design improves interpretability by allowing predictions to be explained through retrieved reference images.

We train and evaluate our methods on a self-collected dataset containing over 3k images captured within a designated campus area. Notably, our approaches outperform both a ResNet-based baseline and a naive mean-coordinate predictor by up to 45%. We release our dataset on Hugging Face at rwxyzgao/IMG2GPS_ALL, and our code is available at Image2GPS.

Heatmap and GPS points distribution plot of our self-collected dataset.
Training and validation loss curves of EfficientNet models.
k-NN regression performance with different backbones and k.
Comparison of k-NN results before and after feature space shaping.