Image2GPS
This is a course project for CIS 5190 Applied Machine Learning, taught by Prof. Dinesh Jayaraman. [report]
Predicting GPS coordinates from images is useful in GPS-denied scenarios such as indoor navigation or urban environments. In this project, we explore two practical approaches for image-based GPS localization:
- Regression-based method: an EfficientNet-B0 backbone with a linear regression head that directly predicts latitude and longitude from an input image.
- Retrieval-based method: a more interpretable approach that performs k-nearest-neighbor (k-NN) image retrieval using learned visual feature embeddings.
Both approaches achieve comparable performance. The retrieval-based method adopts a two-stage training strategy that incorporates contrastive loss for representation learning, explicitly aligning the learned feature space with geographic distance. This design improves interpretability by allowing predictions to be explained through retrieved reference images.
We train and evaluate our methods on a self-collected dataset containing over 3k images captured within a designated campus area. Notably, our approaches outperform both a ResNet-based baseline and a naive mean-coordinate predictor by up to 45%. We release our dataset on Hugging Face at rwxyzgao/IMG2GPS_ALL, and our code is available at Image2GPS.