Google’s overhauled Photos app debuted last year with some great tricks — it can organize your photos by location and recognition of faces, animals, venues and landmarks. Usually, if you want to determine where a photo was taken, you can look at its metadata, which often includes GPS coordinates; that’s how Google Photos does it. Last week, Google published an academic paper showing they can train a deep-learning machine to find a photo’s location without using this metadata.
The machine, called PlaNet, was originally trained using the location metadata of 126 million photos taken from the Internet, according to Sophos’ Naked Security blog. The team divided the earth into 26,000 squares, each square’s size depending on the number of photos that were taken there. The PlaNet machine used the first 91 million photos to learn where in the grid each photo was taken. From there, the machine tried to guess or estimate where the remaining 34 million photos were taken.
Of course, it's much easier for the deep-learning machine to determine a photo’s location if a significant landmark appears in it (there’s only one Statue of Liberty, after all). But PlaNet was able to tackle harder-to-place locations as well, using cues like the plants, architecture and animals of a particular grid square.
So how did the machine do with photo location placement when it faced off against humans who have traveled a lot? Just as you would expect: humanity lost. Both PlaNet and 10 well-traveled humans played Geoguessr, which shows you a street panorama, then asks you to pinpoint where in the world you believe the photo was taken. Humans only placed 11 locations in one country, while PlaNet place 17. The Google researchers believe that PlaNet’s higher accuracy is a result of the fact that a machine can “visit” an essentially unlimited number of places and use that data to recognize a new photo’s location. And, it would be highly unlikely that any human would have visit every place in the world and have perfect recall of those places.
The amount of code necessary to perform this task is 377 MB, making it capable of fitting on a smartphone. Now that’s power in the palm of your hand.
Google could use this technology to tie Street View to Photos so categorization becomes more automated; it could also use it to enhance searches.
It’s Google’s PlaNet; we’re just living in it.
[Young woman taking photo with phone via Peter Bernik/Shutterstock]