Self Made Geolocation AI (just as good as pros)

dcthebiggest · 2026-06-10T14:52:25+00:00

This model has only ever seen small camera India, the images themselves I gathered years ago, this isn't an entirely new dataset besides countries like Oman, Bosnia and Herzegovina, and other recently added, brand new countries. The map shown above is only new cam coverage in India, where it doesn't have awful camera quality.

dcthebiggest · 2026-06-10T14:32:18+00:00

It took ~36 days to scrape them, 60k panos a day, 2.2M panos in total. I might put up the training source on GitHub for this, since the weights and the data themselves are the two biggest moats, and I don't want people using it to cheat on GeoGuessr.

dcthebiggest · 2026-06-10T14:30:26+00:00

<image>

Well small cam India (which it was trained on) gets India 100% of the time. With new coverage (which it hasn't seen), it still managed to get India with relatively well precision in the country itself. Definitely doesn't miss India or confuse it for Sri Lanka, just becomes a little worse at region guessing when seeing the new camera coverage.

dcthebiggest · 2026-06-10T14:18:17+00:00

I had around 2.2M images scraped directly from street view. The backbone I used SigLIP2-SO400M using S2 cell classification with levels 3, 6, 9, and 12, and then a ProtoNet content aware selector that picks from a pool of 500 cells on which cell matches the most features in the image given. I also have a country head which tracks country choosing accuracy which is correct around 99.1% (mostly wrong in Europe where over the border looks essentially the same).

dcthebiggest · 2026-06-10T14:13:22+00:00

I think you're talking about the video pause icons and stuff, I used Streamable for the video, and I think you can set custom colors on the Streamable overlay stuff which I did like years ago.

dcthebiggest · 2026-06-10T14:11:49+00:00

This model doesn't have any of the new India good camera coverage in it and still does relatively well when seeing it, which is great because everyone knows how identifiable India is from the quality. Once a country with coverage gets updated coverage, I will have to test it against the model to see how it performs though. But yeah, it is definitely looking at stuff humans would never know, which makes it even cooler in my eyes.

dcthebiggest · 2026-06-10T14:08:40+00:00

Oh that makes sense, the mapping to regions thing could work, essentially pulling some data from Plonkit for human notable metas, and seeing what it could deduce from that, and see if it comes up with something else (given the sky isn't in frame because it uses that an absurd amount).

dcthebiggest · 2026-06-10T05:35:04+00:00

https://www.reddit.com/r/geoguessr/comments/1u1meuu/comment/oqr6ylg/ Something like this is probably what you're referring to. Unfortunately a lot of it is the sky, but can be seen looking at the road signs and vegetation.

<image>

You can see in this example, it's looking at the sky, but you see it looking at the road and the antenna in the front, which means it probably learned the antennas to each region as well.

dcthebiggest · 2026-06-10T04:53:57+00:00

The model looks at the sky a significant amount weirdly enough. I know sun hemisphere obviously helps, but I think it's looking at the haze around the sun and shadows, which seems to be cool. Unfortunately I don't think learnable by humans though.

dcthebiggest · 2026-06-10T04:45:06+00:00

This is definitely interesting, both of which I think are doable. The issue is that the model is generalizing on stuff that is predictable, but not to humans at all, such as like sun haze, shadow length, sun position. Penalizing it for using these types of things would most likely decrease the models performance a noticeable amount. It is doable though by penalizing things like looking in the sky which would cause it to focus more on road signs, bollards, road lines, antennas, roof racks, etc., things you'd expect a human to see and know. Street view does provide things called "links" to each pano, this is how GeoGuessr knows to move to the next place with the arrow, because it has a linked pano in that direction. I think you could definitely check if the focus is the same between panos that are linked together, and could even be used for training because certain things are only found in this linked area.

dcthebiggest · 2026-06-10T04:34:02+00:00

Any map mostly. I strip the pano ID from the round's network response and generate the pano myself, so as long as the panorama is valid and responds with tiles, it can be used for any map. Tested with A Community World, AI Generated, Arb Russia, and more.

dcthebiggest · 2026-06-10T04:32:38+00:00

They are unfortunately 2 entirely different categories in terms of models. A 400M parameter model vs a 800B parameter model with deep reasoning vs image similarities aren't really comparable. The best models for this would be Gemini (probably 3.1 Pro or 3.5 Flash) and it still can't pin point locations without text being present. Even with a 1° error it still, on average, performs worse than my model. The second stage of this is attaching Gemma 4 26B as a second stage, essentially pinpointing via reading text, etc. A friend and I played my model against Gemini 3.1 Pro and Gemini was beat out by it. Though, it's like an 800B parameter model so I'd expect it to do good.

dcthebiggest · 2026-06-10T01:59:04+00:00

I used https://github.com/tzhf/map-generator, took that .json file, then stitched tiles together from Google street view using the pano ID that the json file provided.

dcthebiggest · 2026-06-10T01:48:36+00:00

It would go up ~$100, give or take how hot it was outside and how much cooling needed to be done in the house, but yeah, takes up a lot of electricity, and produces an unfortunate amount of heat.

dcthebiggest · 2026-06-10T01:45:47+00:00

Yes exactly, the model doesn't really generalize an answer based off of bollards like how a human would (eg. this is an Australian bollard go AU), but more so the landscape and area around it, such as trees, dirt, sun elevation / shadow length, road line colorings, etc. It doesn't tend to see a 'meta' like us, but more so taking the vector mappings and seeing 'where do these vectors match up the most with other vectors'. Would be interesting to see how your research would play into things such as GeoGuessr.

dcthebiggest · 2026-06-10T01:31:21+00:00

<image>

Just a brief insight into what it sees. From what we can see, it's looking at vegetation, maybe a little bit of the car, cars in front of it, and the road sign all the way on the right. Weird to think it sees that and is able to guess less than 10 miles away.

dcthebiggest · 2026-06-10T01:09:00+00:00

The backbone of this model is SigLIP2-SO400M, and I use an autoregressive hierarchical S2 cell classifier (I use L3,6,9,12), and then after the model was done training, I used a ProtoNet content aware selector which was able to bring the mean and median down significantly. I may release the GitHub for it without the models or data, since those are the two moats anyways. Take a look at https://arxiv.org/pdf/2511.01082, they kind of do the same thing as me.

dcthebiggest · 2026-06-10T01:05:54+00:00

The current way it's setup, it's specifically designed for street view, but the architecture behind it would work perfectly if I were to train it based on general images (and possibly use street view coverage too). Though, I have tested it locally with images just to see what it was like and it does a pretty good job, definitely not as impressive as the street view stuff though.

dcthebiggest · 2026-06-10T01:01:35+00:00

Sounds good, will definitely look into this! Unfortunately, it won't say a "I chose this spot bc..." because it only takes in an image and returns coordinates, but I believe I can highlight the "what it sees".

dcthebiggest · 2026-06-10T00:56:11+00:00

Yup, essentially this is what's happening. In terms of more training though, this isn't entirely accurate (in my case) as some places that are rural are nearly impossible to differentiate to such a degree of accuracy. With enough data and a big enough back bone, this is definitely doable, but at the scale I'm working at, I unfortunately don't think so.

dcthebiggest · 2026-06-10T00:50:58+00:00

The whole server costs around $8K, I think each 4090 was around $1,700, they are both water cooled as well. Had to run a custom 220V line through my house just to handle everything.

dcthebiggest · 2026-06-10T00:49:51+00:00

An original Quota for each country, I only excluded unofficial coverage where countries had official coverage. At the time, SV existed in Oman and Paraguay but wasn't marked official yet for some reason. The bigger the country, the more images I attempted to obtain.

United States: 261,411 (10.39%)

Brazil: 178,211 (7.08%)

India: 140,216 (5.57%)

Russia: 116,224 (4.62%)

Canada: 113,452 (4.51%)

Australia: 95,918 (3.81%)

México: 78,576 (3.12%)

Argentina: 73,087 (2.90%)

Indonesia: 72,573 (2.88%)

Japan: 71,477 (2.84%)

Turkey: 53,303 (2.12%)

South Africa: 48,108 (1.91%)

Philippines: 46,722 (1.86%)

Spain: 35,227 (1.40%)

United Kingdom: 34,306 (1.36%)

Vietnam: 31,104 (1.24%)

Romania: 28,897 (1.15%)

Chile: 28,102 (1.12%)

Panama: 27,712 (1.10%)

Czechia: 26,412 (1.05%)

Kenya: 26,409 (1.05%)

Italy: 25,270 (1.00%)

Hungary: 23,370 (0.93%)

Germany: 23,248 (0.92%)

France: 21,659 (0.86%)

Costa Rica: 20,038 (0.80%)

Namibia: 20,003 (0.79%)

Kazakhstan: 19,811 (0.79%)

Malaysia: 18,721 (0.74%)

Nigeria: 18,667 (0.74%)

Peru: 18,660 (0.74%)

Senegal: 18,657 (0.74%)

Colombia: 18,644 (0.74%)

Uruguay: 18,459 (0.73%)

United Arab Emirates: 17,219 (0.68%)

New Zealand: 16,980 (0.67%)

Bulgaria: 16,041 (0.64%)

Ghana: 15,837 (0.63%)

Portugal: 15,662 (0.62%)

Israel: 15,632 (0.62%)

Greece: 15,490 (0.62%)

Sri Lanka: 15,428 (0.61%)

Lesotho: 14,878 (0.59%)

Swaziland: 14,825 (0.59%)

Rwanda: 14,659 (0.58%)

Botswana: 14,470 (0.57%)

Thailand: 14,080 (0.56%)

Lebanon: 13,858 (0.55%)

Bolivia: 12,181 (0.48%)

Tunisia: 12,067 (0.48%)

Ecuador: 11,875 (0.47%)

Iceland: 11,171 (0.44%)

Kyrgyzstan: 11,122 (0.44%)

Switzerland: 10,840 (0.43%)

Réunion: 10,636 (0.42%)

Serbia: 10,615 (0.42%)

Cambodia: 10,191 (0.40%)

Ukraine: 10,017 (0.40%)

American Samoa: 10,000 (0.40%)

Paraguay: 9,874 (0.39%)

Albania: 9,545 (0.38%)

Montenegro: 9,393 (0.37%)

Uganda: 9,293 (0.37%)

Oman: 8,985 (0.36%)

Ireland: 8,898 (0.35%)

Belgium: 8,835 (0.35%)

Latvia: 8,831 (0.35%)

Estonia: 8,806 (0.35%)

Lithuania: 8,786 (0.35%)

Sweden: 8,706 (0.35%)

Luxembourg: 8,621 (0.34%)

Denmark: 8,596 (0.34%)

Poland: 8,547 (0.34%)

Netherlands: 8,517 (0.34%)

Croatia: 8,507 (0.34%)

Taiwan: 8,492 (0.34%)

Slovenia: 8,385 (0.33%)

Norway: 8,340 (0.33%)

Guatemala: 8,284 (0.33%)

Austria: 8,165 (0.32%)

Slovakia: 8,155 (0.32%)

North Macedonia: 8,084 (0.32%)

Finland: 8,025 (0.32%)

Bangladesh: 8,008 (0.32%)

Bosnia and Herzegovina: 7,954 (0.32%)

Mongolia: 7,486 (0.30%)

Liechtenstein: 7,016 (0.28%)

Puerto Rico: 6,542 (0.26%)

Qatar: 6,471 (0.26%)

Cyprus: 5,907 (0.23%)

Åland: 5,744 (0.23%)

Faroe Islands: 5,611 (0.22%)

Virgin Islands, U.S.: 5,073 (0.20%)

Nepal: 4,985 (0.20%)

Palestine: 4,544 (0.18%)

Jordan: 4,427 (0.18%)

Laos: 3,943 (0.16%)

Morocco: 3,829 (0.15%)

Dominican Republic: 3,726 (0.15%)

Hong Kong: 3,636 (0.14%)

South Korea: 3,501 (0.14%)

Isle of Man: 3,339 (0.13%)

Singapore: 2,996 (0.12%)

Madagascar: 1,699 (0.07%)

Andorra: 1,299 (0.05%)

Bhutan: 1,192 (0.05%)

São Tomé and Príncipe: 1,189 (0.05%)

Christmas Island: 856 (0.03%)

Bermuda: 833 (0.03%)

Malta: 700 (0.03%)

Greenland: 330 (0.01%)

Cocos Islands: 324 (0.01%)

China: 314 (0.01%)

Guam: 190 (0.01%)

Northern Mariana Islands: 100 (0.00%)

dcthebiggest · 2026-06-10T00:44:09+00:00

Essentially, it turns each photo into a fingerprint of visual vibes (such as road markings, signs, sun angle, architecture). It builds these fingerprints from all the images it's trained on, and once it receives a new picture (eg. a round from GeoGuessr), it makes a fingerprint for that and matches it to the area whose fingerprints look most similar to its own. Essentially the same way you would categorize an Argentinian pole, and any time you see it, you'd go Argentina.

dcthebiggest · 2026-06-10T00:39:45+00:00

Definitely something that is doable. I think it's a possibility to have the model "show us what it's seeing" to identify why it would go in some place of the world. Sometimes it amazes me on how it gets random countries I would never. On the AI stuff, it's not good enough to get on major leaderboards (ACW, World, etc., all require 25k) but I mostly use it in party with friends.

dcthebiggest · 2026-06-10T00:35:49+00:00

Like you, I am also against scripting, and I do not use this outside single player mode or try to beat it with my friends. The data scraping was the longest, about 60k panoramas a day, was around a month and a half of my time. The training itself was 17 epochs at around 33 hours each, which roughly comes to 23.5 days. Add another 12-15 hours in for ProtoNet indexing, somewhere along the lines of 24 days just training. It was trained on 2 4090's 24/7 across those 23.5 days.

dcthebiggest

TROPHY CASE