I Built a GeoGuessr Assistant that reaches 74.8% accuracy

#ai #python #computervision #react

Hallo,

Github Repo: https://github.com/yacine204/geoGuessr_Assistant

So i recently finished my final year project in computer science (3rd year) and the theme was GeoGuessr Assistant which basically gives candidates coordinates hinting where u are in the game.

The Idea

instead of going full AI i went hybrid and focused on the strongest Human-Like clues (road signs and any type of text).

How It Works

the pipeline consits of 5 stages

1/Road Sign Convention Detection:
Fine tuned a YOLOv8m model to detect these classes:
- MUTCD
- VIENNA
- AMBIGUOUS
2/Text Extraction

Used EasyOCR for this step but im still thinking of finetuning it to be able to detect any asian language without having to manually configure the default language.

_easyocr_reader = easyocr.Reader(['en'], gpu=False)
#if i get slavic text per example i have to do this
_easyocr_reader = easyocr.Reader(['ru'], gpu=False)

3/Country filtering Here will be using results of stage 1 and 2. and this stage has 3 stages
- Country Probability Init:
- Init country Probability by its distribution found in this repo: https://github.com/Crrrrrrr/geoguessr-statistics
- Country elimination
- we have to do some math before elimination and that math will just tell us how sure are we for taking that decision

bias = (total_vienna_confidence - total_mutcd_confidence) / (total_vienna_confidence + total_mutcd_confidence)

if bias>0 then we lean to vienna
elif bias<0 then we lean to mutcd
else mixed/ambiguous we do nothing

and to be safe we set a thresh hold thats dependant on bias_confident:
    if bias_confidence>0.7 -> threshold = 0.2
    elif bias_confidence>0.4 -> threshold = 0.3
    else -> threshold = 0.5

finally:
    if bias>threshold -> vienna
    elif bias<-threshold -> mutcd
    else -> hybrid

4/Nominatim this is just to prepare nodes for the last stage but basically we pass the extracted text and we get the type of entities and their POI's
5/OverPass API this is the final step that gets us the coords, we cluster them by 500 km to reduce error, we can change that if we dont trust the ocr that much.

Would love to continue to work on the project by adding more fine tuned weights that detects vegetation, buildings architecture, cars...etc

ps: Github repo README is well documanted than this xD

If you read all this thank u for your time and attention, would love ur feedback!