DEV Community: Nastaran Moghadasi

Why Google Maps Won't Let Gemini Take the Wheel

Nastaran Moghadasi — Wed, 24 Dec 2025 21:03:39 +0000

A few weeks ago, I watched someone try to hammer a screw into a wall! They succeeded, eventually. But the result was ugly, unstable, and took three times longer than it should have. This is roughly what happens when organizations deploy generative AI for problems that don’t need it.

The current discourse around AI has a selection bias problem. We hear constantly about what generative models can do, but we rarely hear about when they shouldn’t be used. This matters because choosing the wrong model architecture isn’t just inefficient. It can be dangerous.

Let me explain through a product you probably use every day.

The Navigation Problem

Consider GPS navigation apps like Google Maps or Waze. These systems solve several distinct problems: classifying traffic patterns, recognizing road signs from imagery, predicting travel times, and computing optimal routes.

Here is a question worth asking. Which of these tasks actually benefits from a generative model?

To answer this, we need to be precise about what generative and discriminative actually mean.

A discriminative model learns the boundary between categories. Given input x, it models P(y∣x) directly, the probability of output y conditioned on input. It asks: given this road segment and current traffic, is the delay 5 minutes or 20 minutes?

A generative model learns the joint distribution P(x,y) or, in the case of modern large language models, models the probability of the next token given the context. It can generate new samples that resemble the training distribution. It asks: what would a plausible route description look like?

P (x_{t + 1} ∣ x_{1}, ..., x_{t})

The distinction matters. These models have fundamentally different failure modes.

Why Discriminative Models Win for Pathfinding

When you ask Google Maps for directions to the airport, the blue line on your screen is computed by algorithms that have nothing to do with generative AI. The underlying computation is typically some variant of A* search or Dijkstra’s algorithm. It is potentially augmented by graph neural networks for traffic prediction.

This is precisely the kind of problem where we don’t want creativity. We want the global optimum.

This is the correct architectural choice, and here is why.

Ground truth constraints are non-negotiable. When you are driving at 100 km/h, you need the system to discriminate between a drivable road and a pedestrian path. The model must be bound by the physical reality of the road network. A generative model might produce a “plausible-looking” route that happens to include a road segment that was closed for construction because it fits the statistical pattern of routes it has seen, not because it is actually traversable.

The problem has a well-defined optimum. Route planning is a graph optimization problem. Given edge weights (travel times, distances, fuel costs), we want the path that minimizes total cost. This is precisely the kind of problem where we don’t want creativity. We want the global optimum. We have principled algorithms that can find it efficiently.

Reliability dominates over delight. Generative AI excels when surprising outputs are valuable. In creative writing, an unexpected metaphor might be brilliant. In navigation, an unexpected route means you are lost in an unfamiliar neighborhood or late for your flight.

Imagine a generative navigation system. You ask for directions to the airport, and it suggests a route through what it imagines would be a scenic park road. It does this because in its training data, parks and pleasant drives co-occur frequently. The fact that this particular park has no through-road is lost in the statistical averaging.

The model has optimized for plausibility, not correctness.

Wait, Didn’t Google Just Add Gemini to Maps?

Yes. And this is exactly where the nuance lives.

In late 2024, Google integrated Gemini into Google Maps. The key question is: what is Gemini actually doing?

The answer reveals how thoughtful architecture works in practice.

What Gemini Does

Gemini powers the natural language interface and contextual descriptions. When the app tells you “turn right after the blue Thai restaurant,” that instruction was generated by a multimodal model. It analyzed Street View imagery and synthesized a human-friendly landmark description.

This is genuinely generative work. It involves creating natural language from raw data (GPS coordinates, business names, visual features). The model is generating descriptions that don’t exist in any database. It is reasoning about what would be most salient to a human driver scanning the street.

Gemini also powers semantic search. When you type “cozy cafe with parking near me,” you are querying with natural language that needs to be interpreted, not just matched against keywords. This is where the world knowledge of a generative model becomes valuable.

Crucially, this generation is securely tethered to reality through a process called grounding. As the specific documentation for Grounding with Google Maps explains, when a user’s query contains geographical context, the Gemini model invokes the Maps API as a source of truth. The model then generates responses grounded in actual Google Maps data relevant to the location, rather than relying solely on its training weights.

What Gemini Doesn’t Do

Gemini does not compute your route.

The actual pathfinding (the blue line, the ETA, the turn-by-turn sequence) is still handled by optimization algorithms and graph neural networks. These systems take the road network as a constraint rather than a suggestion. They find the minimum-cost path through a graph, with edge weights updated by discriminative models that predict traffic delays.

The architecture looks something like this:

This is a hybrid architecture. It is hybrid for a reason.

The Safety Principle

Here is the key insight:

Google uses Gemini for the description but not for the direction.

If Gemini were allowed to be creative with actual routing decisions, it might hallucinate that a pedestrian bridge is a valid vehicle crossing. Visually, bridges that cars use and bridges that only pedestrians use look similar. The model has learned associations, not physical constraints.

By constraining generative models to the interface layer (natural language input/output, landmark descriptions, semantic search) while keeping pathfinding in the realm of constrained optimization, the system gets the benefits of both paradigms.

The generative model makes the experience feel natural and human.
The discriminative and optimization models keep you on actual roads.

This isn’t a limitation of generative AI. It is appropriate scoping.

The Broader Principle

Navigation is just one example. The same logic applies across domains.

Medical diagnosis: You probably want a discriminative model that estimates rather than a generative model that produces plausible-sounding diagnoses. The latter might generate confident text about a condition the patient doesn’t have.

Fraud detection: The goal is to discriminate between legitimate and fraudulent transactions. A generative model might be useful for creating synthetic training data, but the production classifier should be discriminative.

Structural engineering: When computing whether a bridge can support a given load, you want physics simulations and finite element analysis. You do not want a model that generates realistic-looking stress distributions.

The pattern is clear. When the problem has a ground truth that must be respected, when there is a well-defined optimum, and when creative outputs would be failures rather than features, discriminative models and traditional optimization often outperform generative approaches.

When Generative AI Is the Right Choice

To be clear, generative AI is genuinely transformative for many problems.

Open-ended content creation: writing, art, music, code generation.
Natural language interfaces: making complex systems accessible through conversation.
Semantic understanding: interpreting intent, summarizing documents, answering questions.
Synthetic data generation: creating training examples for rare cases.
Exploration and ideation: when you want novelty and surprise.

The question isn’t whether generative AI is powerful. It demonstrably is. The question is whether it is appropriate for the specific problem you are solving.

The Meta-Point

We are in a moment where generative AI is being treated as a universal solution. The commercial pressure to add AI to every product is immense. But good engineering has always been about choosing the right tool for the job.

The most sophisticated AI systems today are hybrids. They use generative models where creativity and natural language matter, discriminative models where classification accuracy matters, and traditional algorithms where mathematical guarantees matter. The skill is in knowing which is which.

The next time someone proposes adding a large language model to a system, it is worth asking: what problem are we actually solving, and is a generative model the right tool for it?

Sometimes the answer is yes. And sometimes, you just need a good classifier and a well-tuned Dijkstra implementation.

References

Derrow-Pinion, A., She, J., Wong, D., Lange, O., Hester, T., Perez, L., Nunkesser, M., Lee, S., Guo, X., Wiltshire, B., Battaglia, P. W., Gupta, V., Li, A., Xu, Z., Sanchez-Gonzalez, A., and Li, Y. 2021. ETA Prediction with Graph Neural Networks in Google Maps. arXiv preprint arXiv:2108.11482. https://arxiv.org/abs/2108.11482

Dijkstra, E. W. 1959. A Note on Two Problems in Connexion with Graphs. Numerische Mathematik, 1(1), 269–271. https://eudml.org/doc/131436

Google. 2024. Grounding with Google Maps. Google AI for Developers. https://ai.google.dev/gemini-api/docs/maps-grounding

Google. 2024. New ways to get around with Google Maps, powered by AI. The Keyword (Google Blog). https://blog.google/products/maps/gemini-navigation-features-landmark-lens/

Hart, P. E., Nilsson, N. J., and Raphael, B. 1968. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Transactions on Systems Science and Cybernetics, 4(2), 100–107. https://ieeexplore.ieee.org/document/4082128

Ng, A. Y. and Jordan, M. I. 2002. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes. Advances in Neural Information Processing Systems 14. https://papers.nips.cc/paper/2001/hash/7b7a53e239400a13bd6be6c91c4f6c4e-Abstract.html

How AI Knows a Cat Is Like a Dog: An Intuitive Guide to Word Embeddings

Nastaran Moghadasi — Mon, 22 Dec 2025 08:53:51 +0000

Have you ever wondered how a computer knows that a cat is more like a dog than a car?

To a machine, words are just strings of characters or arbitrary ID numbers. But in the world of Natural Language Processing, we’ve found a way to give words a home in a multi-dimensional space. In this space, the neighbors are their semantic relatives.

In this post, we’ll explore the fascinating world of word embeddings. We’ll start with the intuition (no deep technical dives) and build up a clear understanding of what word embeddings really are (along with code), and how they enable AI systems to capture meaning and relationships in human language.

The Magic of Word Math: Static Embeddings

Imagine if you could do math with ideas. The classic example in the world of embeddings is:

King - Man + Woman ≈ Queen

This isn’t just a clever trick! This is the power of Static Embeddings like GloVe (Global Vectors for Word Representation).

GloVe works by looking at massive amounts of text to see how often words appear near each other. It then assigns each word a fixed numerical vector. Because these vectors represent the “meaning”, words that are semantically similar end up close together.

King is closer to queen than man or woman.

The Bank Problem: When One Vector Isn’t Enough

As powerful as static models like GloVe are, they have a blind spot called polysemy: words with multiple meanings.

Think about the word “bank”:

I need to go to the bank to deposit some money. (A financial institution).
We sat on the bank of the river. (The edge of a river).

Bank vs river bank (two different meanings of one word).

In a static model like GloVe, a bank has one single, fixed vector. This single meaning is an average across all contexts the model saw during training. This means the model can’t truly distinguish between a place where you keep your savings and the grassy side of a river.

The Solution: Contextual Embeddings with BERT

This is where Dynamic or Contextual Embeddings, like BERT (Bidirectional Encoder Representations from Transformers), have changed the game. Unlike GloVe, BERT doesn’t just look up a word in a fixed dictionary. It looks at the entire sentence to generate a unique vector for a word every single time it appears.

When BERT processes our two bank sentences, it recognizes the surrounding words (like “river” or “deposit”) and generates two completely different vectors. It understands that the context changes the core identity of the word.

Here is the simple usage of BERT with PyTorch in code:

import torch
from transformers import BertTokenizer, BertModel

# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained('./bert_model')
model_bert = BertModel.from_pretrained('./bert_model')

# Prevent training
model_bert.eval()

def print_token_embeddings(sentence, label):
    """
    Tokenizes a sentence, runs it through BERT,
    and prints the first 5 values of each token's embedding.
    """
    # Tokenize input
    inputs = tokenizer(sentence, return_tensors="pt")

    # Forward pass
    with torch.no_grad():
        outputs = model_bert(**inputs)

    # Contextual embeddings for each token
    embeddings = outputs.last_hidden_state[0]

    # Convert token IDs back to readable tokens
    tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])

    # Print token + partial embedding
    print(f"\n--- {label} ---")
    for token, vector in zip(tokens, embeddings):
        print(f"{token:<12} {vector.numpy()[:5]}")

# Example sentences
sentence1 = "I went to the bank to deposit money."
sentence2 = "We sat on the bank of the river."

# Compare contextual embeddings
print_token_embeddings(sentence1, "Sentence 1")
print_token_embeddings(sentence2, "Sentence 2")

Output

The output shows that BERT assigns different vectors to the word bank based on its surrounding context.

Sentence 1: I went to the bank to deposit money.

Sentence 2: We sat on the bank of the river.

Which Model Should You Use?

Choosing the right embedding depends entirely on your specific task and your available computational resources.

Static Embeddings (like GloVe) are the best choice when you need a fast, computationally lightweight solution with a small memory footprint. They are perfect for straightforward tasks like document classification, where the broader meaning of words is usually sufficient.

On the other hand, Contextual Embeddings (such as BERT) are necessary when your task requires a deep understanding of language and ambiguity, such as question answering or advanced chatbots. They excel at handling words with multiple meanings, which is often the key to an application’s success. However, keep in mind that they require more computational power and a larger memory footprint.

Wrapping Up

Embeddings are the foundation of how AI reads and processes our human world. Whether you are using a pre-trained model like BERT or building a simple embedding model from scratch using PyTorch’s nn.Embedding layer, you are essentially building a bridge between human thought and machine calculation.

What do you think? If you were training a model from scratch today, what specific vocabulary or niche topic would you want it to learn first? Let me know in the comments 👇.

Note: All illustrations in this post were generated using DALL·E 3.

Quick Quiz

Let’s test your understanding. Share your answer in the comments 👇.

How does text data differ from image data in machine learning?

References

Devlin, J. et al. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
Stanford NLP Group. GloVe: Global Vectors for Word Representation.
Spot Intelligence. GloVe Embeddings Explained.

Why Your Model is Failing (Hint: It’s Not the Architecture)

Nastaran Moghadasi — Thu, 18 Dec 2025 20:51:34 +0000

We’ve all been there. You spend days tuning hyperparameters and tweaking your architecture, but the loss curve just won’t cooperate. In my experience, the difference between a successful project and a failure is rarely the model architecture. It’s almost always the data pipeline.

I recently built a robust data pipeline solution for a private work project. While I can’t share that proprietary data due to privacy reasons, the challenges I faced are universal: messy file structures, proprietary label formats, and corrupted images.

To show you exactly how I solved them, I’ve recreated the solution using the Oxford 102 Flowers dataset. It is the perfect playground for this because it mimics real-world messiness: over 8,000 generically named images with labels hidden inside a proprietary MATLAB (.mat) file rather than nice, clean category folders.

Here is the step-by-step guide to building a bugproof PyTorch data pipeline that handles the mess so your model doesn’t have to.

1. The Strategy: Lazy Loading & The Off-by-One Trap

If you can’t reliably load your data, nothing else matters.

For this pipeline, I built a custom PyTorch Dataset class focused on lazy loading. Instead of loading all 8,000+ images into RAM at once, we store only the file paths during setup (init) and load the actual image data on-demand (getitem).

A critical lesson learned: Watch out for indexing errors. The Oxford dataset uses 1-based indexing for its labels, but PyTorch expects 0-based indexing. Catching this off-by-one error early saves you from training a perpetually confused model.

The Dataset Skeleton
Here is the core structure we need to implement:

from torch.utils.data import Dataset

from PIL import Image

class FlowerDataset(Dataset):

    def __init__(self, img_paths, labels, transform=None):

        self.img_paths = img_paths

        # Adjust for 0-based indexing if your source is 1-based

        self.labels = labels - 1 
        self.transform = transform

    def __len__(self):
        return len(self.img_paths)

    def __getitem__(self, idx):
        # Lazy loading happens here

        img = Image.open(self.img_paths[idx]).convert('RGB')
        label = int(self.labels[idx])

        if self.transform:
            img = self.transform(img)

        return img, label

2. Consistency: The Pre-processing Pipeline

Real-world data is rarely consistent. In the Flowers dataset, images have wildly different dimensions (e.g., 670x500 vs 500x694). PyTorch batches require identical dimensions, so we need a rigorous transform pipeline.

I strictly avoid simple resizing, which distorts the image. Instead, I use a Resize on the shorter edge to preserve the aspect ratio, followed by a CenterCrop to get our uniform square. Finally, we convert to tensors and normalize pixel intensity from 0-255 down to 0-1.

from torchvision import transforms

# Standard normalization stats

mean = [0.485, 0.456, 0.406]

std = [0.229, 0.224, 0.225]

base_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=mean, std=std),
])

And here is the output for a sample image:

3. Augmentation: Endless Variation, Zero Extra Storage

One of the biggest advantages of PyTorch’s on-the-fly augmentation is that it provides endless variation without taking up extra storage.

By applying random transformations (flips, rotations, and color jitters) only when the image is loaded during training, the model sees a slightly different version of the image every epoch. This forces the model to learn essential features like shape and color rather than memorizing pixels.

Note: Always disable augmentation for validation and testing to ensure your metrics reflect actual performance improvements.

4. The Bugproof Pipeline: Handling Corrupted Data

This is the part that usually gets overlooked in tutorials but is vital in production. A single corrupted image can crash a training run hours after it starts.

To fix this, we update the getitem method to be resilient. If it encounters a bad file (corrupted bytes, empty file, etc.), it shouldn’t crash. Instead, it should log the error and recursively call itself to fetch the next valid image.

Here is the pattern I use:

def __getitem__(self, idx):
        try:
            img = Image.open(self.img_paths[idx]).convert('RGB')
            if self.transform:
                img = self.transform(img)

            self.access_counts[idx] += 1
            return img, int(self.labels[idx])

        except Exception as e:
            self.log_error(idx, str(e))
            # Recursively skip to the next valid sample

            new_idx = (idx + 1) % len(self)
            return self.__getitem__(new_idx)

5. Telemetry: Know Your Data

Finally, I added basic telemetry to the pipeline. By tracking load times and access counts, you can identify if specific images are dragging down your training throughput (e.g., massive high-res files) or if your random sampler is neglecting certain files.

In my implementation, if an image takes longer than 1 second to load, the system warns me. After training, I print a summary like:

Total images: 8,189

Errors encountered: 2

Average load time: 7.8 ms

Summary

If you are shipping models to production, you need to invest as much time in your data pipeline as you do in your model architecture.

By implementing lazy loading, consistent transforms, on-the-fly augmentation, and robust error handling, you ensure that your sophisticated neural network isn’t being sabotaged by a broken data strategy.

How to Read a Scientific Paper Efficiently: A 5-Step Guide

Nastaran Moghadasi — Wed, 17 Dec 2025 02:05:37 +0000

I’ve found that one of the most challenging yet rewarding skills to develop is the ability to effectively read and understand scientific research papers. It’s a journey of deconstructing complex ideas, and I’m excited to share what I’ve learned with you. This blog post is a guide for anyone who wants to dive into the world of academic research, inspired by the insights from Somdip Dey’s “A Beginner’s Guide to Computer Science Research.” [1]

The Art of Reading a Scientific Paper

Reading a research paper is not like reading a novel or a news article. It’s an active process that requires a systematic approach. The goal is not just to read the words on the page but to engage with the ideas, critically evaluate the research, and connect it to your own knowledge and interests. Here’s a breakdown of a five-step process that can help you navigate the dense world of academic literature.

Step 1: Find Your Focus

The first and most crucial step is to choose a subject area that genuinely interests you. Research requires a significant investment of time and effort, and your passion for the topic will be the fuel that keeps you going. Instead of blindly following trends or suggestions, take the time to explore different areas and find what truly excites you. This personal connection to the subject matter will make the entire process more enjoyable and sustainable.

Step 2: Master the Art of the Search

Once you have a topic in mind, the next step is to find relevant research papers. While a simple Google search might be your first instinct, it’s essential to use scholarly databases and search engines to ensure the credibility of the sources. Here are some of the most valuable resources for computer science research:

Search Engine/Database Description Google Scholar A freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines. Microsoft Academic Search A free public web search engine for academic publications and literature, developed by Microsoft Research. ACM Digital Library A research, discovery and networking platform containing the full text of every article ever published by ACM. IEEEXplore A digital library providing access to scientific and technical content published by the Institute of Electrical and Electronics Engineers (IEEE) and its publishing partners. DBLP A computer science bibliography website that provides open bibliographic information on major computer science journals and proceedings. Scopus A multidisciplinary bibliographic database containing abstracts and citations for academic journal articles. ScienceDirect A leading full-text scientific database offering journal articles and book chapters from more than 2,500 peer-reviewed journals and more than 11,000 books.

When searching, use a combination of broad and specific keywords related to your topic. Keep a running list of these keywords, as they will be invaluable for future searches.

Step 3: Organize and Categorize

As you start collecting papers, you’ll quickly realize that not all research is the same. Dey suggests categorizing papers into two main types: argumentative and analytical. Argumentative papers present a new idea and provide evidence to support it, while analytical papers offer a new perspective or analysis of an existing topic. Understanding the type of paper you’re reading will help you better grasp the author’s intent.

To manage your growing library of papers, consider using reference management software like EndNote, or BibDesk. These tools can help you organize your references, take notes, and even visualize the connections between different papers.

Step 4: The Three-Pass Approach

Reading a paper effectively is a multi-step process. A popular method, also referenced by Dey, is the “three-pass approach” proposed by S. Keshav. [2] This method involves reading the paper three times, with each pass having a different goal:

The First Pass (5-10 minutes): This is a quick scan to get a general idea of the paper. Read the title, abstract, and introduction. Glance at the section and sub-section headings. Read the conclusions. This will give you a high-level overview of the paper’s contribution.
The Second Pass (up to an hour): In this pass, you’ll read the paper more carefully, but you can still ignore the finer details like proofs. Pay attention to the figures, diagrams, and illustrations. This will help you understand the context of the work and the evidence presented.
The Third Pass (4-5 hours for beginners): This is the most detailed pass. The goal here is to understand the paper in its entirety. You should be able to mentally re-implement the paper’s ideas and identify its strengths and weaknesses.

Step 5: The Art of Critical Thinking

Reading a paper is not a passive activity. The final and most important step is to critically engage with the content. As you read, ask yourself the following questions:

What is the core problem the paper is trying to solve?
What is the proposed solution?
How is the solution evaluated? Are the benchmarks and evaluations fair?
What are the underlying assumptions made by the authors?
What are the limitations of the research?
Can the work be improved? What are the potential avenues for future research?

Answering these questions will not only deepen your understanding of the paper but also help you generate your own ideas and contribute to the conversation.

The Journey Continues

Reading research papers is a skill that takes time and practice to develop. Don’t be discouraged if you find it challenging at first. By following a systematic approach and actively engaging with the material, you’ll gradually build the confidence and expertise to navigate the world of academic research. This journey of a thousand papers begins with a single, well-read one.

References

[1] Dey, S. (2014). A Beginner’s Guide to Computer Science Research. XRDS: Crossroads, The ACM Magazine for Students, 20(4), 14-15.

[2] Keshav, S. (2007). How to Read a Paper. ACM SIGCOMM Computer Communication Review, 37(3), 83-84.