Nicholas

Posted on Jun 29

Geospatial Prompt Engineering: A Technical Guide for Google Earth Engine Developers

Let's Start

Prompt engineering is the technique of designing inputs that guide Large Language Models (LLMs) toward accurate, useful, and structured outputs. At its core, an LLM is a prediction engine — it takes sequential text as input and predicts the most probable next token, iterating this process until a response is complete. The quality of that prediction is directly shaped by the clarity, structure, and context of your prompt.

For developers working in the geospatial domain — particularly with platforms like Google Earth Engine (GEE) — prompt engineering is a fundamental skill. Geospatial tasks involve complex multi-step pipelines, structured data formats (GeoJSON, FeatureCollections, ImageCollections), domain-specific APIs, and outputs that often need to feed directly into downstream code or analytical systems.

In this article, we're going to use Google AI Studio, Gemini 3 Flash, and the prompting area as our core Playground environment, where you can prototype ideas dynamically and explore the robust set of advanced configuration tools to master the fundamentals of modern prompt engineering. Whether you are writing your first GEE script or building a production geospatial pipeline, the techniques here will help you work faster, produce cleaner code, and extract more reliable insights from AI assistants. So head over to Google AI Studio.

LLM Output System Configuration

Before writing a single prompt, you need to understand how to configure the model itself. The same prompt can produce dramatically different results depending on your sampling settings. There are four key parameters to control.

Temperature governs randomness in token selection. At temperature 0 (greedy decoding), the model always selects the highest-probability token — output is deterministic and focused. At higher temperatures (approaching 1.0 and beyond), the model explores lower-probability tokens, producing more creative or diverse responses. For geospatial tasks that require precise code or factual answers — such as writing an Earth Engine reducer or parsing a FeatureCollection — temperature 0 is almost always the right choice. For brainstorming variable names, writing a methodology narrative, or generating diverse test cases, a temperature of 0.7–0.9 gives you more variety.

Top-K restricts token selection to the K most probable candidates at each step. A low top-K (e.g., 10) keeps outputs conservative and factual. A high top-K opens up more vocabulary, which is useful for creative or explanatory writing.

Top-P (nucleus sampling) selects the smallest set of tokens whose cumulative probability exceeds P. A value of 0.95 is a good general starting point — it allows some creativity without going off the rails.

Output length (max tokens) limits how many tokens the model generates. This matters especially in geospatial workflows where you might be prompting for structured JSON output: if the token limit is too low, the JSON gets truncated and becomes unparseable. Always set your token limit generously when requesting structured outputs, and consider using a json-repair library in production applications to handle truncation gracefully.

Recommended system configurations for geospatial tasks:

Core Prompting Techniques

Zero-Shot Prompting

A zero-shot prompt gives the model a task description with no examples. It relies entirely on the model's training data to infer the desired output format and content.

Zero-shot works well for straightforward Earth Engine tasks that are well-represented in the model's training corpus:

Write a Google Earth Engine JavaScript function that filters an ImageCollection by a date range and a bounding box geometry, computes the mean image, and clips it to the geometry.

This is clean, specific, and gives the model enough context to produce working code. Notice the use of an action verb ("Write") and concrete specifications (JavaScript, date range, bounding box, mean, clip) — all signals that help the model predict the right sequence of tokens.

One-Shot and Few-Shot Prompting

Few-shot prompting provides the model with example input-output pairs before presenting the actual task. It is the single most powerful technique available to a prompt engineer. By showing the model what you want rather than just describing it, you anchor its predictions to a concrete pattern.

For geospatial work, few-shot prompting is especially valuable when you need the model to produce structured outputs like GeoJSON, Earth Engine feature dictionaries, or standardized API call formats.
Let's Few-shot prompting example to parse a region description into Earth Engine geometry JSON

`Parse a natural language region description into a valid Earth Engine 
geometry configuration JSON.

EXAMPLE 1:
Input: "A rectangle covering Nairobi, Kenya"
Output:
{
  "type": "Rectangle",
  "coordinates": [36.65, -1.5, 37.1, -1.1],
  "projection": "EPSG:4326",
  "region_name": "Nairobi"
}

EXAMPLE 2:
Input: "A point at the center of Lake Victoria"
Output:
{
  "type": "Point",
  "coordinates": [33.0, -1.0],
  "projection": "EPSG:4326",
  "region_name": "Lake Victoria Center"
}

Now parse my task:
Input: "A bounding box covering the Mount Kenya forest reserve"
Output:`

The model now understands both the desired JSON schema and the coordinate conventions you are using. Without the examples, it might produce valid JSON, but with an inconsistent structure that breaks your downstream parser.

Key rules for few-shot examples in geospatial contexts:

Use 3–6 examples as a starting point; complex spatial logic may require more
Include edge cases (e.g., multi-polygon geometries, cross-dateline bounding boxes)
Keep examples diverse — don't show only point geometries if your system will also receive polygons
For classification tasks (e.g., land cover labeling), mix up the classes across examples to prevent the model from memorizing a sequence rather than learning the pattern

System, Role, and Contextual Prompting

These three techniques work together to define the "frame" of any prompt. They are distinct but complementary:
System prompting sets the overarching task and output requirements. It tells the model what it should produce. In geospatial applications, system prompts are most useful for enforcing output formats and domain constraints:

You are a geospatial data extraction assistant. Always return output as valid JSON conforming to the Earth Engine FeatureCollection schema. Never include explanatory text outside the JSON block.

Role prompting assigns the model a persona that shapes its tone, vocabulary, and depth of expertise. This is particularly powerful when you need domain-specific reasoning:

I want you to act as a senior remote sensing scientist with expertise in Google Earth Engine. I will describe a geospatial analysis task, and you will suggest the most appropriate Earth Engine ImageCollection, bands and reducers to use.
My task: "Map monthly surface water extent changes across the Tana River basin in Kenya for 2023."

By assigning the role of a remote sensing scientist, you are telling the model to draw on satellite-specific knowledge (sensor selection, band characteristics, cloud masking approaches) rather than giving a generic coding answer.

Contextual prompting provides task-specific background that the model needs to reason correctly. This is essential when working with non-standard datasets, proprietary schemas, or domain-specific workflows:

Context: You are helping develop a Kenya county-level energy demand forecasting system. The system uses Google Earth Engine to extract daily 2m temperature forecasts from the WeatherNext 2.0 model (projects/gcp-public-data-weathernext/assets/weathernext_2_0_0), filters by ensemble_member '8' and forecast_hour 6, and spatially aggregates over county boundaries using ee.Reducer.mean() at 1km scale.

Task: Write the Earth Engine Python API code to perform this extraction for April 2025 and return the results as a Pandas DataFrame with columns: date, county_name, mean_temp_kelvin.

Without the contextual prompt, the model would need to guess which WeatherNext asset path to use, which ensemble member, and what spatial scale to apply. With it, the output is precise and ready to integrate into a production pipeline.

Combining all three — a production-grade geospatial prompt:

SYSTEM: You are a geospatial data engineer. Return all outputs as valid Python code using the Earth Engine Python API. Include inline comments explaining each step. Do not include any explanatory prose outside the code block.
ROLE: Act as an expert in satellite-based climate data extraction with deep knowledge of the Earth Engine data catalog.

CONTEXT: We are building a county-level temperature monitoring system for Kenya. County boundaries are stored in a FeatureCollection at 'projects/my-project/assets/kenya_counties'. The target dataset is ERA5-Land Daily Aggregates (ECMWF/ERA5_LAND/DAILY_AGGR).

TASK: Write a function that accepts a year and a month as parameters, extracts the mean daily 2m air temperature for each county, and returns a FeatureCollection where each feature contains the county name, date, and mean temperature in Celsius.

Advanced Reasoning Techniques

Chain of Thought (CoT)

Chain of Thought prompting instructs the model to generate intermediate reasoning steps before arriving at a final answer. Adding a phrase like "Let's think step by step" or "Explain your reasoning before writing the code" dramatically improves accuracy on complex tasks — particularly those involving spatial logic, coordinate transformations, or multi-step analysis design.

CoT is especially valuable in geospatial contexts because many tasks involve non-obvious dependencies: you need to cloud-mask before compositing, reproject before reducing, and clip after computing — not before. The model is more likely to get the order right if it reasons through the steps explicitly.

Without CoT:
Prompt: Write Earth Engine code to compute the annual mean NDVI for Kenya using Landsat 8 SR for 2023.

Output: [Code that composites without cloud masking, producing artifacts from cloud-contaminated pixels]

With CoT:
Prompt: Write Earth Engine code to compute the annual mean NDVI for Kenya using Landsat 8 SR for 2023. Let's think step by step about what preprocessing is needed before computing the composite.

Output:
Step 1: Load the USGS Landsat 8 Level 2 SR Collection 2 ImageCollection Step 2: Filter by the Kenya bounding box and 2023 date range Step 3: Apply the QA_PIXEL cloud masking function to remove clouds and cloud shadows Step 4: Scale the SR bands using the Landsat Collection 2 scale factors (multiply by 0.0000275, add -0.2) Step 5: Compute NDVI = (NIR - Red) / (NIR + Red) using B5 and B4 Step 6: Reduce the masked, scaled collection to a mean composite Step 7: Clip to the Kenya boundary

[Correct, well-structured code follows]

Important CoT configuration rule: Set temperature to 0 when using Chain of Thought. CoT relies on greedy, deterministic reasoning — you want the model to follow the single most logical path, not explore random alternatives.

For even stronger results, combine CoT with few-shot examples. Show the model one complete worked example with intermediate steps before presenting your actual task:

# Earth Engine CoT few-shot example structure

Q: Compute monthly mean LST for a given county. 
   Show your reasoning steps.
A: 
  Step 1: Select MODIS LST collection (MOD11A1) - 1km daily
  Step 2: Filter to county geometry using filterBounds()
  Step 3: Filter to target month using filterDate()
  Step 4: Convert LST from Kelvin (scale factor 0.02) to Celsius
  Step 5: Reduce to monthly mean using .mean()
  Step 6: Extract spatial mean over county using reduceRegion()
  Result: [Correct code]

Q: Now compute annual mean precipitation for the same county 
   using CHIRPS daily data.
A: [Model now reasons through the steps correctly]

Step-Back Prompting

Step-back prompting asks the model to first answer a general, higher-level question related to your task, then uses that answer as context for the specific request. This "step back" activates broader background knowledge before the model tackles the specific problem — producing more accurate and grounded outputs.

In geospatial work, step-back prompting is useful when your specific task depends on domain principles the model might not surface automatically.

Step 1 — Step-back prompt (general principles):
Based on best practices in remote sensing, what are the key preprocessing steps and quality considerations when using MODIS surface reflectance data (MOD09GA) for vegetation analysis in semi-arid regions of East Africa?

The model responds with: cloud masking via state_1km band, solar zenith angle corrections, handling of high aerosol flags, seasonal compositing to reduce noise, and the specific challenges of mixed savanna/agricultural pixels.
Step 2 — Use the step-back answer as context for your specific task:
Context: Best practices for MODIS MOD09GA vegetation analysis in East Africa includes: [paste step-back answer here]

Using these considerations, write an Earth Engine Python API script that computes a seasonal NDVI composite for the Kenyan highlands (April-June long rains season) for 2020-2024, applying appropriate quality masking.

The resulting code will be more robust — it will include the quality flag masking, handle the aerosol flags, and use appropriate compositing logic — because the model was primed with the domain principles before generating code.

Self-Consistency
Self-consistency runs the same prompt multiple times at a higher temperature and selects the most common answer across runs. It is particularly useful for classification tasks or analysis decisions where you want a more reliable answer than a single run provides.

In a geospatial context, self-consistency is valuable when you are asking the model to recommend an analysis approach, classify a land cover type, or diagnose a bug — tasks where the model might give different answers on different runs.

Practical approach for Earth Engine development:

# Pseudo-code for self-consistency workflow

prompt = """
I have an Earth Engine ImageCollection with 120 Sentinel-2 images 
from 2023 over a wetland area. I need to detect seasonal flooding extent. 
Classify this task and recommend the single best spectral index to use.
Return only: INDEX_NAME: [name], REASON: [one sentence]
"""

# Run the same prompt 5 times at temperature 0.7
responses = [call_llm(prompt, temperature=0.7) for _ in range(5)]

# Tally the most common recommended index
# If 4/5 runs recommend MNDWI and 1 recommends NDWI, 
# use MNDWI with high confidence

This approach gives you a pseudo-probability of correctness rather than a single potentially unreliable answer. The cost is higher (5× the API calls), but for critical design decisions in a production pipeline, the reliability gain is worth it.

Tree of Thoughts (ToT)

Tree of Thoughts generalizes Chain of Thought by allowing the model to explore multiple reasoning branches simultaneously rather than following a single linear chain. Each intermediate "thought" node can branch into several possible continuations, and the model evaluates and prunes these branches to find the best path.

ToT is suited to complex geospatial design problems where multiple valid approaches exist, and you want the model to evaluate trade-offs rather than commit to the first path it finds.

Example application — choosing a spatial interpolation approach:
I need to interpolate sparse weather station temperature data across Kenya (47 stations) to a 1km grid for input into an energy demand model.

Using a Tree of Thoughts approach, explore three different interpolation strategies. For each strategy, consider: (1) data requirements, (2) computational complexity in Earth Engine, (3) accuracy for East African topography, and (4) limitations. Then recommend the best approach with justification.

The model will branch into Kriging, IDW, and co-kriging with elevation as a covariate — evaluate each along the four dimensions — and converge on a recommendation. This is significantly more useful than a direct prompt asking "what interpolation method should I use?"

ReAct (Reason and Act)

ReAct combines reasoning with external tool use in a continuous thought-action-observation loop. The model reasons about what information it needs, takes an action (querying a tool, running code, searching the web), observes the result, and updates its reasoning. This loop continues until the task is complete.

In geospatial applications, ReAct is the foundation for building Earth Engine agent workflows — systems where an LLM can autonomously query datasets, execute analysis scripts, interpret outputs, and iterate.

A conceptual ReAct loop for geospatial data discovery:
Task: Find the most appropriate Earth Engine dataset for mapping monthly soil moisture anomalies in the Horn of Africa at 10km resolution.

Thought: I need to identify satellite-derived soil moisture products available in the Earth Engine catalog. I should search for both active radar and passive microwave options.
Action: Search Earth Engine Data Catalog for "soil moisture" Observation: Results include SMAP (L3, L4), ESA CCI, and Sentinel-1 SAR-based products

Thought: SMAP L4 provides 9km resolution daily data. ESA CCI is a multi-sensor merged product at ~25km. Sentinel-1 requires processing. For 10km monthly at Horn of Africa scale, SMAP/SPL4SMGP is the best fit.

Action: Retrieve SMAP/SPL4SMGP asset metadata from Earth Engine Observation: Available from 2015-03-31, 3-hourly, 9km, sm_surface and sm_rootzone bands

Final Answer: Use SMAP/SPL4SMGP (SMAP L4 Global 3-hourly 9km Surface and Rootzone Soil Moisture). Filter to monthly composites using .mean(), select 'sm_surface' band, and aggregate to anomalies by subtracting a multi-year monthly baseline.

Implementing ReAct in code requires a framework like LangChain with Earth Engine tool integrations, but even without automation, applying the ReAct reasoning pattern manually in your prompts produces more systematic and accurate outputs than direct questioning.

Automatic Prompt Engineering (APE)

Automatic Prompt Engineering technique uses an LLM to generate and evaluate multiple prompt variants for a given task — effectively automating the prompt iteration process. You prompt the model to generate 10 different phrasings of a task instruction, evaluate them against a metric (such as whether the generated code runs without errors), and select the best performer.

For Earth Engine developers, this is useful when building reusable prompt templates for applications that will handle many different spatial queries:

We are building a natural language interface for Earth Engine. Users will ask questions about satellite data. Generate 10 different ways a user might ask to "extract monthly mean temperature for a specific county in Kenya from the ERA5-Land dataset."

Keep the semantics identical but vary phrasing, formality, and specificity.

The model generates variants ranging from technical ("Extract spatially aggregated monthly mean 2m air temperature from ECMWF/ERA5_LAND/MONTHLY_AGGR for Nairobi County") to conversational ("What was the average temperature in Nairobi last month?"). You can then evaluate which phrasing most reliably triggers correct Earth Engine code generation and use that as your canonical prompt template.

Code Prompting for Earth Engine
LLMs are highly effective at every stage of the geospatial developer workflow. Here is how to prompt effectively for each task type.

Writing Earth Engine Code
Be specific about the language (Python API vs. JavaScript Code Editor), the dataset asset path, the spatial and temporal parameters, and the desired output type. Action verbs and concrete specifications are your best tools:

Write a Google Earth Engine Python API function that:
- Accepts a county name and a year as parameters
- Loads the CHIRPS Daily precipitation dataset (UCSB-CHG/CHIRPS/DAILY)
- Filters to the specified year and the county's geometry
- Computes daily precipitation totals in mm
- Returns a Pandas DataFrame with columns: date, county, precip_mm
- Include error handling for counties not found in the FeatureCollection

Explaining Earth Engine Code
Paste the code and ask for a step-by-step explanation. This is particularly useful when inheriting legacy scripts or working through complex reducer chains:

Explain the following Earth Engine Python code step by step, 
focusing on what each spatial operation does and why the 
operations are ordered in this sequence:

[paste code here]

Translating Between JavaScript and Python API
The Earth Engine JavaScript Code Editor and Python API have different syntax conventions. LLMs handle this translation reliably:

Translate the following Earth Engine JavaScript code to the 
Python API. Maintain all variable names and add type hints 
where appropriate:

[paste JavaScript code here]

Debugging Earth Engine Code
Always include the full error traceback along with the code. Ask the model to both fix the error and explain any other improvements it identifies:

The following Earth Engine Python code raises this error:

EEException: Image.reduceRegion: The geometry is unbounded 
(contains the full globe). Please specify a geometry 
with a bounded extent.

Traceback: [paste traceback]

Code: [paste code]

Debug the error and explain any other issues you notice 
in the code structure.

Structured Output and JSON in Geospatial Workflows

Structured output is one of the most important techniques for production geospatial systems. When you instruct an LLM to return JSON, you force it to organize its output into a schema — which reduces hallucinations, makes the output programmatically parseable, and integrates cleanly with downstream Earth Engine code.

System prompt for structured geospatial feature extraction:

Classify the following satellite scene description and return 
valid JSON conforming to this schema:

SCHEMA:
{
  "scene_id": "string",
  "satellite": "string",
  "acquisition_date": "YYYY-MM-DD",
  "cloud_cover_pct": number,
  "usable_for_analysis": boolean,
  "recommended_bands": ["string"],
  "notes": "string"
}

Scene description: "Sentinel-2 L2A image acquired over Mombasa 
on 2024-03-15, approximately 35% cloud cover concentrated over 
the ocean, inland areas clear, bands B2-B8A available."

JSON Response:

The model returns the code below:

{
  "scene_id": "S2_L2A_Mombasa_20240315",
  "satellite": "Sentinel-2",
  "acquisition_date": "2024-03-15",
  "cloud_cover_pct": 35,
  "usable_for_analysis": true,
  "recommended_bands": ["B2", "B3", "B4", "B8", "B8A", "B11", "B12"],
  "notes": "Cloud cover concentrated over ocean; inland areas suitable for land analysis"
}

Using JSON Schemas for Earth Engine asset inputs:

When working with large volumes of spatial data, provide a JSON Schema as a blueprint so the model knows exactly what structure to expect:

{
  "type": "object",
  "properties": {
    "asset_path": { "type": "string", "description": "Earth Engine asset path" },
    "geometry_type": { "type": "string", "enum": ["Point", "Rectangle", "Polygon"] },
    "coordinates": { "type": "array", "description": "Coordinate array in WGS84" },
    "date_range": {
      "type": "object",
      "properties": {
        "start": { "type": "string", "format": "date" },
        "end": { "type": "string", "format": "date" }
      }
    },
    "target_bands": { "type": "array", "items": { "type": "string" } }
  }
}

Important production note: JSON outputs from LLMs can be truncated if the response hits the token limit, resulting in missing closing braces that make the JSON invalid. Always use a json-repair library in production applications to handle this gracefully:

from json_repair import repair_json
import json

raw_output = llm_response  # Potentially truncated JSON string
repaired = repair_json(raw_output)
parsed = json.loads(repaired)

Best Practices for Geospatial Prompt Engineering

Use Instructions Over Constraints

Tell the model what to do rather than what not to do. Positive instructions are clearer and produce better results.

Weak (constraint-based):
Write Earth Engine code to compute NDVI. Do not use JavaScript. Do not use deprecated APIs. Do not include print statements.
Strong (instruction-based):
Write Earth Engine Python API code (using the ee library) to compute
NDVI. Use modern Collection 2 Landsat SR bands. Return the result as
a clipped ee.Image object.

Use Variables to Make Prompts Reusable
In production Earth Engine applications, parameterize your prompts with variables rather than hardcoding specific values. This turns a one-off prompt into a reusable template:

VARIABLES:
{dataset} = "ECMWF/ERA5_LAND/HOURLY"
{variable} = "temperature_2m"
{region} = "Nairobi County"
{start_date} = "2024-01-01"
{end_date} = "2024-12-31"

PROMPT:
Write an Earth Engine Python function to extract daily mean {variable} 
from {dataset} for {region} between {start_date} and {end_date}. 
Return results as a Pandas DataFrame.

Be Specific About Output Format and Length
Specify exactly what you need: the programming language, the return type, whether you want inline comments, and the approximate length of the output.

Generate a 3-function Python module for Earth Engine that:
-Authenticates and initializes the EE session
-Extracts temperature data for a given geometry and date range
-Exports results to Google Drive as a CSV
Each function should have a docstring. Target 50-80 lines total.

CoT Best Practices for Geospatial Reasoning

When applying Chain of Thought to Earth Engine tasks, always place the reasoning before the final code output. The model's intermediate reasoning changes the token context that shapes its final prediction, so the order matters.

For CoT, always set the temperature on Google AI Studio to 0. Spatial reasoning tasks generally have one correct answer (the preprocessing steps for Landsat SR are not a matter of opinion), and you want the model to follow the most logical deterministic path.

For self-consistency combined with CoT, build your prompt so the final answer is clearly delimited from the reasoning — this makes it straightforward to extract and compare answers across multiple runs.

In summary, prompt engineering for geospatial applications is both a science and a craft. The underlying mechanics are consistent — LLMs are token prediction engines, and well-structured prompts produce better predictions — but the application of these mechanics to satellite data, spatial APIs, and structured geospatial formats requires deliberate technique.

The progression is straightforward: start with zero-shot for simple, well-defined tasks. Add examples (few-shot) when you need a consistent output structure. Use system, role, and contextual prompting to frame the model's expertise and enforce output contracts. Apply Chain of Thought when the task involves multi-step reasoning or ordered operations. Use ReAct patterns when building agent workflows that need to interact with Earth Engine APIs or external data sources. And always enforce JSON schemas when your outputs need to integrate with programmatic systems.

Earth Engine's server-side computation model, its rich data catalog, and its Python API make it one of the most powerful geospatial platforms available. Pairing it with disciplined prompt engineering — structured inputs, verified outputs, documented iterations — makes it possible to build geospatial AI systems that are not just powerful, but reproducible, maintainable, and production-ready.

Acknowledgement
This article was developed by blending core methodologies from the Prompt Engineering Guide by Lee Boonstra. The framework has been adapted and structured specifically for production-level cloud geospatial workflows in Earth Engine by Nicholas Musau, a Google Developer Expert (GDE) for Earth Engine and Lead of the Google Earth Engine Developer Community Nairobi.

By applying these tailored runtime configurations and prompt structures within Google AI Studio, geospatial analysts can seamlessly bridge the gap between traditional remote sensing workflows and next-generation Agentic GIS solutions.

DEV Community