
Hey DEV community! ๐
We're the team behind Google AI Studio and the Gemini API at Google DeepMind.
We'll be answering your questions live on Aug...
For further actions, you may consider blocking this person and/or reporting abuse
Curious if there are upcoming releases for Gemini CLI. In my tests itโs excellent at whole-repo analysis and strategy, but it often stumbles in execution (tools break and it loops).
Are any major releases planned? What kind, and on what timeline?
And will there be multi-agent support?
Hey there! Am glad to hear that you've been using and loving the Gemini CLI (us, too! ๐).
This update is via the Google Cloud folks who are building out the CLI:
Please how do I get to work for Google ๐.
Vector Databases and VR Question:
Do you forsee AI/vector databases being gamified in such a way that we can throw on a VR headset and 'swim' through the vector database, so to speak, sort of as a fun way to explore and retreieve data?
I'd like to try that, sounds fun.
Thanks.
I love the idea of being "immersed" in your data, and to use 3D space as a path to spot unexpected relationships in datasets! In addition to the recommendations from other folks on this thread, you might also be interested in checking out the Embeddings Projector, as a fun way to view and manipulate data in 3D space.
Please how do I get to work for Google ๐.
Thats awesome!
Are these mostly used for demo or are they useful for practitioners?
Wow
Excellent idea!
But the main challenge is how to display 512+ dimensions of embeddings in 3D VR space?
Perhaps through interactive projections or using additional channels (color, sound, vibration).
Hi Prema, thanks for your response.
Im assuming it would be approached by taking the overall (x,y,z) of each individual vector and assign it some set volume in space, with some padding, so a user could navigate through the 'cracks'.
It would essentially be like swimming through a gas, but the molecules are ginormous so they are visible to the user... big enough that a user could select each one to see the details.
But small enough to sneak by each one as they gently nudge out of the way but return back to there normal position.
I think this could be done several ways. In my expereince tools that come to mind right away are blender and three.js! Haha.
Could even have a temperature map overlay, so a user could 'jump in' and explore search results based on their custom query and see how closely they are related. Or perhaps a pattern overly, to be accommodating for more users?
You know what. This would be really awesome for music exploration.
Former game dev here. Blender is 3d modeling software, not really ideal for your use case. I just wanted to say that if you have a big idea like this, your often better off to try to make it yourself.
There are a couple of game engines that are free to use such as Unreal and Unity that provide VR support as well as plenty of online resources.
I would recommend Unity for this due to a combination of community support regarding tutorials, and it using C# as it's primary coding language. Most AI is pretty good at writing C# scripts (As long as you keep them modular) so you don't need to be a master programmer.
You might even enjoy learning how to use the game engine. In regards to visuals, you would also want to learn Blender for the 3D assets.
I don't foresee Google making anything like this as it's very niche and they prefer broad strokes, not to mention they had a pretty massive failure in the game industry and likely aren't looking to try again (Stadia).
Thanks for your wide-lensed feedback. I have used unreal engine a bit but not to any major extent. Any reason you would use unity over unreal for something like this? Based on your answer, sounds like my orignial question is at the very least, possible.
I can sort of picture your concept in my head. Unreal is a lot more complex, for me at least, in regards to setting up a system like that cause your options are their visual blueprints or C++ and the engine itself is pretty heavy on resources. Unity is lighter and i think scripting a system like that would be much easier in C# as long as you can optimize it.
You could probably just instantiate new nodes as your going along and cull anything thats out of view. Since its VR its going to be a bit heaftier to run so the smaller engine would likely be more stable for the average person :)
My discord is on my profile if you want to discuss it more over there. I don't really have any other socials lol
When will it be possible to vibe code with Google Apps script? Thanks
Thanks for the question!
You can already use the Gemini APIs and Gemini in AI Studio to generate Apps Script code, which you can then pull into Google Workspace products (like Sheets). The Google Cloud team also has a few codelabs showing how to use the Gemini APIs with Apps Script (example).
Can we have a virtual hackathon solely focused on building AI apps in ai.studio?
I love that! Weโre planning to run more hackathons later this year and I'll make sure to forward that idea!
Yes!
On DEV!
I wish to work for Google as a C++ developer scraplinkecomarket.netlify.app
My work with html and css and js.
Stay tuned here - may or may not have something coming soon!
I really have keen interest in drug development and personalized medicine using AI. My master's thesis was on find suitable drug candidates for PSP using graph neural networking and other AI techniques. I did use DeepMind's Alphafold2 also in it. I learnt everything by myself for it through online resources. But I feel overwhelmed with the vast number of online resources, and they are not that helpful to make a proper plan with tangible result to get better in the domain. So, if I want to one day work in DeepMind and be part of novel drug discovery, what are the steps I need to take?
Itโs great to hear that youโre interested in AI for drug discovery! Google DeepMind, Isomorphic Labs, and our colleagues in Google Research are all investing very heavily in AI for health and the medical domain.
The skill sets that you would need would depend on the role that you would be interested in taking - for example, engineering, product, research, marketing, and more are all role profiles that we hire for in our AI for health orgs. For each of those focus areas, I would recommend that you continue building your expertise in AI and in the medical / life sciences, and make sure to share your work visibly - either via GitHub for open-source and software projects, or by publishing the research that you've been pursuing.
I'd also recommend building on or evaluating some of the open models that Google has released in the healthcare space, like TxGemma and MedGemma. Good luck, and am looking forward to seeing what you build!
I wish to work for Google as a C++ Developer ๐.
I am an AI engineer by profession. So any specific guidelines I can follow to attain a position at Deep Mind in the Drug Research group? To attain an interview call, or what all should I prepare etc.
What would it take to intern as a devrel for DeepMind?
We regularly have engineering and product internship roles available in Google and at Google DeepMind! I recommend checking out our careers pages, and searching for "internship".
If youโre interested in a career as a developer relations engineer, I would recommend building in the open - contributing to open-source projects, sharing your work publicly (on social media, and on GitHub) and investing in supporting your local and online developer communities. Many DevRel folks start their careers as software engineers, and then gradually move to a more community-facing role.
On this subject, how do you think the idea of internships will evolve in the future? There's so much written about how AI is particularly affecting entry-level jobs. What do you think needs to change for employers to be able to best support this kind of work?
How does 'Search-grounded' mode work under the hoodโare citations confidence-weighted and deduplicated? Can we constrain freshness windows, force certain domains, or provide our own corpus for grounding?
The secret sauce is the same as Google Search because the tool relies on the Google Search Index. Currently, the groundingMetadata does not expose a direct confidence score for each citation. The presence of a citation indicates the model found that source relevant for generating a specific part of the response. In terms of deduping, the system generally attempts to provide unique and relevant sources. While you might see citations from different pages on the same domain if they each contribute distinct information, the goal is to provide a concise set of the most useful sources rather than a long list of redundant links.
For bring your own search scenarios, try using function calling with RAG flows
In terms of working under the hood, the first thing the tool will do is analyze your query. For example, a prompt like "Who won the F1 race last weekend?" will trigger a search, while "Write a poem about the ocean" likely won't. The model then formulates one or more search queries based on your prompt to find the most relevant information from the Google Search Index. The most relevant snippets and information from the search results are fed into the model's context window along with your prompt. The model uses this retrieved information as its source of truth to generate a "grounded" response. The API returns the response along with groundingMetadata. This metadata includes the source URLs for the information used, to build citation links back to the original content for verification.
We are working on a filter to constrain to date ranges. You cannot force certain domains (use URL Context for that), but you can exclude some domains from search. The โBring your own searchโ option is available through Vertex.
How influenced are you by the work done from other companies (i.e. OpenAI releasing GPT-5 recently etc)
It's always inspiring to see the recent surge in AI development โ both in the modeling and product space! ๐
At Google, we ensure many different closed (ex: Anthropic) and open models are available to our customers on Google Cloud via the Vertex AI Model Garden. We also support many of the research labs via both our open machine learning frameworks (JAX) and hardware (TPUs and GPUs) for training on GCP, and have been excited to see many startups and enterprises adopt the Gemini and Gemma models.
Our DevX team has also been hard at work adding or improving support for the Gemini APIs and Gemma models into developer tools (like Roo Code, Cline, Cursor, Windsurf, etc.) and frameworks (LangGraph, n8n, Unsloth, etc.). More to come, we all go further when we're working together as one community.
What advice do you have for someone who is considering signing up to a CS Bootcamp vs. going all-in on building with AI tools?
Great question, and I know a lot of folks have this top-of-mind. ๐๐ป
For programs like a CS Bootcamp or attending a university, I'd say the biggest value that you're really getting is the in-person community. Many educational structures are still catching up to state-of-the-art in AI and in building product-grade software systems, so the coursework you'd be completing might not be aligned with the latest model and product releases - and those features / models are changing minimally weekly, if not daily, which makes it a challenge for educators to keep their curriculum up-to-the-minute.
To build up expertise and the skill set for working with AI systems, I would strongly suggest to just start building: find a problem that really bugs you, use AI to automate it, and then share your work visibly externally -- via GitHub and social media. This is a really useful way to get product feedback, and to get inspired! There are also frequently AI hackathons happening, either in-person or online (ex: the Major League Hacking events list and DevPost are great places to look).
You can also check out DEV Challenges ๐
Hi. I canโt turn my Gemini-powered voice assistant into a real product because the input price for audio on Gemini 2.5 Flash is $1.00. Any plans to make audio input pricing more like text input? (My assistant also pays for Google Cloud TTS). Who should I contact for this and to request higher usage limits?
While we don't have any immediate plans to reduce the audio input price, we're always looking for ways to make our models more accessible.
Feedback like yours directly influences our future plans, so thank you for raising this.
For higher usage limits, the best next step is to fill out the request form here: docs.google.com/forms/d/e/1FAIpQLS...
What was it like at Google when Chatgpt launched?
Are there first-class APIs for plan-and-execute (task graphs, subgoals, retries) and multi-agent coordination? How do you sandbox tool scopes per agent and persist/restore agent state across sessions?
For complex AI agents, Google's Agent Development Kit (ADK) offers first-class APIs for advanced features.
Is Imagen legacy at this point?
We do not consider the Imagen model family to be legacy. Imagen is still a specialized model that is lower latency, has different pricing options and is recommended for photorealistic images, sharper clarity, improved spelling and typography. You can use Google AI Studio to play with both Imagen and Gemini 2.5 Flash Image models and compare the results for your specific use case.
Iโve prepped for the GenAI Leaders exam, taken the classes, studied the docs โ I want to believe Gemini has the answers. But so far, Iโm just not seeing it.
Iโve been an early adopter of Google products forever, and until Gemini, Iโd never turned off a Google beta. This one? I hated it. When the model showed up in Copilot, it was barely functional for weeks. Fine, growing pains. But just this past weekend I gave it another shot and spent nearly two hours training and testing a Gemini Gem. It was underwhelming at best.
Hereโs where Iโm stuck: GitHub has the developers and Microsoft has the business. OpenAI has the public (and plenty of devs). Claude has its own cult following because itโs that trustworthy. Even Kiro, out of nowhere, has the latest in planning and developer flow.
Iโve read all the promises about making Gemini accessible to millions of developers. Accessibility is great, sure โ but my toaster is accessible, and that doesnโt make me crave toast. ๐
So my questions are:
Thank you for the feedback! ๐
The Gemini App team is a consumer product, working hard on things like personalization, integrations with Google Workspace and Google Maps, Gems for workflow automations, and more. They have also been investing in deep integrations with Google Search, like AI Mode - and built-in citations for its responses, grounded in up-to-date results. Subscribers for Gemini Pro and Gemini Ultra also get premium access to some of Google Labs' exciting new products, like Jules, Flow, and Whisk.
These use cases for the Gemini App are more focused on consumers, while the Gemini APIs and AI Studio (where the folks on this AMA sit! ๐) are more focused on developer and information worker use cases.
Thanks for the detailed breakdown! Sorry I couldnโt catch yโall live.
To clarify, I wasnโt asking about Gemini the app โ I meant Gemini the model. Do you think it can establish itself as a recognizable brand the way other companies have with their solutions? If youโre still game to answer, we can definitely frame it around developers as the main user group, since that seems most relevant to your day-to-day. ๐
I have a Bachelors degree in Computer science, it's been almost 5 years, I graduated during start of covid pandemic. I haven't had a "real" job working at tech company YET(I hope). I've just been doing freelance Web Development work since then, my concern and question is, do you think I still have some hope getting in the tech industry? it's been a bit scary especially now with all the AI surge and hearing of job losses... I could use some motivation, though I won't give up! hehe. Many thanks!
(this took some courage for me to open up about)
Hi there, thank you so much for your courage in sharing this, it's a completely understandable feeling, especially with how chaotic the market feels right now. But I do think that with AI generated code, the demand for writing software is only increasing, and we will always need engineers who can solve real problems, architect systems, and guide those AI tools. Companies definitely want to see that you can leverage AI to ship fast now, so you're in a great position to learn and showcase that. I would fully embrace your freelance journey as your unique strength, keep building, share your projects in public, and keep applying :)
Thank you so much for taking the time to reply and to give me that hope! I will remember you, google and dev.to co when I achieve my goals ! :D Good things are coming your way.
With 2M-token contexts, what are the semantics of memory garbage-collection (pinning, TTLs, eviction heuristics)? Can callers attach per-segment priorities and get visibility into which chunks were actually attended to?
I have been using gemini api for all my personal projects , thanks to your generous free teir it helped to get from my idea to app .
So what is the next plans of Google Ai studio , like recently I have seen to many changes Idx editor to Firebase studio and now AI studio . Is there any plans to create a platform to tweak model itself something like that , to make our usecase more specific may be ??
Thanks for testing out AI Studio and the Gemini APIs - especially the Build feature in AIS!
As you've probably seen first hand, AI Studio's Build feature gives you the ability to create and deploy apps, quickly and securely, via deep Google Cloud platform product integrations like Cloud Run and Logging. More production-grade features are soon to come, please keep sending your requests!
If you'd like to fine-tune / customize the Gemini models, or smaller open models (like our Gemma 3 model family), please make sure to check out the fine-tuning capabilities in the Vertex AI Model Garden. For most Gemini 2.5 use cases, we've seen folks have success with composing the APIs with things like vector databases for retrieval and prompt engineering - no fine-tuning required. ๐
OpenAI also released open models, while large models from China seem to be performing better on benchmarks.
Where do you see the main focus going at your side? more small open models or still large close models
I also tested Gemma3-270M finetuning and production usage, its really useful for lots of usecases.
Am so glad to hear that you've been enjoying the Gemma open models, please keep the feedback coming! :)
Google DeepMind is significantly investing in both our Gemini models (available as APIs) and our Gemma open model family, the latest releases being Gemma 3 and Gemma 3n. We also have open models in other domains, such as much generation (ex: Magenta). Stay tuned for more releases, both closed and open-source.
Does the Gemini API support schema-constrained decoding (e.g., JSON Schema) with hard guarantees that the stream is always valid JSON? If the model deviates mid-stream, can we enable automatic repair/retry, and can we surface token-level error locations for debugging?
Yes, our structured output feature is designed for exactly this. It guarantees that the model's output will be valid JSON that conforms to the schema you provide.
You can check out the documentation for it here: ai.google.dev/gemini-api/docs/stru...
Regarding automatic repair and surfacing token-level errors, those specific capabilities aren't available just yet.
Are you able to say the direction you're going with working on those capabilities?
Do you think we are reaching the scaling limits for AI? And AGI is not possible with the current setup.
I'd like to add to this.
If scaling really is seemingly endless (per Zuckโs take), whatโs DeepMindโs contingency for when raw parameter growth stops giving useful returns? Do you already have a โpost-transformerโ plan?
How are non-text-based models like video/etc. similar and different from what I understand to be a text-based LLM?
That's a great question! It highlights a fascinating area of AI research. LLMs are focused on understanding and generating human-like text; other non-text-based models have distinct differences in how they process and represent information.
Hello and very interesting question. All the models are based on neural networks foundations and learn from being exposed to vast amounts of data with a key goal to learn meaningful representations of the input data. For LLM this might be a numerical vector that captures a meaning of a specific word or sentence. For an image model it might be a vector that describes objects, textures or scenes in the image. This allows them to successfully generate new content. Just like LLMs, these models are also generative - whether they generate images or videos or compose music. The concepts of pre-training and fine-tuning for a specific use case are also similar with these models.
The differences are data structure and modality, text for LLMs, and 2D images for image and video. LLM process tokenize words, while image or video models process pixels or frames. GenMedia models are more costly computationally. LLMs lean into semantic understanding while genmedia is more perceptual understanding of frames, patterns, etc. The future is MULTIMODAL
What type of developers besides academic AI specialists have roles in an AI product team?
good q! Beyond academic AI specialists & researchers, a successful AI product team is built by a variety of developers who operationalize the core model: Data Engineer, MLEs, Backend/Frontend/Fullstack, MLOps, Site Reliability Engineers, and also customer facing engineers/developer experience engineers.
If you go to our careers page and filter for engineering & research, you'll get a good idea :) deepmind.google/about/careers/?cat...
Do you find Google has explicitly changed many of their hiring practices in the advent of AI? Seems like a lot of the role titles I would expect are there, but are the expectations and pace etc different from when you were earlier in your career?
Who's the funniest person on the team?
I consider myself to be pretty funny, but do not know if that sentiment is shared across the team.

+1, am voting for Alisa! Bonus points for the Nano-Banana reference ๐คฃ
This is great
bahaha
Ha
How do you see Gemini being used in robotics?
Absolutely love this question, and think that robotics is one of the most exciting use cases for our Gemini Live models, as well as fine-tuned versions of Gemma. The TL;DR is that we're using the Gemini APIs for things like live conversations (real-time dialogue with robots); triggering tools and on-device models as function calls; and planning robotic actions.
You can learn more about the embodied intelligence work that the DeepMind robotics team is doing with the Gemini APIs and fine-tuned open models (like Gemma) at the links below:
When I think DeepMind I think of AlphaGo and other more research-y areas.
How does the organization balance "research" vs. "available for production" offerings ?
In addition to driving truly groundbreaking research (ex: AlphaEvolve, AlphaStar, AlphaGo, etc.), DeepMind has expanded its scope in the last ~year to also own many product experiences - the Gemini APIs, AI Studio, and the Gemini App (just to name a few).
We're also increasingly seeing the line between "research" and "product" to be blurred: many new model capabilities inspire products and features, and users' explorations can inspire new frontier model capabilities and improvements. As a person who has worked both in a pure-research org, and in pure-product orgs, this blend of the two makes a ton of sense, and leads to much better product experiences for all of our customers. ๐
Regarding a Veo API, and I apologise, I haven't followed this project much, but are you going to make it possible for people to be able to download and run this locally or integrate it with offline stacks at all?
Also, do you have any plans to make any sort of opensource text-to-speech models? I have yet to find something viable for my use case that doesn't sound likea cursed speak and spell or spend a fortune on tokens through something like ElevenLabs.
Hi! Great questions! Re: running Veo on prem. We don't have any concrete plans for that right now. Our priority is delivering a robust and scalable experience through the cloud-based Gemini API. That said, we are in contact with our on-prem teams to better understand the demand for this kind of solution - we know they are working hard to bring Flash and Pro to Google Distributed Cloud for example.
Re TTS: We definitely know that high quality open source TTS models are in demand and we do understand the need for them. I recommend keeping an eye on what the Gemma model team has been releasing, though we do not have concrete plans we can share at this time.
Thanks for the response! I did end up finding an open-source TTS that works offline, which fits my needs for now. That said, I completely understand the business need for subscriptions. The challenge is that when everyone goes monthly, users hit fatigueโmost of us end up picking which tools to keep and which ones to cut when finances get tight.
Personally, Iโd happily pay a one-time license fee (say ~$200 CAD) or a fair upgrade price down the road. Itโs a model a lot of us genuinely miss, and I think itโs a way to capture a user base that avoids ongoing subscriptions entirely. In my view, leaving that option off the table risks missing a meaningful segment of developers.
Will the Veo API expose keyframes, camera paths (dolly/orbit), shot length, and temporal consistency controls? Is video-in/video-out editing (masking, inpainting, style transfer) on the roadmap for programmatic workflows?
Yes, we are planning on additional controls for Veo 3, and camera controls have been most in demand. What are the controls important for your use cases?
For streaming vs non-streaming requests, what p50/p95 latencies should we plan for across Flash vs Pro-class models? Any difference between HTTP/2 vs gRPC endpoints, and do you recommend persistent connections for best tail latency?
Thank you for the question!
Google is constantly investing in improving the latency for our Gemini models, so the expected response time for your inference calls is expected to change. As a way to plan, I would recommend experimenting with different models, and taking a look at the model cards associated with each of our models - for example, Gemini 2.5 Flash (with Thinking Mode turned off) is much faster than Gemini 2.5 Flash (with Thinking) or Gemini 2.5 Pro, while still showing strong performance on model evals.
For especially latency-sensitive use cases, I would recommend testing out Gemini 2.5 Flash Lite, as well as some of our smaller, open on-device models (like Gemma 3 270M).
Will the Gemini CLI support a local emulator for tool/function schemas so we can unit-test prompts offline (goldens, snapshot diffs), and then replay the same transcripts against the cloud for parity checks?
This is an awesome idea! While we don't currently support this at the moment, weโre happy to consider this! Gemini CLI is fully open-source so if you file a feature request with a detailed explanation of how you would expect it to work, we can then prioritize it. If you are super keen you can even contribute the feature yourself :)
Is there server-side prompt+tool caching or response memoization we can opt into? If so, whatโs the cache key (model/version, tools, system prompt, attachments, etc.), TTL, and invalidation behavior after model updates?
With extended context enabled, what chunking/windowing strategies do you recommend to keep retrieval precise (e.g., hierarchical summaries, segment embeddings)? Any guidance on time-to-first-token and throughput impacts at 2M tokens?
Any plans for unified, modality-agnostic pricing (normalized 'token equivalents') and hard monthly spend caps with circuit breakers? For audio/video, can we pre-quote cost from media duration/shape before we run?
We definitely recognize the desire for a simpler, unified pricing metric. It's a key area of exploration for us. The current model prices each modality (text, image, audio) by its most direct computational unit, but we are actively investigating how to best abstract this into a more streamlined model for the future.
Monthly spend caps: While the Gemini API itself doesn't have a built-in spend cap, you can implement an effective circuit breaker today using the broader Google Cloud platform and setting up your billing budget.
Regarding pre-quoting costs, the pricing for audio and video is based on duration (e.g., per second or per minute). This means you can easily pre-calculate the cost. Before you make the API call, simply get the duration of your media file and multiply it by the rate listed on our pricing page. This allows you to build cost estimates directly into your application's workflow for full transparency. We are working on improving our billing dashboards to make things easier in the future though!
Can you talk about the role of AI Benchmarks in model development? Are companies intentionally (or unintentionally) designing the models to excel at these benchmarks even if that sacrifices overall usefulness in a broader context?
Our top priority is to design models that work well for usersโ real-life use cases. We track that "usefulness" through a variety of signals one of which is through various AI benchmarks (both created privately and public) so we track them to the extent they are useful in providing us with that signal and communicating it externally.
Can't really speculate on what other companies are optimizing for
Thank you!
Thank you
Is there a low-latency 'RT' API that supports audio-in/audio-out with barge-in, partial tool calls, and server events/WebRTC? What is the expected end-to-end latency budget for full duplex speech interactions?
The Live API is exactly what you're looking for. It's our bidirectional streaming API designed for real-time, conversational AI and is powered by our latest native audio models.
This means it handles audio-in and audio-out directly, and can even process real-time video.
As for latency, we're targeting an end-to-end budget of 700ms. We're aiming for this to keep the interactions feeling natural, much like the normal pauses in a human conversation. We'd love for you to give it a try!
Do you publish first-party eval suites and regression dashboards with per-release deltas for coding, reasoning, safety, and multimodal? Can we pin a model+patch level and receive advance deprecation notices with auto-evals on our private test sets?
Can developers tune safety thresholds per category (e.g., self-harm, medical, IP) and attach allow/deny lists or regex/DSL guardrails that run before/after model output? Any 'safe sampling' modes that reduce refusal rates without policy violations?
The content safety filters are available for developers and are turned off by default. On the Gemini API, we do not provide an ability to tune any other safety thresholds at the moment because they are directly tied to our usage policies.
For larger enterprise use cases, Vertex AI does offer an option to turn off some filters for trusted and vetted customers that we know will continue to adhere to safety policies. Customers can usually apply using a form and there is a committee that reviews all requests and a team that manages allowlisting.
By and large, how does your team think about AI slop, dead internet theory, misinformation, etc.
Can we enforce zero-retention and region-locked processing (e.g., EU-only) per project? What compliance envelopes (SOC 2, ISO, HIPAA-adjacent) are supported, and how do we audit that our traffic stayed in-region?
The Gemini API does not offer this yet, but for region-locking and zero-retention you can use Vertex. If you build with our GenAI SDKs you can easily migrate.
Some helpful links:
Imagine a wildfire response team with spotty connectivity using body-cams and drones. They need offline scene understanding (smoke/plume segmentation), on-device speech translation for evacuees, and syncing to cloud when a satellite link appears. What would a Gemini-based architecture look like (model choices, on-device vs cloud split, failover), and what reliability/validation metrics would you commit to for life-critical use?
Do you offer multilingual embeddings with consistent cross-lingual alignment? What dimensions/similarity metrics are recommended, and how stable are vector spaces across model updates (backward compatibility guarantees)?
Do you offer multilingual embeddings with consistent cross-lingual alignment?
Yes. You may find some public evals in the tech report and the blog post:
What dimensions/similarity metrics are recommended?
We report evals and different dimensions in our tech report (see link above), and we use dot product.
How stable are vector spaces across model updates (backward compatibility guarantees)?
Vector spaces are not backward compatible.
Can we set a seed that yields deterministic outputs across regions and hardware targets? Are logprobs/logits available for all models, and do you publish drift/change logs so we can reproduce results after minor model revisions?
The Gemini API supports a seed and logprobs, you'll find more info on the API reference docs
ai.google.dev/api/generate-content
For more advanced model skew and drift monitoring you can switch to Vertex and use Vertex model monitoring:
cloud.google.com/vertex-ai/docs/mo...
What are the semantics for parallel tool calls (fan-out/fan-in), backpressure, and partial tool result streaming? Can we cap concurrency per request and get cost/latency attribution per tool invocation?
What fine-tuning paths exist today (SFT, LoRA/PEFT, preference tuning like DPO)? Is there a built-in eval harness in AI Studio to A/B base vs tuned models with statistical significance and dataset versioning?
Hey there, thank you for the question! The options you have for fine-tuning depend on both model (ex: Gemini vs. Gemma) and platform (ex: AI Studio vs. the Vertex AI platform). We recently dropped fine-tuning support for the Gemini APIs in AI Studio, and we do not have a built-in eval harness via AIS; however, there is a built-in eval framework in Vertex AI that might be useful for your use cases.
Related: the Gemini APIs team is partnering with open-source evals providers, like Promptfoo, to have more robust support for the Gemini APIs and all of its features; and with open-source fine-tuning frameworks (like Unsloth and Hugging Face) to offer fine-tuning support for our Gemma open model family.
I am using the 'Gemini 2.5 flash' model in a translation app project, and everything works well. However, when I switch to the 'Gemini 2.5 flash-lite' model to save on costs, the latter model tends to repeat phrases (it translates the same phrase more than once, creating duplications) and is not as accurate as the other model. I hope you can fix this in the next version.
Can you help breakdown the general terms of "AGI" vs. "Super AGI" vs. "Super Intelligence" vs. "Recursive Self Improvement" vs. "Singularity" ?
I feel like there's a lot of crossover and ambiguity.
Also, any predictions on what the future looks like in this regard?
Thanks, The Veo AI tool for videos is perhaps the best right now. They've done a great job.
Semi-random: I have friends in my personal life who criticize my use of AI for environmental reasons (water usage, etc.). My sense is that most of those statistics are overblown, especially in context of other regular activities.
But my question is: do you have folks in your personal life who are in any way critical of your line of work? How do you deal with that?
Do you support mask-based edits, reference-style adapters, or LoRA-style image adapters? Will outputs ship with C2PA provenance by default, and can developers opt in/out at the request level?
We do not support mask-based edits, reference-style adapters, or LoRAs - but I recommend to give Gemini 2.5 Flash Image a try in AI Studio and see if the model is capable of addressing your use cases.
Our models are compliant with provenance and we use Synth ID as a digital watermark.
Are you allowed to use non-Google models in your day-to-day work at Google? Like can you code with Anthropic LLMs or is that frowned upon?
Other models like Anthropic's are accessible via Vertex, and exceptions may be made for specific tools if you request them, but we now primarily use Gemini 2.5 Pro for coding.
Thanks, make sense. Do y'all get access to early release models for in-house usage?
yes, that's one benefit of being in our team
What's your org structure like and how has it changing over time?
Google AI Studio started as a Google Labs project, but our API was so successful with developers that we moved to Google Cloud before finally landing in Google DeepMind. I love that we are able to be closer to research and this move really sped up our ability to get the newest models to developers.
We are a small scrappy team that has a tight knit relationship between PM, model teams, eng, Dev rel, and GTM teams. Because there are so few of us, we have to work as a single unit to bring new models to the world quickly, but also safely and successfully. I love the pace we work at, though Iโm definitely taking a little vacation after #nanobanana
We wants to do the partnership with them. Will youlike to be the our partner? but we will not pay ๐
How we can use gemini apis in there websites and what is the limit of free Gemini api
Can Lyria export multitrack stems, tempo maps, and/or MIDI for downstream DAWs? What are the licensing/indemnity terms for commercial use, and are there guardrails around named-artist style emulation?
The current experimental version is built for real-time streaming, so it doesn't support exporting multitrack stems or MIDI for DAWs just yet. However, we are hard at work on our next generation of music models and APIs, so definitely stay tuned!
Regarding artist styles, we do have guardrails in place. To avoid generating pieces that are too similar to copyrighted content, the model is designed to filter out prompts that explicitly mention artists by name.
Do you offer a compatibility shim for common OpenAI patterns (JSON mode, tool schema, vision/video inputs) and a guide on behavioral differences to avoid edge-case regressions during migration?
Thanks for the question! yes we do have an OpenAI compatibility layer that supports most of those patterns, see here: ai.google.dev/gemini-api/docs/openai.
While we actively extend features, it may still have limitations. We recommend checking the docs and joining the Gemini developer forum for more questions around that.
How much work does AI do in your day-to-day and how has that changed over time?
Can we have GeminiApp in Google Apps Script soon?
Thanks for the question!
You can already use the Gemini APIs and Gemini in AI Studio to generate Apps Script code, which you can then pull into Google Workspace products (like Sheets). The Google Cloud team also has a few codelabs showing how to use the Gemini APIs with Apps Script (example).
How do you handle temporal reasoning (e.g., stale facts, embargoed knowledge, future-dated content)? Is there a notion of time-aware decoding or decay that prevents confident answers about outdated sources?
When models are hot-patched, will you publish reproducible โmodel-diffโ artifacts (weight deltas, eval deltas) and support shadow deployments so teams can canary traffic with automatic rollback on regression?
What's your roadmap for robustness against multimodal adversarial attacks (audio perturbations, image patches, typographic illusions, UI overlays) and how can developers fuzz these systematically pre-launch?
Great question. Think of our approach as a four-part defense plan. First, we ensure our training data is clean and secure. Second, we design the model's core architecture to be inherently skeptical, especially where text, images, and audio mix. Third, we constantly put the model through a "sparring gym" with adversarial training, teaching it to recognize and ignore attacks. Finally, our own teams are always trying to break it (red teaming) before it ever gets to you. To help with reactive safety, we are constantly shipping new classifiers to help us catch any bad actors across different areas.
For developers, the best strategy is to think like an attacker. Before you launch, try to fool your own app. Systematically fuzz it by adding subtle noise to audio commands, sticking weird patches on images, hiding tiny text prompts in pictures, or creating fake UI overlays. Automating these kinds of tests is the most effective way to find and patch up these multimodal weak spots before they cause real trouble.
What guidance (and product features) do you have for education that embraces AI: authentic-assessment patterns, provenance/watermarking for student work, and guardrails that promote learning over shortcutting?
For enterprise privacy, can we bring our own KMS/HSM with envelope encryption, receive zero-knowledge audit proofs for no-retention processing, and run privacy red-teaming that yields machine-readable findings?
You can implement most of these enterprise privacy controls with Google's AI and cloud services:
Will AI Studio support a true third-party ecosystem (tools, datasets, inspectors) with versioning, revenue share, code signing, and policy reviewโso developers can monetize capabilities safely inside other teams' apps?
Seemingly ever time I open my feed on X, Reddit, DEV, etc., I'm hit with a flurry of posts that make me feel like I'm being "left behind" by the latest/greatest in AI Land.
How do you deal with the dizzying rate of change?
Youโre not alone here, even we canโt keep up with everything. I recommend following a few selected newsletters.
For me news.smol.ai/ and X helps me keep up to date on a high level, and then I dive more into topics that interest me and/or are connected to my work.
When did you decide you wanted to work in this field? (For anyone on the team)
My "aha" moment really started back when I was a video game producer, seeing how we could build complex, interactive systems to help players be more effective and creative. My core passion has always been making people more productive, and I realized Gen AI is the ultimate tool to supercharge that for everyone.
When I worked in EdTech, I saw AI's potential to democratize education and make learning deeply personal and engaging, which always leads to better outcomes.
Getting to work on this team and focus on safety has been the most rewarding part, as it feels like I'm helping make this powerful technology a positive and safe force for the world.
Will you integrate formal methods (Z3/SMT, Coq/Lean proof hints) so codegen can emit verifiable contracts and proofs, not just tests, and surface failed proof obligations back to the prompt?
Can Gemini expose a calibrated abstention mode that explicitly says โI don't know,โ returns confidence intervals, and emits pointers for how a caller could reduce uncertainty?
That sounds like a Reddit AMA (Ask Me Anything) announcement. It means the Google DeepMind team is inviting people to ask them questions about their projects like Gemini (Googleโs advanced AI model), Google AI Studio (a platform for building with Gemini), and other research or tools theyโre developing.
How are accessibility features (screen readers, captions, image alt-text generation, haptics) designed and tested for developers and end users with diverse disabilitiesโand can we programmatically assert accessibility budgets in AI Studio projects?
Could you ship end-to-end โfact-chainโ explanations (input โ retrieval โ intermediate steps โ output) for non-trivial answers across text, code, image, and audio, with a public spec for how those traces are generated and verified?
Hello Google DeepMind team!
As AI applications scale from MVP to serving thousands of users, performance optimization becomes critical. What strategies does your team suggest for optimizing latency and throughput when deploying Gemini API in production environments with high concurrency?
Are there specific caching strategies, request batching techniques, or infrastructure patterns that work particularly well with your API? Also curious about monitoring and debugging best practices for production Gemini integrations.
Hey team! Really impressed with Gemini's multi-modal capabilities. ๐จ
How does Google DeepMind approach balancing model complexity and performance in Gemini when supporting multi-modal inputs like video, images, and text simultaneously? Are there specific optimization techniques or architectural decisions that help maintain response speed while processing diverse data types?
Curious about this from both a technical implementation perspective and developer experience standpoint.
Hi Google DeepMind team! ๐
As developers increasingly adopt structured approaches to AI integration (like JSON prompting for better consistency), I'm curious about your perspective on API design patterns that maximize reliability.
What are the best practices your team recommends for integrating Gemini API into existing developer workflows to ensure scalability and reliability, especially for multi-modal AI applications? Are there specific patterns or architectural approaches you've seen work particularly well in production environments?
Also excited to try the extended context features in AI Studio - thanks for making these tools accessible to the developer community!
Is the AI is the way to the future or is dangereus for us, is like the Universal Paperclips? maybe one day...
A dev from burgundy from France
Why do most metrics and surveys say that AI is making us 20% more productive? Even if we don't have physical AI and agents are trapped on software, the fact that they can generate and critic code at the speed of light, coupled with their humongous knowledge base and context window, should mean we should have orders of magnitude more productivity gains. Besides, why is the AI age not reflecting in economic data, i.e. GDP? What do you think are the bottlenecks to unlocking the economic and productivity benefits of AI? Why isn't it living up to expectations?
AI studio
I think liking comments is not enough @jess
Now you are not liking comments and why no one replying to my comments @jess @vivjair @kate_olszewska @alisa_fortin @dynamicwebpaige
Would the Gemma-3-270B model be sufficient for training a DevOps-specific LLM?
What's one capability Gemini absolutely can't do today but should by 2026?
Please how do I work for you.
what do you think frontend is good way to make money