
A day-by-day roadmap for testers ready to become AI engineers.
By Himanshu Agarwal · HimanshuAI · Testing · Engineering · Education
🌐 himanshuai.com · 💼 LinkedIn · 🎯 1:1 on Topmate · ✦ Free Substack · 📚 Playbook Store
Meet your guide
HimanshuAI helps software testers move into GenAI through practical playbooks, a daily newsletter, and 1:1 sessions — built around three things: Testing · Engineering · Education.
This roadmap is the free map. The playbooks and bundles are the guided route. Everything is one link away.
📩 Daily, free: one email a day on the SDET → GenAI shift — a tool, an eval trick, a small challenge. Subscribe at himanshuai.substack.com.
Why now: you already think like an AI engineer
Every good SDET lives in edge cases, reproducibility, and "prove it works." That's exactly what GenAI teams lack — because language models are non-deterministic, and someone has to make them trustworthy.
| 1 · Head start | Python, CI, debugging, and test design already transfer. You're not starting at zero. |
| 2 · The gap | LLMs fail quietly. Evaluation and guardrails are the missing discipline — your home turf. |
| 3 · The move | Reframe testing as "LLM evaluation" and you become the person every AI team needs. |
💡 The one-line thesis: GenAI doesn't need more people who can call an API. It needs people who can tell when the output is wrong — and you've done that your whole career.
📗 Recommended playbook: GenAI SDET Career Pack — Complete 4-Book Collection
How the 30 days work
One focused hour a day. Each week ends with something working — by day 30 you have a deployed project and an evaluation suite that proves it.
| Week | Days | Theme | Focus |
|---|---|---|---|
| Week 1 | 1–7 | Foundations | Python refresh, ML intuition, how LLMs work |
| Week 2 | 8–15 | Core GenAI | Prompting, model APIs, RAG on your own data |
| Week 3 | 16–23 | Advanced & Agents | Vector DBs, agents, Model Context Protocol |
| Week 4 | 24–30 | Evaluate & Ship | LLM testing, governance, deploy a real project |
Week 1 · Foundations (Days 1–7)
Get fluent before you get fancy. These three skills make everything after them feel easy.
1. Python, done properly — Days 1–2
Functions, typing, async, virtual envs, and clean structure. Write it like you'd want to test it.
Tools: Python 3.12 · uv · pytest · VS Code
2. AI / ML intuition — Days 3–4
Features, training vs inference, overfitting, and what embeddings really are.
Tools: NumPy · pandas · scikit-learn · Kaggle
3. LLM fundamentals — Days 5–7
Tokens, context windows, temperature, and the transformer idea at a working level.
Tools: Hugging Face · tiktoken · 3Blue1Brown · Karpathy
✅ Smallest next step: tokenise a sentence with
tiktokenand watch the same words become different token counts. That's your first "aha".
📘 Recommended playbook: GenAI SDET Career Pack — Complete 4-Book Collection
Week 2 · Core GenAI skills (Days 8–15)
Now you build. Talk to models with intent, wire up the real APIs, and feed them your own knowledge.
4. Prompt engineering — Days 8–9 · ⭐ SDET advantage
Treat prompts like test specs — clear inputs, expected outputs, edge cases, versioning.
Tools: Anthropic Console · OpenAI Playground · LangSmith
5. LLM APIs in code — Days 10–11
Call models, stream, retry, and use tool/function calling with real reliability instincts.
Tools: Claude · OpenAI · Gemini · Groq
6. RAG — grounding models — Days 12–15
Chunk, embed, retrieve, answer from your own docs. The pattern behind most useful AI products.
Tools: LangChain · LlamaIndex · Unstructured
✅ Smallest next step: build a 40-line RAG over one PDF. Ask it a question only that PDF can answer. When it cites the right page, you've built your first AI feature.
📙 Recommended playbook: RAG for SDETs Pack — 7 Books
Week 3 · Advanced & agents (Days 16–23)
Move from single calls to systems — searchable memory, tool-using agents, and MCP.
7. Vector databases — Days 16–17
Store and search embeddings at scale — similarity search, metadata filters, when a file beats a DB.
Tools: Chroma · Pinecone · Qdrant · pgvector
8. AI agents & frameworks — Days 18–20
Let models plan, call tools, and loop — and learn how to keep them on rails.
Tools: LangGraph · CrewAI · AutoGen
9. MCP — Model Context Protocol — Days 21–23
The open standard for connecting models to tools and data. Build one MCP server.
Tools: MCP SDK · Claude · Tool servers
📕 Recommended playbook: MCP Mastery Pack
Week 4 · Evaluate & ship (Days 24–30)
The finish line. Prove your system behaves, keep it safe, and put it online.
10. LLM testing & evaluation — Days 24–27 · ⭐ Your superpower
Build eval sets, score outputs, catch regressions, red-team for safety. The most in-demand GenAI skill.
Tools: Ragas · DeepEval · promptfoo · Giskard
Deploy your project — Days 28–30
Wrap it in an API, add a UI, containerise, and ship a public link. A live demo beats ten certificates.
Tools: FastAPI · Streamlit · Docker · HF Spaces
📚 Recommended bundles:
- The Complete AI Testing & GenAI Engineering Master Bundle (18 Books)
- AI Governance & Compliance Pack — All 4 Playbooks
The SDET superpower: why evaluation matters most
Everyone can prompt a model. Almost no one can prove it behaves. That proof is LLM evaluation — and it maps one-to-one to what you already do.
![The SDET → GenAI skill bridge]
| You already know | It becomes |
|---|---|
| Test case design (inputs, outputs, boundaries) | Eval dataset design (golden questions, rubrics) |
| Regression testing | Prompt / model regression scoring |
| Flaky test triage (non-determinism) | LLM variance handling (temperature, pass-rate stats) |
🎯 Position yourself as an AI Quality Engineer or LLM Evaluation Engineer — a title that barely existed two years ago and is now on every serious AI team's hiring plan.
Your 30-day calendar
One hour a day. By day 30 you ship.
![The 30-day calendar]
Week 1 — Foundations: Day 1 Python setup · 2 Python depth · 3 ML basics · 4 scikit-learn · 5 Tokens · 6 Transformers · 7 HF intro
Week 2 — Core GenAI: Day 8 Prompting · 9 Prompt specs · 10 First API · 11 Tool calling · 12 Embeddings · 13 Build RAG · 14 RAG eval · 15 Ship RAG
Week 3 — Advanced: Day 16 Vector DB · 17 Retrieval · 18 Agent 101 · 19 LangGraph · 20 Guardrails · 21 MCP intro · 22 MCP server · 23 Integrate
Week 4 — Evaluate & Ship: Day 24 Eval sets · 25 Ragas · 26 Regression · 27 Red-team · 28 API + UI · 29 Dockerize · 30 Deploy 🚀
📌 Rule of thumb: one hour a day for 30 days beats a heroic weekend that never repeats. Consistency is the whole strategy.
Toolbox & free resources
Foundation: Python · uv · pytest · Jupyter
Models & APIs: Claude · OpenAI · Gemini · Ollama
Orchestration: LangChain · LlamaIndex · LangGraph · MCP
Vector & retrieval: Chroma · Pinecone · Qdrant · pgvector
Evaluation ⭐: Ragas · DeepEval · promptfoo · Giskard
Ship it: FastAPI · Streamlit · Docker · HF Spaces
Free learning resources
- Neural Networks: Zero to Hero — Andrej Karpathy's free video course
- 3Blue1Brown — the neural-network & transformer visual series
- Hugging Face LLM Course & docs — hands-on, free, up to date
- "Attention Is All You Need" (Vaswani et al., 2017) — the transformer paper
- Provider docs — Anthropic, OpenAI, and Google model documentation
- Eval docs — Ragas, DeepEval, and promptfoo getting-started guides
Three builds that get you hired
Each fits inside the 30 days and shows off the exact skills — especially evaluation — that AI teams screen for.
- Docs Q&A assistant (RAG) — chat over a codebase or manual, with citations and a small accuracy eval that proves it works.
- Prompt regression harness (Eval ⭐) — a CLI that scores prompt versions against a golden set and fails the build on regression. Pure SDET energy.
- Automated triage agent — reads bug reports, tags severity, drafts a reproduction — with guardrails you designed.
Then write it up: one clear post per build — the problem, the approach, how you measured success. Ship the link.
💬 Interview tip: lead with build #2. "I built an evaluation harness for LLM outputs" makes AI hiring managers lean in — it's the problem they're living with right now.
Recommended: The Enterprise LLM Engineering Vault
When you're ready to go from "I finished the roadmap" to "I build production LLM systems," this is the flagship playbook set — architecture, deployment, evaluation, and governance for enterprise-grade GenAI.
![The Enterprise LLM Engineering Vault]
The H·AI Playbook Store
Every bundle maps to a part of this roadmap. (Add each cover image from Gumroad where marked.)
| Bundle | Link |
|---|---|
| ![] The Complete AI Testing & GenAI Engineering Master Bundle (18 Books) | Get it → |
| ![] The AI Test Engineering Vault — 9-in-1 SDET→GenAI Bundle (2026) | Get it → |
| ![] RAG for SDETs Pack — 7 Books | Get it → |
| ![] AI Test Automation Pack — 4-Book Complete Bundle | Get it → |
| ![]MCP Mastery Pack | Get it → |
| ![] AI Governance & Compliance Pack — All 4 Playbooks | Get it → |
| ![] GenAI SDET Career Pack — Complete 4-Book Collection | Get it → |
| ![] The Enterprise LLM Engineering Vault | Get it → |
Let's connect
The gap between a tester and an AI engineer isn't talent. It's starting day one — and not stopping.
You have the 30-day map, the tools, and the one skill the whole field wants. Open your editor and begin.
- 🌐 Website: himanshuai.com
- 💼 LinkedIn: linkedin.com/in/himanshuai
- 🎯 1:1 consulting (Topmate): topmate.io/himanshuai
- ✦ Free newsletter (Substack): himanshuai.substack.com
- 📚 Playbook Store (Gumroad): himanshuai.gumroad.com
© HimanshuAI · Testing · Engineering · Education · himanshuai.com




Top comments (0)