Neeraj Kumar Singh Beshane

Posted on Mar 23

The AI Foundation Every Engineer Needs (and What to Skip)

I went through 50+ AI resources so you don't have to. Here are the 18 that actually matter, organized by level, graded by impact, with a clear list of what to skip.

84% of developers are using or planning to use AI coding tools. O'Reilly reports prompt engineering interest surged 456%. And their latest analysis paints an even starker picture: up to 90% of software engineers now use AI in their coding workflow.

But most engineers are winging it. You install Copilot, try a few prompts, maybe watch a YouTube tutorial, and suddenly you're not sure if you're behind or ahead. There is no clear curriculum—just a firehose of noise and tools competing for your attention. Here is the reality: The gap between "uses Copilot sometimes" and "architects AI solutions strategically" is exactly where careers will diverge.

This post is the map I wish I had. I spent the last few weeks grinding through 50+ resources (courses, books, docs, and papers) and filtered them down to the 18 that actually move the needle. Whether you're a junior developer getting started or a senior engineer figuring out what to learn next, this is your definitive starting line.

What We're Covering

Who is this for: Any software professional (backend, frontend, infra, data, QA, security, leadership) at any experience level.

What you'll walk away with: A graded learning path across 4 foundational AI domains, with 18 curated resources ranked by priority.

Time investment: ~20 min read | 30-50 hours to work through everything

The 4 Domains Every Engineer Needs

Before specific tools or frameworks, there are four foundational areas that every engineer needs some competence in. Think of these as the load-bearing walls of a house. The role-specific stuff ("AI for DevOps" or "AI for security") gets built on top.

Domain 1: Prompt Engineering

In plain terms: Prompt engineering is the skill of giving clear, structured instructions to AI models so they give you useful, reliable results. It's not just "chatting with ChatGPT." It's a real discipline.

Think of it like writing a really good ticket for a contractor. The more specific and structured your instructions, the better the output. Vague input = vague output.

Why you need it: Every AI tool you use (Copilot, Cursor, Claude, ChatGPT) runs on prompts under the hood. The quality of your prompts directly determines the quality of what you get back. This is table-stakes for every role.

Key concepts to learn:

Tokenization — how models break text into chunks (this explains a lot of weird behavior)
Context windows — how much information a model can "see" at once
System prompts — persistent instructions that shape how a model behaves
Chain-of-thought — asking the model to show its reasoning step by step
Few-shot examples — giving the model examples of what you want before asking it to perform

Maturity	What's Available
Production-ready	Anthropic prompt guide, OpenAI prompt guide, structured outputs
Emerging	Automated prompt optimization (DSPy), prompt testing frameworks
Experimental	Self-refining prompts, model-specific meta-prompting

Domain 2: LLM Capabilities and Limitations

In plain terms: Understanding what AI models can and can't do. Where they shine, where they hallucinate (make things up), and why they sometimes give you completely wrong answers with total confidence.

The analogy: LLMs are like a well-read intern who's memorized thousands of books but has never written production code. They explain concepts well, but they'll also confidently suggest a library that doesn't exist. Knowing when to trust them and when to verify is the skill.

Why you need it: Without this, you'll either over-trust AI (shipping bugs) or under-trust it (missing real productivity gains). A rigorous study from METR found that experienced devs were actually 19% slower with AI tools on certain tasks, while believing they were faster. That gap is a capabilities-and-limitations problem.

Maturity	What's Available
Production-ready	GPT-5.4, Claude Opus 4.6/4, Gemini 3.1 — strong for code, writing, analysis
Emerging	Long-context models (1M+ tokens), multimodal reasoning
Experimental	Reliable agentic reasoning, self-correction

Domain 3: AI-Assisted Development Tooling

In plain terms: The AI-powered tools that plug into your actual coding workflow: code completion, generation, refactoring, debugging, documentation.

The landscape has exploded beyond just GitHub Copilot. There are now fundamentally different categories:

Inline completion — Copilot: predicts your next lines as you type
Chat-first IDE — Cursor: you describe what you want in natural language, it edits your code
Terminal-native agents — Claude Code, Codex CLI: you describe a task, they execute multi-step workflows
Hybrid — Windsurf: blends chat and inline editing

Each suits different workflows. It's worth trying a couple to see what clicks for how you work.

Maturity	What's Available
Production-ready	GitHub Copilot (20M+ users), Cursor (1M+ daily), Claude Code, Windsurf
Emerging	OpenAI Codex CLI, multi-file agentic editing, automated PR review
Experimental	Fully autonomous coding agents

Domain 4: Agent Architecture

In plain terms: The design patterns for building AI systems that take actions on their own. Not just answer questions, but actually do things in the world.

A regular AI chatbot is like texting a friend for advice. An AI agent is like hiring an assistant who reads your email, figures out what needs to happen, does it, checks if it worked, and reports back.

The agent loop goes: perceive (read the environment) → reason (figure out what to do) → act (do it) → observe (check the result) → repeat.

Why you need it: The Model Context Protocol (MCP), essentially USB-C for AI integrations, is now standardizing how agents connect to tools. Whether you build agents or just use agent-powered tools (like Cursor or Claude Code), understanding this loop changes how you evaluate and debug AI systems.

Maturity	What's Available
Production-ready	MCP, function calling, structured tool use
Emerging	LangGraph, OpenAI Agents SDK, multi-agent orchestration
Experimental	Self-improving agents, long-horizon autonomous tasks

The 18 Curated Resources (Graded)

Here's the full list, organized by experience level and graded so you know where to focus.

How the Grades Work

Grade A (Essential): Skip this and you'll have a real gap.
Grade B (Highly Recommended): Adds meaningful depth. Worth it if you're serious.
Grade C (Useful for Depth): Only if you're going deep in a specific area.

Beginner Resources

Resource	Type	Grade	Time	Why It Matters
Anthropic Prompt Engineering Guide	Guide	A	2-3 hrs	The most thorough prompt guide available. Start here.
Anthropic Interactive Tutorial	Repo	A	3-4 hrs	Hands-on Jupyter notebooks where you actually run and iterate on prompts.
DeepLearning.AI: ChatGPT Prompt Engineering for Developers	Course	A	1.5 hrs	Andrew Ng's 90-minute crash course. Best time-to-insight ratio available.
OpenAI Prompt Engineering Guide	Guide	B	1-2 hrs	Good second perspective, especially if you're using OpenAI's models.
Andrej Karpathy: "Deep Dive into LLMs"	Talk	A	3.5 hrs	The single best explanation of how LLMs work. Not optional.
MCP Official Documentation	Docs	B	1-2 hrs	How agents connect to tools. You don't need to memorize it — just understand the architecture.
GitHub Copilot Documentation	Docs	B	1-2 hrs	Covers features most Copilot users never discover.
Cursor Quickstart	Docs	B	1-2 hrs	Minimal docs, but Cursor reveals its value through use. Try it on a real project.

anthropics / prompt-eng-interactive-tutorial

Anthropic's Interactive Prompt Engineering Tutorial

Welcome to Anthropic's Prompt Engineering Interactive Tutorial

Course introduction and goals

This course is intended to provide you with a comprehensive step-by-step understanding of how to engineer optimal prompts within Claude.

After completing this course, you will be able to:

Master the basic structure of a good prompt
Recognize common failure modes and learn the '80/20' techniques to address them
Understand Claude's strengths and weaknesses
Build strong prompts from scratch for common use cases

Course structure and content

This course is structured to allow you many chances to practice writing and troubleshooting prompts yourself. The course is broken up into 9 chapters with accompanying exercises, as well as an appendix of even more advanced methods. It is intended for you to work through the course in chapter order.

Each lesson has an "Example Playground" area at the bottom where you are free to experiment with the examples…

View on GitHub

The interactive tutorial above is one of the best hands-on resources for building prompt engineering intuition. Clone it, run the notebooks, and actually experiment.

Intermediate Resources

Resource	Type	Grade	Time	Why It Matters
Chip Huyen: AI Engineering	Book	A	15+ hrs	The comprehensive reference. If Designing Data-Intensive Applications was your systems bible, this is the AI equivalent.
LangGraph Documentation	Docs	A	4-6 hrs	The most battle-tested agent framework. The patterns transfer even if you don't adopt LangGraph itself.
DeepLearning.AI: Agentic AI	Course	A	5-8 hrs	Covers agent design end-to-end: tool use, planning, memory, multi-agent coordination. Production-focused.
OpenAI Agents SDK	Docs	B	3-4 hrs	Handoffs between agents, guardrails, and tracing as first-class concepts. Good second framework.
Claude Code Documentation	Docs	B	1-2 hrs	Terminal-native AI for multi-step coding tasks. The value is in building muscle memory.
OpenAI GPT-5 Prompting Guide	Guide	B	1-2 hrs	What changed from GPT-4 and where older patterns break.
MCP Specification (Full)	Spec	B	2-3 hrs	Read this when you're ready to build MCP servers, not before.

Advanced Resources

Resource	Type	Grade	Time	Why It Matters
Lilian Weng: Prompt Engineering	Blog	B	1.5 hrs	Academic survey with paper citations. Dense but thorough.
Lilian Weng: LLM Powered Autonomous Agents	Blog	B	2 hrs	The foundational post on agent architecture: planning, memory, and tool use.
MMLU-Redux Paper	Paper	C	1 hr	Shows ~6% of benchmark questions contain errors. Matters if you're comparing models by benchmarks.
Stanford HAI AI Index 2025	Report	B	2-3 hrs	The data source behind most AI trend claims. Great for calibrating hype vs. reality.

What I Learned the Hard Way

The resource list above is the "what." This section is the "what they won't tell you," things I learned by actually working through these resources and applying them.

Cursor and Claude Code Are Complements, Not Competitors

Cursor is great when you're editing a file and want a tight feedback loop: change something, see the result, iterate. Claude Code is great when you want to say "refactor this module" and let the AI figure out the steps.

I use both every day. The mistake I see engineers make is picking one tool and forcing every task through it. Match the tool to the task, not to brand loyalty.

Tutorial Prompts ≠ Production Prompts

The DeepLearning.AI course teaches you clean, self-contained examples. That's the right starting point. But production prompts look nothing like that. They're 2,000-token system messages with edge case handling, output format constraints, and error recovery instructions.

Plan to spend 5-10x longer adapting what you learn to production than you spend learning it. The Anthropic interactive tutorial gets closest to bridging this gap because it makes you actually iterate.

"AI Makes Developers 19% Slower": It's More Nuanced Than That

The METR study is real. But the headline misses context. The task was contributing to unfamiliar codebases, where deep project knowledge matters more than code generation speed. Participants also thought they were 20% faster.

The real lesson: on greenfield code and well-scoped tasks, AI tools genuinely help. On complex debugging in large codebases, the gains evaporate. Measure your own results instead of going by vibes.

Start with the API, Not the Framework

LangGraph has solid patterns and good docs. But I've watched engineers spend three days wrapping a task in a framework that would've taken four hours with direct API calls and a while loop.

Learn the patterns from frameworks. Then decide whether you actually need the framework. For a single-agent, single-task workflow, raw API calls with structured output will get you to production faster.

Most People Learn in the Wrong Order

The instinct is: install Copilot → try some prompts → maybe watch a video. That's like learning React before learning JavaScript. You'll get output, but you won't know why it breaks.

Better order: Karpathy's talk (understand the engine) → prompt engineering (learn the interface) → tooling (apply it). The mental model makes every tool more effective.

Karpathy's "Deep Dive into LLMs" is 3.5 hours. Watch at 1.25x if you need to, but don't skip it. It will reshape how you think about every AI tool you use.

Your Learning Path — Where to Start

This Week (~5 hours)

Watch Karpathy's talk (3.5 hrs at 1x, ~2.5 hrs at 1.25x). Take notes on the tokenization and RLHF sections. They explain more quirky AI behavior than any debugging guide.
Install Cursor and spend 90 minutes on a real project. Not a tutorial, your actual codebase. Use Cmd+K for inline edits and Cmd+L for chat. Notice where it nails it and where it produces garbage. That calibration is the starting point.

This Month (~20 hours)

Work through the Anthropic Prompt Engineering Guide and the Interactive Tutorial. Do the exercises, don't just read.
Take DeepLearning.AI's Agentic AI course.
Build one real agent. Not a tutorial agent, something useful for your actual workflow. An agent that creates Jira tickets from Slack messages, reviews PRs against your style guide, or monitors a dashboard. The gap between "I get agents conceptually" and "I've built and debugged one" is enormous.

This Quarter (~50 hours)

Read Chip Huyen's AI Engineering — one chapter at a time, applied immediately.
Pick one AI coding tool and go deep for 30 days. Track where it saves time and where it costs time. Real numbers, not vibes.
Skim the Stanford HAI AI Index 2025 executive summary — it's the best single source for separating AI reality from AI hype.

What to Skip (Seriously, Skip These)

Not everything popular is worth your time. Here's what I'd steer clear of as a starting point:

Generic Udemy mega-courses. ~23% completion rates, recycled content between releases. The free resources above are more current and more focused.
Expensive university AI certificates ($1,600-$3,000+). The Anthropic guide, DeepLearning.AI courses, and Karpathy's talk are free. Unless your employer pays or you need the cohort structure, this is poor ROI.
Fine-tuning tutorials as a first step. Prompt engineering + RAG (retrieval-augmented generation — where you feed the model your own data at query time) solves 90%+ of use cases people reach for fine-tuning to address. Learn those first.
"Build GPT from scratch" tutorials. Fascinating for understanding transformer internals. Not the right starting point for application engineers. Karpathy's talk gives you the mental model; come back to these later.
Andrew Ng's Deep Learning Specialization as step one. It's excellent — but it teaches neural network fundamentals (backpropagation, CNNs) that are foundational for ML engineers, not the prompt-to-agents path most software engineers need first. Do it later if you want to go deep on model internals.

Over to You

This guide reflects what worked for me as a security infrastructure engineer at a fintech platform. Your context will differ. That's the point.

Three questions for the community:

What's been your experience with AI coding tools so far? Have they genuinely sped you up, or have you noticed the "feels faster but isn't" effect that the METR study describes?
Prompt engineering vs. just using better tools — where do you invest your time? Some engineers swear by becoming expert prompt engineers; others say just pick the best tool and let it handle the prompts. Where do you land?
What's the next AI skill you're planning to learn? Agent architecture? RAG pipelines? Something else entirely? I'm curious what the community is gravitating toward.

This is Part 1 of the AI Role Upgrade Roadmap series. Each post maps the AI landscape for a specific software role — what matters, what doesn't, and where to invest your time.

Neeraj Kumar Singh Beshane

Staff Security Infra @ Parafin · AI security researcher: EmbedGuard, GenOps, RuntimeGuard · PerceptFence · Conf42/CIMA speaker

DEV Community

The AI Foundation Every Engineer Needs (and What to Skip)

What We're Covering