Overview
Course Overview

AI Career Bootcamp

A practical 8-week program to move from AI curious to AI native — grounded in real job postings and the 7 skills employers actually hire for.

Why This Course Exists

The AI job market is K-shaped. Traditional roles are flat or falling. AI-native roles are growing so fast that for every qualified candidate there are 3.2 jobs. Average time to fill: 142 days.

Companies are desperate for people who can build, operate, and evaluate AI systems — not just use them. This course teaches the 7 skills that appear in hundreds of real job postings.

Course Structure

8 weeks, 4–6 hours per week. Self-paced. Each week builds on the last.

WeekFocusCore Skill
1FoundationsSpecification Precision
2QualityEvaluation & Quality Judgment
3SystemsMulti-Agent Decomposition
4ReliabilityFailure Pattern Recognition
5SafetyTrust & Security Design
6ScaleContext Architecture
7EconomicsCost & Token Economics
8IntegrationCapstone + Portfolio
Prerequisites
Basic AI tool usage (ChatGPT, Claude, or similar). Everything else is built from scratch here.
āœ‹
Before You Start — Self-Check

Answer these honestly. They'll help you track your growth.

  • Have you shipped an AI feature to users (even a simple chatbot)?
  • Can you name 3 ways AI fails differently than humans fail?
  • Have you estimated the cost of an AI-powered feature before building it?
  • Do you know what a "context window" is and why it matters?

No prep needed if you're new — that's exactly what weeks 1–4 are for.

The Reality

The K-Shaped Job Market

Two markets moving in opposite directions. Understanding this split is the first step to being on the right side of it.

The Split

Market 1: Traditional knowledge work — standard PMs, conventional SWEs, business analysts, general administrators. Job openings: flat or falling.

Market 2: AI-native roles — people who design, build, operate, and manage AI systems. Growing fast. Extremely in demand.

"There are essentially infinite AI jobs right now. Not growing demand. Not a hot sector. Functionally infinite. And they cannot find qualified people."
— Nate Jones, source research

The Numbers
MetricValueWhat It Means
Jobs-to-Candidate Ratio 3.2 : 1 Three jobs for every qualified person
Time to Fill AI Role 142 days Almost half a year per role
AI Jobs (est.) 1.6M ManpowerGroup estimate (likely low)
Qualified Applicants ~500K If you're here, you write your own ticket
Why "I Applied to 500 Jobs and Got Nothing"
You're probably applying to Market 1. The commodity basket is crowded because everyone can do it. Market 2 has the opposite problem — not enough qualified people.
The Other Problem: Bad Actors

Not all job postings are real. Nate Jones found:

  • Resume farming: Companies post AI roles they don't intend to fill, using applications as free labor to learn what candidates know
  • Whitewashed roles: "AI PM" but actually just regular PM with AI tools
  • Overstated skills: Candidates claiming AI expertise they don't have

The skill framework in this course is specifically designed to cut through this noise — these are learnable skills tied to how AI actually works.

The Good News
All 7 skills are learnable. You don't need a CS degree. You don't need to be a genius. You need specificity and practice. That's it.
The 7 Skills

What Employers Actually Want

Derived from hundreds of real job postings, backward-analyzed into sub-skills. These are tied to how AI works — not hype cycles.

Skill 1
Specification Precision
Write exact specs agents can execute without inference
Skill 2
Evaluation & Quality
Detect AI errors before they reach production
Skill 3
Multi-Agent Decomposition
Break complex projects into agent-sized chunks
Skill 4
Failure Pattern Recognition
Diagnose why agentic systems break — and fix them
Skill 5
Trust & Security Design
Draw the line between human and agent authority
Skill 6
Context Architecture
Build the information infrastructure agents run on
Skill 7
Cost & Token Economics
Mathematically justify AI investments before building
→
Test Projects
Apply all 7 skills in real scenarios
The Skill That's #1 on Every Posting

Evaluation & quality judgment — checking whether AI output is actually correct vs. just sounding correct. AI is confidently wrong in ways humans don't instinctively catch.

Skill 1

Specification Precision

Not "prompting." Writing exact, unambiguous instructions that agents execute without inferring intent. The 2026 standard for working with AI.

Week 1
1/8 weeks
1 The Fill-in-the-Blank Problem

Humans read between the lines. We infer intent from context, body language, past conversations. Agents don't. They take what you give them literally and fill in the rest with their best guess.

The result: Vague prompt → plausible-sounding output that misses your actual goal → you assume the AI is smart enough to figure it out → it wasn't.

Why this matters for hiring
In 2026, "good at AI" means "can specify exactly what I want." This is why technical writers, lawyers, and QA engineers have a head start — they've trained in exact documentation.
The 2026 Standard

Here's the difference between what most people call "prompting" and what employers mean by specification precision:

āŒ Vague:
"Help with customer support"

āœ… Precise:
"Build a tier-1 ticket agent that:
- Handles password resets (account verification required)
- Handles order status inquiries (read-only, no changes)
- Handles return initiations (orders < 30 days, original packaging)
- Escalates to human when: sentiment score < 0.3 OR
  ticket involves billing disputes > $200 OR
  customer uses keyword 'lawyer' or 'attorney'
- Logs every escalation with reason_code and customer_sentiment_score
- Never: issue refunds, change shipping addresses, share internal pricing"

The vague version takes 5 seconds to write. The precise version takes 10 minutes. The precise version is what gets hired.

Who Has a Head Start
ProfessionWhy They Transfer
Technical WritersTrained to write for audiences who can't infer
LawyersPrecision is liability — they already think this way
QA EngineersWriting testable specs is the job
EditorsAlready spotting ambiguity and imprecision
AccountantsExact definitions, no room for interpretation
āœļø
Exercise: Specification Audit
30 minutes

Find a vague task you've given an AI in the past week. Rewrite it as a precise specification.

  • Define exact inputs — what data does the agent receive?
  • Define exact outputs — what does success look like?
  • Define boundaries — what does it not handle?
  • Define escalation — when does it flag for human review?
  • Define success metrics — how do you measure correctness?

Test: Give your old vague prompt and your new precise spec to the same AI. Compare outputs. Document the difference.

šŸŽÆ
Exercise: Decompose These Tasks
45 minutes

Write precise specifications for each:

  • An agent that triages inbound sales leads
    Hint: What qualifies a lead? What disqualifies? Who escalates?
  • An agent that summarizes legal contracts
    Hint: What sections matter? What risk flags? What can't it do?
  • An agent that drafts code review summaries
    Hint: What context matters? What's the output format? What triggers flags?
How to Practice Specification Muscle

The key insight: specification is not about the AI. It's about knowing what you want.

  • Before every AI interaction, write down what you expect the output to look like
  • If you can't describe what you want in writing, the AI can't produce it
  • Test: could a new hire execute this from your prompt alone? If not, it's too vague
  • Get feedback: show your specs to someone in your field and ask what's missing
Skill 2

Evaluation & Quality Judgment

The single most-cited skill in AI job postings. The ability to detect when AI output is actually wrong — not just confident.

Week 2
2/8 weeks
2 The Confidence Problem

Humans stumble when they're wrong. We hesitate, qualify, backtrack. AI doesn't. AI generates text that looks exactly the same whether it's right or wrong. The confident tone implies correctness — and it's a lie.

The fluency trap

When AI output looks polished, well-structured, and confident, humans instinctively trust it. This is the failure mode that causes real harm — wrong code shipped to production, incorrect legal summaries filed, bad data fed into decision systems.

The skill: Resisting the temptation to read fluency as competence. Building internal barometers for quality that don't depend on how confident the AI sounds.

Semantic vs. Functional Correctness

Semantic: "The AI said the right things" — the output sounds correct, uses the right terminology, follows the right structure.

Functional: "The AI did the right thing" — the output achieves the actual goal, the data is accurate, the recommendation is valid.

Example: An AI recommends a credit card. It explains its reasoning perfectly. Semantically correct. But the card it recommends doesn't exist in the system. Functionally wrong.

This gap — between "sounds right" and "is right" — is where evaluation lives.

Building Eval Frameworks

An eval framework is a systematic quality barometer for AI output.

For any AI task, define:

  • What correct looks like — 3 to 5 concrete criteria
  • What borderline looks like — acceptable but not ideal
  • What failure looks like — detectable, specific failure modes
  • What edge cases look like — the 10% of situations that break the general case
Example: Code Review Agent Eval

Criteria:
āœ“ All security vulnerabilities caught (OWASP Top 10)
āœ“ Performance issues flagged (> O(n²) without justification)
āœ“ Style deviations from team guidelines noted
āœ“ Every "LGTM" has a specific reason, not rubber-stamp approval

Edge cases that should fail:
āœ— Silent approval of code with known CVEs
āœ— Missing error handling in async code
āœ— Approving code that contradicts PR description
šŸ”
Exercise: The Audit Test
45 minutes

Take AI output on a topic you know deeply — your area of expertise. Act as if you're the editor or auditor responsible for its accuracy.

  • Find 1 factual error the AI made
  • Find 1 edge case it missed
  • Find 1 place where it "sounded right" but wasn't

Document these. This is your eval muscle forming. Most people discover they missed errors they would have caught if they were looking.

šŸ› ļø
Exercise: Build an Eval Harness
60 minutes

For one AI task you do repeatedly:

  • Define 5 concrete criteria for "correct" output
  • Write a 5-question checklist someone could use to evaluate the output
  • Identify 3 edge cases that should trigger a "fail" rating

Congratulations — you just built an eval harness. This is what employers mean when they say "build evaluation frameworks."

Skill 3

Multi-Agent Task Decomposition

Breaking complex projects into agent-sized work units and orchestrating planner/sub-agent architectures. The skill that separates single-use AI from scalable AI.

Week 3
3/8 weeks
3 Why Single Agents Hit Walls

A single agent has hard limits: context window size, task complexity it can hold in memory, and number of steps it can execute before losing the thread. Complex projects — "build our entire customer onboarding flow" — can't be done by one agent in one shot.

The solution: Decompose the project into discrete tasks, each handled by a specialized agent, coordinated by a planner that maintains state across the full run.

The Key Distinction from Regular PM

Human managers: "Figure out the details as you go. Use your judgment." Agents can't do this.

Human PM decomposition:
"Go handle the product launch. 
You know what needs to happen. 
Loop in marketing when you need them."

Agent decomposition:
"Planner Agent:
1. Coordinate sub-agents for: market research, 
   competitor analysis, pricing strategy, 
   content calendar, launch checklist, 
   post-mortem template
2. Each sub-agent receives exact task specs
3. Each sub-agent returns output to planner
4. Planner verifies output quality before 
   proceeding to next task
5. If any sub-agent fails twice, escalate to human"

The decomposition is the product spec for the multi-agent system. Bad decomposition = system that fails in predictable ways.

The Sizing Question

Every decomposition has a hidden question: "Is this task correctly sized for the agentic harness I have?"

Harness TypeTask Size It Can Handle
Single-threaded agent Single task, ~10-15 steps max, fits in context
Multi-agent (planner + subs) Large project, multiple workstreams, long-horizon goals
Hierarchical agent swarm Enterprise-scale, many teams, cross-functional coordination

Give a too-large task to a single agent → it loses track, starts improvising, produces confident nonsense.

šŸ”¬
Exercise: Decompose a Project
45 minutes

Take this project: "Research competitor X and produce a 10-page market analysis."

  • Break it into 7–10 discrete agent-sized tasks
  • What are the logical chunks?
  • What's the execution order?
  • Where are the handoff points?
  • Which tasks depend on which others?

Then ask: could each task be completed by a single agent in one session? If not, decompose further.

šŸ—ļø
Exercise: Architecture Diagram
60 minutes

Design a multi-agent architecture (can be ASCII, hand-drawn, or a tool like Miro).

Use the project: "Build a content marketing system"

  • How many agents do you need?
  • What does each agent do (be specific)?
  • What does the planner agent coordinate?
  • How do you verify each agent's output before the next step?
  • Where do correction loops go?
Skill 4

Failure Pattern Recognition

The six ways agentic systems break — and how to diagnose, fix, and prevent them. This is what separates hobbyists from professionals.

Week 4
4/8 weeks
The Six Failure Modes
Failure What's Happening How to Spot It
Context Degradation Quality drops as session gets long — context window polluted Output quality correlates inversely with session length
Specification Drift Agent forgets goals over long tasks Mid-task output diverges from original intent
Sycopantic Confirmation Agent validates bad input, builds entire wrong system around it Wrong data → confident wrong output chain follows
Tool Selection Errors Agent picks wrong tool from harness Task done but wrong approach — usually prompt framing problem
Cascading Failure One agent's error propagates through the chain Multiple failures trace back to single root cause
Silent Failure Output looks correct but is functionally wrong Requires deep audit — most dangerous failure mode
Silent Failure — The Hardest One

This one deserves extra attention. It's the one that ships to production and causes problems for weeks before anyone notices.

Real Example

AI recommends "brown leather boots" to a customer. The recommendation looks correct in the chat log. The customer receives blue leather boots. Investigation reveals: the warehouse had a mixup. The AI recommended the right product from the catalog — but the catalog image didn't match the actual inventory. The AI never saw the warehouse mixup. The output looked identical to correct output.

The fix: Functional correctness checks, not just semantic ones. Does this recommendation actually work in the real world?

šŸ”¬
Exercise: Failure Mode Roulette
30 minutes

For each scenario, identify the failure mode and how you'd fix it:

  • Scenario 1: A code agent spent 2 hours writing a Python scraper. Output looks perfect — all imports, clean syntax, complete functions. But it scraped the wrong website entirely.
    What failure mode? Why? How do you fix it?
  • Scenario 2: An agent started a 50-step data pipeline. Steps 1–10 were great. Steps 30–50 got increasingly creative — inventing data that wasn't in the source.
    What failure mode? Why? How do you fix it?
  • Scenario 3: An AI recommended a credit card. It explained its reasoning perfectly. The card doesn't exist in the company's product database.
    What failure mode? Why? How do you fix it?
šŸ““
Exercise: Build a Failure Log
Ongoing

Start documenting failures you encounter in your own AI work. After 10 entries, you'll have a personal failure mode handbook.

Failure Log Entry Template:
Date: 
Task: 
What Happened: 
Failure Mode: 
How Detected: 
How Fixed: 
Prevention: 
Skill 5

Trust & Security Design

Deciding where humans stay in the loop, how agents are authorized, and how to verify guardrail compliance. The skill that makes AI safe to ship.

Week 5
5/8 weeks
5 The Core Question

Where does the blast radius of a failure meet acceptable risk? Every AI action needs an answer to this before it goes live.

The problem: Telling an agent "be good" in a system prompt doesn't work. These are probabilistic systems. Guardrails have to be structural, not aspirational.

The Four Sub-Skills
Sub-Skill What It Means Example
Cost of Error What's the blast radius if this goes wrong? Misspelled email draft vs. wrong drug dose
Reversibility Can this mistake be undone? Email draft = yes. Wire transfer = no
Frequency How often does this action run? 10K/day vs. 2/day — same error, different risk
Verifiability Can you prove it was correct after the fact? Semantic vs. functional correctness audit
Guardrail Construction Patterns
Pattern 1: Human-in-the-loop at boundaries
------------------------------------------
Agent recommends → Human approves → Action executes
Used for: High-cost, irreversible, or high-frequency actions

Pattern 2: Pre-flight verification
------------------------------------------
Agent prepares output → Verification agent checks →
  Pass: proceed | Fail: return for revision
Used for: Outputs that go to external customers

Pattern 3: Output constraints in system prompt
------------------------------------------
"[CONSTRAINTS]
- Never mention internal pricing
- Escalate legal questions to human
- Confirm dollar amounts with user before proceeding
- Never store PII in logs
[/CONSTRAINTS]"
Used for: Behavior that must always apply

Pattern 4: Rollback-capable transactions
------------------------------------------
Action → Log for audit → Verify → Commit | Revert
Used for: Database writes, external API calls
šŸ—ŗļø
Exercise: Risk Map Your AI Work
45 minutes

For each AI action you take or plan to take, answer:

  • Blast radius: What's the worst-case outcome?
  • Reversible? Yes / No / Partially
  • Frequency: How many times per day/week?
  • Verifiable? How would you prove it was correct?

Classify each as: Low / Medium / High / Critical risk

Skill 6

Context Architecture

Building the information infrastructure that lets agents find and use the right data at the right time. The skill that turns one agent into dozens.

Week 6
6/8 weeks
6 The Dewey Decimal System for AI

In 2024, "using AI at work" meant pasting the right documents into the prompt. In 2026, it means building systems where the right information is always available to agents — structured, clean, and traversable.

Context architecture is the discipline of designing that information layer so agents can self-serve what they need — without human hand-holding.

Why this is worth $300K+
Get context architecture right → you can deploy dozens of agents on the same data infrastructure. Get it wrong → every agent needs its own human curator. The difference between a platform and a toy.
Persistent vs. Per-Session Context

Persistent context: Always available to the agent — company policies, product knowledge base, team roster, past interaction history. Loaded once, used forever.

Per-session context: Loaded for a specific run — the user's current request, session-specific data, task-relevant documents. Refreshed each session.

Persistent Context (always available):
ā”œā”€ā”€ Company policies (HR, legal, security)
ā”œā”€ā”€ Product documentation
ā”œā”€ā”€ Team directory + responsibilities
ā”œā”€ā”€ Escalation paths
└── Historical decisions + rationale

Per-Session Context (loaded per task):
ā”œā”€ā”€ Current user request
ā”œā”€ā”€ Relevant documents for this task
ā”œā”€ā”€ Session-specific variables
└── Handoff data from previous agents
The Contamination Problem

Dirty data in context = confused agents = confident wrong output. If your product database has outdated prices, your agent will recommend outdated prices — confidently.

Context architecture includes:

  • Data freshness: When was this data last updated?
  • Source of truth: Which system is authoritative?
  • Confidence signals: How certain should the agent be about this data?
  • Escalation triggers: When should the agent flag data as unreliable?
šŸ—‚ļø
Exercise: Context Design for One Agent
60 minutes

Design the context architecture for: "A sales agent that answers customer questions using your company's product knowledge base."

  • What is always in context (persistent)?
  • What is loaded per session?
  • How does the agent find the right information?
  • What would contaminate this system?
  • How do you verify the agent found the right context?
⚔
Exercise: Scale Test
30 minutes

Take your single-agent context design. Now make it work for 20 agents simultaneously — different teams, different tasks, same data infrastructure.

  • What breaks?
  • What needs to change?
  • How do agents avoid stepping on each other?

This is the question that separates a $150K AI specialist from a $300K+ AI architect.

Skill 7

Cost & Token Economics

Mathematically justifying AI investments before building them. The skill that turns "AI is expensive" from a complaint into a decision framework.

Week 7
7/8 weeks
7 The Core Calculation

Before building any AI feature, you need to answer: Is it worth it?

Cost per task = (tokens_used Ɨ model_price) + overhead

ROI = value_of_task / cost_per_task

Break-even: cost_per_task < value_of_task

The model selection problem: Frontier models (Claude Opus, GPT-4.5) give the best quality but cost more. Cheap models (Llama, Haiku) cost less but may be wrong more often. The skill is matching model to task correctly.

When to Use Which Model Tier
Task Type Recommended Tier Why
Simple classification, routing Cheap / Fast Doesn't need frontier reasoning
Drafting, summarization Mid-tier Good enough quality, cost-conscious
Complex reasoning, architecture Frontier Quality failures are expensive
Code generation, technical docs Frontier Subtle errors cause production bugs
Multi-step agentic pipelines Blended Cheap for routing, frontier for execution
Building a Token Cost Calculator

The practical skill: build a tool (spreadsheet, script, or dashboard) where you can change variables and see blended cost across models instantly.

Token Cost Calculator Template:

Task: [describe task]
Estimated tokens: [your estimate]

Model          | $/1M tokens | Your cost
------------------------------------
GPT-4.5        | $2.50       | [calc]
Claude Haiku   | $0.25       | [calc]
Claude Sonnet  | $3.00       | [calc]
Llama 4        | $0.10       | [calc]

Volume: [tasks/day] Ɨ [days/month] = [monthly_tasks]
Monthly cost at each tier: [calc]

Break-even value per task: [value] / [monthly_tasks]
🧮
Exercise: Model Selection Audit
45 minutes

Look at your last 10 AI tasks. For each:

  • Which model did you actually use?
  • Was that the right model for the job?
  • Could a cheaper model have done it?
  • Could a frontier model have justified its cost?

Identify your 3 biggest model selection inefficiencies. This is where you're burning budget without gaining quality.

Apply What You Learned

Test Projects

Three projects that test all 7 skills in realistic scenarios. Each includes brief, rubrics, and what to submit.

Beginner
Customer Support AI Agent
Design and spec a tier-1 customer support agent that handles common requests, knows when to escalate, and produces audit logs.
The Brief

A mid-size e-commerce company wants to deploy an AI agent to handle their top 20 support ticket types. You're brought in to design the agent system, write the specs, and build a simple prototype.

The agent should: handle password resets, order status checks, return initiations, refund requests under $100, and product information queries. It must escalate anything involving billing disputes over $200, legal keywords, or customer sentiment below 0.3.

Skills This Tests

Skill 1 (Specification) — exact boundaries, escalation criteria, success metrics | Skill 2 (Evaluation) — how you measure quality | Skill 4 (Failure Modes) — what breaks and how to catch it | Skill 5 (Trust & Security) — blast radius, human checkpoints

Deliverables
  • Complete specification document for the agent
  • Eval framework: 5-question checklist for output quality
  • Failure mode analysis: 3 most likely failures + fixes
  • Guardrail design: where humans stay in the loop
  • Functional prototype using Claude/GPT (even a single conversation counts)
Rubric
Pass
Spec covers all 5 ticket types with clear boundaries
Good
+ eval framework with specific quality criteria
Exceptional
+ failure mode analysis + functional prototype
Intermediate
Market Research Pipeline
Build a multi-agent pipeline that researches a company, produces a competitive analysis, and generates actionable recommendations.
The Brief

A VC firm wants an AI system that, given a target company, automatically produces: company overview, competitive positioning, market size estimate, risk assessment, and investment recommendation. The system should scale to handle 10 companies per week.

Skills This Tests

Skill 3 (Multi-Agent Decomposition) — how you break this into agent-sized chunks | Skill 6 (Context Architecture) — how information flows between agents | Skill 7 (Cost Economics) — model selection per task + ROI justification | Skill 2 (Evaluation) — quality gates between pipeline stages

Deliverables
  • Multi-agent architecture diagram with all agents and their roles
  • Planner agent specification: how it coordinates sub-agents
  • Context design: what data persists, what's loaded per run
  • Quality gates: where eval happens between pipeline stages
  • Cost model: estimated monthly token cost at scale (10 companies/week)
  • Failure mode analysis: what breaks the pipeline and how you catch it
Rubric
Pass
Multi-agent decomposition with clear agent roles
Good
+ context architecture + cost model
Exceptional
+ working prototype on 1 company + failure analysis
Advanced
Enterprise AI Reliability System
Design the full AI system for a healthcare-adjacent startup that must meet compliance standards, handle PHI data, and operate with verifiable audit trails.
The Brief

A health-tech startup is building an AI system that helps care coordinators manage patient scheduling, insurance verification, and pre-visit prep. Every action must be auditable. The system must pass a compliance audit (HIPAA-equivalent). The team is 5 people.

You need to design the full system — not just the AI, but the human-AI workflow, the guardrails, the context architecture, the eval systems, and the failure recovery procedures.

Skills This Tests

All 7 skills at once. This is a capstone project. The spec for Skill 5 (Trust & Security) should be especially thorough — PHI data, blast radius analysis, compliance requirements change everything about guardrail design.

Deliverables
  • Full specification for all AI agents in the system
  • Multi-agent architecture with decomposition rationale
  • Context architecture: what data, how structured, compliance handling
  • Eval framework: quality standards for each agent
  • Guardrail system: blast radius map, human checkpoints, compliance controls
  • Failure mode handbook: 6 failure types applied to this system
  • Cost model: break-even analysis for the full system
  • Compliance section: how audit trails work, what happens in a breach scenario
Rubric
Pass
Full system spec with all 7 skill areas addressed
Good
+ compliance/audit section + failure handbook
Exceptional
+ working prototype + cost model + real blast radius analysis
Track Your Progress

Self-Assessment Checklist

Rate yourself honestly on each skill. These are the questions employers ask in AI-native interviews.

1 Specification Precision
  • I can write exact specs that agents execute without clarification
  • I test my prompts as if a new hire will read them
  • I've documented prompting standards for my team
2 Evaluation & Quality Judgment
  • I catch AI errors before they reach production
  • I build eval frameworks for AI tasks
  • I understand the difference between semantic and functional correctness
3 Multi-Agent Decomposition
  • I can break complex projects into agent-sized chunks
  • I understand planner/sub-agent architectures
  • I've built at least one working multi-agent system
4 Failure Pattern Recognition
  • I can identify which of the 6 failure modes is occurring
  • I build correction loops into my agentic systems
  • I've diagnosed a silent failure in production
5 Trust & Security Design
  • I map blast radius for every agent action
  • I know where humans stay in the loop for my systems
  • I've built guardrails that hold under adversarial input
6 Context Architecture
  • I can design context systems for scalable agent deployments
  • I understand persistent vs. per-session context
  • I think like a librarian when structuring company data for AI
7 Cost & Token Economics
  • I can estimate token costs before building agents
  • I select models based on task requirements, not just "best available"
  • I've built tools to calculate blended AI costs
Your Score

Count your checkmarks. If you answered confidently to 5 or more per skill, you're ready for AI-native roles. Focus your study on the areas where you're under 3.

Further Learning

Resources

Curated resources for each skill area. Everything here is free or low-cost.

Skill 1 — Specification
  • Anthropic Prompt Engineering Guide — anthropic.com
  • OpenAI API Best Practices — platform.openai.com
  • Google's ML Product Guidelines — machine-learning-principles
Skill 2 — Evaluation
  • Anthropic Engineering Blog — especially the eval writing posts
  • Braintrust / Helicone — eval tooling for AI
  • OpenAI Evals — open source eval library
Skill 3 — Multi-Agent
  • CrewAI documentation — crewai.com
  • LangGraph examples — langchain.com/langgraph
  • Nate Jones' agent architecture videos (his YouTube channel)
Skill 4 — Failure Modes
  • Search "Claude loops" on Twitter/X — real failure examples
  • Claude documentation on agentic patterns
  • LangChain troubleshooting guides
Skill 5 — Trust & Security
  • OWASP Top 10 for LLMs — owasp.org
  • Anthropic's AI safety guidelines
  • OpenAI's use-case-specific safety guidelines
Skill 6 — Context Architecture
  • RAG tutorials — Retrieval Augmented Generation explainers
  • Pinecone / Weaviate / Chroma — vector DB tutorials
  • "What is a vector database" explainers (Hacker News)
Skill 7 — Cost Economics
  • OpenRouter model pricing page — openrouter.ai/models
  • Anthropic pricing — console.anthropic.com/pricing
  • OpenAI pricing — platform.openai.com/pricing
  • Tiktoken — token counting library
Certifications
  • Claude Certified Architect — growing fast, Accenture-backed, likely becomes the "AWS cert" of AI roles
  • AWS Machine Learning Specialty — enterprise credibility
  • Google Cloud Professional ML Engineer — if you're GCP-native
Communities
  • AI Builders Slack — search "AI Builders Slack invite"
  • CrewAI / LangChain Discord servers
  • Nate Jones' hiring board (when it launches — check his Substack)