Generative AI for Business — Week 5

Agentic AI in Action (II)

Multi-Agent Systems, MCP & Reliability

Week 5

JHU Carey Business School | 2026
Generative AI for Business — Week 5

Today's agenda

Time Topic
0:00–0:35 Agent orchestration patterns
0:35–1:05 Model Context Protocol (MCP)
1:05–1:35 Agent reliability & failure modes
1:35–1:50 Break
1:50–2:25 Hands-on: Multi-agent workflow
2:25–2:55 Hands-on: MCP in action
2:55–3:00 Wrap-up + Assignment 4
JHU Carey Business School | 2026
Generative AI for Business — Week 5

Why multi-agent?

    SINGLE AGENT                      MULTI-AGENT
    ┌────────────────────┐      ┌────────────────────────────┐
    │                    │      │                            │
    │  One LLM does      │      │  Agent 1: Research         │
    │  EVERYTHING:       │      │      │                     │
    │                    │      │      ▼                     │
    │  • Research        │      │  Agent 2: Analyze          │
    │  • Analyze         │      │      │                     │
    │  • Write           │      │      ▼                     │
    │  • Review          │      │  Agent 3: Write            │
    │                    │      │      │                     │
    │  Context gets      │      │      ▼                     │
    │  overloaded.       │      │  Agent 4: Review           │
    │  Quality drops.    │      │                            │
    │                    │      │  Each agent: focused,      │
    │                    │      │  specialized, manageable   │
    └────────────────────┘      └────────────────────────────┘

Same principle as human teams: specialization works

JHU Carey Business School | 2026
Generative AI for Business — Week 5

Orchestration patterns

    SEQUENTIAL                    PARALLEL
    ┌────────┐                ┌────────┐  ┌────────┐
    │Agent A │                │Agent A │  │Agent B │
    │Research│                │Finance │  │Legal   │
    └───┬────┘                └───┬────┘  └───┬────┘
        │                        │            │
        ▼                        └─────┬──────┘
    ┌────────┐                         │
    │Agent B │                         ▼
    │Analyze │                    ┌────────┐
    └───┬────┘                    │Combine │
        │                        │& Report│
        ▼                        └────────┘
    ┌────────┐
    │Agent C │
    │Write   │
    └────────┘


    HIERARCHICAL                  DEBATE
    ┌──────────┐              ┌────────┐  ┌────────┐
    │ Manager  │              │Agent A │  │Agent B │
    │ Agent    │              │(Pro)   │◄─►│(Con)   │
    └─┬──┬──┬─┘              └────────┘  └────────┘
      │  │  │                      │          │
      ▼  ▼  ▼                      └────┬─────┘
    ┌──┐┌──┐┌──┐                        ▼
    │A ││B ││C │                   ┌────────┐
    └──┘└──┘└──┘                   │ Judge  │
                                   └────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 5

Agent communication: the handoff

    ┌────────────────────────────────────────────────────────┐
    │                  AGENT HANDOFF                          │
    │                                                         │
    │  Agent A (Researcher)                                   │
    │  ┌─────────────────────────────────────────────────┐   │
    │  │ System: "You are a research agent..."            │   │
    │  │ Task: "Find Q3 earnings for top 5 tech companies"│   │
    │  │ Output: structured JSON with findings            │   │
    │  └──────────────────────┬──────────────────────────┘   │
    │                         │                               │
    │                    HANDOFF DATA                          │
    │                    (structured,                          │
    │                     validated,                           │
    │                     schema-defined)                      │
    │                         │                               │
    │                         ▼                               │
    │  Agent B (Analyst)                                      │
    │  ┌─────────────────────────────────────────────────┐   │
    │  │ System: "You are a financial analyst..."         │   │
    │  │ Context: [research results from Agent A]         │   │
    │  │ Task: "Compare growth rates, identify trends"    │   │
    │  │ Output: analysis report                          │   │
    │  └─────────────────────────────────────────────────┘   │
    └────────────────────────────────────────────────────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 5

Memory in agents

    ┌─────────────────────────────────────────────────────┐
    │                  AGENT MEMORY                        │
    │                                                      │
    │  SHORT-TERM                    LONG-TERM             │
    │  (context window)              (persistent storage)  │
    │  ┌──────────────────┐    ┌──────────────────┐       │
    │  │ Current          │    │ User preferences │       │
    │  │ conversation     │    │ Past interactions│       │
    │  │ Tool results     │    │ Learned patterns │       │
    │  │ Working state    │    │ Knowledge base   │       │
    │  │                  │    │                  │       │
    │  │ Volatile:        │    │ Persistent:      │       │
    │  │ Gone when        │    │ Survives across  │       │
    │  │ context resets   │    │ sessions         │       │
    │  └──────────────────┘    └──────────────────┘       │
    │                                                      │
    │  EPISODIC                                            │
    │  (experience memory)                                 │
    │  ┌──────────────────────────────────────────┐       │
    │  │ "Last time I tried approach X, it failed  │       │
    │  │  because of Y. Next time, try Z instead." │       │
    │  └──────────────────────────────────────────┘       │
    └─────────────────────────────────────────────────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 5

Model Context Protocol (MCP)

JHU Carey Business School | 2026
Generative AI for Business — Week 5

The integration problem

    WITHOUT MCP                         WITH MCP
    ┌────────────────────────┐    ┌────────────────────────┐
    │                        │    │                        │
    │  App A ──custom──► DB  │    │  App A ──┐             │
    │  App A ──custom──► CRM │    │          │             │
    │  App A ──custom──► Slack│    │  App B ──┼──MCP──► DB │
    │                        │    │          │        CRM  │
    │  App B ──custom──► DB  │    │  App C ──┘        Slack│
    │  App B ──custom──► CRM │    │                        │
    │  App B ──custom──► Slack│    │  Write the connector  │
    │                        │    │  ONCE, use everywhere  │
    │  N apps × M tools      │    │                        │
    │  = N×M integrations    │    │  N apps + M tools      │
    │                        │    │  = N+M integrations    │
    └────────────────────────┘    └────────────────────────┘

MCP is USB-C for AI — one standard connector

JHU Carey Business School | 2026
Generative AI for Business — Week 5

MCP architecture

    ┌──────────────────────────────────────────────────────┐
    │                                                       │
    │  ┌────────────────┐         ┌───────────────────┐    │
    │  │   MCP HOST     │         │   MCP SERVER      │    │
    │  │   (AI app)     │         │   (tool provider) │    │
    │  │                │         │                   │    │
    │  │  ┌──────────┐  │  JSON   │  ┌─────────────┐ │    │
    │  │  │  MCP     │◄─┼─────────┼─►│  Resources  │ │    │
    │  │  │  Client  │  │  RPC    │  │  (data)     │ │    │
    │  │  └──────────┘  │         │  ├─────────────┤ │    │
    │  │                │         │  │  Tools      │ │    │
    │  │  Claude Code   │         │  │  (actions)  │ │    │
    │  │  Cursor         │         │  ├─────────────┤ │    │
    │  │  Your app      │         │  │  Prompts    │ │    │
    │  │                │         │  │  (templates)│ │    │
    │  └────────────────┘         │  └─────────────┘ │    │
    │                              └───────────────────┘    │
    └──────────────────────────────────────────────────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 5

MCP in practice: what servers expose

    RESOURCES (read-only data)          TOOLS (actions)
    ┌──────────────────────┐      ┌──────────────────────┐
    │ • Database schemas   │      │ • query_database()    │
    │ • File contents      │      │ • send_email()        │
    │ • API documentation  │      │ • create_ticket()     │
    │ • Configuration      │      │ • run_report()        │
    │                      │      │ • update_crm()        │
    │ Model can browse     │      │ Model can execute     │
    │ without side effects │      │ with side effects     │
    └──────────────────────┘      └──────────────────────┘

    PROMPTS (templates)
    ┌──────────────────────────────────────────────────┐
    │ • "Analyze this customer's account"               │
    │ • "Generate a weekly sales report"                │
    │ • "Draft a response to this support ticket"       │
    │                                                   │
    │ Pre-built, tested, reusable prompt templates     │
    └──────────────────────────────────────────────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 5

The MCP ecosystem

    ┌──────────────────────────────────────────────────────┐
    │                  MCP SERVERS                          │
    │                                                       │
    │  DATA                    PRODUCTIVITY                 │
    │  ┌─────────────┐        ┌─────────────┐              │
    │  │ PostgreSQL  │        │ Slack       │              │
    │  │ MongoDB     │        │ Google Drive│              │
    │  │ Redis       │        │ Notion      │              │
    │  │ Elasticsearch│       │ Linear      │              │
    │  └─────────────┘        └─────────────┘              │
    │                                                       │
    │  DEVELOPER               ENTERPRISE                   │
    │  ┌─────────────┐        ┌─────────────┐              │
    │  │ GitHub      │        │ Salesforce  │              │
    │  │ Jira        │        │ SAP         │              │
    │  │ AWS         │        │ ServiceNow  │              │
    │  │ Sentry      │        │ Workday     │              │
    │  └─────────────┘        └─────────────┘              │
    │                                                       │
    │  + hundreds of community-built servers                │
    └──────────────────────────────────────────────────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 5

Agent Reliability

Why agents fail and how to fix them

JHU Carey Business School | 2026
Generative AI for Business — Week 5

Why agents fail

    ┌─────────────────────────────────────────────────────┐
    │              AGENT FAILURE CASCADE                    │
    │                                                      │
    │  Step 1: Research     ✓ Correct                     │
    │       │                                              │
    │       ▼                                              │
    │  Step 2: Analyze      ✗ Wrong tool selected          │
    │       │                  (used calculator instead     │
    │       │                   of database query)          │
    │       ▼                                              │
    │  Step 3: Compute      ✗ Based on wrong data          │
    │       │                  (cascading error)            │
    │       ▼                                              │
    │  Step 4: Report       ✗ Confidently wrong            │
    │                         (no error detection)          │
    │                                                      │
    │  ONE mistake compounds through every subsequent step │
    └─────────────────────────────────────────────────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 5

AgentRx: diagnosing agent failures

    ┌─────────────────────────────────────────────────────┐
    │              AGENTRX FAILURE TAXONOMY                 │
    │                                                      │
    │  PERCEPTION                 PLANNING                 │
    │  ┌─────────────────┐       ┌─────────────────┐      │
    │  │ Misunderstood   │       │ Wrong strategy  │      │
    │  │ the task        │       │ chosen          │      │
    │  │ Missed context  │       │ Skipped steps   │      │
    │  │ Wrong assumption│       │ Wrong ordering  │      │
    │  └─────────────────┘       └─────────────────┘      │
    │                                                      │
    │  ACTION                     OBSERVATION              │
    │  ┌─────────────────┐       ┌─────────────────┐      │
    │  │ Wrong tool      │       │ Ignored errors  │      │
    │  │ Wrong parameters│       │ Misinterpreted  │      │
    │  │ Tool errored    │       │ results         │      │
    │  │ Hallucinated    │       │ Didn't verify   │      │
    │  │ tool call       │       │                 │      │
    │  └─────────────────┘       └─────────────────┘      │
    └─────────────────────────────────────────────────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 5

The autonomy-reliability trade-off

                     HIGH
                      │
    Reliability       │    ●  Human does it
                      │
                      │         ●  Human-in-the-loop
                      │              (agent proposes,
                      │               human approves)
                      │
                      │                   ●  Agent with
                      │                      guardrails
                      │
                      │                        ●  Full
                      │                           autonomy
                      │
                     LOW ────────────────────────────────
                          LOW                        HIGH
                                  Autonomy

The goal is to move UP and RIGHT — more autonomy with maintained reliability

JHU Carey Business School | 2026
Generative AI for Business — Week 5

Building reliable agents

    ┌─────────────────────────────────────────────────────┐
    │              RELIABILITY PATTERNS                     │
    │                                                      │
    │  1. GUARDRAILS                                       │
    │     Input validation → Action limits → Output checks │
    │                                                      │
    │  2. HUMAN-IN-THE-LOOP                               │
    │     Agent proposes → Human reviews → Agent executes  │
    │                                                      │
    │  3. TRAJECTORY LOGGING                               │
    │     Log every: thought, tool call, result, decision  │
    │                                                      │
    │  4. SELF-VERIFICATION                                │
    │     Agent checks its own work before returning       │
    │                                                      │
    │  5. GRACEFUL DEGRADATION                             │
    │     If unsure → ask for help, don't guess            │
    │     If tool fails → retry or use alternative         │
    │     If task too complex → break it down or escalate  │
    └─────────────────────────────────────────────────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 5

Break

15 minutes

JHU Carey Business School | 2026
Generative AI for Business — Week 5

Hands-on

Multi-Agent Workflow + MCP

JHU Carey Business School | 2026
Generative AI for Business — Week 5

Exercise 1: Multi-agent workflow

cd scripts/week5
claude "Read multi_agent.py, explain the workflow, then help me modify it"

The pipeline:

    ┌────────────┐     ┌────────────┐     ┌────────────┐
    │  RESEARCH  │ ──→ │  ANALYSIS  │ ──→ │  WRITING   │
    │  Agent     │     │  Agent     │     │  Agent     │
    │            │     │            │     │            │
    │ "Find info │     │ "Identify  │     │ "Write a   │
    │  about..." │     │  patterns, │     │  briefing  │
    │            │     │  insights" │     │  document" │
    └────────────┘     └────────────┘     └────────────┘

Tasks:

  • Run the default pipeline
  • Add an error-handling agent between steps
  • MS: implement custom orchestration logic
  • MBA: adapt the pipeline for a business scenario
JHU Carey Business School | 2026
Generative AI for Business — Week 5

Exercise 2: MCP in action

claude "Read mcp_server.py, explain how it works"

Build an MCP server with business tools:

    YOUR MCP SERVER
    ┌─────────────────────────────────┐
    │                                 │
    │  Tools:                         │
    │  ├── inventory_lookup(sku)      │
    │  ├── customer_search(name)      │
    │  └── generate_report(type)      │
    │                                 │
    │  Resources:                     │
    │  ├── product_catalog            │
    │  └── pricing_table              │
    │                                 │
    └─────────────────────────────────┘
         ▲
         │ MCP protocol
         ▼
    ┌─────────────────────────────────┐
    │  Claude Code (MCP client)       │
    └─────────────────────────────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 5

Assignment 4 (due next week)

MS section:

  • Agent555 case: design a multi-agent dispatch system
  • Submit: architecture diagram + key agent prompts

MBA section:

  • "What Roles Could GenAI Play on Your Team?" case
  • 1-page analysis mapping the framework to a real organization

Graded on completion.

JHU Carey Business School | 2026
Generative AI for Business — Week 5

Next week preview

Week 6: GenAI and Agent Governance

  • Risk taxonomy: hallucination, bias, security, privacy
  • Evaluation and verification frameworks
  • Regulatory landscape (EU AI Act, NIST)
  • Red-teaming workshop

Reading:

  • NIST AI Risk Management Framework
  • EU AI Act summary
JHU Carey Business School | 2026
Generative AI for Business — Week 5

Questions?

JHU Carey Business School | 2026

Last week we built single agents with tools and RAG. This week we go further: multiple agents working together, the Model Context Protocol standard that's reshaping how agents connect to the world, and — critically — why agents fail and how to fix them. The AgentRx paper is your key reading for this week.

Why not just use one agent? Same reason you don't have one person do everything at a company. Context overload: a single agent trying to research, analyze, write, and review fills up its context window and quality degrades. Specialization: each agent gets its own focused system prompt. Debuggability: when something goes wrong, you can identify which agent failed. The multi-agent approach mirrors how human teams work: division of labor, handoffs, quality checks.

Four patterns. Sequential: agents pass work in a chain, like an assembly line. Parallel: multiple agents work simultaneously on different aspects, results combined. Hierarchical: a manager agent delegates to specialist agents and synthesizes results. Debate: two agents argue opposing positions, a judge decides. Sequential is simplest and most common. Hierarchical works well for complex tasks with clear subtasks. Debate is surprisingly effective for decisions that benefit from multiple perspectives.

The handoff between agents is critical. Agent A produces structured output that becomes Agent B's input. This is where things often break: if the output format is ambiguous, if data is lost in translation, if the schema changes. Best practices: define a clear schema for handoffs (JSON with required fields), validate outputs before passing them, include metadata (confidence scores, sources, timestamps). Think of it like a well-defined API contract between services.

Agents have three types of memory. Short-term: the context window, everything the agent can see right now. This is volatile — it resets every conversation. Long-term: persistent storage of user preferences, past interactions, learned knowledge. This survives across sessions. Episodic: memories of specific experiences, including what worked and what didn't. Claude Code uses all three: your conversation is short-term, CLAUDE.md files are long-term, and its auto-memory feature is episodic. Building effective agent memory is one of the hardest problems in agentic AI.

Before MCP, every AI application needed custom integrations for every tool. Want to connect Claude to your database? Custom code. Want to connect GPT-4 to the same database? Different custom code. MCP solves this by creating a standard protocol. Build one MCP server for your database, and any MCP-compatible AI application can connect to it. The analogy is USB-C: one cable works with everything. This reduces the integration matrix from N×M to N+M.

MCP has two sides. The host (like Claude Code) contains an MCP client that sends requests. The server (your tool) exposes three types of capabilities: Resources (data the model can read), Tools (actions the model can take), and Prompts (pre-built templates). Communication happens over JSON-RPC — a simple, standard protocol. The server can run locally or remotely. Claude Code natively supports MCP, which is why you can connect it to databases, APIs, and other tools.

Three capabilities. Resources are read-only: the model can see database schemas, file contents, documentation. This is safe — no side effects. Tools have side effects: the model can take actions like querying a database, sending an email, creating a ticket. These need guardrails. Prompts are pre-built templates for common tasks — tested and reusable. When building an MCP server, think carefully about what you expose. Resources are safe to be generous with. Tools should be carefully scoped.

The MCP ecosystem is growing rapidly. There are official servers for major databases, productivity tools, developer tools, and enterprise systems. Plus hundreds of community-built servers on GitHub. This means you can connect Claude Code to your Postgres database, Slack workspace, GitHub repo, and CRM system — all through standard MCP connections. For your final projects, MCP servers can give your agents real-world capabilities without writing custom integrations.

Agents fail differently than chatbots. A chatbot gives one wrong answer. An agent makes one wrong decision, then builds on it through multiple steps, each amplifying the error. By the final output, the result can be confidently, spectacularly wrong. This cascading failure is the biggest risk of agentic systems. The AgentRx paper from your reading systematically categorizes these failure modes and proposes diagnostic approaches.

The AgentRx framework maps failures to the four stages of the agent loop. Perception failures: the agent misunderstood the task or missed important context. Planning failures: it chose the wrong strategy or skipped steps. Action failures: wrong tool, wrong parameters, or hallucinated a tool that doesn't exist. Observation failures: it ignored errors or misinterpreted results. When debugging an agent, trace through the trajectory and identify which stage failed first — that's your root cause.

This is the fundamental trade-off in agentic AI. More autonomy = more value but less reliability. The sweet spot today for most enterprise use cases is human-in-the-loop: the agent does the work, but a human approves critical actions. Over time, as we build better guardrails and evaluation, we'll push the frontier toward full autonomy with high reliability. For your final projects, be thoughtful about where you place your system on this curve.

Five patterns for building reliable agents. Guardrails: validate inputs, limit what actions the agent can take, check outputs. Human-in-the-loop: have humans approve high-stakes actions. Trajectory logging: log everything so you can debug failures. Self-verification: have the agent check its own work. Graceful degradation: when uncertain, ask for help rather than guessing. Claude Code implements all five of these, which is why it asks for permission before running commands or editing files.

You'll work with a three-agent pipeline: researcher, analyst, writer. Each agent has its own system prompt and tools. Your job is to understand the handoffs, then modify the pipeline. Try adding a fourth agent — maybe a fact-checker or editor. Notice how the quality of the handoff data (structured JSON between agents) affects the final output quality. If the researcher returns vague results, the analyst can't do much.

You'll build a simple MCP server that exposes business tools. Claude Code can then connect to your server and use those tools in conversation. This is the same pattern used in production: your internal systems expose MCP servers, and AI assistants connect to them. Try building tools for a domain you care about — inventory, CRM, reporting. The key learning is how MCP standardizes the tool interface so any AI app can use your tools.