Generative AI for Business — Week 4

Agentic AI in Action (I)

RAG, Tool Use & Agent Foundations

Week 4

JHU Carey Business School | 2026
Generative AI for Business — Week 4

Today's agenda

Time Topic
0:00–0:30 What is agentic AI?
0:30–1:10 RAG: Retrieval-Augmented Generation
1:10–1:35 Tool use in practice
1:35–1:50 Break
1:50–2:30 Hands-on: Build a RAG pipeline
2:30–2:55 Hands-on: Tool-using agent
2:55–3:00 Wrap-up + Assignment 3
JHU Carey Business School | 2026
Generative AI for Business — Week 4

The spectrum of autonomy

    CHATBOT          COPILOT            AGENT
    ┌─────┐         ┌─────┐           ┌─────┐
    │     │         │     │           │     │
    │ 💬  │         │ 🤝  │           │ 🤖  │
    │     │         │     │           │     │
    └─────┘         └─────┘           └─────┘
      │               │                 │
      ▼               ▼                 ▼
    User asks       User works        System acts
    AI answers      AI assists        autonomously

    ─────────────────────────────────────────────►
    LOW autonomy                    HIGH autonomy
    LOW risk                        HIGH risk
    SIMPLE                          COMPLEX
JHU Carey Business School | 2026
Generative AI for Business — Week 4

The agent loop

    ┌─────────────────────────────────────────────────────┐
    │                                                      │
    │         ┌──────────┐                                │
    │         │ PERCEIVE │ ◄── User query, environment    │
    │         └────┬─────┘                                │
    │              │                                      │
    │              ▼                                      │
    │         ┌──────────┐                                │
    │         │   PLAN   │ ◄── What steps to take?        │
    │         └────┬─────┘                                │
    │              │                                      │
    │              ▼                                      │
    │         ┌──────────┐                                │
    │         │   ACT    │ ──► Call tools, retrieve docs   │
    │         └────┬─────┘                                │
    │              │                                      │
    │              ▼                                      │
    │         ┌──────────┐                                │
    │         │ OBSERVE  │ ◄── Tool results, feedback     │
    │         └────┬─────┘                                │
    │              │                                      │
    │              └──────► Done? ──► YES ──► Response    │
    │                         │                           │
    │                         NO                          │
    │                         │                           │
    │                    back to PLAN                      │
    │                                                      │
    └─────────────────────────────────────────────────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 4

Agent architectures

    ReAct                    Plan-and-Execute          Reflexion
    ┌──────────────┐        ┌──────────────┐        ┌──────────────┐
    │ Think → Act  │        │ Plan all     │        │ Try → Fail   │
    │ → Observe    │        │ steps first  │        │ → Reflect    │
    │ → Think →    │        │ then execute │        │ → Try again  │
    │   Act → ...  │        │ one by one   │        │              │
    │              │        │              │        │              │
    │ Interleaved  │        │ Structured   │        │ Self-healing │
    │ thinking     │        │ upfront      │        │ from errors  │
    │ and acting   │        │ planning     │        │              │
    └──────────────┘        └──────────────┘        └──────────────┘

    Most common.             Good for complex        Good when
    Claude Code              multi-step tasks.       mistakes are
    uses this.               Requires known steps.   likely.
JHU Carey Business School | 2026
Generative AI for Business — Week 4

RAG

Retrieval-Augmented Generation

JHU Carey Business School | 2026
Generative AI for Business — Week 4

Why RAG?

    WITHOUT RAG                         WITH RAG
    ┌──────────────────────┐      ┌──────────────────────┐
    │                      │      │                      │
    │  "What is our Q3     │      │  "What is our Q3     │
    │   revenue?"          │      │   revenue?"          │
    │                      │      │        │             │
    │        │             │      │        ▼             │
    │        ▼             │      │   ┌──────────┐      │
    │   ┌──────────┐      │      │   │ RETRIEVE  │      │
    │   │   LLM    │      │      │   │ Q3 report │      │
    │   └──────────┘      │      │   └────┬─────┘      │
    │        │             │      │        │             │
    │        ▼             │      │        ▼             │
    │   "I don't have     │      │   ┌──────────┐      │
    │    access to your   │      │   │ LLM +    │      │
    │    financial data"  │      │   │ context  │      │
    │                      │      │   └──────────┘      │
    │   or worse:          │      │        │             │
    │   HALLUCINATION      │      │        ▼             │
    │                      │      │   "$47.3M, up 12%   │
    │                      │      │    from Q2" ✓       │
    └──────────────────────┘      └──────────────────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 4

The RAG pipeline

    INDEXING (offline, once)
    ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
    │ Documents│ ─→ │  CHUNK   │ ─→ │  EMBED   │ ─→ │  STORE   │
    │ (PDFs,   │    │ Split into│    │ Convert  │    │ Vector   │
    │  docs,   │    │ passages │    │ to vectors│    │ database │
    │  web)    │    │          │    │          │    │          │
    └──────────┘    └──────────┘    └──────────┘    └──────────┘


    RETRIEVAL (online, per query)
    ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
    │  User    │ ─→ │  EMBED   │ ─→ │ SEARCH   │ ─→ │ GENERATE │
    │  query   │    │  query   │    │ similar  │    │ answer + │
    │          │    │          │    │ chunks   │    │ sources  │
    └──────────┘    └──────────┘    └──────────┘    └──────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 4

Chunking: the make-or-break step

    TOO SMALL                  JUST RIGHT                TOO LARGE
    ┌──────────┐          ┌──────────────────┐        ┌──────────────┐
    │ "Revenue │          │ "Q3 revenue was  │        │ [Entire 50-  │
    │  was"    │          │  $47.3M, up 12%  │        │  page annual │
    │          │          │  from Q2, driven │        │  report      │
    │ Lost     │          │  by enterprise   │        │  crammed     │
    │ context  │          │  sales growth    │        │  into one    │
    │          │          │  in APAC region."│        │  chunk]      │
    └──────────┘          └──────────────────┘        └──────────────┘
         ✗                        ✓                         ✗

    STRATEGIES:
    ┌─────────────────────────────────────────────────────────┐
    │ • Fixed-size: 500-1000 tokens with 100-200 overlap     │
    │ • Semantic: split at paragraph/section boundaries       │
    │ • Hierarchical: summary chunks + detail chunks          │
    │ • Document-aware: respect headers, tables, lists        │
    └─────────────────────────────────────────────────────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 4
    EMBEDDING SPACE (simplified to 2D)

                    "Q3 revenue growth"
                          ●  ← Query
                        / |
                      /   |
                    /     |
         "Q3 sales   "Annual revenue
          increased"   summary"
              ●           ●   ← Close (relevant)




         "Office            "CEO
          supplies"          biography"
              ●                 ●   ← Far (irrelevant)

Similar meaning → similar vectors → found by search

JHU Carey Business School | 2026
Generative AI for Business — Week 4

RAG evaluation

    ┌─────────────────────────────────────────────────────┐
    │                RAG EVALUATION                        │
    │                                                      │
    │   RETRIEVAL QUALITY          GENERATION QUALITY      │
    │   ┌─────────────────┐       ┌─────────────────┐     │
    │   │ • Precision:    │       │ • Faithfulness:  │     │
    │   │   % retrieved   │       │   Does answer    │     │
    │   │   docs that are │       │   match the      │     │
    │   │   relevant      │       │   retrieved docs?│     │
    │   │                 │       │                  │     │
    │   │ • Recall:       │       │ • Completeness:  │     │
    │   │   % relevant    │       │   Did it use all │     │
    │   │   docs that     │       │   relevant info? │     │
    │   │   were retrieved│       │                  │     │
    │   │                 │       │ • Hallucination: │     │
    │   │ • MRR:          │       │   Did it make    │     │
    │   │   Rank of first │       │   stuff up?      │     │
    │   │   relevant doc  │       │                  │     │
    │   └─────────────────┘       └─────────────────┘     │
    └─────────────────────────────────────────────────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 4

RAG vs. fine-tuning vs. long context

    ┌──────────────┬────────────────┬──────────────────────┐
    │              │                │                      │
    │     RAG      │  FINE-TUNING   │  LONG CONTEXT        │
    │              │                │                      │
    │ Best when:   │ Best when:     │ Best when:           │
    │ • Many docs  │ • Specific     │ • Few docs           │
    │ • Docs change│   style/format │ • Fit in window      │
    │ • Need       │ • Consistent   │ • Need full          │
    │   citations  │   behavior     │   document context   │
    │ • Scalable   │ • Specialized  │ • Simple setup       │
    │              │   domain       │                      │
    │              │                │                      │
    │ Harder to    │ Need training  │ Expensive per        │
    │ build        │ data + compute │ query                │
    │              │                │                      │
    └──────────────┴────────────────┴──────────────────────┘

These are complementary, not competing

JHU Carey Business School | 2026
Generative AI for Business — Week 4

Tool use: how LLMs interact with the world

    ┌─────────────────────────────────────────────────────────┐
    │                                                          │
    │  User: "What's the current stock price of AAPL?"         │
    │                                                          │
    │  LLM thinks: I need real-time data. Let me use a tool.  │
    │                                                          │
    │  ┌──────────────────────────────────────────────────┐   │
    │  │  Tool call: get_stock_price(ticker="AAPL")       │   │
    │  └──────────────────┬───────────────────────────────┘   │
    │                     │                                    │
    │                     ▼                                    │
    │  ┌──────────────────────────────────────────────────┐   │
    │  │  Tool result: {"price": 237.42, "change": +1.2%} │   │
    │  └──────────────────┬───────────────────────────────┘   │
    │                     │                                    │
    │                     ▼                                    │
    │  LLM: "Apple (AAPL) is currently trading at $237.42,    │
    │   up 1.2% today."                                        │
    │                                                          │
    └─────────────────────────────────────────────────────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 4

Anatomy of a tool definition

    ┌─────────────────────────────────────────────────────┐
    │  TOOL DEFINITION (you provide to the model)          │
    │                                                      │
    │  name: "get_stock_price"                             │
    │                                                      │
    │  description: "Get the current stock price           │
    │   and daily change for a given ticker symbol.        │
    │   Use this when the user asks about stock            │
    │   prices or market data."                            │
    │                                                      │
    │  parameters:                                         │
    │    ticker:                                           │
    │      type: string                                    │
    │      description: "Stock ticker (e.g., AAPL, MSFT)" │
    │      required: true                                  │
    │                                                      │
    │  returns: { price: number, change: string }          │
    └─────────────────────────────────────────────────────┘

    Good description = model knows WHEN and HOW to use it
JHU Carey Business School | 2026
Generative AI for Business — Week 4

Tool orchestration patterns

    SEQUENTIAL                 PARALLEL               CONDITIONAL
    ┌────────┐            ┌────────┐  ┌────────┐     ┌────────┐
    │ Tool A │            │ Tool A │  │ Tool B │     │ Tool A │
    └───┬────┘            └───┬────┘  └───┬────┘     └───┬────┘
        │                     │           │               │
        ▼                     └─────┬─────┘          ┌────┴────┐
    ┌────────┐                      │                │ Check   │
    │ Tool B │                      ▼                │ result  │
    └───┬────┘                 ┌────────┐            └────┬────┘
        │                      │ Combine │                │
        ▼                      └────────┘           ┌─────┴─────┐
    ┌────────┐                                      ▼           ▼
    │ Tool C │                               ┌────────┐  ┌────────┐
    └────────┘                               │ Tool B │  │ Tool C │
                                             └────────┘  └────────┘
    Each step uses                Multiple tools     Choose next tool
    previous result               at once            based on result
JHU Carey Business School | 2026
Generative AI for Business — Week 4

Claude Code: an agent in action

    ┌─────────────────────────────────────────────────────┐
    │  YOU: "Fix the bug in auth.py"                       │
    │                                                      │
    │  Claude Code:                                        │
    │    1. PERCEIVE → Read auth.py (tool: Read)           │
    │    2. PLAN    → Identify the issue                   │
    │    3. ACT     → Edit auth.py (tool: Edit)            │
    │    4. OBSERVE → Read modified file                    │
    │    5. ACT     → Run tests (tool: Bash)               │
    │    6. OBSERVE → Tests pass? ✓                        │
    │    7. RESPOND → "Fixed the bug by..."                │
    │                                                      │
    │  Tools used: Read, Edit, Bash, Glob, Grep            │
    │  Pattern: ReAct (reasoning interleaved with action)  │
    └─────────────────────────────────────────────────────┘

You've been using an agent all semester

JHU Carey Business School | 2026
Generative AI for Business — Week 4

Break

15 minutes

JHU Carey Business School | 2026
Generative AI for Business — Week 4

Hands-on

RAG Pipeline + Tool-Using Agent

JHU Carey Business School | 2026
Generative AI for Business — Week 4

Exercise 1: Build a RAG pipeline

cd scripts/week4
claude "Read rag_pipeline.py, explain it, then help me build a RAG system"

Steps:

    1. Load documents ──→ 2. Chunk them ──→ 3. Embed & store
                                                    │
    6. Evaluate      ◄── 5. Generate   ◄── 4. Query & retrieve

Try:

  • Index a set of sample documents
  • Query the system and check answer quality
  • Experiment with chunk sizes (200, 500, 1000 tokens)
  • MS: implement custom similarity scoring
  • MBA: focus on document selection and query design
JHU Carey Business School | 2026
Generative AI for Business — Week 4

Exercise 2: Tool-using agent

claude "Read tool_agent.py, explain it, then run it"
    ┌──────────────────────────────────────────────────┐
    │  Available tools:                                 │
    │                                                   │
    │  🔢 calculator(expression)     Evaluate math      │
    │  🔍 web_search(query)          Search the web     │
    │  📄 read_file(path)            Read a file        │
    │                                                   │
    │  Task: "Find AAPL's revenue, calculate YoY        │
    │   growth, and write a brief investment memo"       │
    └──────────────────────────────────────────────────┘

Your job: Add a new tool and test on multi-step tasks

JHU Carey Business School | 2026
Generative AI for Business — Week 4

Assignment 3 (due next week)

MS section:

  • HireMe BlueJay Coach: design doc for an agentic prototype
  • Submit: architecture diagram + initial implementation

MBA section:

  • HubSpot and Motion AI case
  • 1-page recommendation: which deployment model should HubSpot choose?

Graded on completion. Project plan due.

JHU Carey Business School | 2026
Generative AI for Business — Week 4

Next week preview

Week 5: Agentic AI in Action (II)

  • Multi-agent orchestration
  • MCP (Model Context Protocol)
  • Agent reliability and failure modes
  • AgentRx framework

Reading:

  • Anthropic MCP documentation
  • Barke et al., "AgentRx: Diagnosing AI Agent Failures" (2025)
JHU Carey Business School | 2026
Generative AI for Business — Week 4

Questions?

JHU Carey Business School | 2026

Today we cross the threshold from LLMs as text generators to LLMs as agents that take actions in the world. This is the biggest conceptual shift in the course. By the end of today, you'll have built an AI agent that retrieves documents and uses tools to answer questions.

Think of this as a spectrum, not categories. A chatbot waits for questions and answers them. A copilot works alongside you, suggesting and completing. An agent acts on its own — it can search, compute, write files, call APIs. Higher autonomy means higher value but also higher risk. Most enterprise deployments today are chatbots evolving into copilots. Agents are the frontier, and that's what we're building today.

Every agent, no matter how complex, follows this loop. Perceive the current state — what's the user asking, what information do I have? Plan — what steps should I take? Act — execute a tool call, search, or computation. Observe — what did I get back? Then loop: am I done, or do I need more steps? Claude Code itself follows this pattern. When you ask it to fix a bug, it reads files (perceive), decides what to change (plan), edits code (act), and checks if it worked (observe).

Three main architectures. ReAct (Reasoning + Acting) interleaves thinking and tool use — it's the most common and what Claude Code uses. Plan-and-Execute creates a full plan upfront, then executes step by step — better for well-defined multi-step tasks. Reflexion adds a self-critique step: try, fail, reflect on why, try again — useful when the task is hard and first attempts often fail. In practice, most agents use ReAct or a hybrid.

RAG solves three fundamental LLM limitations. First: LLMs don't know your private data — your financials, policies, internal docs. Second: LLMs' knowledge has a cutoff date — they don't know what happened last week. Third: without sources, LLMs hallucinate confidently. RAG fixes all three by retrieving relevant documents and putting them in context before the model generates a response. The model can now cite sources, stay grounded in facts, and access up-to-date information.

RAG has two phases. Indexing happens offline, once: you take your documents, split them into chunks, convert each chunk to a vector embedding, and store those vectors in a database. Retrieval happens online, per query: you embed the user's question, search for the most similar chunks, and feed those chunks to the LLM along with the question. The LLM generates an answer grounded in the retrieved documents.

Chunking is where most RAG pipelines fail or succeed. Too small: you lose context, the chunk doesn't contain enough information to answer the question. Too large: you waste context window space, the relevant info is diluted. The sweet spot is usually 500-1000 tokens with some overlap between chunks. But the best approach is document-aware chunking that respects the natural structure: split at section headers, keep tables together, respect paragraph boundaries.

Embeddings convert text to points in high-dimensional space. Similar meanings land near each other. When you search, you embed your query and find the closest document chunks. This is similarity search, and it's what makes RAG work. The quality of your embeddings directly determines the quality of your retrieval. Modern embedding models are remarkably good at capturing semantic similarity — "revenue growth" matches "sales increased" even though they share no words.

RAG evaluation has two independent dimensions. Retrieval quality: are you finding the right documents? Precision, recall, and mean reciprocal rank are standard metrics. Generation quality: given the right documents, does the model answer correctly? Faithfulness (does it stick to the sources?), completeness (does it use all relevant info?), and hallucination rate (does it make stuff up?). A RAG system can fail at either stage — bad retrieval or bad generation — so you need to evaluate both.

Students always ask: "Should I use RAG or fine-tuning?" The answer is usually both — they solve different problems. RAG gives the model access to knowledge. Fine-tuning changes the model's behavior and style. Long context (just paste everything in) is simplest but most expensive and only works for small document sets. In practice: fine-tune for behavior, RAG for knowledge, long context for quick prototypes.

Tool use is what turns an LLM from a text generator into an agent. The model doesn't actually call the API — it generates a structured request saying "I want to call this function with these arguments." Your code executes the function and returns the result. The model then uses that result to formulate its response. This is the same pattern behind Claude Code: when it edits a file, it's generating a tool call that the CLI executes.

The tool definition is crucial. The model decides whether to use a tool based on the description — so write it like you're explaining it to a new coworker. Include when to use it, what it does, and what the parameters mean. A bad description leads to the model either never using the tool or using it at the wrong time. Think of tool definitions as part of your context engineering.

Tools can be orchestrated in different patterns. Sequential: each tool uses the output of the previous one — search, then analyze, then summarize. Parallel: run multiple tools at once for speed — get stock price and news simultaneously. Conditional: the model decides which tool to use next based on what it learned — if the stock is down, search for news about why. Most real agents use a mix of these patterns. The model itself decides the orchestration based on the task.

Here's the reveal: Claude Code, the tool you've been using all semester, is itself an agent. It follows the exact perceive-plan-act-observe loop we just discussed. When you ask it to fix a bug, it reads files, plans changes, edits code, runs tests, and iterates. It uses tools: Read, Edit, Bash, Glob, Grep. It uses ReAct-style interleaved reasoning and action. Understanding how Claude Code works gives you a template for building your own agents.

This exercise walks you through building a complete RAG pipeline from scratch. You'll load documents, chunk them, embed them, store the vectors, retrieve relevant chunks for a query, and generate an answer. The key learning: how chunking strategy affects answer quality. Try different chunk sizes and see how the answers change. If your chunks are too small, the model lacks context. Too large, and retrieval precision drops.

Now you'll build a tool-using agent. The starter code gives you three tools: calculator, web search, and file reader. The model decides which tools to use and when. Your task is to add a new tool — maybe a database query, a date calculator, or a document summarizer — and test the agent on multi-step tasks that require multiple tool calls. Notice how the model chains tools together: search for data, calculate something, then write a summary.