Generative AI for Business — Week 3

Generative AI in Action (II)

Reasoning & Context Engineering

Week 3

JHU Carey Business School | 2026
Generative AI for Business — Week 3

Today's agenda

Time Topic
0:00–0:40 Reasoning in LLMs
0:40–1:20 Context engineering
1:20–1:35 Practical patterns & anti-patterns
1:35–1:50 Break
1:50–2:15 Hands-on: Reasoning comparison
2:15–2:55 Hands-on: Context engineering workshop
2:55–3:00 Wrap-up + Assignment 2
JHU Carey Business School | 2026
Generative AI for Business — Week 3

What do we mean by "reasoning"?

    PATTERN MATCHING                    REASONING
    ┌─────────────────────┐      ┌─────────────────────┐
    │                     │      │                     │
    │  "The capital of    │      │  "If a train leaves │
    │   France is Paris"  │      │   at 3pm going 60mph│
    │                     │      │   and another at     │
    │  Retrieval from     │      │   4pm going 80mph..."│
    │  training data      │      │                     │
    │                     │      │  Multi-step logic    │
    │  Most LLM tasks     │      │  Harder for LLMs    │
    └─────────────────────┘      └─────────────────────┘

LLMs are very good at pattern matching.
Reasoning is where things get interesting — and tricky.

JHU Carey Business School | 2026
Generative AI for Business — Week 3

Chain-of-thought: the breakthrough

    STANDARD PROMPTING                 CHAIN-OF-THOUGHT
    ┌──────────────────────┐     ┌──────────────────────────┐
    │ Q: Roger has 5 balls.│     │ Q: Roger has 5 balls.    │
    │ He buys 2 cans of 3. │     │ He buys 2 cans of 3.     │
    │ How many balls?      │     │ How many balls?           │
    │                      │     │                           │
    │ A: 11               │     │ A: Roger started with 5.  │
    │    ✗ WRONG           │     │ He bought 2 cans of 3     │
    │                      │     │ = 2 × 3 = 6 new balls.   │
    │                      │     │ Total = 5 + 6 = 11.      │
    │                      │     │    ✓ CORRECT              │
    └──────────────────────┘     └──────────────────────────┘

"Let's think step by step" — the most impactful prompt hack ever

JHU Carey Business School | 2026
Generative AI for Business — Week 3

When CoT helps (and when it doesn't)

    HELPS A LOT                          DOESN'T HELP MUCH
    ┌──────────────────────┐       ┌──────────────────────┐
    │                      │       │                      │
    │  ✓ Multi-step math   │       │  ✗ Simple facts      │
    │  ✓ Logic puzzles     │       │  ✗ Translation       │
    │  ✓ Code debugging    │       │  ✗ Summarization     │
    │  ✓ Business analysis │       │  ✗ Creative writing  │
    │  ✓ Planning          │       │  ✗ Classification    │
    │  ✓ Causal reasoning  │       │                      │
    │                      │       │  (these are pattern   │
    │  (these need steps)  │       │   matching tasks)     │
    └──────────────────────┘       └──────────────────────┘

Rule of thumb: If a human would need scratch paper, use CoT

JHU Carey Business School | 2026
Generative AI for Business — Week 3

The reasoning model landscape

    ┌───────────────────────────────────────────────────────┐
    │                REASONING APPROACHES                    │
    ├───────────────────┬───────────────────────────────────┤
    │                   │                                    │
    │  PROMPT-TIME      │  TRAIN-TIME                       │
    │  (you add CoT)    │  (model trained to reason)        │
    │                   │                                    │
    │  • "Think step    │  • OpenAI o1/o3                   │
    │    by step"       │  • DeepSeek-R1                    │
    │  • Works with     │  • Claude extended thinking        │
    │    any model      │                                    │
    │  • You control it │  • Model decides when/how         │
    │                   │    to reason                       │
    │  • Free           │  • Costs more tokens              │
    │                   │  • Often much better               │
    └───────────────────┴───────────────────────────────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 3

How reasoning models work

    STANDARD MODEL
    ┌────────┐     ┌──────────┐
    │ Prompt │ ──→ │ Response │
    └────────┘     └──────────┘
         Fast, cheap, good for most tasks

    REASONING MODEL
    ┌────────┐     ┌──────────────────────────┐     ┌──────────┐
    │ Prompt │ ──→ │   Internal thinking...   │ ──→ │ Response │
    │        │     │   Step 1: Consider...    │     │          │
    │        │     │   Step 2: But wait...    │     │ (higher  │
    │        │     │   Step 3: Actually...    │     │  quality)│
    │        │     │   Step 4: Therefore...   │     │          │
    └────────┘     └──────────────────────────┘     └──────────┘
                    Slower, more expensive, but
                    much better on hard problems
                    (you may or may not see this)
JHU Carey Business School | 2026
Generative AI for Business — Week 3

When to use reasoning models

    ┌──────────────────────────────────────────────────────┐
    │                TASK DIFFICULTY                         │
    │                                                       │
    │  HARD  │  Reasoning model    │  Reasoning model       │
    │        │  (worth the cost)   │  (essential)           │
    │        │                     │                        │
    │  ──────┼─────────────────────┼─────────────────────── │
    │        │                     │                        │
    │  EASY  │  Standard model     │  Standard model        │
    │        │  (don't overthink)  │  + good prompt         │
    │        │                     │                        │
    │        └─────────────────────┴─────────────────────── │
    │          LOW                    HIGH                   │
    │               ACCURACY REQUIREMENT                    │
    └──────────────────────────────────────────────────────┘

Don't use a reasoning model to summarize an email

JHU Carey Business School | 2026
Generative AI for Business — Week 3

Context Engineering

The art and science of what you put in the prompt

JHU Carey Business School | 2026
Generative AI for Business — Week 3

Context engineering: why it matters

    SAME MODEL, DIFFERENT CONTEXT → DIFFERENT RESULTS

    ┌──────────────────┐          ┌──────────────────┐
    │ "Write a report  │          │ "You are a senior│
    │  about Q3 sales" │          │  analyst at [Co].│
    │                  │          │  The CEO needs a │
    │  → Generic,      │          │  Q3 sales report │
    │    vague,        │          │  for the board.  │
    │    unhelpful     │          │  Focus on: ...   │
    │                  │          │  Format: ...     │
    │                  │          │  Tone: ...       │
    │                  │          │  Here's the data:│
    │                  │          │  [attached]       │
    │                  │          │                  │
    │                  │          │  → Specific,     │
    │                  │          │    actionable,   │
    │                  │          │    useful        │
    └──────────────────┘          └──────────────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 3

The context window

    ┌──────────────────────────────────────────────────────┐
    │                 CONTEXT WINDOW                        │
    │              (what the model "sees")                  │
    │                                                       │
    │  ┌────────────┐ ┌──────────┐ ┌─────────────────────┐│
    │  │  System    │ │ Few-shot │ │   User message      ││
    │  │  prompt    │ │ examples │ │   + retrieved docs   ││
    │  │            │ │          │ │   + conversation     ││
    │  │  ~500      │ │  ~1000   │ │    history           ││
    │  │  tokens    │ │  tokens  │ │   ~variable          ││
    │  └────────────┘ └──────────┘ └─────────────────────┘│
    │                                                       │
    │  ◄─────────── 200K tokens (Claude) ──────────────►   │
    │                                                       │
    │  Every token costs money. Fill wisely.                │
    └──────────────────────────────────────────────────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 3

The context engineering stack

    ┌─────────────────────────────────────────────────┐
    │  5. TOOL RESULTS                                │
    │     Calculator outputs, API responses, search   │
    ├─────────────────────────────────────────────────┤
    │  4. CONVERSATION HISTORY                        │
    │     Prior turns, memory, state                  │
    ├─────────────────────────────────────────────────┤
    │  3. RETRIEVED CONTEXT (RAG)                     │
    │     Relevant docs, data, knowledge base         │
    ├─────────────────────────────────────────────────┤
    │  2. FEW-SHOT EXAMPLES                           │
    │     Input/output pairs showing desired behavior │
    ├─────────────────────────────────────────────────┤
    │  1. SYSTEM PROMPT                               │
    │     Role, instructions, constraints, format     │
    └─────────────────────────────────────────────────┘

    Each layer adds more context and more control
JHU Carey Business School | 2026
Generative AI for Business — Week 3

System prompts: the foundation

    WEAK SYSTEM PROMPT              STRONG SYSTEM PROMPT
    ┌──────────────────┐      ┌──────────────────────────┐
    │                  │      │ ROLE                      │
    │ "You are a       │      │ "You are a senior credit │
    │  helpful         │      │  analyst at JPMorgan..."  │
    │  assistant."     │      │                           │
    │                  │      │ TASK                      │
    │                  │      │ "Analyze loan apps using  │
    │                  │      │  the 5 C's framework..."  │
    │                  │      │                           │
    │                  │      │ CONSTRAINTS               │
    │                  │      │ "Never approve without    │
    │                  │      │  collateral docs..."      │
    │                  │      │                           │
    │                  │      │ FORMAT                    │
    │                  │      │ "Output as JSON with      │
    │                  │      │  decision, rationale..."  │
    └──────────────────┘      └──────────────────────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 3

Few-shot examples: show, don't tell

    ZERO-SHOT                       FEW-SHOT
    ┌───────────────────┐     ┌───────────────────────────┐
    │ "Classify this    │     │ Example 1:                │
    │  email as urgent  │     │ Input: "Server down!!"    │
    │  or not urgent"   │     │ Output: URGENT            │
    │                   │     │                           │
    │ Model guesses     │     │ Example 2:                │
    │ what you mean     │     │ Input: "Q3 report ready"  │
    │                   │     │ Output: NOT URGENT        │
    │                   │     │                           │
    │                   │     │ Now classify:             │
    │                   │     │ Input: "Client unhappy,   │
    │                   │     │  threatening to leave"    │
    │                   │     │                           │
    │                   │     │ Model follows the pattern │
    └───────────────────┘     └───────────────────────────┘

3-5 examples is usually the sweet spot

JHU Carey Business School | 2026
Generative AI for Business — Week 3

Prompt patterns that work

    ┌─────────────────┬──────────────────────────────────────┐
    │ PATTERN         │ HOW IT WORKS                         │
    ├─────────────────┼──────────────────────────────────────┤
    │                 │                                      │
    │ Persona         │ "You are a [specific expert]..."     │
    │                 │ Activates domain-relevant knowledge  │
    │                 │                                      │
    │ Chain-of-       │ "Think step by step before           │
    │ thought         │  answering..."                       │
    │                 │                                      │
    │ Self-critique   │ "After answering, check your         │
    │                 │  work for errors..."                 │
    │                 │                                      │
    │ Output format   │ "Respond as JSON / markdown table    │
    │                 │  / bullet points..."                 │
    │                 │                                      │
    │ Constraints     │ "If unsure, say 'I don't know'       │
    │                 │  rather than guessing..."            │
    └─────────────────┴──────────────────────────────────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 3

Anti-patterns to avoid

    ✗ VAGUE                         ✓ SPECIFIC
    "Make it better"                "Improve clarity by using
                                     shorter sentences and
                                     active voice"

    ✗ CONTRADICTORY                 ✓ CONSISTENT
    "Be creative but follow        "Prioritize accuracy. Within
     this template exactly"         the template, vary word
                                     choice and examples"

    ✗ OVERLOADED                    ✓ FOCUSED
    "Also do X, and Y, and Z,      "Do X. (Separate prompts
     and by the way also W..."      for Y and Z)"

    ✗ NEGATIVE ONLY                 ✓ POSITIVE GUIDANCE
    "Don't be boring, don't        "Write in a conversational
     be too formal, don't           tone with concrete examples
     use jargon"                    and clear language"
JHU Carey Business School | 2026
Generative AI for Business — Week 3

Context overflow: the hidden failure mode

    TOKEN 1          ◄── HIGH ATTENTION ──►         TOKEN N
    ┌────────────────────────────────────────────────────┐
    │████████████                              ██████████│
    │████████████         ░░░░░░░░░░           ██████████│
    │████████████         ░░░░░░░░░░           ██████████│
    │████████████         ░░░░░░░░░░           ██████████│
    └────────────────────────────────────────────────────┘
     ▲ Beginning                                  End ▲
     │ of context          "Lost in the           │
     │ (system prompt)      middle"               │
     │                                             │
     Most attended                            Most attended

Models attend most to the beginning and end of context

JHU Carey Business School | 2026
Generative AI for Business — Week 3

Testing prompts systematically

    ┌──────────────────────────────────────────────────────┐
    │               PROMPT TESTING LOOP                     │
    │                                                       │
    │   Draft prompt                                        │
    │       │                                               │
    │       ▼                                               │
    │   Run on 5-10 test cases                              │
    │       │                                               │
    │       ▼                                               │
    │   Evaluate outputs ◄─── Does it pass all cases?       │
    │       │                     │                         │
    │       │ NO                  │ YES                     │
    │       ▼                     ▼                         │
    │   Diagnose failure     Run on 20+ cases               │
    │       │                     │                         │
    │       ▼                     ▼                         │
    │   Modify prompt        Ship it                        │
    │       │                                               │
    │       └──── back to test ────┘                        │
    └──────────────────────────────────────────────────────┘
JHU Carey Business School | 2026
Generative AI for Business — Week 3

Break

15 minutes

JHU Carey Business School | 2026
Generative AI for Business — Week 3

Hands-on

Reasoning & Context Engineering

JHU Carey Business School | 2026
Generative AI for Business — Week 3

Exercise 1: Reasoning comparison

cd scripts/week3
claude "Read reasoning_test.py and explain it, then run it"

The experiment:

    Problem ──→ Standard prompt ──→ Answer + score
           ──→ "Think step by step" ──→ Answer + score
           ──→ Extended thinking ──→ Answer + score
  • Compare accuracy, token count, and cost
  • Try: logic puzzles, business cases, multi-step math

Discussion: When is the extra cost of reasoning worth it?

JHU Carey Business School | 2026
Generative AI for Business — Week 3

Exercise 2: Context engineering workshop

claude "Read context_engineer.py, explain it, then help me build a prompt"

Your task: design a context-engineered prompt

    ┌──────────────────────────────────────────────────┐
    │  1. Define the persona and role                   │
    │  2. Write clear task instructions                 │
    │  3. Add 3-5 few-shot examples                    │
    │  4. Add constraints and guardrails               │
    │  5. Test on 5+ edge cases                        │
    │  6. Iterate based on failures                     │
    └──────────────────────────────────────────────────┘

MBA: Financial analyst assistant for earnings reports
MS: Code review assistant for security vulnerabilities

JHU Carey Business School | 2026
Generative AI for Business — Week 3

Assignment 2 (due next week)

MS section:

  • AInxious case: advanced prompting exercise
  • Submit 4-page report with prompt documentation

MBA section:

  • AInxious case (same deliverable)
  • Plus LuxeCuts: use LLM reasoning to solve the scheduling problem
  • 1-page analysis

Rubric-graded (not just completion). Team formation due.

JHU Carey Business School | 2026
Generative AI for Business — Week 3

Next week preview

Week 4: Agentic AI in Action (I)

  • From chatbot to agent: the spectrum of autonomy
  • RAG: Retrieval-Augmented Generation
  • Tool use and function calling
  • Build your first agent

Reading:

  • Lewis et al., "RAG for Knowledge-Intensive NLP Tasks" (2020)
  • Anthropic tool use documentation
JHU Carey Business School | 2026
Generative AI for Business — Week 3

Questions?

JHU Carey Business School | 2026

Today we go deep on two things that separate amateur from expert use of LLMs: reasoning models and context engineering. If Week 1 was "how do these work" and Week 2 was "which one do I pick," this week is "how do I get the most out of them."

Let's be precise about what we mean. Most of what LLMs do is sophisticated pattern matching — recalling and recombining information from training. True reasoning — multi-step logic, planning, novel problem-solving — is much harder. The debate about whether LLMs truly "reason" or just simulate reasoning is ongoing, but practically speaking, we care about whether they get the right answer on complex tasks. That's what chain-of-thought and reasoning models improve.

This is from Wei et al. 2022, your reading for this week. The key finding: simply asking the model to show its work dramatically improves accuracy on reasoning tasks. GSM8K math benchmark went from ~18% to ~57% accuracy just by adding "Let's think step by step." Why does this work? By generating intermediate steps, the model can use its own output as working memory. Each step is a new pattern-matching opportunity. It's like giving yourself scratch paper on an exam.

CoT isn't universally helpful — it adds latency and cost. Use it when the task requires multiple steps of reasoning. Don't use it for simple retrieval or classification tasks. The scratch paper rule is a good heuristic: if a smart human would need to write things down to get the answer right, the model probably benefits from CoT too.

There are two approaches to reasoning. Prompt-time CoT: you tell the model to think step by step. Works with any model, free to use, but limited. Train-time reasoning: models like o1, DeepSeek-R1, and Claude's extended thinking are specifically trained to reason. They generate a chain of thought internally before producing an answer. These models are significantly better on hard problems — math, coding, complex analysis — but they cost more because they generate many more tokens internally.

Reasoning models insert a thinking phase between your prompt and the response. The model generates potentially thousands of tokens of internal deliberation — exploring approaches, catching its own mistakes, reconsidering. With Claude's extended thinking, you can actually see this reasoning. With o1, it's hidden. The cost implication: reasoning models can use 10-100x more tokens than standard models for the same query, but the quality improvement on hard tasks is dramatic.

Reasoning models are expensive — both in cost and latency. Use them judiciously. Hard task + high accuracy requirement = reasoning model. Everything else = standard model with good prompting. The most common mistake is using a reasoning model for tasks that don't need it. Summarization, translation, simple Q&A — these are pattern matching tasks where standard models excel.

Context engineering is the single most important skill for using LLMs effectively. The model is the same in both cases. The difference is entirely in what context you provide. This is why "prompt engineering" as a discipline exists — but I prefer "context engineering" because it's broader. It's not just the prompt text, it's everything you put in the context window: system prompt, examples, retrieved documents, conversation history, tool results.

The context window is the model's entire working memory. Everything the model knows about your task must fit in this window. Claude's context window is 200K tokens — about 150,000 words or 500 pages. That sounds huge, but in practice you need to be strategic about what goes in. Every token you add costs money and can dilute the signal. The skill is fitting maximum signal into minimum tokens.

Think of context engineering as a stack. At the bottom: the system prompt defines who the model is and how it should behave. Above that: few-shot examples show (not tell) the desired behavior. Then retrieved context from a knowledge base. Then conversation history for multi-turn interactions. And tool results for real-time data. Each layer gives you more control. A well-engineered context stack can make a standard model outperform a reasoning model with a bad prompt.

The system prompt is your most important lever. It should include four things: Role (who the model is), Task (what it's doing), Constraints (what it must NOT do), and Format (how to structure output). The difference between "helpful assistant" and a well-crafted system prompt is the difference between a general intern and a domain expert. Invest time here — it's the highest-ROI activity in any GenAI project.

Few-shot examples are the most reliable way to control model behavior. Instead of describing what you want, you show it. The model picks up on the pattern — format, tone, reasoning style, edge cases. Three to five examples usually suffices. Include at least one edge case or tricky example. The examples don't just teach the model what to output — they implicitly communicate your standards and expectations.

These five patterns are your core toolkit. Persona primes the model with relevant knowledge. CoT improves reasoning. Self-critique catches errors. Output format ensures usable responses. Constraints prevent common failure modes like hallucination. You can combine them — a strong prompt often uses all five.

Common mistakes. Vague instructions get vague outputs. Contradictory instructions confuse the model — it'll randomly prioritize one over the other. Overloaded prompts lead to partial completion. Negative-only instructions ("don't do X") are less effective than positive instructions ("do Y instead"). When in doubt: be specific, be consistent, and break complex tasks into separate prompts.

This is the "lost in the middle" phenomenon. When you stuff a lot of text into the context window, the model pays most attention to the beginning (system prompt) and end (most recent message). Information in the middle gets less attention. This has practical implications: put your most important instructions at the beginning and end. If you're doing RAG, put the most relevant documents near the end, close to the query. Don't rely on the model noticing a crucial instruction buried in page 50 of your context.

Prompt engineering is not a one-shot activity. It's iterative, just like software development. Draft a prompt, test it on diverse cases, evaluate the outputs, fix failures, repeat. The most common mistake is testing on one or two examples and declaring victory. You need at least 10-20 test cases covering normal inputs, edge cases, and adversarial inputs. This is exactly what we'll practice in the hands-on exercise.

You'll run the same hard problem through three approaches: direct prompting, CoT prompting, and extended thinking. Compare the answers, the number of tokens used, and the cost. The goal is to build intuition for when reasoning models earn their keep. Try at least three different problem types — you'll see that the benefit varies dramatically by task type.

This is the core exercise. You'll use the scaffold in context_engineer.py to systematically build and test a prompt. Start with the persona, add instructions, then examples, then constraints. Test it, find where it fails, fix it. Use Claude Code to help you — it's meta: using an AI to help you write better prompts for AI. Spend at least 20 minutes iterating. The first version of your prompt will not be good enough.