Context Management & Reliability — Claude Certification N...

Roughly 15% of the exam.

Prompt caching

One invariant explains everything: caching is a prefix match. The cache keys on the exact rendered bytes up to each breakpoint; a single changed byte at position N invalidates every breakpoint at or after N.

Render order is tools → system → messages — a breakpoint on the last system block caches tools and system together.
Maximum 4 cache_control breakpoints per request.
Economics (5-minute default TTL): writes ≈ 1.25× base input price (2× for the 1-hour TTL), reads ≈ 0.1×. Break-even after about two uses.
Verify with usage.cache_read_input_tokens. Zero across repeated identical requests means a silent invalidator: a timestamp or UUID interpolated into the system prompt, non-deterministic JSON key order, or a tool set that varies per request.
Tool definition changes and model switches invalidate everything (they render first / caches are model-scoped). Keep the system prompt frozen; inject dynamic context later in the messages.

Tokens & limits

Count tokens with the count_tokens endpoint using the same model you will run inference on. Never tiktoken — it's another vendor's tokenizer and undercounts for Claude.
stop_reason: "max_tokens" = your per-response output cap hit (raise it or stream). model_context_window_exceeded = the whole conversation no longer fits (compact or trim). Different problems, different fixes.
Large outputs require streaming in practice — buffered requests at high max_tokens hit HTTP timeouts.

Batch processing

The Message Batches API processes requests asynchronously at 50% of standard token prices — the default answer for any high-volume, latency-tolerant workload (nightly classification, bulk extraction). Most batches finish within an hour (24-hour max), all Messages API features work inside them, and results stay retrievable for 29 days.

Errors & retries

Retryable: 429 (rate limit — honor retry-after), 500 (internal), 529 (overloaded). Exponential backoff; official SDKs retry these automatically.
Not retryable as-is: 400 (malformed request), 401 (auth), 403 (permission), 404 (bad model ID or endpoint), 413 (too large). Fix the request instead.
Use the SDK's typed exception classes rather than string-matching error messages.

Compaction

Server-side compaction summarizes earlier history when a long conversation approaches the context limit. The trap the exam checks: you must append the full response.content to history every turn — compaction blocks are state the API needs back; extracting only the text silently loses them.

→ Drill this domain in practice mode

Independent community study resource — not affiliated with or endorsed by Anthropic. Claude is a trademark of Anthropic, PBC. All study material and practice questions here are original, written from Anthropic's public documentation. Everything runs in your browser; nothing you answer is stored or transmitted.