Diff: Code Review Context Your LLM Actually Needs
Pasting a raw git diff into your LLM is asking for a review with half the files missing. cxpak diff adds callers, callees, and type signatures — within a token budget.
You paste a diff into your LLM. It spots a renamed parameter. It suggests updating the three callers it can see. You apply the fix. Your build breaks — because there were five callers, and the model never saw the other two.
The problem isn’t the model. It’s what you fed it.
The context gap
A raw git diff tells you what changed. It doesn’t tell you what depends on what changed. The model sees the modified function but not the callers. It sees the new type but not the module that imports it. It’s reviewing code through a keyhole.
The instinct is to widen the keyhole — paste in more files. But you’re guessing which files matter, and every irrelevant file burns tokens that could carry actual context. A mid-sized PR touching six files might implicate thirty more through import chains. You’re not going to find them all by hand, and the model’s context window can’t hold them all raw.
What you actually need is the diff, plus the signatures of everything that touches the changed code, within a token budget.
What cxpak diff does
cxpak diff --tokens 50k .
That command does three things.
Extracts git changes. Using libgit2, it diffs your current state — working tree and staged changes — against HEAD. If you pass --git-ref, it diffs the named ref’s committed tree against HEAD’s committed tree instead, ignoring the working tree entirely. Either way, the output captures the added, removed, and context lines from each changed file.
Walks the dependency graph. For every changed file, cxpak identifies the direct dependencies (files it imports) and direct dependents (files that import it). This is the same graph walk that powers trace — tree-sitter parses the codebase, builds an import graph, and walks one hop in both directions from the changed files.
Budgets the output. The diff gets first priority on the token budget. The remaining budget goes to context: public function and type signatures from every file in the dependency neighbourhood. If the budget runs tight, context signatures get truncated before diff content does. The model sees what changed first; it sees as much surrounding context as the budget allows.
The result is a single document — diff plus dependency context — that fits in one prompt.
Reviewing against a branch
The default compares your current state — staged and unstaged changes — against HEAD. Useful for checking work in progress. For PR review, compare against the base branch:
cxpak diff --tokens 50k --git-ref main .
This diffs main’s committed tree against HEAD’s committed tree — equivalent to git diff main HEAD. You get every difference between the two branches, plus the dependency context for all of it. Hand that to your model and ask it to review.
For highly interconnected codebases where one-hop misses important transitive dependencies, --all does a full breadth-first traversal from the changed files:
cxpak diff --tokens 50k --all .
Start with the default. Widen if the model asks about code it can’t see.
What the model sees
What the model receives looks like this:
## Project Metadata
- **Ref:** `main`
- **Changed files:** 3
- **Context files:** 8
## Key Files
### src/parser/mod.rs
```diff
- fn parse_file(path: &str) -> ParseResult {
+ fn parse_file(path: &str, lang: Language) -> ParseResult {
Function / Type Signatures
src/commands/overview.rs
pub fn run_overview(args: &OverviewArgs) -> Result<()>
src/commands/trace.rs
pub fn run_trace(args: &TraceArgs) -> Result<()>
The model can see the change, see who calls the changed function, and check whether the callers handle the new parameter. No guessing. No keyhole.
## The toolkit
This is the third command in a pattern. [`overview`](/blog/spending-cpu-cycles-so-you-dont-spend-tokens) gives your model a map of the entire codebase — use it to start a conversation. [`trace`](/blog/trace-finding-code-that-matters) gives it a scalpel — use it to debug a specific function or error. `diff` gives it a review brief — use it when code has changed and you need the model to understand the impact.
```bash
# Onboarding: the lay of the land
cxpak overview --tokens 80k .
# Debugging: targeted dependency context
cxpak trace --tokens 50k "handle_request" .
# Review: what changed and what it touches
cxpak diff --tokens 50k --git-ref main .
Three commands. Map, scalpel, review brief. Each spends CPU cycles so you don’t spend tokens on orientation.
Trade-offs
The dependency graph is built from static imports. Dynamic dispatch, reflection, and plugin loading won’t appear — same limitation as trace, same honest reason: better to miss a runtime-only dependency than to hallucinate one.
One-hop context covers most cases. Full BFS with --all can pull in half the codebase on a tightly coupled project — which defeats the purpose of budgeting. Start narrow.
Large diffs can squeeze out context signatures. If your diff alone exceeds the token budget, you won’t get dependency context at all. The fix is either a larger budget or a smaller PR — and the latter is better advice regardless.
Give your model the diff and the dependency context. It’ll stop guessing about the callers it can’t see.