Spending CPU Cycles So You Don't Spend Tokens: Building cxpak
Why I built a Rust CLI that packages codebases into token-budgeted briefings for LLMs — and what I learned about the real cost of context.
Every time an LLM explores your codebase, it burns 50–100k tokens just finding its bearings. Reading files, navigating directories, building a mental model. Navigation, not insight. By the time it understands your project, most of the budget is spent — and it still doesn’t have a global view. It made greedy local decisions about what to read next, and those decisions are different every time.
I got tired of paying for orientation. So I built cxpak — a Rust CLI that does the exploration offline, packages the result, and hands the LLM a briefing instead of a flashlight.
The economics
Tokens are expensive. CPU is cheap. Storage is practically free.
That’s the entire insight. If you can shift the exploration work from token-time to CPU-time, you save money and get a better result. The LLM stops groping through your codebase and starts with a structured overview of the whole thing — deterministic, reusable, and exactly within your token budget.
cxpak parses the entire codebase with tree-sitter, counts tokens with tiktoken (close enough to what the models use), and allocates a budget across seven sections by priority:
- Signatures (30%) — the public API surface. Irreplaceable. You can regenerate an implementation; you can’t guess a contract.
- Module map (20%) — every file, every public symbol. The table of contents.
- Key files (20%) — README, Cargo.toml, package.json. Verbatim. Can’t be summarised.
- Dependency graph (15%) — who imports whom. How modules talk to each other.
- Git context (10%) — recent commits, file churn, contributors. Nice-to-have, first to cut.
- Directory tree (5%) — structural, compresses well, low priority.
- Metadata (fixed ~500 tokens) — file count, language breakdown, token estimate. Always included.
When a section exceeds its budget, cxpak truncates line by line — not by removing entire functions, which would leave orphaned references, but by cutting at line boundaries and inserting a marker that tells the LLM exactly what’s missing and how to get more.
Pack mode
This is where it gets interesting. A small project might fit entirely within a 20k-token budget — one file, no truncation, done. But a real codebase? The one that prompted me to build cxpak had 498 files and roughly 646k tokens. That’s not fitting in any single prompt.
The naive approach: truncate everything heavily and hope for the best. The cxpak approach: pack mode.
Pack mode writes two things. First, an overview — always within your token budget, always one file, always a complete picture of what exists. Second, a .cxpak/ directory with full, untruncated detail files for each section. The overview points to them:
## Function / Type Signatures
<!-- full content: .cxpak/signatures.md (~39.4k tokens) -->
The LLM reads the overview in one prompt. It now knows the shape of the entire codebase — every module, every public function, every dependency. When it needs detail, it reads a specific .cxpak/ file. No blind exploration. No wasted tokens on files that turn out to be irrelevant.
Briefing packet, not a flashlight in a dark room.
Why tree-sitter
I could have used regex. It would have been faster to write and wrong in ways that matter.
Tree-sitter parses actual syntax trees. It knows the difference between a function called authenticate and a comment that mentions authentication. It extracts signatures without bodies, captures visibility modifiers, and handles the quirks of each supported language — from Rust’s trait bounds to Python’s decorators to C++‘s templates.
Each language implements a common trait. The Rust parser alone is 611 lines; the simplest (JavaScript) is 311. At launch, that was 3,300 lines of parser code across eight languages — it’s grown since. Not trivial — but the alternative is regex that breaks on edge cases and produces context the LLM can’t trust.
The parsers are compiled in via Cargo feature flags. If you only work with Python and TypeScript, you can slim the binary down. Same codebase, different capabilities. No runtime dependencies.
Design decisions
A few choices that shaped the tool:
Exact token counting. cxpak uses tiktoken-rs with the cl100k_base encoding — a widely adopted LLM tokeniser. Different models use slightly different tokenisers, so counts are approximate rather than exact, but they’re close enough to budget reliably. No heuristics, no “roughly 4 characters per token.”
Line-by-line truncation over semantic truncation. Removing entire functions to save tokens sounds clever until you realise it leaves dangling references and confuses the model about what exists. Line-by-line is predictable: you see the start of a list, then a marker saying where it was cut. The LLM can work with that.
Three-layer ignore rules. .gitignore is for git, not for LLM context — different goals. cxpak ships with 52 built-in patterns (node_modules, pycache, target/, media files), respects .gitignore on top, and supports an optional .cxpakignore for project-specific overrides. Zero configuration for the common case.
Omission markers that help. When content is truncated, the marker tells you what’s missing. In single-file mode, it suggests a larger budget: <!-- signatures omitted: ~39.4k tokens. Use --tokens 54k+ to include -->. In pack mode, it points straight to the detail file: <!-- full content: .cxpak/signatures.md (~39.4k tokens) -->. Either way, the LLM — or the human — knows exactly what to do.
The numbers
At first release: 5,400 lines of Rust across 31 files. Eight languages. 36 commits from first scaffold to working tool. About a week from idea to something I use daily.
I built it because I needed it. The multi-agent workflow I described in the previous post was burning through tokens on orientation — every agent started from scratch, exploring the same codebase, making the same greedy navigation decisions. cxpak gives each agent a briefing before it starts. The agents were powerful but blind. Now they have a map.