I benchmarked two approaches to code indexing for Kodit (which powers Helix Code Intelligence). The smarter one lost.

Mar 17, 2026

Read the full Post on the Helix blog: https://helix.ml/blog/chunking-beats-slicing

I benchmarked two approaches to code indexing for Kodit (which powers Helix Code Intelligence). The smarter one lost. The "smarter" approach was program slicing — using a syntax tree to extract self-contained, structurally coherent code snippets rather than just cutting text into chunks. The theory was solid: slices capture real code structure, preserve function boundaries, include relevant dependencies. A basic RAG chunk might split straight through a critical function definition. I ran both against SWE-Bench Verified using mini-SWE-agent. Three conditions: a clean baseline (no Kodit), Kodit with slicing, Kodit with chunking.

----------------------------------------------------------------------
Metric                 Baseline    Kodit Pre 1.0    Kodit Post 1.0
----------------------------------------------------------------------
Instances evaluated          25               25                25
Resolved (passed)            12               11                15
Resolve rate                 48%              46%               60%

Chunking won by 14 points. Slicing came in *below* the baseline — it wasn't just not helping, it was actively getting in the way. Why? It comes down to how LLMs are actually trained. They're optimised to read files and write files. Program slices aren't files — they're synthetic constructs that don't map onto how the model processes information. Handing an LLM a syntax tree is like handing someone a book's index and expecting a book report. There's more to it than that, including caveats on sample size, what this means for Kodit's architecture going forward, and what the full 500-instance SWE-Bench run might show.
Full post on the Helix blog: https://helix.ml/blog/chunking-beats-slicing

HelixML

Discussion about this post

Ready for more?