Kodit 0.3: 10x Faster Indexing and Enterprise-Grade New Features
Information about Kodit's latest 0.3 release
Kodit, the MCP server that keeps your most important codebases searchable, has just reached its biggest milestone yet. Thanks to community feedback I’ve dramatically improved indexing throughput and delivered a raft of enterprise-focused enhancements.
10× faster indexing: smarter batching + streaming generators
Private Azure DevOps support: zero-config, secrets scrubbed
Pre-filter searches: by language, author, timestamp or repo
Auto-indexing: via environment variables (AI GitOps!)
Slick CLI progress bars: for instant feedback
Read on for the details or skip to the Quick Start and give it a spin.
Improving Performance
This version delivers a major throughput improvement to the indexing process. I started with a GitHub issue that rightly suggested that the indexing UX was poor. So I began by converting all heavy i/o loops to generators to reduce RAM usage by streaming results back to whatever needed it.
On the way I found a massive issue with the way I was batching data for embedding. Batching is required because most embedding APIs (local and remote) support sending batches of embeddings to an endpoint out of the box. But they only support this up to a point. OpenAI, for example, only supports batches of up to 8192 tokens, otherwise you get a HTTP 400 error. That means you need to a) calculate the number of tokens in your data, and b) only batch them up to a point where they fit.
What’s worse, sometimes people like to write massive functions, which means that I have seen snippets longer than 8192, in which case you need to truncate. But because tokens != words, you need to use the tokeniser to figure out where you need to truncate.
I found, however, that I had a while loop iteratively trying to reduce the character count and recalculating the number of tokens that took every character. Stupid, I know. I replaced this with a version that used the raw token array to truncate data in one go. This alone provided a 10x improvement.
After that, and after a brief quest to make the codebase more domain driven, I then implemented an observer pattern to have callbacks to the CLI code to display nice progress bars for all operations. UX win for everyone!
Indexing will still crawl if you try to index large repositories on your laptop using local models. Use an external AI provider like OpenAI or Helix.ML to make it really snappy!
Indexing Private Repositories
I had an important enterprise request to be able to index private Azure DevOps repositories. Thankfully it turned out that the Git URI schema happily accepted personal access tokens and Azure DevOps repositories. The only thing I needed to do was sanitise the URI so that secrets didn’t end up in the database or the logs.
Check out the documentation for more details.
Filtering Searches By X
Another enterprise feature that is also useful to power users, is the ability to pre-filter search results in the MCP or CLI interfaces. Previously, if you had a large number of repositories, it was hard for the agent to find canonical results. There’s a variety of reasons for this, but the main one is that much of the index isn’t relevant to the user’s current workspace. For example, it’s quite likely that the user doesn’t need Java snippets when they are writing a Python application.
So Kodit 0.3 introduces filters that allow you to restrict the search to source, language, author, or timestamp. Of course in most usage, it’s the AI agent that makes this decision, but you can influence what filters it predicts with good prompting.
Auto-Indexing
Aside from improving the deployment documentation, we also had an enterprise request to make it possible to index via configuration; AI GitOps, if you will. I achieved this by exposing some new environmental variables that allow you to specify what gets indexed at configuration time. I call this “auto-indexing.”
In the future I envisage that I might get requests for the ability to specify configuration options per index or even provide an external API to update the index remotely. If you’re interested in any of this, please raise a feature request.
What’s Next?
I have lots more planned for the next milestone. Although I’d love to hear your thoughts. If you have a great idea don’t keep it to yourself. Let me know! I’d love to include it in a future milestone. The next milestone will include the following major features:
better CLI tools to manage indexes
ability to keep indexes synchronised with their source
full MCP protocol coverage to make it easier to use and install (especially streaming HTTP, to get the OAuth support)
a Helix hosted SaaS version of Kodit to make it even easier to get started and open the door to federated indexing
Try Kodit Now
Now’s your chance to try Kodit if you haven’t tried it yet. I think it’s fast becoming the way to ensure your AI coding assistant has the context it needs to work with obscure libraries, private enterprise repositories, or even when you’re working within a microservices architecture!
Try it now and let me know how it goes!