Yesterday I released Kodit 0.5. As these things often do, This release started off fairly benign. The original intention was to implement features that allowed Kodit to scale to index greater numbers of repositories. However, after attempting to tackle incremental indexing, I quickly realised that we should be mimicking the Git domain more than we currently were.
In code at 0.4 and before, everything was based upon a directory and files within that directory. But after considering that I wanted to index different versions of a repository (like a different tag or a different branch), that quickly became unsustainable. So I took the decision to migrate everything to a git-based domain model and take advantage of the structure.
Breaking Changes Ahoy
Because I’ve changed the domain model, it means that the database schema doesn’t really match. I made the decision to restructure the database, which means that any old data you have in there will get deleted.
I also took the opportunity to remove the auto-indexing command. That was introduced as a stopgap before we had API-based indexing. Since we have API indexing now, this was no longer used, so I removed it.
New Features
With that out of the way, we can now talk about some exciting new features:
The change to the Git domain model. This means that Kodit now has an internal representation of commits, tags, files, and everything else. This not only helps with incremental indexing, which means that you won’t have to reprocess commits. It also means that new commits where nothing much has changed will hardly require any processing at all. This also unlocks the next round of future enhancements we have planned.
Next on the list is LiteLLM integration. The reason for this was that I wanted to incorporate different providers for enrichment and embedding. The simplest way to do that was to use LiteLLM, which supports more than a hundred external embedding providers. I’ve tested it with Helix, Ollama, vLLM, Azure, and OpenAI, but it should work with any provider.
In order to handle increased demand, I’ve completely refactored the indexing pipeline. Now, we have a queue-based system that also has status endpoints so that you can review the status of an indexing operation without having to look at the logs so much.
And finally, there’s probably more to do here. There’s been a bit of refactoring and improvement for the database reads and writes. I found that once we had large numbers of commits, the database read performance was quite slow because of the inefficiency in the way that things were structured. This has improved things, but there’s still more to do.
What’s Next?
Now that we’ve got a good domain model, we have big plans for our next steps. First on the list is a wide range of new enrichments. These new enrichments are based around three key repository use cases: using, developing and reading.
Users of a repository need to know things like the public API and the examples that they can copy from. Developers of a repository need to know the system architecture, the database schema, the layers, and the ways of working. But the readers of a repository want to know the history, the status, a 10km view of the repository as a whole. I’m not entirely sure how this will be exposed to the MCP at this point in time, but I know that it is useful information.
The next step after that is to build a user interface to allow users to view all of this information in a pretty, user-friendly way. People shouldn’t have to browse the API docs to get access to this information.
And finally, it’s still on my mind that I want to index more things. I want to index documentation, I want to index API documentation, I want to index all the things. At the moment, Kodit still only indexes code. And I’m confident that there is more to do in the front-end world as well.
That’s all for now, but of course if you have any ideas or any requests for new features, then please visit the repository. https://github.com/helixml/kodit