Helix Kodit: Open Source MCP Server to Index External Repositories

Early adopter edition of Helix Kodit - get the best out of your AI coding assistant

Jun 09, 2025

AI coding assistants have emerged as one of the best use cases for generative AI. VC’s and companies are investing billions in developing this use case because the value proposition is clear. They help you develop faster.

The Problem With AI Coding Assistants

And if you’re anything like me, you’ve been using AI coding assistants to speed up your development. You’ve probably had a lot of success but there are still many areas for improvement.

One key problem that hinders me most is that AI models have a large data blind spot. Some of these are quite obvious.

Foundation models have been trained on data up to a certain point in time. Whenever you use a new model, it’s likely that the data cutoff was, at best, approximately one year before. This means that the model is incapable of accurately predicting code for new versions of a language or library.

The next problem is that the capability of a model is directly related how much relevant data was included in the training data. Esoteric libraries, with few examples, despite being old, might be so under-sampled that again, the model can’t infer what the code should look like.

The third, and possibly most important example is when codebases are private. Private codebases are restricted and it’s unlikely (not impossible!) that this data has made it into the model’s training data. This means that the model is again incapable of generating code directly related to your private, enterprise code.

There’s more situations where models perform poorly due to lack of awareness, but these are the main three. So what’s the best way of overcoming this?

Can RAG Help?

One pattern that has proven itself is retrieval. Retrieval augmented generation (RAG) incorporates extra context like examples, documentation, data, and anything else related to the problem at hand. This provides the model with extra information with which to make a prediction. The results are often much better.

This lead me to an idea to build a tool that allows you to “include” codebases and related information that help overcome the previous three issues. You can include codebases for new libraries, codebases with accepted enterprise patterns, and in the future, much much more.

You ingest codebases and pass relevant information to the AI assistant to help it write better code.

Introducing Kodit - Early Adopter Edition

I’m pleased to announce the early adopter edition of Kodit. Kodit is an MCP server that indexes codebases and offers relevant snippets of code to your coding assistant.

I chose to expose Kodit as an MCP server because the vast majority of coding assistants can now integrate with tools via MCP. So all you need to do is index your codebases, connect it to your coding assistant, and let your assistant query Kodit for relevant examples.

In this early adopter version, you can index local and remote codebases, search using keyword and semantic search, and scale by using an external database and AI providers.

I’ve focused on trying to provide a strong local experience, so out of the box it will use a local database and local models. Performance won’t be great, but you won’t need to add any API keys. For more advanced, daily enterprise users, you can start Kodit as a container, use specialised search-optimised databases, and external (or on premise!) AI providers. Learn how to do this in the reference documentation.

In my experience, I’ve had much better results on a variety of tasks when using Kodit. But I’m launching this early adopter edition to gather feedback from your experience with Kodit.

This early feedback will help ensure that the roadmap represents ideas that really help you. So please help by trying Kodit, giving feedback on what does and doesn’t work, and what you’d like to see moving forward.

More information:

HelixML

Discussion about this post

Ready for more?