<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[HelixML]]></title><description><![CDATA[Helix brings the best of open source AI to your business]]></description><link>https://blog.helix.ml</link><image><url>https://substackcdn.com/image/fetch/$s_!uVK-!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6ac6823-53fa-4485-b35d-65c2770f5cb8_1280x1280.png</url><title>HelixML</title><link>https://blog.helix.ml</link></image><generator>Substack</generator><lastBuildDate>Tue, 14 Apr 2026 19:59:31 GMT</lastBuildDate><atom:link href="https://blog.helix.ml/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Luke Marsden]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[helixml@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[helixml@substack.com]]></itunes:email><itunes:name><![CDATA[Luke Marsden]]></itunes:name></itunes:owner><itunes:author><![CDATA[Luke Marsden]]></itunes:author><googleplay:owner><![CDATA[helixml@substack.com]]></googleplay:owner><googleplay:email><![CDATA[helixml@substack.com]]></googleplay:email><googleplay:author><![CDATA[Luke Marsden]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[I benchmarked two approaches to code indexing for Kodit (which powers Helix Code Intelligence). The smarter one lost.]]></title><description><![CDATA[Read the full Post on the Helix blog: https://helix.ml/blog/chunking-beats-slicing]]></description><link>https://blog.helix.ml/p/i-benchmarked-two-approaches-to-code</link><guid isPermaLink="false">https://blog.helix.ml/p/i-benchmarked-two-approaches-to-code</guid><dc:creator><![CDATA[Phil Winder]]></dc:creator><pubDate>Tue, 17 Mar 2026 17:26:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uVK-!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6ac6823-53fa-4485-b35d-65c2770f5cb8_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Read the full Post on the Helix blog: <a href="https://helix.ml/blog/chunking-beats-slicing">https://helix.ml/blog/chunking-beats-slicing</a></p><p>I benchmarked two approaches to code indexing for Kodit (which powers Helix Code Intelligence). The smarter one lost. The "smarter" approach was program slicing &#8212; using a syntax tree to extract self-contained, structurally coherent code snippets rather than just cutting text into chunks. The theory was solid: slices capture real code structure, preserve function boundaries, include relevant dependencies. A basic RAG chunk might split straight through a critical function definition. I ran both against SWE-Bench Verified using mini-SWE-agent. Three conditions: a clean baseline (no Kodit), Kodit with slicing, Kodit with chunking.</p><pre><code>----------------------------------------------------------------------
Metric                 Baseline    Kodit Pre 1.0    Kodit Post 1.0
----------------------------------------------------------------------
Instances evaluated          25               25                25
Resolved (passed)            12               11                15
Resolve rate                 48%              46%               60% </code></pre><p>Chunking won by 14 points. Slicing came in *below* the baseline &#8212; it wasn't just not helping, it was actively getting in the way. Why? It comes down to how LLMs are actually trained. They're optimised to read files and write files. Program slices aren't files &#8212; they're synthetic constructs that don't map onto how the model processes information. Handing an LLM a syntax tree is like handing someone a book's index and expecting a book report. There's more to it than that, including caveats on sample size, what this means for Kodit's architecture going forward, and what the full 500-instance SWE-Bench run might show. <br>Full post <a href="https://helix.ml/blog/chunking-beats-slicing">on the Helix blog: https://helix.ml/blog/chunking-beats-slicing</a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Why Benchmarking AI Code Tools Is Harder Than You Think]]></title><description><![CDATA[Standard AI benchmarks are not fit for purpose. Here's what you need to know.]]></description><link>https://blog.helix.ml/p/why-benchmarking-ai-code-tools-is</link><guid isPermaLink="false">https://blog.helix.ml/p/why-benchmarking-ai-code-tools-is</guid><dc:creator><![CDATA[Phil Winder]]></dc:creator><pubDate>Thu, 05 Mar 2026 14:02:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7ecm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61104392-847e-41fc-9cd0-226da06d1d66_2872x1628.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>AI coding assistants are everywhere. Claude Code, Codex, Cursor, etc. Everyone wants to know which is &#8220;best&#8221;. You&#8217;ll find an infinite array of opinions and a thousand AI-generated &#8220;hot takes&#8221; that are neither hot and only take (the piss).</p><p>The natural instinct is to look at leaderboards. Some poor soul, somewhere, has taken the time to attempt to robustly benchmark these tools. I sincerely thank them for the effort because I appreciate how hard it is to do this well.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7ecm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61104392-847e-41fc-9cd0-226da06d1d66_2872x1628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7ecm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61104392-847e-41fc-9cd0-226da06d1d66_2872x1628.png 424w, https://substackcdn.com/image/fetch/$s_!7ecm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61104392-847e-41fc-9cd0-226da06d1d66_2872x1628.png 848w, https://substackcdn.com/image/fetch/$s_!7ecm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61104392-847e-41fc-9cd0-226da06d1d66_2872x1628.png 1272w, https://substackcdn.com/image/fetch/$s_!7ecm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61104392-847e-41fc-9cd0-226da06d1d66_2872x1628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7ecm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61104392-847e-41fc-9cd0-226da06d1d66_2872x1628.png" width="1456" height="825" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61104392-847e-41fc-9cd0-226da06d1d66_2872x1628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:825,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:362673,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/189260290?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61104392-847e-41fc-9cd0-226da06d1d66_2872x1628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7ecm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61104392-847e-41fc-9cd0-226da06d1d66_2872x1628.png 424w, https://substackcdn.com/image/fetch/$s_!7ecm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61104392-847e-41fc-9cd0-226da06d1d66_2872x1628.png 848w, https://substackcdn.com/image/fetch/$s_!7ecm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61104392-847e-41fc-9cd0-226da06d1d66_2872x1628.png 1272w, https://substackcdn.com/image/fetch/$s_!7ecm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61104392-847e-41fc-9cd0-226da06d1d66_2872x1628.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2><strong>The Problem With Traditional Benchmarks</strong></h2><p>In general, the benchmarks are created under two high-level remits. The first is an academic exercise to find the state of the art. Academics then use these results to guide their research. This is a good thing and they should continue to do that. But these benchmarks are not representative of real-world scenarios. The second is a task-specific exercise where model or algorithm developers attempt to produce directional metrics that correlate with downstream performance. Again, as a long-term ML and AI practitioner, I appreciate the need to simplify problems to metrics that we can directly optimise for. But, again, these benchmarks are not representative of real-world scenarios.</p><p>All RAG benchmarks are based upon one-shot retrieval tasks, evaluated in isolation for retrieval accuracy. Nearly all coding benchmarks are based upon one-shot patch generation tasks, evaluated in isolation for patch correctness.</p><p>This isn&#8217;t how AI coding agents actually work.</p><p>Coding assistants work through trial and error. Much like in reinforcement learning, they explore their environment and anticipate the goals of the developer. They often make mistakes. These are sometimes caught by automated analysis (e.g. linting, tests, etc.). Sometimes they are not and need to be manually corrected. We can include base knowledge (e.g. CLAUDE.md) or external knowledge (e.g. a web search) to help the agent. All of these permutations aren&#8217;t tested, all of the time, by any of the benchmarks. This is a problem.</p><h2><strong>Modern Coding Benchmarks</strong></h2><p>HumanEval and SWE-bench are the two most popular coding benchmarks that are touted by every vendor.</p><p><a href="https://github.com/openai/human-eval">HumanEval</a> is probably the worst. Created by OpenAI in 2021, it consists of a function signature and a docstring describing what the function should do. It also contains a hidden set of unit tests that evaluate the correctness of the function. Ignoring the fact that these examples are now in every model&#8217;s training data, the main issue is that it&#8217;s a one-shot generation test. It&#8217;s the same as any QA challenge originally conceived <a href="https://aclanthology.org/P02-1040.pdf">way back in 2002</a>.</p><p>Aside: this led to the best-named metric on the market, <a href="https://github.com/mjpost/sacrebleu">SacreBLEU</a>, which is independent of tokenisation.</p><p>In 2023, Princeton researchers (subsequently OpenAI) released <a href="https://github.com/swe-bench/SWE-bench">SWE-bench</a>. It represented an important step up from HumanEval by drawing real-life examples from real pull requests. Each instance is codified as the commit just prior to the fix of the issue. The agent is given the issue description and access to the repository at that point in time. They have test cases again to test the correctness of the patch. For reference, initial basic one-shot RAG approaches achieved just 2% success. (Granted, this was Claude 2 and BM25 at the time...)</p><p>You&#8217;d think that would be the end of the story, because this almost represents what agents are doing in real life. But no.</p><p>The first problem is that OpenAI found that a whopping <a href="https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/">59% of tests in a sample have &#8220;flawed&#8221; test cases</a> that reject functionally correct patches. They also note that more recent models have (in)advertently learned to overfit the benchmarks, predicting the correct patch irrespective of the prompt, akin to <a href="https://www.bbc.com/news/business-34324772">Volkswagen changing emission profiles when it detected it was being tested</a>.</p><blockquote><p>Given a short snippet from the task description, GPT&#8209;5.2 outputs the exact gold patch. In particular, it knows the exact class and method name, and the new early return condition <code>if username is None or password is None</code> that is introduced.</p></blockquote><p>The second problem, and this is less relevant to model developers like OpenAI, is that Kodit allows the coding assistant to search for relevant work from other external resources and codebases. Kodit is not restricted to <em>only</em> searching the codebase under test. It can learn from others. This is a critical advancement in the enterprise domain where developers are often working across multiple codebases at the same time. An authentication implementation in one repo is likely very useful for another.</p><p>A final problem I have with nearly all benchmarks is that they are self-contained. In my experience, most coding tasks involve another library, framework, or system. None of these benchmarks ever say &#8220;add a new table to my SQLAlchemy application&#8221;, or &#8220;update the frontend to show the information in the new API&#8221;. They&#8217;re always leet-code style &#8220;implement quicksort&#8221; tasks; self-contained, using the base language only. And they&#8217;re often only in Python!</p><h2><strong>Why Kodit is Hard to Benchmark</strong></h2><p>Kodit is a multi-turn, multi-tool, multi-context assistant to a coding assistant. It&#8217;s hard enough to say, let alone benchmark! Kodit indexes external codebases to provide relevant context to any coding task. Exposed as an MCP server, it can be used by any coding assistant that supports the MCP protocol. In addition, it generates enrichments meant more for human consumption to help explain the inner workings of a codebase.</p><p>Given this flexibility, traditional information retrieval metrics don&#8217;t capture whether the context actually improved the solution. Success is measured downstream of Kodit, at the end of the coding task. So the question isn&#8217;t &#8220;did it find the right snippet&#8221; but instead should be &#8220;did this snippet lead to better code.&#8221; This means you need an end-to-end evaluation, more like SWE-bench, but with global context.</p><p>The next problem is one I&#8217;ve observed. I have seen situations where I know a quick Kodit lookup would help the assistant, but the coding assistant decided not to. It chose to search the web instead. Or worse, it just started writing code. In most cases I have to hack around this by telling the agent, in no uncertain terms, to use Kodit. Threats work well. But it&#8217;s tedious. Equally, I&#8217;ve seen coding assistants search for the wrong thing and go down a wasteful path.</p><p>So in the end, the &#8220;performance&#8221; of Kodit is often less about what it is able to do, but more about how well the agent can use it.</p><p>This realisation has led me to an important conclusion that I need to make Kodit simpler, more focussed, less smart. I am now actively working on simplifying the MCP interface and the internal search implementation.</p><h2><strong>What Does a Good Benchmark Look Like?</strong></h2><p>I am using SWE-bench verified to test and evaluate Kodit. Using the canonical SWE-bench coding agent, <a href="https://github.com/SWE-agent/mini-swe-agent">mini-swe-agent</a>, I created a wrapper that adds Kodit as an attached MCP server and compared it against an agent without Kodit. And a script that indexes the commit under test (so the agent can&#8217;t just search for the correct answer in a subsequent commit). And it works; I&#8217;ll leave the actual metrics for another day. But it&#8217;s more like an end-to-end test than an evaluation. The agent can&#8217;t take advantage of Kodit&#8217;s key selling point: leveraging information from other codebases.</p><p>If anyone fancies a bit of light torture and wants to implement a benchmark themselves, then a good one would look like this:</p><ul><li><p>End-to-end measurement of final code quality. Both functionally and non-functionally.</p></li><li><p>Multi-turn aware. Captures and evaluates the full agent trajectory, not just the final patch.</p></li><li><p>Able to compare with and without external context augmentation.</p></li><li><p>Accounts for cost or the number of tokens used.</p></li><li><p>Has realistic challenges. Not just bug fixes, but new features, framework and language migrations, version upgrades, integration with external systems, usage of popular external libraries, etc.</p></li><li><p>All the languages, not just Python!</p></li><li><p>Resistant to contamination. Uses private or freshly-created repos the model hasn&#8217;t seen.</p></li></ul><h2><strong>Why Now</strong></h2><p>AI coding tools have now moved on from auto-complete. We seem to have skipped merrily through auto-assist and are already smack bang in the middle of auto-management. But we have no way to know how well these tools perform.</p><p>For Kodit, it&#8217;s hard for me to explain to my users by how much Kodit improves the coding assistant. Through experience I know it&#8217;s positive. Via demos I can see it working where it failed before. But it&#8217;s still incredibly hard to quantify.</p><p>But I&#8217;m actively working on this. Future posts will share more concrete results and learnings. For now, the main point is: be wary of the leaderboards and the opinions.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[What a control room for AI coding agents actually looks like]]></title><description><![CDATA[Most teams run one AI coding agent at a time on a developer&#8217;s laptop. Helix gives each agent its own GPU-accelerated desktop, then lets you orchestrate dozens of them in parallel]]></description><link>https://blog.helix.ml/p/what-a-control-room-for-ai-coding</link><guid isPermaLink="false">https://blog.helix.ml/p/what-a-control-room-for-ai-coding</guid><dc:creator><![CDATA[Priya Samuel]]></dc:creator><pubDate>Tue, 03 Mar 2026 14:07:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Fdv2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbeb019f4-7184-4d92-af61-7952e3699c06_3000x2000.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Fdv2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbeb019f4-7184-4d92-af61-7952e3699c06_3000x2000.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Fdv2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbeb019f4-7184-4d92-af61-7952e3699c06_3000x2000.png 424w, https://substackcdn.com/image/fetch/$s_!Fdv2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbeb019f4-7184-4d92-af61-7952e3699c06_3000x2000.png 848w, https://substackcdn.com/image/fetch/$s_!Fdv2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbeb019f4-7184-4d92-af61-7952e3699c06_3000x2000.png 1272w, https://substackcdn.com/image/fetch/$s_!Fdv2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbeb019f4-7184-4d92-af61-7952e3699c06_3000x2000.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Fdv2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbeb019f4-7184-4d92-af61-7952e3699c06_3000x2000.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/beb019f4-7184-4d92-af61-7952e3699c06_3000x2000.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:9334928,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/189240244?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbeb019f4-7184-4d92-af61-7952e3699c06_3000x2000.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Fdv2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbeb019f4-7184-4d92-af61-7952e3699c06_3000x2000.png 424w, https://substackcdn.com/image/fetch/$s_!Fdv2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbeb019f4-7184-4d92-af61-7952e3699c06_3000x2000.png 848w, https://substackcdn.com/image/fetch/$s_!Fdv2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbeb019f4-7184-4d92-af61-7952e3699c06_3000x2000.png 1272w, https://substackcdn.com/image/fetch/$s_!Fdv2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbeb019f4-7184-4d92-af61-7952e3699c06_3000x2000.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>What a control room for AI coding agents actually looks like</h2><p>Picture ten items on your engineering backlog. A new feature. A framework migration. Four security patches. A batch of logging improvements across a dozen repos. You know the shape of every one of them. You could write specs for all of them this afternoon.</p><p>You can&#8217;t build them all this afternoon. Not with one developer. Not even with one very good AI agent.</p><p>Helix changes that equation. Not by making one agent faster, but by giving you a fleet of them, each working in its own GPU-accelerated desktop, coordinated through a Kanban board you can watch in real time.</p><h3>Each agent gets its own computer</h3><p>We covered this architecture in an earlier post, but it&#8217;s worth repeating here because it&#8217;s the foundation everything else builds on.</p><p>Every agent in Helix gets its own isolated desktop environment. Not a container with a language runtime. A full GPU-accelerated Linux desktop running the Zed code editor, a terminal, a browser, and its own filesystem. When you spin up five agents to work on five tasks, they&#8217;re running on five separate desktops. They can&#8217;t interfere with each other.</p><p>Each desktop appears as a separate machine, but underneath it&#8217;s a high-density Docker-in-Docker (or Docker-in-Kubernetes) setup sharing GPU resources. We did a lot of work on GPU virtualization with virtio-gpu and Vulkan passthrough to make multi-tenant desktops viable on a single physical machine.</p><p>The result is that you can watch your agents work. Literally watch them. You see the code editor, the terminal output, the browser window. When an agent opens Chrome to test the app it just built, you see Chrome open. When it reads an error and goes back to fix the code, you see that too.</p><h3>The Kanban board</h3><p>The orchestration layer is a Kanban board. Columns for backlog, planning, implementation, review, and done. Each card is a task. Each task gets an agent.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pa0X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07d5eb0b-e037-4980-aa1a-5445869dac4c_2009x986.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pa0X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07d5eb0b-e037-4980-aa1a-5445869dac4c_2009x986.png 424w, https://substackcdn.com/image/fetch/$s_!Pa0X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07d5eb0b-e037-4980-aa1a-5445869dac4c_2009x986.png 848w, https://substackcdn.com/image/fetch/$s_!Pa0X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07d5eb0b-e037-4980-aa1a-5445869dac4c_2009x986.png 1272w, https://substackcdn.com/image/fetch/$s_!Pa0X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07d5eb0b-e037-4980-aa1a-5445869dac4c_2009x986.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pa0X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07d5eb0b-e037-4980-aa1a-5445869dac4c_2009x986.png" width="1456" height="715" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/07d5eb0b-e037-4980-aa1a-5445869dac4c_2009x986.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:715,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:299724,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/189240244?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07d5eb0b-e037-4980-aa1a-5445869dac4c_2009x986.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Pa0X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07d5eb0b-e037-4980-aa1a-5445869dac4c_2009x986.png 424w, https://substackcdn.com/image/fetch/$s_!Pa0X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07d5eb0b-e037-4980-aa1a-5445869dac4c_2009x986.png 848w, https://substackcdn.com/image/fetch/$s_!Pa0X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07d5eb0b-e037-4980-aa1a-5445869dac4c_2009x986.png 1272w, https://substackcdn.com/image/fetch/$s_!Pa0X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07d5eb0b-e037-4980-aa1a-5445869dac4c_2009x986.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>Move a card into the planning column and the agent spins up a desktop and starts writing a spec. As with Spec Driven development: requirements first, then technical design, followed by an implementation plan (spec). The agent writes these documents in <a href="https://zed.dev/">Zed</a>, and you can review them with inline comments, Google Docs style. Leave a comment saying &#8220;what about edge cases for deleted users?&#8221; and the agent responds to your comment and updates the design.</p><p>This is the workflow that&#8217;s changed how we build software internally. You batch up your thinking early. Leave comments on specs across five different tasks. The agents respond and iterate on the designs while you move on to the next review. When a design looks right, you approve it, and the agent shifts into implementation mode. It writes code, runs tests, and opens a pull request.</p><p>The approval of the implementation plan isn&#8217;t ceremonial. When you approve a spec, the agent receives a structured prompt telling it: &#8220;Your design has been approved. You&#8217;re now in the implementation phase.&#8221; File diffs show up in real time. The agent commits code, runs the app, and tests it. You&#8217;re reviewing finished pull requests, not babysitting the work.</p><h3>Agents don&#8217;t talk to each other (on purpose)</h3><p>The obvious question with multiple agents: if one miscommunicates something to another, how do you debug that?</p><p>Our answer is that they don&#8217;t communicate with each other. At all.</p><p>For coding tasks, where an agent needs to hold a coherent plan from spec to implementation, the communication overhead with multi agent communication buys you very little and introduces failure modes that are genuinely hard to debug.</p><p>So our agents are intentionally isolated. They coordinate the same way human developers do: through git. When an agent finishes its work and opens a pull request, it merges from main first. If there&#8217;s a conflict, it resolves it. That&#8217;s the coordination mechanism. It&#8217;s boring. It works.</p><p>Maybe one day it&#8217;ll make sense to have two agents pair-programming on the same desktop. But right now, isolated agents working in parallel on separate tasks, coordinating through version control, gives you the throughput gains without the chaos.</p><h3>Do the work once, apply it everywhere</h3><p>Some of the most valuable engineering work is also the most tedious: applying the same change across dozens of repositories.</p><p>Think about an organisation with 100 repos that share the same Python framework. Same patterns, same structure. A security patch or logging change needs to go into 30 or 50 of them. That work goes on the backlog. And it sits there. For weeks. Sometimes months.</p><p>Here&#8217;s what we built. You do the work once, in one repo, with one agent. During that process, the agent learns things you didn&#8217;t know at the beginning. The spec gets refined through actually doing the work. Then you clone that refined spec across the other 49 repos. The agents spin up in parallel, each working in its own desktop, each applying the same pattern to a different codebase.</p><p>Do one in an hour. Do 49 in ten minutes.</p><p>Not all of them land perfectly. You review a group view that shows progress across all the cloned tasks: which ones are done, which ones need attention, which ones have already been merged. But the ratio of human effort to output changes dramatically. Instead of a new hire spending a week getting through three of them, you&#8217;re reviewing pull requests across all 49 by lunchtime.</p><h3>The acceleration curve</h3><p>We have adopted this concept called the <a href="https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16dd04">AI acceleration curve</a> from Steve Yegge&#8217;s post from Jan 2026. It&#8217;s eight steps, from basic model inference all the way up to orchestrated agent fleets.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JqLA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03064fdf-8d5c-4721-8c85-2d7bdd84ff33_1774x986.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JqLA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03064fdf-8d5c-4721-8c85-2d7bdd84ff33_1774x986.png 424w, https://substackcdn.com/image/fetch/$s_!JqLA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03064fdf-8d5c-4721-8c85-2d7bdd84ff33_1774x986.png 848w, https://substackcdn.com/image/fetch/$s_!JqLA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03064fdf-8d5c-4721-8c85-2d7bdd84ff33_1774x986.png 1272w, https://substackcdn.com/image/fetch/$s_!JqLA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03064fdf-8d5c-4721-8c85-2d7bdd84ff33_1774x986.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JqLA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03064fdf-8d5c-4721-8c85-2d7bdd84ff33_1774x986.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/03064fdf-8d5c-4721-8c85-2d7bdd84ff33_1774x986.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:152740,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/189240244?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03064fdf-8d5c-4721-8c85-2d7bdd84ff33_1774x986.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JqLA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03064fdf-8d5c-4721-8c85-2d7bdd84ff33_1774x986.png 424w, https://substackcdn.com/image/fetch/$s_!JqLA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03064fdf-8d5c-4721-8c85-2d7bdd84ff33_1774x986.png 848w, https://substackcdn.com/image/fetch/$s_!JqLA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03064fdf-8d5c-4721-8c85-2d7bdd84ff33_1774x986.png 1272w, https://substackcdn.com/image/fetch/$s_!JqLA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03064fdf-8d5c-4721-8c85-2d7bdd84ff33_1774x986.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The instinct is to skip straight to step eight. Everybody wants the fleet.</p><p>But the reality is that a team running its first inference endpoints this quarter is the team that&#8217;ll be ready for agent fleets next year. Each step builds the organisational muscle, the infrastructure, the trust, that makes the next step possible.</p><p>Helix Coding agents are built for step eight. But Helix works at every step along the way. You can start with self-hosted inference and RAG. Add single-agent coding sessions when your team is comfortable. Move to multi-agent orchestration when you&#8217;ve seen enough to trust the workflow.</p><p>That&#8217;s not a compromise. It&#8217;s how platforms actually grow. You meet teams where they are. You solve the problem they have right now. And when they&#8217;re ready for the next level, the infrastructure is already there.</p><h3>Try it</h3><p>If you want to see where your team falls on the curve, or you just want to watch five AI agents build five different apps at the same time, we&#8217;d love to <a href="https://helix.ml/contact">talk</a>.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Porting a Code RAG system from Python to Go: What the AI got wrong]]></title><description><![CDATA[Why we rewrote Kodit from Python to Go, what broke along the way, and what the new version means for users and integrators.]]></description><link>https://blog.helix.ml/p/porting-a-code-rag-system-from-python</link><guid isPermaLink="false">https://blog.helix.ml/p/porting-a-code-rag-system-from-python</guid><dc:creator><![CDATA[Phil Winder]]></dc:creator><pubDate>Thu, 26 Feb 2026 14:58:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!bvDT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8703d130-7d63-4fb0-b7ae-8754afec450c_1334x743.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Kodit started as a Python project. An <strong><a href="http://localhost:1313/early-adopter-release-kodit-mcp-external-repositories/">MCP server and CLI</a></strong> for indexing code repositories, combining BM25 keyword search with vector embeddings and reciprocal rank fusion to give AI coding assistants the context they need. Python served well for prototyping: FastAPI, SQLAlchemy, Pydantic, and a rich ecosystem of ML libraries made it straightforward to build and iterate.</p><p>But Python added friction. Deploying Kodit into the <strong><a href="https://github.com/helixml/helix">Helix</a></strong> ecosystem meant shipping a Python runtime, managing pip dependencies, and accepting the performance overhead of an interpreted language on a search-heavy workload. Since Helix is a Go project, it was obvious that Kodit should be in Go too. The goal was feature parity with the Python version, plus something new: a clean Go client API so that Helix and other projects could import Kodit as a library, not just call it as a server.</p><p>This article is the story of that migration. What changed architecturally, what broke along the way, and what the new Go version means for users. For the generic methodology behind AI-assisted cross-language migrations, see the <strong><a href="https://winder.ai/python-to-go-migration-with-claude-code/">companion article on Winder.AI</a></strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bvDT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8703d130-7d63-4fb0-b7ae-8754afec450c_1334x743.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bvDT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8703d130-7d63-4fb0-b7ae-8754afec450c_1334x743.png 424w, https://substackcdn.com/image/fetch/$s_!bvDT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8703d130-7d63-4fb0-b7ae-8754afec450c_1334x743.png 848w, https://substackcdn.com/image/fetch/$s_!bvDT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8703d130-7d63-4fb0-b7ae-8754afec450c_1334x743.png 1272w, https://substackcdn.com/image/fetch/$s_!bvDT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8703d130-7d63-4fb0-b7ae-8754afec450c_1334x743.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bvDT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8703d130-7d63-4fb0-b7ae-8754afec450c_1334x743.png" width="1334" height="743" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8703d130-7d63-4fb0-b7ae-8754afec450c_1334x743.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:743,&quot;width&quot;:1334,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:117729,&quot;alt&quot;:&quot;An image of Kodit being used inside Helix as Code Intelligence.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/188300352?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8703d130-7d63-4fb0-b7ae-8754afec450c_1334x743.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="An image of Kodit being used inside Helix as Code Intelligence." title="An image of Kodit being used inside Helix as Code Intelligence." srcset="https://substackcdn.com/image/fetch/$s_!bvDT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8703d130-7d63-4fb0-b7ae-8754afec450c_1334x743.png 424w, https://substackcdn.com/image/fetch/$s_!bvDT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8703d130-7d63-4fb0-b7ae-8754afec450c_1334x743.png 848w, https://substackcdn.com/image/fetch/$s_!bvDT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8703d130-7d63-4fb0-b7ae-8754afec450c_1334x743.png 1272w, https://substackcdn.com/image/fetch/$s_!bvDT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8703d130-7d63-4fb0-b7ae-8754afec450c_1334x743.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>The Migration Approach</strong></h2><p>The full methodology is covered in the <strong><a href="https://winder.ai/python-to-go-migration-with-claude-code/">Winder.AI article</a></strong>, but the short version is this: I set up a monorepo with the Python source and Go target side by side, wrote two design documents (CLAUDE.md for domain context and coding standards, MIGRATION.md for an ordered task checklist), and used <strong><a href="https://docs.anthropic.com/en/docs/claude-code/overview">Claude Code</a></strong> to generate the Go implementation in an automated loop.</p><p>What was specific to Kodit was the domain modelling.</p><p><strong>Bounded contexts.</strong> Kodit has four distinct areas: repositories (the code sources being indexed), enrichments and snippets (the indexed content and its metadata), search (the query pipeline), and configuration. Each maps to a directory in the Go codebase with its own domain, application, and infrastructure layers.</p><p><strong>Ubiquitous language.</strong> Terms like <em>enrichment</em>, <em>association</em>, <em>snippet</em>, and <em>embedding model</em> have precise meanings in the Kodit domain. These were documented in a glossary in CLAUDE.md so the AI would use them consistently rather than inventing its own terminology. Getting this right matters: when the AI starts calling an enrichment a &#8220;document&#8221; or a snippet a &#8220;chunk&#8221;, the generated code drifts from the existing schema and APIs.</p><p><strong>Layered architecture.</strong> The Go codebase follows a DDD-inspired structure: domain types have no external dependencies, application services orchestrate use cases, and infrastructure implementations handle persistence and external APIs. Layer rules are enforced by Go&#8217;s package system. The domain package never imports infrastructure.</p><p>This structural discipline paid off during the automated generation phase. With clear boundaries, the AI could generate code for one context without accidentally coupling it to another.</p><h2><strong>Architectural Decisions</strong></h2><p>Several important design decisions were made during the migration. Some were intentional. Some were discovered by accident.</p><h3><strong>Public API vs Internal</strong></h3><p>The AI defaulted to placing everything in Go&#8217;s <code>internal/</code> directory. This is idiomatic Go: <code>internal/</code> prevents external projects from importing your packages. But the whole point of this migration was to make Kodit consumable as a Go library. I needed Helix to be able to <code>import</code> Kodit&#8217;s search client, repository types, and configuration directly.</p><p>I discovered this problem halfway through the migration. Everything compiled. Tests passed. But nothing was importable from outside the module. The refactor to extract a proper public API surface was substantial. It required deciding which types and interfaces belonged in the public package, which stayed internal, and how the public client would wrap the internal application services.</p><p>The result is a clean Go client that any project can import:</p><pre><code><code>import "github.com/helixml/kodit/client"

c, err := client.New(client.Config{
    BaseURL: "http://localhost:8080",
})

results, err := c.Search(ctx, client.SearchQuery{
    Query:      "authentication middleware",
    Repository: "myorg/myrepo",
    Limit:      10,
})
</code></code></pre><p>The lesson: define your public API surface before generating any code. If I had specified this in CLAUDE.md from the start, the AI would have structured the code around the public interface rather than burying everything in <code>internal/</code>.</p><h3><strong>The Snippets Resurrection</strong></h3><p>This was the standout domain failure of the migration.</p><p>In the early Python version, Kodit stored snippets in their own database table. Later, I consolidated the design: snippets became a type of unified enrichment, stored in the enrichments table with associations linking them to repositories and other enrichments. This simplified the schema to essentially two core tables: enrichments and associations. All content, whether a code snippet, a description, an embedding, or a repository reference, was an enrichment linked by associations.</p><p>But remnants of the old design remained in the Python codebase. Type hints referencing a <code>Snippet</code> model. Comments mentioning the snippets table. Variable names like <code>snippet_results</code>. The AI saw these, recognised that &#8220;snippet&#8221; was a core domain concept (it was in the ubiquitous language glossary, after all), and rebuilt the entire deprecated table and data access layer.</p><p>I only discovered the problem when I ran a migration test: importing real data from a running Python instance into the new Go version. The data migrated successfully (enrichments landed in the enrichments table), but searches returned zero results. The Go search pipeline was querying the snippets table, which was empty.</p><p>The fix was another refactor. &#8220;Snippet&#8221; touched nearly every layer of the codebase: domain types, repository interfaces, application services, API handlers, database queries. Every reference had to be redirected to the enrichments table and its association-based data model.</p><p>The lesson is twofold. First, clean up dead references before migration. If deprecated code exists anywhere in the source, the AI will find it and use it. Second, migration tests are essential. Smoke tests with fresh data are not sufficient. You need to test with real data from the previous version to catch schema-level regressions.</p><h3><strong>Configuration Scattering</strong></h3><p>The AI scattered configuration defaults and overrides across multiple files. A default embedding model in one package. An overridden batch size in another. Environment variable reads in a third. The Go version had no single place where you could see what the system&#8217;s configuration was, what the defaults were, or where values were being mutated.</p><p>The principle I enforced during refactoring: configuration should be set, defaulted, logged, validated, and mutated in exactly one place. In the Go version, this is the <code>config</code> package. Application services receive their configuration at construction time and never read environment variables or apply defaults themselves.</p><h3><strong>In-Memory Pagination</strong></h3><p>The AI initially created list endpoints that loaded all records from the database and paginated in memory. An obvious and stupid error.</p><p>I caught this during code review and required proper <code>LIMIT</code>/<code>OFFSET</code> queries flowing from the API layer through the application service into the database query. The pagination parameters are defined at the API boundary and propagated down to the DB.</p><p>The broader pattern here is that AI-generated code tends to take the path of least resistance. Loading everything and slicing in Go is simpler to write because the infrastructure is already there. Doing it the right way, threading pagination parameters through three layers, touches a lot of code. If you care about performance at scale, you need to specify these constraints in the design.</p><h2><strong>Testing and Validation</strong></h2><p>Building confidence in the new version required multiple layers of testing. No single strategy was sufficient on its own.</p><h3><strong>Unit Tests</strong></h3><p>These tests are fast and catch regressions in individual components, but they said nothing about whether the system worked end-to-end. In the first version I focussed more on representative, real life end-to-end and smoke tests.</p><h3><strong>Smoke Tests</strong></h3><p>I created a pair of smoke test suites: one targeting the Python version, one targeting the Go version, both executing the same sequence of operations. Index a repository. Create enrichments. Run a search. Compare results.</p><p>These smoke tests caught wiring issues that unit tests could not: missing middleware, incorrect route registrations, serialisation differences between FastAPI and Go&#8217;s HTTP handlers.</p><p>After creating a Python-era postgres dump, I wrote a new smoke test to ingest this and test other end-to-end workflows.</p><h3><strong>API Parity via OpenAPI</strong></h3><p>A test that compares the OpenAPI specification generated by the Go version against the Python version. This caught missing endpoints, wrong parameter types, incorrect response schemas, and structural differences that would break existing clients.</p><p>If you are migrating a web API, this test is essential. It provides a machine-readable contract between the old and new implementations.</p><h3><strong>Ranking Comparison</strong></h3><p>The most revealing test was a direct side-by-side comparison of search results. I ran the same queries against both versions and compared the ranked output.</p><p>The results were initially wrong. Completely wrong. The investigation uncovered multiple issues:</p><ul><li><p><strong>Truncation error.</strong> When converting embeddings to VectorChord&#8217;s database format, the Go version was incorrectly truncating the float arrays. Dimensions were being lost.</p></li><li><p><strong>RRF indexing error.</strong> The reciprocal rank fusion implementation had an off-by-one error when combining BM25 and semantic rankings.</p></li><li><p><strong>Wrong embedding read.</strong> The AI had added unrequested functionality to read multiple embedding formats from disk. This caused it to load the wrong embedding for a given snippet, producing nonsensical similarity scores.</p></li></ul><p>Each of these passed unit tests in isolation. Only the end-to-end ranking comparison revealed the compounding effect.</p><p>There was a silver lining. During this debugging, Claude noticed that the codebase was using L2 (Euclidean) distance rather than cosine distance for vector similarity. This was likely degrading results in the Python version too. A genuine improvement discovered by accident.</p><h3><strong>Migration Test</strong></h3><p>Testing with real data migrated from the old Python database to the new Go schema. This is what caught the snippets table regression described above. If you are rewriting a system that has existing production data, migration tests are non-negotiable. They test the one thing smoke tests cannot: whether the new system correctly handles legacy data.</p><h2><strong>What the AI Got Wrong</strong></h2><p>To be specific about where the AI failed on this project:</p><p><strong>Resurrecting deprecated features.</strong> The snippets table rebuild was the most expensive failure. The AI saw domain references, inferred importance, and recreated dead functionality. The fix touched dozens of files.</p><p><strong>Dead code accumulation.</strong> After refactoring from <code>internal/</code> to a public API, orphaned packages remained. They appeared used because other orphaned packages imported them. Identifying dead code required understanding the full dependency graph, which the AI could not do unprompted.</p><p><strong>Excessive functionality.</strong> The AI added features not present in the Python version: multiple embedding format readers, alternative search strategies, extra configuration options. Each addition introduced potential bugs with zero user value.</p><p><strong>Missing end-to-end wiring.</strong> Individual components worked. The application as a whole did not start correctly the first time. The AI generated each piece but never ran the server. Wiring errors (missing dependency injection, incorrect initialisation order) only appeared when the full system was assembled.</p><h2><strong>The New Kodit</strong></h2><p>What users and integrators get from the Go version:</p><p><strong>Go client library.</strong> Import <code>github.com/helixml/kodit/client</code> and use Kodit programmatically. Search, index repositories, manage enrichments, all through typed Go functions. This is the foundation for the Helix integration.</p><p><strong>Same interfaces.</strong> The MCP server and CLI behave identically to the Python version. Existing users should see no difference in their workflow.</p><p><strong>Database compatibility.</strong> SQLite for local-first usage. VectorChord/PostgreSQL for enterprise scale. The Go version supports both, matching the Python version&#8217;s flexibility.</p><p><strong>Performance.</strong> The Go version benefits from compiled execution and Go&#8217;s concurrency model for parallel indexing and search. Formal benchmarks are forthcoming, but my initial testing is reporting a 5x performance improvement during testing was noticeably faster for large repositories.</p><h2><strong>What&#8217;s Next</strong></h2><p>The migration itself inspired new functionality. The dead code and orphaned package problems I encountered manually are exactly the kind of issues Kodit should detect automatically. Dead code detection and duplication analysis are on the roadmap. I also want to get back to benchmarking and indexing improvements.</p><p>The Helix integration is underway, with Kodit&#8217;s Go client providing native code search within the Helix platform. Community contributions are welcome, particularly around new enrichment strategies and search pipeline improvements.</p><p>The <strong><a href="https://github.com/helixml/kodit/">Kodit repository</a></strong> is open source. Issues, discussions, and pull requests are the best way to get involved.</p><h2><strong>Conclusion</strong></h2><p>The rewrite was worth it. The Go version is cleaner, faster to deploy, and designed for library consumption from the start. The AI-assisted approach compressed what would have been months of manual translation into about three weeks, but it required constant human oversight of architecture and domain correctness.</p><p>The biggest lesson is this: AI coding assistants are powerful translators but poor architects. They will faithfully convert Python patterns to Go patterns, function by function, file by file. But they cannot see the system as a whole. They cannot question whether a deprecated table should be rebuilt. They cannot decide which packages should be public. They cannot judge whether in-memory pagination is acceptable at scale.</p>]]></content:encoded></item><item><title><![CDATA[How We Forked Zed and Added Remote Control for Agent Fleet Orchestration]]></title><description><![CDATA[Zed is a fast, GPU-accelerated code editor written in Rust.]]></description><link>https://blog.helix.ml/p/how-we-forked-zed-to-run-a-fleet</link><guid isPermaLink="false">https://blog.helix.ml/p/how-we-forked-zed-to-run-a-fleet</guid><dc:creator><![CDATA[Chris Sterry]]></dc:creator><pubDate>Wed, 25 Feb 2026 18:25:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!NBHu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b8af75c-12d1-496b-9106-a783b6c188ee_1397x782.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Zed is a fast, GPU-accelerated code editor written in Rust. It has excellent LSP support, a growing agent panel, and a clean architecture. It also has no concept of external orchestration &#8212; and that&#8217;s where our problem started.</p><p>Helix runs fleets of coding agents. Each agent is a headless Zed instance running inside a Docker container, connected to an LLM via the Agent Control Protocol (ACP). A central API dispatches tasks, monitors progress, manages thread lifecycles, and streams results back to users in real time. None of that is possible with stock Zed &#8212; so we forked it and added a WebSocket control plane.</p><p>This post covers what we built, the bugs that nearly broke us, and how we got streaming performance from O(N&#178;) down to O(delta).</p><h2>What We Needed From the Fork</h2><p>Three capabilities required forking:</p><ol><li><p><strong>Remote command injection</strong> &#8212; the API must be able to send chat messages, simulate user input, and query UI state in a running Zed instance, with no human at the keyboard.</p></li><li><p><strong>Event exfiltration</strong> &#8212; Zed must report back when a thread is created, when messages stream in, when the agent finishes, and when errors occur.</p></li><li><p><strong>Multi-thread lifecycle management</strong> &#8212; when a thread exhausts its context window, Helix starts a new one on the same WebSocket connection. Zed must handle multiple concurrent ACP threads per connection.</p></li></ol><h2>The WebSocket Sync Protocol</h2><p>The control plane is a single bidirectional WebSocket between the Helix API and each Zed instance. The API side lives in <code>websocket_external_agent_sync.go</code>; the Zed side in <code>crates/external_websocket_sync/</code>.</p><p><strong>Server &#8594; Zed (commands):</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NBHu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b8af75c-12d1-496b-9106-a783b6c188ee_1397x782.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NBHu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b8af75c-12d1-496b-9106-a783b6c188ee_1397x782.png 424w, https://substackcdn.com/image/fetch/$s_!NBHu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b8af75c-12d1-496b-9106-a783b6c188ee_1397x782.png 848w, https://substackcdn.com/image/fetch/$s_!NBHu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b8af75c-12d1-496b-9106-a783b6c188ee_1397x782.png 1272w, https://substackcdn.com/image/fetch/$s_!NBHu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b8af75c-12d1-496b-9106-a783b6c188ee_1397x782.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NBHu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b8af75c-12d1-496b-9106-a783b6c188ee_1397x782.png" width="1397" height="782" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b8af75c-12d1-496b-9106-a783b6c188ee_1397x782.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:782,&quot;width&quot;:1397,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:165558,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/189166610?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b8af75c-12d1-496b-9106-a783b6c188ee_1397x782.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NBHu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b8af75c-12d1-496b-9106-a783b6c188ee_1397x782.png 424w, https://substackcdn.com/image/fetch/$s_!NBHu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b8af75c-12d1-496b-9106-a783b6c188ee_1397x782.png 848w, https://substackcdn.com/image/fetch/$s_!NBHu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b8af75c-12d1-496b-9106-a783b6c188ee_1397x782.png 1272w, https://substackcdn.com/image/fetch/$s_!NBHu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b8af75c-12d1-496b-9106-a783b6c188ee_1397x782.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Zed &#8594; Server (events):</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ljf4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d79bd4-1e9d-4b86-8e11-f904bf751734_1392x434.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ljf4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d79bd4-1e9d-4b86-8e11-f904bf751734_1392x434.png 424w, https://substackcdn.com/image/fetch/$s_!Ljf4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d79bd4-1e9d-4b86-8e11-f904bf751734_1392x434.png 848w, https://substackcdn.com/image/fetch/$s_!Ljf4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d79bd4-1e9d-4b86-8e11-f904bf751734_1392x434.png 1272w, https://substackcdn.com/image/fetch/$s_!Ljf4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d79bd4-1e9d-4b86-8e11-f904bf751734_1392x434.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ljf4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d79bd4-1e9d-4b86-8e11-f904bf751734_1392x434.png" width="1392" height="434" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/80d79bd4-1e9d-4b86-8e11-f904bf751734_1392x434.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:434,&quot;width&quot;:1392,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:97233,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/189166610?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d79bd4-1e9d-4b86-8e11-f904bf751734_1392x434.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ljf4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d79bd4-1e9d-4b86-8e11-f904bf751734_1392x434.png 424w, https://substackcdn.com/image/fetch/$s_!Ljf4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d79bd4-1e9d-4b86-8e11-f904bf751734_1392x434.png 848w, https://substackcdn.com/image/fetch/$s_!Ljf4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d79bd4-1e9d-4b86-8e11-f904bf751734_1392x434.png 1272w, https://substackcdn.com/image/fetch/$s_!Ljf4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80d79bd4-1e9d-4b86-8e11-f904bf751734_1392x434.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Every message that touches a thread carries <code>acp_thread_id</code> for correlation. The <code>request_id</code> field ties a command to its eventual <code>thread_created</code> and <code>message_completed</code> events, so the API can track which user request produced which response.</p><h2>Architecture</h2><pre><code><code>Helix Frontend
      |
      | HTTP POST /api/v1/sessions/chat
      v
Helix API  ----WebSocket----&gt; Zed (headless, in container) ---ACP---&gt; LLM
      |                              |
      | pubsub (session_update,      | thread events
      | interaction_update)          | (message_added, etc.)
      v                              |
Helix Frontend &lt;----WebSocket--------+</code></code></pre><p>The API maintains a map of <code>acp_thread_id</code> to Helix session IDs. When a user sends a message, the API creates an Interaction record with the user&#8217;s prompt, then dispatches a <code>chat_message</code> command over the WebSocket. Zed creates or reuses an ACP thread, the LLM streams its response, and Zed relays each chunk back as <code>message_added</code> events. The API accumulates these into the Interaction&#8217;s response and publishes real-time updates to the frontend.</p><p>When context exhausts, Helix sends a new <code>chat_message</code> without an <code>acp_thread_id</code>, prompting Zed to create a fresh thread. The new <code>thread_created</code> event maps it back to the same Helix session. One WebSocket connection manages the full lifecycle.</p><h2>Bug 1: The Multi-Message Accumulation Problem</h2><p>Zed&#8217;s agent panel produces multiple distinct entries per response turn: an assistant message, one or more tool calls, and a follow-up message. Each entry has its own <code>message_id</code>. Within a single entry, Zed streams <em>cumulative</em> content updates &#8212; the full content so far for that entry, not deltas.</p><p>The original code stored the response as a single string and overwrote it on each <code>message_added</code> event:</p><p>go</p><pre><code><code>// The bug
interaction.ResponseMessage = content</code></code></pre><p>Fine when there&#8217;s one <code>message_id</code>. With multiple entries:</p><ol><li><p><code>message_added(id="msg-1", content="I'll help you with that.")</code> &#8594; response = <code>"I'll help you with that."</code></p></li><li><p><code>message_added(id="msg-2", content="</code> <code>`tool\nedit")` &#8594; response = `"`</code> <code>tool\nedit"</code> (msg-1 gone)</p></li><li><p><code>message_added(id="msg-2", content="</code> <code>`tool\nedit file.py\n`</code> <code>")</code> &#8594; correct overwrite of msg-2, but msg-1 is still gone</p></li></ol><p>The fix tracks the byte offset where each <code>message_id</code>&#8216;s content begins. Same ID &#8594; replace from offset. New ID &#8594; append with separator, record new offset:</p><p>go</p><pre><code><code>type MessageAccumulator struct {
    Content       string
    LastMessageID string
    Offset        int // byte offset where current message_id starts
}

func (a *MessageAccumulator) AddMessage(messageID, content string) {
    if a.LastMessageID == "" {
        a.Content = content
        a.Offset = 0
        a.LastMessageID = messageID
        return
    }

    if a.LastMessageID == messageID {
        // Same message streaming -- replace from offset, keep prefix
        a.Content = a.Content[:a.Offset] + content
        return
    }

    // New distinct message -- record offset, append with separator
    a.Offset = len(a.Content) + 2 // account for "\n\n"
    a.Content = a.Content + "\n\n" + content
    a.LastMessageID = messageID
}</code></code></pre><p>Zed sends cumulative content per <code>message_id</code> (overwrite semantics), but the overall response is an append-only sequence of distinct message IDs. The accumulator handles both with a single offset tracker.</p><h2>Bug 2: The Completion Hang</h2><p>Users reported that responses would stream correctly but never show as complete &#8212; the loading spinner hung indefinitely.</p><p>The handler for <code>message_completed</code> published <code>session_update</code> events to the frontend. The frontend&#8217;s <code>session_update</code> handler has rejection logic: it checks whether the incoming session has the expected number of interactions and drops events that fail validation. A safeguard against stale data from out-of-order WebSocket messages &#8212; but it meant completion events were intermittently discarded.</p><p>The fix was to publish through both channels:</p><p>go</p><pre><code><code>// 1. interaction_update -- same channel used during streaming
//    ensures useLiveInteraction sees state=complete
err = apiServer.publishInteractionUpdateToFrontend(
    helixSessionID, helixSession.Owner, targetInteraction, messageRequestID)

// 2. session_update -- full session for React Query cache consistency
err = apiServer.publishSessionUpdateToFrontend(
    reloadedSession, targetInteraction, messageRequestID)</code></code></pre><p>The <code>interaction_update</code> path targets a specific interaction rather than the full session, bypassing the rejection logic entirely. That&#8217;s the reliable path for completion signals.</p><h2>Shared Protocol Code: Eliminating Test Drift</h2><p>The original end-to-end tests used a Python mock WebSocket server that reimplemented the sync protocol. The accumulation bug above didn&#8217;t appear in tests because the Python mock had its own (simpler) message handling. Tests passed. Production broke.</p><p>The solution: extract a shared <code>wsprotocol</code> Go package that both the production Helix server and the Go test server import. Same parsing, same accumulation logic, same event dispatch. If the accumulator has a bug, the test catches it because it runs the same code path.</p><p>The package has four components. <strong>MessageAccumulator</strong> &#8212; the append/overwrite logic above. <strong>Protocol</strong> &#8212; manages the WebSocket lifecycle, reads and parses messages, dispatches to handlers. <strong>EventHandler interface</strong> &#8212; the seam between shared protocol code and environment-specific behavior:</p><p>go</p><pre><code><code>type EventHandler interface {
    OnAgentReady(conn *Conn, sessionID string) error
    OnThreadCreated(conn *Conn, sessionID string, evt *ThreadCreatedEvent) error
    OnMessageAdded(conn *Conn, sessionID string, evt *MessageAddedEvent, accumulated string) error
    OnMessageCompleted(conn *Conn, sessionID string, evt *MessageCompletedEvent) error
    OnUIStateResponse(conn *Conn, sessionID string, evt *UIStateResponseEvent) error
    OnThreadLoadError(conn *Conn, sessionID string, evt *ThreadLoadErrorEvent) error
    OnRawEvent(conn *Conn, sessionID string, msg *SyncMessage) error
}</code></code></pre><p>Production implements this with database writes and pubsub. Tests use in-memory tracking and assertions. The <code>OnRawEvent</code> escape hatch handles Helix-specific events without bloating the shared interface.</p><p>Adding a new event type: (1) add a struct to <code>types.go</code>, (2) add a case to <code>dispatch</code>, (3) add a method to <code>EventHandler</code>. Both production and test code get the change, or neither does. No more protocol drift.</p><h2>Streaming Performance: O(N&#178;) to O(delta)</h2><p>This was the most significant engineering challenge.</p><p>Streaming from Zed isn&#8217;t like streaming raw LLM output. An LLM token stream is purely append-only. Zed&#8217;s agent panel isn&#8217;t &#8212; a single response turn contains an assistant message, tool calls with status indicators, and follow-up messages, all interleaved. Those status indicators mutate in place mid-stream: <code>**Status: Running**</code> becomes <code>**Status: Completed**</code>. Content can change anywhere, not just at the end.</p><p>The naive approach &#8212; send the full accumulated response on every update &#8212; worked, but scaled badly. On every <code>message_added</code> event (dozens per second during fast token streaming), the API would:</p><ol><li><p>Query the database for the session</p></li><li><p>Query the database for the interaction</p></li><li><p>Write the updated interaction back</p></li><li><p>Serialize the entire interaction as JSON and publish it to the frontend</p></li></ol><p>For a 100KB response, this meant pushing 100KB over the WebSocket on every token. By the end of a long response, the browser was doing megabytes of string copying per second and the UI would visibly lag.</p><p><strong>Caching and throttling (Go side):</strong> A <code>streamingContext</code> struct caches the session and interaction for the lifetime of a streaming response, eliminating two database round-trips per token. Database writes are throttled to one every 200ms &#8212; the in-memory state always has the latest content, but we only flush to Postgres periodically. <code>message_completed</code> always writes the final state, so at most 200ms of content is lost on a crash. Frontend publishes are throttled to one every 50ms, since the frontend batches to <code>requestAnimationFrame</code> (~16ms) anyway.</p><p><strong>Patch-based deltas:</strong> Instead of sending the full interaction JSON on every update, the API computes a patch &#8212; the byte offset of the first change and the new content from that point forward. In the common case (pure append), the fast path fires: check that the new content starts with the previous content, return the offset and the suffix. One string prefix comparison.</p><p>For backwards edits (tool call status changing), the slow path finds the first differing rune.</p><p>The frontend receives <code>interaction_patch</code> events and applies them directly to a ref, bypassing React state during streaming. Multiple patches between animation frames are coalesced. The React Query cache isn&#8217;t touched until completion.</p><p>Wire traffic: O(N) per update &#8594; O(delta). For a 100KB response where each token adds ~20 bytes, that&#8217;s roughly a 5000x reduction per update.</p><h2>Bug 3: The UTF-16 Offset</h2><p>The first deployment of the patch protocol produced garbled text. Users saw <code>"de Statussktop"</code> where <code>"desktop"</code> should have appeared. Content in the database was correct &#8212; corruption was purely in rendering.</p><p>The root cause: <code>computePatch</code> returned byte offsets (Go&#8217;s <code>len()</code> counts bytes), but JavaScript <code>string.slice()</code> operates on UTF-16 code units. The streaming content contained 147 instances of <code>&#8250;</code> (U+203A, RIGHT SINGLE ANGLE QUOTATION MARK &#8212; Zed uses this as a breadcrumb separator in tool call output). Each <code>&#8250;</code> is 3 bytes in UTF-8 but 1 UTF-16 code unit, creating a cumulative offset divergence of 294 bytes. When a backwards edit occurred &#8212; a tool call status change &#8212; the patch was spliced into the wrong position.</p><p>The fix iterates by rune and tracks UTF-16 code unit position:</p><p>go</p><pre><code><code>func utf16RuneLen(r rune) int {
    if r &gt;= 0x10000 {
        return 2 // surrogate pair
    }
    return 1
}</code></code></pre><p>The slow path decodes runes from both strings in lockstep, accumulating <code>utf16Off</code> alongside <code>byteOff</code>. Supplementary plane characters (emoji like &#128228;) count as 2 UTF-16 code units.</p><p><strong>Zed-side throttling:</strong> Zed fires an <code>EntryUpdated</code> event on every LLM token. At high token rates, that&#8217;s hundreds of <code>message_added</code> messages per second, most of them redundant since the Go side only publishes every 50ms anyway. A 100ms throttle in Zed&#8217;s <code>thread_service.rs</code> buffer,s intermediate update,s and flushes before every <code>message_completed</code>. Nothing is dropped; wire traffic drops by ~90%.</p><div><hr></div><p>The overall shape of the work: fork a fast editor, add a protocol layer, find three distinct bugs each caused by a different mismatch between assumptions (overwrite vs. append semantics, session-level vs. interaction-level events, byte offsets vs. UTF-16 code units), then fix the performance problem that only appears at scale. Standard distributed systems work, with a Rust/Go language boundary making everything a bit more interesting.</p><p>Code is available at <a href="https://github.com/helixml/helix">github.com/helixml/helix</a>.</p>]]></content:encoded></item><item><title><![CDATA[How We Made Docker Builds 193x Faster: From 45 Minutes to 14 Seconds]]></title><description><![CDATA[The Problem]]></description><link>https://blog.helix.ml/p/how-we-made-docker-builds-193x-faster</link><guid isPermaLink="false">https://blog.helix.ml/p/how-we-made-docker-builds-193x-faster</guid><dc:creator><![CDATA[Chris Sterry]]></dc:creator><pubDate>Tue, 24 Feb 2026 14:40:44 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!kqI_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b4a148-7926-4025-8315-fc295fd44768_961x466.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><h2><strong>The Problem</strong></h2><p>Helix runs AI coding agents inside isolated desktop containers &#8212; each agent gets its own GNOME desktop with a full IDE, Docker daemon, and development environment. When an agent needs to build a project, it runs <code>docker build</code> inside its container.</p><p>The problem: <strong>every new agent session started with a cold Docker build cache</strong>. The containers are ephemeral &#8212; when a session ends, the container is destroyed along with its Docker state. For a project like Helix itself (which compiles a Rust IDE, Go APIs, Python services, and a Node.js frontend), a cold build takes <strong>43 minutes</strong>. That&#8217;s 43 minutes of an agent sitting there waiting for builds before it can start working.</p><p>This matters because multiple agents regularly clone the exact same source code. Ten agents working on ten different tasks in the same repo all need to build the same base images. Without shared caching, that&#8217;s 10 * 43 minutes = 7 hours of redundant compilation.</p><h2><strong>The Architecture</strong></h2><p>The container nesting looks like this:</p><pre><code><code>Host Machine
&#9492;&#9472;&#9472; sandbox-nvidia (Docker-in-Docker host)
    &#9500;&#9472;&#9472; helix-buildkit (shared BuildKit instance)
    &#9474;   &#9492;&#9472;&#9472; buildkit_state volume (persistent cache)
    &#9500;&#9472;&#9472; helix-registry (shared Docker registry)
    &#9474;   &#9492;&#9472;&#9472; registry_data volume (layer-level transfer cache)
    &#9500;&#9472;&#9472; agent-session-A (desktop container)
    &#9474;   &#9492;&#9472;&#9472; local dockerd &#8594; builds route to shared BuildKit
    &#9500;&#9472;&#9472; agent-session-B (desktop container)
    &#9474;   &#9492;&#9472;&#9472; local dockerd &#8594; builds route to shared BuildKit
    &#9492;&#9472;&#9472; agent-session-C ...
</code></code></pre><p>Each desktop container runs its own Docker daemon (for isolation), but all builds route to a <strong>shared BuildKit instance</strong> at the sandbox level. The BuildKit cache is stored on a persistent Docker volume that survives container restarts.</p><p>The key insight: when Agent B builds the same Dockerfile that Agent A already built, BuildKit says &#8220;I already have all these layers cached&#8221; and the build completes instantly. The cache is content-addressed &#8212; identical inputs produce identical cache keys regardless of which container initiated the build.</p><h2><strong>The </strong><code>--load</code><strong> Bottleneck</strong></h2><p>Shared BuildKit got us halfway there. Builds were fast (~0.5 seconds for fully cached images), but there was a catch: <strong>the image still needed to be loaded into the local Docker daemon</strong>.</p><p>When using a remote BuildKit builder, <code>docker buildx build --load</code> exports the built image as a tarball, streams it over gRPC to the client, and imports it into the local daemon. This happens even when every layer is cached and the image hasn&#8217;t changed at all.</p><p>For a 7.73GB image (our desktop base image with GNOME, IDE, and dev tools):</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5Hb7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15281ce8-dc89-4842-82b1-c2e85ad17607_622x185.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5Hb7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15281ce8-dc89-4842-82b1-c2e85ad17607_622x185.png 424w, https://substackcdn.com/image/fetch/$s_!5Hb7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15281ce8-dc89-4842-82b1-c2e85ad17607_622x185.png 848w, https://substackcdn.com/image/fetch/$s_!5Hb7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15281ce8-dc89-4842-82b1-c2e85ad17607_622x185.png 1272w, https://substackcdn.com/image/fetch/$s_!5Hb7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15281ce8-dc89-4842-82b1-c2e85ad17607_622x185.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5Hb7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15281ce8-dc89-4842-82b1-c2e85ad17607_622x185.png" width="622" height="185" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/15281ce8-dc89-4842-82b1-c2e85ad17607_622x185.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:185,&quot;width&quot;:622,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24104,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/189022558?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15281ce8-dc89-4842-82b1-c2e85ad17607_622x185.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5Hb7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15281ce8-dc89-4842-82b1-c2e85ad17607_622x185.png 424w, https://substackcdn.com/image/fetch/$s_!5Hb7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15281ce8-dc89-4842-82b1-c2e85ad17607_622x185.png 848w, https://substackcdn.com/image/fetch/$s_!5Hb7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15281ce8-dc89-4842-82b1-c2e85ad17607_622x185.png 1272w, https://substackcdn.com/image/fetch/$s_!5Hb7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15281ce8-dc89-4842-82b1-c2e85ad17607_622x185.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p><p>That&#8217;s 10 seconds to transfer an image that didn&#8217;t change. The <code>--load</code> flag serializes the entire image into a Docker-format tarball, streams it over gRPC, and the receiving daemon deserializes and imports every layer &#8212; even layers it already has. There&#8217;s no layer-level deduplication in the tarball transfer path.</p><p>This adds up: building Helix involves 6+ images. Even with a hot BuildKit cache, the <code>--load</code> overhead per image turns a sub-second build into a 10-second wait, and the full stack build takes ~23 seconds of mostly <code>--load</code> transfers.</p><h2><strong>Smart </strong><code>--load</code></h2><p>The first optimization: <strong>don&#8217;t load the image if it hasn&#8217;t changed</strong>.</p><pre><code><code>docker build -t myapp:latest .
  &#9492;&#9472;&#9472; wrapper intercepts
      1. Build with --output type=image --provenance=false --iidfile /tmp/iid
         &#8594; BuildKit resolves all layers (cached: ~0.5s)
         &#8594; Writes image config digest to iidfile
         &#8594; No tarball transfer (--output type=image stores in BuildKit only)
      2. Compare iidfile digest with local daemon's image ID
         &#8594; docker images --no-trunc -q myapp:latest
      3. Match? &#8594; Skip --load. "Image unchanged, skipping load"
         Differ? &#8594; Use registry push/pull for layer-level transfer
</code></code></pre><p>A transparent wrapper at <code>/usr/local/bin/docker</code> intercepts both <code>docker build</code> and <code>docker buildx build</code>, applying this logic automatically. No code changes needed in build scripts, Makefiles, or CI pipelines.</p><h3><strong>Three Critical Details</strong></h3><p><strong>1. </strong><code>--iidfile</code><strong> is empty without an output mode on remote builders.</strong></p><p><code>docker buildx build --iidfile /tmp/iid -t foo .</code> with a remote builder produces an <strong>empty iidfile</strong>. BuildKit doesn&#8217;t compute the image config digest unless it actually exports something. The fix: <code>--output type=image</code> tells BuildKit to create the manifest in its internal store (instant for cached builds, no data transfer) and populates the iidfile.</p><p><strong>2. </strong><code>--provenance=false</code><strong> is required.</strong></p><p>With default provenance, BuildKit wraps the image manifest in a <strong>manifest list</strong> that includes an attestation document with build timestamps. The iidfile gets the manifest list digest, which changes every build (because the timestamp changes). With <code>--provenance=false</code>, the iidfile contains the bare image config digest &#8212; deterministic and matching what <code>docker images --no-trunc -q</code> returns.</p><p><strong>3. The wrapper must handle both </strong><code>docker build</code><strong> and </strong><code>docker buildx build</code><strong>.</strong></p><p>Docker 29.x&#8217;s <code>docker build</code> ignores the default buildx builder entirely &#8212; it always uses the local daemon&#8217;s built-in BuildKit. Only <code>docker buildx build</code> honors the configured builder. The wrapper rewrites <code>docker build</code> to <code>docker buildx build</code> (to use the shared cache) and applies smart --load (to avoid the tarball transfer).</p><h2><strong>Registry-Accelerated Loading</strong></h2><p>Smart --load eliminates the transfer when nothing changed. But when code <em>does</em> change, even a one-line change in the top layer of a 7.73GB image still triggers a full tarball <code>--load</code> (~10s). The tarball format doesn&#8217;t support layer-level deduplication &#8212; it&#8217;s all or nothing.</p><p>We solved this with a <strong>shared Docker registry</strong> running alongside BuildKit on the sandbox network. When the wrapper detects an image has changed, instead of <code>--load</code>:</p><ol><li><p><strong>Push</strong> to the registry &#8212; BuildKit pushes only the changed layers (~0.1s)</p></li><li><p><strong>Pull</strong> from the registry &#8212; the local daemon checks which layers it already has, downloads only the new ones (~0.5s)</p></li></ol><p>The Docker registry protocol does layer-level dedup natively. For a 7.73GB image with 95 base layers and 1 changed layer, the pull shows 95 &#8220;Already exists&#8221; and downloads only the single new layer.</p><h3><strong>Benchmarks: 1-line change in top layer of 7.73GB image</strong></h3><p>Measured E2E inside a real desktop container, 3 runs each:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-Hf7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547fe0d3-2152-4024-a38d-b68f52548d43_690x233.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-Hf7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547fe0d3-2152-4024-a38d-b68f52548d43_690x233.png 424w, https://substackcdn.com/image/fetch/$s_!-Hf7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547fe0d3-2152-4024-a38d-b68f52548d43_690x233.png 848w, https://substackcdn.com/image/fetch/$s_!-Hf7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547fe0d3-2152-4024-a38d-b68f52548d43_690x233.png 1272w, https://substackcdn.com/image/fetch/$s_!-Hf7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547fe0d3-2152-4024-a38d-b68f52548d43_690x233.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-Hf7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547fe0d3-2152-4024-a38d-b68f52548d43_690x233.png" width="690" height="233" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/547fe0d3-2152-4024-a38d-b68f52548d43_690x233.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:233,&quot;width&quot;:690,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37717,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/189022558?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547fe0d3-2152-4024-a38d-b68f52548d43_690x233.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-Hf7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547fe0d3-2152-4024-a38d-b68f52548d43_690x233.png 424w, https://substackcdn.com/image/fetch/$s_!-Hf7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547fe0d3-2152-4024-a38d-b68f52548d43_690x233.png 848w, https://substackcdn.com/image/fetch/$s_!-Hf7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547fe0d3-2152-4024-a38d-b68f52548d43_690x233.png 1272w, https://substackcdn.com/image/fetch/$s_!-Hf7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547fe0d3-2152-4024-a38d-b68f52548d43_690x233.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p><p>The three paths compose naturally:</p><ol><li><p><strong>Image unchanged</strong> &#8594; skip load entirely (314ms)</p></li><li><p><strong>Image changed, registry available</strong> &#8594; push/pull via registry (871ms)</p></li><li><p><strong>Image changed, no registry</strong> &#8594; fall back to tarball <code>--load</code> (10s)</p></li></ol><h2><strong>Results</strong></h2><p>There are two cases that matter: cold start (first agent to build a project) and warm start (subsequent agents building the same source).</p><h3><strong>Cold start: ~10 minutes (down from 45 minutes)</strong></h3><p>A fresh agent session starts with an empty Docker daemon &#8212; no images, no layers. Even though every build is a cache hit in shared BuildKit (the compilation is instant), the images still need to be transferred into the local daemon. For Helix-in-Helix, this is a deeply nested pipeline:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kqI_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b4a148-7926-4025-8315-fc295fd44768_961x466.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kqI_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b4a148-7926-4025-8315-fc295fd44768_961x466.png 424w, https://substackcdn.com/image/fetch/$s_!kqI_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b4a148-7926-4025-8315-fc295fd44768_961x466.png 848w, https://substackcdn.com/image/fetch/$s_!kqI_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b4a148-7926-4025-8315-fc295fd44768_961x466.png 1272w, https://substackcdn.com/image/fetch/$s_!kqI_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b4a148-7926-4025-8315-fc295fd44768_961x466.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kqI_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b4a148-7926-4025-8315-fc295fd44768_961x466.png" width="961" height="466" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/48b4a148-7926-4025-8315-fc295fd44768_961x466.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:466,&quot;width&quot;:961,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:76707,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/189022558?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b4a148-7926-4025-8315-fc295fd44768_961x466.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kqI_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b4a148-7926-4025-8315-fc295fd44768_961x466.png 424w, https://substackcdn.com/image/fetch/$s_!kqI_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b4a148-7926-4025-8315-fc295fd44768_961x466.png 848w, https://substackcdn.com/image/fetch/$s_!kqI_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b4a148-7926-4025-8315-fc295fd44768_961x466.png 1272w, https://substackcdn.com/image/fetch/$s_!kqI_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b4a148-7926-4025-8315-fc295fd44768_961x466.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>The cold start is dominated by <strong>image transfer, not compilation</strong>. BuildKit resolves all layers instantly (cached), but loading 7+ GB images into each nesting level takes time. The bottleneck is the <code>--load</code> tarball path: it serializes the entire image regardless of what the receiving daemon already has.</p><p>The nesting makes this worse: Helix-in-Helix has the desktop container (L2) building an inner sandbox (L3), which needs the same 7.24GB desktop image transferred again to a fresh daemon one level deeper.</p><h3><strong>Warm start: 23 seconds (124x faster)</strong></h3><p>Once images exist in the local daemon, subsequent builds are near-instant:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7bRU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0784f7fb-1064-4d93-bafc-f829abd5518d_787x290.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7bRU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0784f7fb-1064-4d93-bafc-f829abd5518d_787x290.png 424w, https://substackcdn.com/image/fetch/$s_!7bRU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0784f7fb-1064-4d93-bafc-f829abd5518d_787x290.png 848w, https://substackcdn.com/image/fetch/$s_!7bRU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0784f7fb-1064-4d93-bafc-f829abd5518d_787x290.png 1272w, https://substackcdn.com/image/fetch/$s_!7bRU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0784f7fb-1064-4d93-bafc-f829abd5518d_787x290.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7bRU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0784f7fb-1064-4d93-bafc-f829abd5518d_787x290.png" width="787" height="290" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0784f7fb-1064-4d93-bafc-f829abd5518d_787x290.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:290,&quot;width&quot;:787,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:46854,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/189022558?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0784f7fb-1064-4d93-bafc-f829abd5518d_787x290.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7bRU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0784f7fb-1064-4d93-bafc-f829abd5518d_787x290.png 424w, https://substackcdn.com/image/fetch/$s_!7bRU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0784f7fb-1064-4d93-bafc-f829abd5518d_787x290.png 848w, https://substackcdn.com/image/fetch/$s_!7bRU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0784f7fb-1064-4d93-bafc-f829abd5518d_787x290.png 1272w, https://substackcdn.com/image/fetch/$s_!7bRU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0784f7fb-1064-4d93-bafc-f829abd5518d_787x290.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>Smart --load checks the image digest against the local daemon (~0.3s) and skips the transfer when nothing changed. This is the common case: agents working on the same codebase where the base images haven&#8217;t been modified.</p><h3><strong>Incremental changes: ~1 second per image</strong></h3><p>When code actually changes, the registry-accelerated load transfers only the changed layers:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZaHt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30d41b22-d535-4fb4-aa01-cd0d97860a17_579x228.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZaHt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30d41b22-d535-4fb4-aa01-cd0d97860a17_579x228.png 424w, https://substackcdn.com/image/fetch/$s_!ZaHt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30d41b22-d535-4fb4-aa01-cd0d97860a17_579x228.png 848w, https://substackcdn.com/image/fetch/$s_!ZaHt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30d41b22-d535-4fb4-aa01-cd0d97860a17_579x228.png 1272w, https://substackcdn.com/image/fetch/$s_!ZaHt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30d41b22-d535-4fb4-aa01-cd0d97860a17_579x228.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZaHt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30d41b22-d535-4fb4-aa01-cd0d97860a17_579x228.png" width="579" height="228" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30d41b22-d535-4fb4-aa01-cd0d97860a17_579x228.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:228,&quot;width&quot;:579,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:31293,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/189022558?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30d41b22-d535-4fb4-aa01-cd0d97860a17_579x228.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZaHt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30d41b22-d535-4fb4-aa01-cd0d97860a17_579x228.png 424w, https://substackcdn.com/image/fetch/$s_!ZaHt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30d41b22-d535-4fb4-aa01-cd0d97860a17_579x228.png 848w, https://substackcdn.com/image/fetch/$s_!ZaHt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30d41b22-d535-4fb4-aa01-cd0d97860a17_579x228.png 1272w, https://substackcdn.com/image/fetch/$s_!ZaHt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30d41b22-d535-4fb4-aa01-cd0d97860a17_579x228.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p><p>A one-line Go change rebuilds only the final compilation layer (~30s) and transfers only that layer via the registry (~1s) instead of the entire 43-minute pipeline.</p><h2><strong>Compose Build Interception</strong></h2><p>There was a gap in the smart <code>--load</code> optimization: <code>docker compose build</code> bypassed it entirely.</p><p>Docker Compose invokes BuildKit through its own Go API, not through the CLI. Our wrapper intercepts <code>docker build</code> and <code>docker buildx build</code>, but compose calls <code>buildx bake</code> internally &#8212; so smart <code>--load</code> never fires. Every compose build did a full tarball <code>--load</code>, even for unchanged images.</p><p>The fix: the wrapper now intercepts <code>docker compose ... build</code>, parses the compose config to extract each service&#8217;s build definition, and builds them individually through the existing smart <code>--load</code> path:</p><pre><code><code>docker compose -f docker-compose.dev.yaml build
  &#9492;&#9472;&#9472; wrapper intercepts (compose + build detected)
      1. $REAL_DOCKER compose config --format json
         &#8594; extract services, image names, build contexts, Dockerfiles, args
      2. For each service with a build section:
         &#8594; docker buildx build -t $IMAGE -f $DOCKERFILE $CONTEXT
         &#8594; smart --load: skip if unchanged, registry push/pull if changed
      3. Compose up finds the images locally.
</code></code></pre><p>Results:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!va1i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4f8971-3a12-4126-b191-9cc24b89032e_900x171.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!va1i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4f8971-3a12-4126-b191-9cc24b89032e_900x171.png 424w, https://substackcdn.com/image/fetch/$s_!va1i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4f8971-3a12-4126-b191-9cc24b89032e_900x171.png 848w, https://substackcdn.com/image/fetch/$s_!va1i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4f8971-3a12-4126-b191-9cc24b89032e_900x171.png 1272w, https://substackcdn.com/image/fetch/$s_!va1i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4f8971-3a12-4126-b191-9cc24b89032e_900x171.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!va1i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4f8971-3a12-4126-b191-9cc24b89032e_900x171.png" width="900" height="171" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df4f8971-3a12-4126-b191-9cc24b89032e_900x171.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:171,&quot;width&quot;:900,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32243,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/189022558?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4f8971-3a12-4126-b191-9cc24b89032e_900x171.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!va1i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4f8971-3a12-4126-b191-9cc24b89032e_900x171.png 424w, https://substackcdn.com/image/fetch/$s_!va1i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4f8971-3a12-4126-b191-9cc24b89032e_900x171.png 848w, https://substackcdn.com/image/fetch/$s_!va1i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4f8971-3a12-4126-b191-9cc24b89032e_900x171.png 1272w, https://substackcdn.com/image/fetch/$s_!va1i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf4f8971-3a12-4126-b191-9cc24b89032e_900x171.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p><p>Not as dramatic as the other optimizations, but 6 seconds saved on every warm build adds up across thousands of agent sessions.</p><h2><strong>The Golden Docker Cache: Eliminating Cold Start Entirely</strong></h2><p>Smart <code>--load</code>, registry-accelerated transfers, and compose interception transformed warm starts from 45 minutes to 23 seconds. But the cold start &#8212; the first agent session for a project &#8212; still took <strong>10 minutes</strong>. Every image had to be transferred into an empty Docker daemon, even though BuildKit compiled nothing.</p><p>We wanted cold start to feel like warm start. Zero penalty for being the first session.</p><h3><strong>The idea</strong></h3><p>When code merges to main, automatically spin up a desktop container, run the project&#8217;s startup script (which builds all the Docker images), then snapshot the entire <code>/var/lib/docker</code> directory. When a new session starts, copy that snapshot &#8212; the &#8220;golden cache&#8221; &#8212; into the session&#8217;s Docker data directory. The local daemon starts with all images pre-populated. No builds, no transfers, no waiting.</p><h3><strong>Why it captures everything</strong></h3><p>Docker&#8217;s data directory contains everything the daemon needs:</p><ul><li><p><strong>Image layers</strong> (<code>overlay2/</code>) &#8212; all built images, all layers</p></li><li><p><strong>Docker volumes</strong> (<code>volumes/</code>) &#8212; inner registries, BuildKit state, nested Docker data</p></li><li><p><strong>Container metadata</strong> &#8212; not useful (containers don&#8217;t survive restart), but harmless</p></li></ul><p>For a project like Helix-in-Helix, the golden cache even includes the inner sandbox&#8217;s Docker data (stored as a Docker volume within the session&#8217;s daemon). The inner sandbox starts with its images pre-populated too &#8212; no transfer through the inner registry needed.</p><h3><strong>The build is just a startup script run</strong></h3><p>Golden builds are beautifully simple: they&#8217;re regular desktop containers with one special environment variable (<code>HELIX_GOLDEN_BUILD=true</code>). The container clones the repo, checks out main, runs the startup script, then exits. The workspace setup script detects the golden mode and skips launching the IDE &#8212; just runs the startup script in the foreground and exits with its return code.</p><p>No new build system. No image manifest parsing. No layer-level copying. The startup script already knows how to build the project. We just run it once and keep the result.</p><h3><strong>Per-project, automatic, incremental</strong></h3><p>Each project gets its own golden cache, scoped by project ID:</p><pre><code><code>/container-docker/
&#9500;&#9472;&#9472; golden/
&#9474;   &#9500;&#9472;&#9472; prj_abc123/docker/    &#8592; Project A's golden (8.7 GB)
&#9474;   &#9492;&#9472;&#9472; prj_def456/docker/    &#8592; Project B's golden (3.2 GB)
&#9492;&#9472;&#9472; sessions/
    &#9492;&#9472;&#9472; docker-data-ses_xyz/docker/  &#8592; copied from golden at session start
</code></code></pre><p>Golden builds trigger automatically when code merges to main (via PR merge or internal approve-implementation). They&#8217;re debounced per-project &#8212; if a build is already running, additional merges are skipped. And critically, they&#8217;re <strong>incremental</strong>: each golden build starts from the previous golden cache, so only changed images need rebuilding. A typical incremental golden build takes 30 seconds to 2 minutes, not 10 minutes.</p><h3><strong>The overlayfs false start</strong></h3><p>Our first approach was elegant on paper: use overlayfs with the golden as the read-only lower directory and a per-session upper directory for copy-on-write. O(1) mount time, true COW semantics, minimal disk usage.</p><p>It didn&#8217;t work. Docker&#8217;s overlay2 storage driver creates its own overlayfs mounts inside <code>/var/lib/docker/overlay2/</code>. Nested overlayfs requires the upper directory to be on a non-overlayfs filesystem &#8212; our merged directory was itself overlayfs, so Docker failed with <code>invalid argument</code>. This is a kernel-level restriction, not a configuration issue.</p><h3><strong>The copy approach that actually works</strong></h3><p>We switched to <code>cp -a</code>: copy the entire golden directory to the session&#8217;s Docker data directory at session start. Less elegant than overlayfs, but it works reliably and performs well enough:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1R8r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50d74180-5f8c-48af-99cb-ad1be5693a50_959x356.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1R8r!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50d74180-5f8c-48af-99cb-ad1be5693a50_959x356.png 424w, https://substackcdn.com/image/fetch/$s_!1R8r!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50d74180-5f8c-48af-99cb-ad1be5693a50_959x356.png 848w, https://substackcdn.com/image/fetch/$s_!1R8r!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50d74180-5f8c-48af-99cb-ad1be5693a50_959x356.png 1272w, https://substackcdn.com/image/fetch/$s_!1R8r!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50d74180-5f8c-48af-99cb-ad1be5693a50_959x356.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1R8r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50d74180-5f8c-48af-99cb-ad1be5693a50_959x356.png" width="959" height="356" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/50d74180-5f8c-48af-99cb-ad1be5693a50_959x356.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:356,&quot;width&quot;:959,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:63453,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/189022558?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50d74180-5f8c-48af-99cb-ad1be5693a50_959x356.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1R8r!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50d74180-5f8c-48af-99cb-ad1be5693a50_959x356.png 424w, https://substackcdn.com/image/fetch/$s_!1R8r!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50d74180-5f8c-48af-99cb-ad1be5693a50_959x356.png 848w, https://substackcdn.com/image/fetch/$s_!1R8r!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50d74180-5f8c-48af-99cb-ad1be5693a50_959x356.png 1272w, https://substackcdn.com/image/fetch/$s_!1R8r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50d74180-5f8c-48af-99cb-ad1be5693a50_959x356.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>13.8 seconds to go from empty daemon to 8.7 GB of pre-built images. Compare that to 10 minutes of building and transferring through nested daemons.</p><h3><strong>Staleness is handled gracefully</strong></h3><p>What if code changes after the golden was built? The session starts with slightly stale images, but the smart <code>--load</code> optimization handles it transparently. When the startup script runs <code>docker build</code>, the wrapper checks the image digest against BuildKit &#8212; if it&#8217;s changed, the registry push/pull transfers only the changed layers (~1 second). The golden provides a warm baseline; the wrapper handles the delta.</p><p>The golden rebuilds on the next merge to main, so staleness is bounded by the development cycle.</p><h2><strong>The Full Picture</strong></h2><p>Here&#8217;s where we ended up, starting from 45 minutes:</p><h3><strong>Cold start: 14 seconds (from 10 minutes, from 45 minutes)</strong></h3><p><strong>PhaseOriginalSmart --loadGolden cache</strong>API + frontend (compose)200s41s<strong>0s</strong> (pre-built)Zed IDE + desktop image459s132s<strong>0s</strong> (pre-built)Inner sandbox setup2,075s380s<strong>0s</strong> (pre-built)Golden copy&#8212;&#8212;<strong>14sTotal45 min10 min14sSpeedup</strong>baseline4.5x<strong>193x</strong></p><h3><strong>Warm start: 23 seconds (unchanged)</strong></h3><p>The warm start didn&#8217;t change &#8212; it was already fast from smart <code>--load</code>. The golden cache&#8217;s value is making cold start match warm start.</p><h3><strong>Incremental golden builds: 30s&#8211;2 min</strong></h3><p>Golden builds start from the previous golden, so they only rebuild what changed:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bRQ4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92d3bba-f3c8-4cb5-9735-87c2dbf50d5d_601x243.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bRQ4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92d3bba-f3c8-4cb5-9735-87c2dbf50d5d_601x243.png 424w, https://substackcdn.com/image/fetch/$s_!bRQ4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92d3bba-f3c8-4cb5-9735-87c2dbf50d5d_601x243.png 848w, https://substackcdn.com/image/fetch/$s_!bRQ4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92d3bba-f3c8-4cb5-9735-87c2dbf50d5d_601x243.png 1272w, https://substackcdn.com/image/fetch/$s_!bRQ4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92d3bba-f3c8-4cb5-9735-87c2dbf50d5d_601x243.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bRQ4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92d3bba-f3c8-4cb5-9735-87c2dbf50d5d_601x243.png" width="601" height="243" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d92d3bba-f3c8-4cb5-9735-87c2dbf50d5d_601x243.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:243,&quot;width&quot;:601,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:31481,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/189022558?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92d3bba-f3c8-4cb5-9735-87c2dbf50d5d_601x243.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bRQ4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92d3bba-f3c8-4cb5-9735-87c2dbf50d5d_601x243.png 424w, https://substackcdn.com/image/fetch/$s_!bRQ4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92d3bba-f3c8-4cb5-9735-87c2dbf50d5d_601x243.png 848w, https://substackcdn.com/image/fetch/$s_!bRQ4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92d3bba-f3c8-4cb5-9735-87c2dbf50d5d_601x243.png 1272w, https://substackcdn.com/image/fetch/$s_!bRQ4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd92d3bba-f3c8-4cb5-9735-87c2dbf50d5d_601x243.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Implementation</strong></h2><p>The system has four components working together:</p><ol><li><p><strong>Docker wrapper</strong> &#8212; installed at <code>/usr/local/bin/docker</code> in each desktop container. Intercepts <code>docker build</code>, <code>docker buildx build</code>, and <code>docker compose build</code>. Routes builds through shared BuildKit, applies smart <code>--load</code> with registry acceleration, decomposes compose builds into individual smart builds. Falls back to tarball <code>--load</code> if the registry is unavailable.</p></li><li><p><strong>Shared BuildKit + Registry</strong> (<code>api/pkg/hydra/manager.go</code>) &#8212; Hydra starts a <code>helix-buildkit</code> container (shared build cache) and a <code>helix-registry</code> container (layer-level transfer) at the sandbox level. Both are on the same Docker network as desktop containers. BuildKit is configured to trust the insecure registry for push operations.</p></li><li><p><strong>Init script</strong> (<code>desktop/shared/17-start-dockerd.sh</code>) &#8212; configures the desktop container&#8217;s dockerd to trust the insecure registry and exports <code>HELIX_REGISTRY</code> and <code>BUILDX_BUILDER</code> globally so the wrapper knows where to push/pull and which builder to use.</p></li><li><p><strong>Golden build service</strong> (<code>api/pkg/services/golden_build_service.go</code>, <code>api/pkg/hydra/golden.go</code>) &#8212; manages golden cache lifecycle. The API-side service triggers builds on merge-to-main, tracks build status in project metadata, and debounces concurrent builds. The Hydra-side code handles golden directory management, session-to-golden promotion, and the <code>cp -a</code> copy on session startup.</p></li></ol><p>The wrapper is generic &#8212; it works for any <code>docker build</code> workload, not just Helix. It auto-detects whether the active builder is remote, and only applies smart --load when it is. On a standard local Docker setup, it&#8217;s a transparent passthrough.</p><h2><strong>What We Built</strong></h2><p>We started with a simple problem &#8212; Docker builds are slow when every agent starts cold &#8212; and ended up building something genuinely interesting: a multi-layered caching system that operates transparently across nested Docker daemons, shared build caches, and per-project golden snapshots.</p><p>The numbers tell the story:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!K-en!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0418331-ecc3-4f1e-b9f9-8ab009136a95_948x241.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!K-en!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0418331-ecc3-4f1e-b9f9-8ab009136a95_948x241.png 424w, https://substackcdn.com/image/fetch/$s_!K-en!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0418331-ecc3-4f1e-b9f9-8ab009136a95_948x241.png 848w, https://substackcdn.com/image/fetch/$s_!K-en!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0418331-ecc3-4f1e-b9f9-8ab009136a95_948x241.png 1272w, https://substackcdn.com/image/fetch/$s_!K-en!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0418331-ecc3-4f1e-b9f9-8ab009136a95_948x241.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!K-en!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0418331-ecc3-4f1e-b9f9-8ab009136a95_948x241.png" width="948" height="241" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e0418331-ecc3-4f1e-b9f9-8ab009136a95_948x241.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:241,&quot;width&quot;:948,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:43415,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/189022558?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0418331-ecc3-4f1e-b9f9-8ab009136a95_948x241.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!K-en!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0418331-ecc3-4f1e-b9f9-8ab009136a95_948x241.png 424w, https://substackcdn.com/image/fetch/$s_!K-en!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0418331-ecc3-4f1e-b9f9-8ab009136a95_948x241.png 848w, https://substackcdn.com/image/fetch/$s_!K-en!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0418331-ecc3-4f1e-b9f9-8ab009136a95_948x241.png 1272w, https://substackcdn.com/image/fetch/$s_!K-en!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0418331-ecc3-4f1e-b9f9-8ab009136a95_948x241.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>An agent can now start working on a project in under 30 seconds, regardless of whether it&#8217;s the first session or the hundredth. The difference between 45 minutes and 14 seconds isn&#8217;t incremental &#8212; it changes what&#8217;s practical. Agents can spin up, do focused work, and tear down without the overhead dominating the task. Short-lived sessions become viable. Parallel agents become economical.</p><p>And the best part: it&#8217;s all transparent. Build scripts, Makefiles, docker-compose files &#8212; none of them changed. The wrapper intercepts standard Docker commands and applies the optimizations automatically. Projects opt into golden cache warming with a single toggle, and the system handles the rest.</p>]]></content:encoded></item><item><title><![CDATA[GPU Virtualization Architecture for Multi-Desktop Containers]]></title><description><![CDATA[I thought we'd have this working by now...]]></description><link>https://blog.helix.ml/p/gpu-virtualization-architecture-for</link><guid isPermaLink="false">https://blog.helix.ml/p/gpu-virtualization-architecture-for</guid><dc:creator><![CDATA[Luke Marsden]]></dc:creator><pubDate>Mon, 16 Feb 2026 10:38:58 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!yXa1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a5d818-b060-45ab-840b-3cac6aa75098_4064x2160.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yXa1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a5d818-b060-45ab-840b-3cac6aa75098_4064x2160.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yXa1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a5d818-b060-45ab-840b-3cac6aa75098_4064x2160.png 424w, https://substackcdn.com/image/fetch/$s_!yXa1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a5d818-b060-45ab-840b-3cac6aa75098_4064x2160.png 848w, https://substackcdn.com/image/fetch/$s_!yXa1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a5d818-b060-45ab-840b-3cac6aa75098_4064x2160.png 1272w, https://substackcdn.com/image/fetch/$s_!yXa1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a5d818-b060-45ab-840b-3cac6aa75098_4064x2160.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yXa1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a5d818-b060-45ab-840b-3cac6aa75098_4064x2160.png" width="1456" height="774" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/33a5d818-b060-45ab-840b-3cac6aa75098_4064x2160.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:774,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2534754,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/188124846?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a5d818-b060-45ab-840b-3cac6aa75098_4064x2160.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yXa1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a5d818-b060-45ab-840b-3cac6aa75098_4064x2160.png 424w, https://substackcdn.com/image/fetch/$s_!yXa1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a5d818-b060-45ab-840b-3cac6aa75098_4064x2160.png 848w, https://substackcdn.com/image/fetch/$s_!yXa1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a5d818-b060-45ab-840b-3cac6aa75098_4064x2160.png 1272w, https://substackcdn.com/image/fetch/$s_!yXa1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a5d818-b060-45ab-840b-3cac6aa75098_4064x2160.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Overview</strong></h2><p>Helix Desktop runs multiple isolated Linux desktop environments (each with its own GNOME Shell, IDE, and browser) inside a single QEMU virtual machine on Apple Silicon Macs. Each desktop gets its own virtual GPU output, H.264 video stream, and DRM lease &#8212; all sharing one physical GPU through virtio-gpu with Vulkan passthrough via Venus/virglrenderer.</p><p>This document describes the full architecture from silicon to pixel, and the deadlock bugs we found and fixed when scaling from 1-2 desktops to 4+.</p><h2><strong>Why This Matters</strong></h2><p>AI agents are getting good enough to write real code, but they still need somewhere to run it. Not just a terminal &#8212; a full desktop environment with a browser for testing, an IDE for human pair programmers, and GPU acceleration for anything graphical (and hardware video encoding for low latency when a user wants to pair with them over the network). And when you have a team of agents working on different tasks, each one needs its own isolated sandbox so they don&#8217;t step on each other&#8217;s files, processes, or state.</p><p>Helix Desktop gives every agent its own full Linux desktop &#8212; running in an isolated container with GPU acceleration. Humans can watch what their agents are doing in real time via H.264 video streams, jump in to collaborate through the same desktop interface, and manage their flock of agents from a mobile phone while on the go. Think of it as giving each agent their own workstation in a virtual office, where you can glance at any screen and tap on it to intervene.</p><p>This architecture also enables new human-computer interaction patterns: commentable spec-driven development where a human writes requirements in a Google Docs-style document, agents immediately update their design docs in response to comments, and the human reviews and redirects &#8212; all happening concurrently across multiple agent desktops. The agents work in parallel, each in their own sandbox, while the human herds the flock.</p><p>The hard technical problem: running 4+ GPU-accelerated desktops simultaneously inside a single QEMU virtual machine on Apple Silicon, sharing one physical GPU, without them deadlocking each other. That&#8217;s what this document is about.</p><h2><strong>The Stack</strong></h2><pre><code><code>Browser (WebSocket H.264 client)
    |
Helix Frame Export (VideoToolbox H.264, per-scanout)
    |
QEMU virtio-gpu device model (fence_poll, process_cmdq, scanout management)
    |
virglrenderer (Venus proxy &#8212; Vulkan API translation, runs as separate process)
    |
Apple Metal / ParavirtualizedGraphics (actual GPU execution)
    |
Apple M-series GPU silicon
</code></code></pre><p>On the guest side:</p><pre><code><code>Container (gnome-shell + Zed IDE + browser)
    |
DRM lease FD (connector + CRTC + planes)
    |
virtio-gpu kernel driver (DMA fences, GEM objects, atomic modesetting)
    |
virtio control queue (1024-entry ring buffer to QEMU)
</code></code></pre><h2><strong>Layer 1: The Virtio Control Queue</strong></h2><p>The guest Linux kernel&#8217;s <code>virtio_gpu</code> driver communicates with QEMU through a virtio virtqueue &#8212; a shared-memory ring buffer. The guest writes command descriptors (create resource, submit 3D command batch, map blob, set scanout, etc.) and kicks the queue. QEMU receives the kick as a vmexit on Apple&#8217;s Hypervisor.framework, pops commands from the ring, and processes them.</p><p>There are two queues: <strong>control</strong> (all GPU commands) and <strong>cursor</strong> (cursor image updates). The control queue is the bottleneck.</p><p><strong>Sizing matters.</strong> The default queue size is 256 entries for 2D mode, which we increased to 1024 (the virtio maximum) for 3D/GL mode. With 4 gnome-shells each submitting GPU commands continuously, 256 entries fills up. When the ring is full, guest threads block in <code>virtio_gpu_queue_ctrl_sgs</code> &#8212; a kernel spinwait that shows up as permanent D-state processes. 1024 entries gives enough headroom.</p><h3><strong>Command Response Flow</strong></h3><pre><code><code>Guest kernel                    QEMU (main thread)
============                    ==================
write cmd to ring
virtqueue_kick() &#9472;&#9472;vmexit&#9472;&#9472;&gt;    virtio_gpu_handle_ctrl_cb()
                                  qemu_bh_schedule(ctrl_bh)
                                    ...main loop iteration...
                                  virtio_gpu_gl_handle_ctrl()
                                    virtqueue_pop() -- dequeue all pending
                                    QTAILQ_INSERT_TAIL(&amp;cmdq)
                                    virtio_gpu_process_cmdq()
                                      for each cmd in cmdq:
                                        process_cmd(cmd)  -- dispatch
                                        if fenced: move to fenceq
                                        if finished: send response
                                    virtio_gpu_virgl_fence_poll()
                                      virgl_renderer_poll()  -- check GPU
                                      process_cmdq() again
                                      re-arm timer

&lt;&#9472;&#9472;interrupt&#9472;&#9472;                  virtio_notify()
dma_fence_signal()                (response written to reply ring)
</code></code></pre><p>The critical thing: <strong>every command gets exactly one response</strong>. The guest thread that submitted it blocks in the kernel until that response arrives as a virtio interrupt. If QEMU never processes the command, the guest thread blocks forever.</p><h2><strong>Layer 2: QEMU&#8217;s Command Processing Pipeline</strong></h2><p>QEMU maintains two queues:</p><ul><li><p><code>cmdq</code>: Commands popped from the virtio ring, waiting to be dispatched to virglrenderer</p></li><li><p><code>fenceq</code>: Commands that have been dispatched but are waiting for GPU completion (async)</p></li></ul><p>And one critical counter:</p><ul><li><p><code>renderer_blocked</code>: A global semaphore. When &gt;0, <code>process_cmdq()</code> refuses to process ANY command from ANY context.</p></li></ul><h3><strong>The </strong><code>renderer_blocked</code><strong> Problem</strong></h3><p><code>renderer_blocked</code> was designed for SPICE&#8217;s GL display path. When SPICE blits a frame to the client, it calls <code>graphic_hw_gl_block(true)</code> to pause GPU command processing until the client acknowledges the frame (<code>gl_draw_done</code>). This makes sense for a single display &#8212; you don&#8217;t want the GPU racing ahead while the display catches up.</p><p>But <code>renderer_blocked</code> is <strong>global across all scanouts</strong>. With 4 gnome-shells, if scanout 1&#8217;s SPICE client is slow to acknowledge, ALL four desktops freeze. Worse, blob resource unmaps (Venus uses these heavily for Vulkan memory management) were also incrementing <code>renderer_blocked</code> during their async RCU cleanup phase. With 4 contexts doing overlapping blob unmaps, the counter stayed &gt;0 perpetually.</p><p><strong>Fix</strong>: We removed <code>renderer_blocked</code> from the blob unmap path entirely. The suspended-command mechanism (<code>cmd_suspended</code> flag + <code>continue</code> in the FOREACH loop) already prevents the specific unmap command from re-executing before RCU completes, without blocking commands from other contexts. We also skip <code>dpy_gl_update</code> entirely on Apple builds (Helix frame export handles frame capture directly, bypassing SPICE).</p><h3><strong>The </strong><code>process_cmdq</code><strong> FIFO Blocking Problem</strong></h3><p>The original <code>process_cmdq</code> used <code>QTAILQ_FIRST</code> + <code>break</code> when it encountered a suspended command:</p><pre><code>// OLD (broken with 4+ contexts):
while (!QTAILQ_EMPTY(&amp;cmdq)) {
    cmd = QTAILQ_FIRST(&amp;cmdq);
    process_cmd(cmd);
    if (cmd_suspended) break;  // STOPS ALL PROCESSING
    ...
}</code></pre><p>A single suspended blob unmap from context 1 would block commands from contexts 2, 3, and 4 that are sitting later in the queue.</p><p><strong>Fix</strong>: Changed to <code>QTAILQ_FOREACH_SAFE</code> with <code>continue</code> &#8212; suspended commands stay in the queue but later commands are processed normally.</p><h2><strong>Layer 3: Fences and the Poll Timer</strong></h2><p>When a guest submits a GPU command with <code>VIRTIO_GPU_FLAG_FENCE</code>, QEMU dispatches it to virglrenderer and moves it to <code>fenceq</code>. The command stays there until virglrenderer reports that the GPU finished the work.</p><p>virglrenderer reports fence completion via a callback (<code>virgl_write_fence</code>), but this callback only fires when QEMU calls <code>virgl_renderer_poll()</code>. And <code>virgl_renderer_poll()</code> only gets called from two places:</p><ol><li><p><code>handle_ctrl</code> &#8212; when the guest kicks the virtqueue (submits new commands)</p></li><li><p><code>fence_poll</code> &#8212; a periodic timer callback</p></li></ol><p>The <code>fence_poll</code> timer is supposed to fire every 10ms (100 Hz). Each invocation:</p><ol><li><p>Calls <code>virgl_renderer_poll()</code> &#8212; asks virglrenderer &#8220;any fences done?&#8221;</p></li><li><p>Calls <code>process_cmdq()</code> &#8212; processes any queued commands</p></li><li><p>Re-arms itself for 10ms later</p></li></ol><h3><strong>Why the Timer Matters</strong></h3><p>Without <code>fence_poll</code>, fence completions only get checked when the guest submits new commands (via <code>handle_ctrl</code>). But if the guest is <em>waiting</em> for a fence to complete before submitting the next command, there&#8217;s a circular dependency:</p><pre><code><code>Guest: "I'll submit my next command after fence 42 completes"
QEMU:  "I'll check if fence 42 completed when I get the next command"
</code></code></pre><p>The timer breaks this cycle by polling independently.</p><h3><strong>The Virtual Clock Problem</strong></h3><p>The original code used <code>QEMU_CLOCK_VIRTUAL</code> for the timer. This clock tracks virtual CPU time &#8212; it <strong>stops advancing when all vCPUs are halted</strong> (executing WFI/wait-for-interrupt). When all guest threads are blocked on GPU fences, all vCPUs eventually enter WFI, the virtual clock stops, and <code>fence_poll</code> never fires. The fences never complete, the vCPUs never wake up &#8212; permanent deadlock.</p><p><strong>Fix</strong>: Switch to <code>QEMU_CLOCK_REALTIME</code> which always advances regardless of vCPU state. Also make the timer unconditionally re-arm (the original code only re-armed when there was work to do, but there was a race window between &#8220;work arrives&#8221; and &#8220;timer checks&#8221;).</p><h3><strong>The Mystery: REALTIME Timer Still Doesn&#8217;t Fire</strong></h3><p>After switching to <code>QEMU_CLOCK_REALTIME</code>, <code>fence_poll</code> still shows zero hits in 1-second process samples (782 samples at 1ms intervals). Meanwhile, <code>gui_update</code> &#8212; also a REALTIME timer &#8212; fires 3-4 times per second from the exact same <code>timerlist_run_timers</code> call path. Both timers are created with <code>timer_new_ms(QEMU_CLOCK_REALTIME, ...)</code> so they should be on the same timerlist. We confirmed via QEMU logs that <code>virtio_gpu_virgl_init</code> runs (twice, due to a guest driver reset/re-init cycle) and reaches the <code>timer_new_ms</code> + <code>timer_mod</code> calls.</p><p>The QEMU main loop thread spends 768/782 samples idle in <code>g_poll</code> &#8594; <code>__select</code>. During the 14 active samples, 3 go through <code>qemu_clock_run_all_timers</code> &#8594; <code>timerlist_run_timers</code> &#8594; <code>gui_update</code>. Zero go through <code>fence_poll</code>. All 14 vCPU threads show heavy BQL contention (25-60% of samples in <code>bql_lock_impl</code>).</p><p>The QEMU logs show <code>Blocked re-entrant IO on MemoryRegion: virtio-pci-notify-virtio-gpu</code> which means <code>virtio_notify()</code> &#8212; called from <code>process_cmdq()</code> &#8594; <code>virtio_gpu_ctrl_response()</code> when completing a command &#8212; is hitting QEMU&#8217;s memory region re-entrancy guard. The guard silently returns <code>MEMTX_ACCESS_ERROR</code>, dropping the guest notification. This could cascade: if the dropped notification means a guest interrupt never fires, the guest thread stays blocked, the vCPU stays in WFI, and the circular dependency persists.</p><p>However, this doesn&#8217;t explain why the timer itself doesn&#8217;t fire. The re-entrancy affects notifications inside <code>process_cmdq</code>, not the timer scheduling. The timer should fire regardless of what happens inside its callback &#8212; the callback runs, re-arms via <code>timer_mod</code>, and the main loop picks it up next iteration.</p><p>Root cause remains unknown. The difference between <code>gui_update</code> (fires) and <code>fence_poll</code> (doesn&#8217;t fire) may be related to when the timer is created: <code>gui_update</code> is created during display initialization before the main loop starts, while <code>fence_poll</code> is created lazily during <code>handle_ctrl</code> (first virtqueue kick) after the main loop is already running. There may be a timer registration race in QEMU&#8217;s GLib integration.</p><h3><strong>The Workaround: Thread-Based Fence Polling</strong></h3><p>Rather than continuing to debug QEMU&#8217;s timer internals, we bypass the timer system entirely with a dedicated thread:</p><pre><code>/* Thread function &#8212; runs independently of QEMU&#8217;s main loop */
static void *fence_poll_thread_fn(void *opaque)
{
    VirtIOGPU *g = opaque;
    VirtIOGPUGL *gl = VIRTIO_GPU_GL(g);

    while (gl-&gt;fence_poll_thread_running) {
        g_usleep(10000); /* 10ms = 100 Hz */
        qemu_bh_schedule(gl-&gt;fence_poll_bh);
    }
    return NULL;
}

/* BH callback &#8212; runs on main loop thread with BQL held */
static void fence_poll_bh_cb(void *opaque)
{
    VirtIOGPU *g = opaque;
    virgl_renderer_poll();
    virtio_gpu_process_cmdq(g);
}</code></pre><p>The thread does nothing except sleep 10ms and schedule a bottom-half (BH) on QEMU&#8217;s main loop. <code>qemu_bh_schedule()</code> is documented as thread-safe &#8212; it writes to an eventfd that wakes the main loop from its <code>g_poll</code>. The BH dispatches on the main thread via <code>aio_ctx_dispatch</code> with BQL held, which is the correct context for <code>virgl_renderer_poll()</code> and <code>process_cmdq()</code>.</p><p>This is robust because:</p><ul><li><p><code>g_usleep</code> always works (no dependency on QEMU&#8217;s timer system)</p></li><li><p><code>qemu_bh_schedule</code> always works (we see BH dispatch in the process samples)</p></li><li><p>BH dispatch is the same mechanism used for virtio command processing</p></li><li><p>The original QEMU timer is kept as a secondary fallback &#8212; if it ever fires, extra <code>virgl_renderer_poll</code> calls are harmless</p></li></ul><h2><strong>Layer 4: virglrenderer and Venus</strong></h2><p>virglrenderer translates Vulkan API calls from the guest into native Metal API calls on the host. It runs as a <strong>separate process</strong> (proxy mode) communicating with QEMU over a Unix socket. Each guest GPU context (one per gnome-shell) gets its own virglrenderer thread.</p><p>The flow:</p><ol><li><p>Guest Mesa driver makes Vulkan calls</p></li><li><p>Venus (Vulkan-on-virtio-gpu protocol) serializes them into virtio-gpu <code>SUBMIT_CMD</code> batches</p></li><li><p>QEMU dispatches batches to virglrenderer via <code>virgl_renderer_submit_cmd()</code></p></li><li><p>virglrenderer deserializes and calls Metal/MoltenVK equivalents</p></li><li><p>When GPU work completes, virglrenderer reports via <code>virgl_write_fence()</code> callback</p></li></ol><p>Venus heavily uses <strong>blob resources</strong> &#8212; guest-visible GPU memory objects. Creating and destroying these involves <code>RESOURCE_CREATE_BLOB</code> and <code>RESOURCE_UNMAP_BLOB</code> commands. The unmap path is particularly tricky because it requires RCU (read-copy-update) synchronization to safely remove memory regions, which is what led to the suspended-command mechanism.</p><h2><strong>Layer 5: DRM Leases</strong></h2><p>Each agent&#8217;s container needs exclusive access to a virtual GPU output &#8212; its own screen, essentially. When a human starts a new agent session, the system needs to dynamically provision a virtual display, hand it to the agent&#8217;s container, and start streaming video from it. When the agent&#8217;s session ends (or crashes), the display is reclaimed and recycled. This has to work for 15+ concurrent agents on a single machine.</p><p>Linux DRM leases provide the isolation primitive: the DRM master can carve off subsets of its resources and hand them to clients as independent DRM file descriptors.</p><p>The <strong>helix-drm-manager</strong> runs as a systemd service on the guest VM:</p><ol><li><p>Opens <code>/dev/dri/card0</code> as DRM master</p></li><li><p>Enumerates connectors and CRTCs (virtio-gpu creates 16 virtual outputs)</p></li><li><p>Listens on a Unix socket for lease requests from containers</p></li><li><p>For each request:</p><ul><li><p>Allocates a scanout index (1-15; 0 is the VM console)</p></li><li><p>Tells QEMU to enable that scanout (TCP message to frame export server)</p></li><li><p>Creates a DRM lease (connector + CRTC + primary plane + cursor plane)</p></li><li><p>Sends the lease FD to the container via <code>SCM_RIGHTS</code></p></li></ul></li><li><p>Monitors the connection &#8212; when the container dies, automatically revokes the lease and disables the scanout</p></li></ol><h3><strong>The mode_config.mutex Deadlock</strong></h3><p>Two operations in the DRM manager acquired the kernel&#8217;s <code>mode_config.mutex</code>:</p><ol><li><p><code>activateCrtc</code> &#8212; <code>DRM_IOCTL_MODE_SETCRTC</code> on the master FD to pre-initialize the CRTC before handing the lease to mutter</p></li><li><p><code>reprobeConnector</code> &#8212; writing to <code>/sys/class/drm/card0-Virtual-N/status</code> to trigger connector detection</p></li></ol><p>Running gnome-shells also hold <code>mode_config.mutex</code> during atomic page flips (<code>drm_atomic_commit</code>). If a gnome-shell is mid-commit waiting for a GPU fence (which may be stalled due to the fence_poll issue), it holds the mutex indefinitely. The DRM manager trying to set up a new lease blocks on the same mutex, and all other gnome-shells&#8217; page flips cascade-block behind it.</p><p><strong>Fix</strong>: Removed both <code>activateCrtc</code> and <code>reprobeConnector</code>. QEMU&#8217;s <code>enableScanout</code> already triggers the guest hotplug event via <code>dpy_set_ui_info</code>, so the connector appears without explicit reprobe. Mutter can do its own initial modeset through the lease FD now that <code>DRM_CLIENT_CAP_UNIVERSAL_PLANES</code> is set on the master.</p><h2><strong>Layer 6: Frame Export and Video Streaming</strong></h2><p>The Helix frame export system (<code>helix-frame-export.m</code>) captures GPU frames directly from QEMU and encodes them as H.264 video:</p><ol><li><p><strong>Capture</strong>: When virglrenderer flushes a scanout, QEMU&#8217;s <code>virgl_cmd_resource_flush</code> calls into helix frame export. The frame&#8217;s Metal texture handle is extracted directly from virglrenderer&#8217;s native handle &#8212; zero CPU copies.</p></li><li><p><strong>Blit</strong>: The Metal texture is blitted to an IOSurface via EGL/GL. Triple buffering (3 IOSurface slots per scanout) allows VideoToolbox to encode asynchronously without blocking the GPU.</p></li><li><p><strong>Encode</strong>: Apple&#8217;s VideoToolbox hardware H.264 encoder compresses each IOSurface. The encode callback fires on a VT thread, which schedules a BH (bottom-half) on QEMU&#8217;s main thread to send the encoded frame.</p></li><li><p><strong>Send</strong>: Encoded NAL units are sent to subscribed clients over TCP sockets. Each client subscribes to a specific scanout. Frames are dropped (not queued) if the client&#8217;s send buffer is full &#8212; this prevents one slow client from affecting others.</p></li></ol><p>The frame export explicitly avoids <code>renderer_blocked</code> / <code>gl_block</code>. The old SPICE GL path used <code>renderer_blocked</code> for backpressure (pause GPU until client acknowledges frame), which is global and causes cross-scanout stalls. Instead, the frame export uses per-slot busy flags &#8212; if all 3 IOSurface slots for a scanout are busy with VT encoding, that scanout&#8217;s frames are dropped, but other scanouts continue normally.</p><h2><strong>Why Scaling Matters</strong></h2><p>A single desktop works fine. Two work fine. The deadlocks only appear at 4+ concurrent desktops &#8212; which is exactly the regime we need for production use. A developer working with a team of agents will routinely have 4-8 agents running simultaneously: one refactoring the backend, one writing frontend tests, one investigating a bug, one updating documentation. Each needs a responsive GPU-accelerated desktop. If starting the fourth agent freezes the other three, the product doesn&#8217;t work.</p><p>Every fix described below was discovered by starting 4 desktops in quick succession and tracing kernel stacks, QEMU process samples, and <code>/proc/interrupts</code> to find exactly where the system seized up. The bugs are all variations of the same theme: mechanisms designed for a single GPU context becoming global bottlenecks when shared across many.</p><h2><strong>Summary of Fixes</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LFP2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62664a35-e1cd-4dd1-a371-cdffda47204a_2062x868.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LFP2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62664a35-e1cd-4dd1-a371-cdffda47204a_2062x868.png 424w, https://substackcdn.com/image/fetch/$s_!LFP2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62664a35-e1cd-4dd1-a371-cdffda47204a_2062x868.png 848w, https://substackcdn.com/image/fetch/$s_!LFP2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62664a35-e1cd-4dd1-a371-cdffda47204a_2062x868.png 1272w, https://substackcdn.com/image/fetch/$s_!LFP2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62664a35-e1cd-4dd1-a371-cdffda47204a_2062x868.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LFP2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62664a35-e1cd-4dd1-a371-cdffda47204a_2062x868.png" width="1456" height="613" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/62664a35-e1cd-4dd1-a371-cdffda47204a_2062x868.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:613,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:303900,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/188124846?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62664a35-e1cd-4dd1-a371-cdffda47204a_2062x868.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LFP2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62664a35-e1cd-4dd1-a371-cdffda47204a_2062x868.png 424w, https://substackcdn.com/image/fetch/$s_!LFP2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62664a35-e1cd-4dd1-a371-cdffda47204a_2062x868.png 848w, https://substackcdn.com/image/fetch/$s_!LFP2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62664a35-e1cd-4dd1-a371-cdffda47204a_2062x868.png 1272w, https://substackcdn.com/image/fetch/$s_!LFP2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62664a35-e1cd-4dd1-a371-cdffda47204a_2062x868.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Will we fix it? Stay tuned to find out :-D</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://discord.gg/VJftd844GE&quot;,&quot;text&quot;:&quot;Join the Beta when we get it working!&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://discord.gg/VJftd844GE"><span>Join the Beta when we get it working!</span></a></p><h2><strong>The Debugging Method</strong></h2><p>Every fix was discovered the same way: start 4 desktops in quick succession, then trace the freeze:</p><ol><li><p><code>/proc/interrupts</code> &#8212; check if GPU interrupt count (<code>virtio1-control</code>) is advancing. If frozen (same count 5 seconds apart), QEMU isn&#8217;t sending fence completions to the guest.</p></li><li><p><code>cat /proc/*/stack</code> &#8212; find D-state processes. gnome-shells stuck in <code>drm_modeset_lock</code> &#8594; <code>dma_fence_default_wait</code> means they&#8217;re waiting for GPU fences while holding <code>mode_config.mutex</code>. Anything stuck in <code>virtio_gpu_vram_mmap</code> means a synchronous MAP_BLOB is waiting for QEMU to process it.</p></li><li><p><code>sample &lt;pid&gt; 1</code> (macOS) &#8212; 1-second process sample of QEMU at 1ms intervals. Shows where every thread spends its time. The main loop thread should show <code>fence_poll</code> or <code>process_cmdq</code> hits; if it&#8217;s 100% in <code>g_poll</code>, nothing is processing GPU commands.</p></li><li><p><strong>Kernel hung task messages</strong> (serial console) &#8212; <code>task X:PID is blocked on a mutex likely owned by task Y:PID</code> directly identifies which process holds the contended lock.</p></li><li><p><strong>QEMU warnings</strong> &#8212; <code>Blocked re-entrant IO on MemoryRegion</code> means a <code>virtio_notify</code> was silently dropped, which means a guest never received a response for a command it&#8217;s waiting on.</p></li></ol>]]></content:encoded></item><item><title><![CDATA[Bringing AI to Where Your Enterprise Lives: Helix + Microsoft 365]]></title><description><![CDATA[How Helix integrates with SharePoint and Microsoft Teams to bring private GenAI directly into your existing workflows]]></description><link>https://blog.helix.ml/p/bringing-ai-to-where-your-enterprise</link><guid isPermaLink="false">https://blog.helix.ml/p/bringing-ai-to-where-your-enterprise</guid><dc:creator><![CDATA[Priya Samuel]]></dc:creator><pubDate>Tue, 30 Dec 2025 17:23:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!LdW8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1ed8ff-ac97-467a-becc-ccabfa958000_1024x1536.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most enterprise AI tools miss the obvious: your knowledge is already in SharePoint, and your conversations happen in Teams. Why build another interface?</p><p>Helix now integrates natively with both SharePoint and Microsoft Teams&#8212;letting you build AI agents that understand your company&#8217;s documents and respond directly in the chat tools your teams already use.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>The Problem with Enterprise AI Silos</h2><p>The friction of opening another tab, another tool, another interface kills adoption. Meanwhile, your SharePoint libraries hold thousands of documents that could answer those questions instantly&#8212;if only the AI could access them.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LdW8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1ed8ff-ac97-467a-becc-ccabfa958000_1024x1536.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LdW8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1ed8ff-ac97-467a-becc-ccabfa958000_1024x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!LdW8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1ed8ff-ac97-467a-becc-ccabfa958000_1024x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!LdW8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1ed8ff-ac97-467a-becc-ccabfa958000_1024x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!LdW8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1ed8ff-ac97-467a-becc-ccabfa958000_1024x1536.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LdW8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1ed8ff-ac97-467a-becc-ccabfa958000_1024x1536.jpeg" width="270" height="405" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4d1ed8ff-ac97-467a-becc-ccabfa958000_1024x1536.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:1536,&quot;width&quot;:1024,&quot;resizeWidth&quot;:270,&quot;bytes&quot;:358354,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/181668742?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1ed8ff-ac97-467a-becc-ccabfa958000_1024x1536.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LdW8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1ed8ff-ac97-467a-becc-ccabfa958000_1024x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!LdW8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1ed8ff-ac97-467a-becc-ccabfa958000_1024x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!LdW8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1ed8ff-ac97-467a-becc-ccabfa958000_1024x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!LdW8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1ed8ff-ac97-467a-becc-ccabfa958000_1024x1536.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>Helix supports SharePoint as a RAG Knowledge Source. It connects your document libraries directly to your AI agents&#8217; knowledge bases using Microsoft Graph API. No manual file uploads, no sync scripts, no stale knowledge. The Teams integration completes the interaction loop with the human users, with web-hooks talking to agent&#8217;s streaming API endpoints which do RAG over SharePoint data.</p><h2>How It Works</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!q7K-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F657970f0-9012-42d4-86f4-24621ec12a25_1800x560.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!q7K-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F657970f0-9012-42d4-86f4-24621ec12a25_1800x560.png 424w, https://substackcdn.com/image/fetch/$s_!q7K-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F657970f0-9012-42d4-86f4-24621ec12a25_1800x560.png 848w, https://substackcdn.com/image/fetch/$s_!q7K-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F657970f0-9012-42d4-86f4-24621ec12a25_1800x560.png 1272w, https://substackcdn.com/image/fetch/$s_!q7K-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F657970f0-9012-42d4-86f4-24621ec12a25_1800x560.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!q7K-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F657970f0-9012-42d4-86f4-24621ec12a25_1800x560.png" width="1456" height="453" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/657970f0-9012-42d4-86f4-24621ec12a25_1800x560.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:453,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:55758,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/181668742?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F657970f0-9012-42d4-86f4-24621ec12a25_1800x560.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!q7K-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F657970f0-9012-42d4-86f4-24621ec12a25_1800x560.png 424w, https://substackcdn.com/image/fetch/$s_!q7K-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F657970f0-9012-42d4-86f4-24621ec12a25_1800x560.png 848w, https://substackcdn.com/image/fetch/$s_!q7K-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F657970f0-9012-42d4-86f4-24621ec12a25_1800x560.png 1272w, https://substackcdn.com/image/fetch/$s_!q7K-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F657970f0-9012-42d4-86f4-24621ec12a25_1800x560.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2></h2><p>The SharePoint client handles the realities of enterprise deployments:</p><p>- Pagination: Automatically handles large document libraries with Microsoft Graph&#8217;s <code>@odata.nextLink</code></p><p>- Recursive traversal: Walks subfolder trees when you need deep document scanning</p><p>- Extension filtering: Only index what matters&#8212;skip those 50MB PowerPoint template decks</p><p>- TLS flexibility: Optional certificate verification bypass for environments with internal Certificate Authorities (yes, we know about your proxy)!</p><h2>Microsoft Teams: AI Where the Conversations Happen</h2><p>The Teams integration takes a different approach&#8212;instead of pulling data <em>*from*</em> Microsoft, it pushes AI responses <em>*into*</em> Teams conversations.</p><p>When a user @mentions your bot:</p><p>1. Teams sends the message to Microsoft&#8217;s Bot Framework Service</p><p>2. Bot Framework POSTs to your Helix deployment&#8217;s webhook endpoint</p><p>3. Helix processes the message through your configured agent</p><p>4. Response routes back through Bot Framework to the Teams client</p><h3>Conversation Threading</h3><p>Helix maintains conversation context across message threads. When a user asks a follow-up question, Helix retrieves the existing session and continues the conversation&#8212;no &#8220;I don&#8217;t have context from before&#8221; nonsense.</p><h2>Multi-Tenant Deployments</h2><p>Here&#8217;s where it gets interesting for enterprises: the Azure Bot and Teams app can live in different tenants.</p><p>Your Azure Bot registration lives in your IT tenant with all its security controls. But you can deploy the Teams app manifest to a customer or partner tenant&#8212;Microsoft&#8217;s Bot Framework routes messages regardless of tenant boundaries.</p><p>This means:</p><p>- Managed Service Providers (MSPs) can build AI agents that serve multiple customer tenants</p><p>- Enterprises with multiple O365 tenants can centralise their AI infrastructure</p><h2>Putting It Together: The Full Pattern</h2><p>The real power comes from combining both integrations. Consider this scenario:</p><h3>HR Policy Bot</h3><ol><li><p>SharePoint knowledge source indexes your HR policy documents from `https://corporate.sharepoint.com/sites/HR/Policies`</p></li><li><p>Teams bot installed in company-wide Teams</p></li><li><p>Employee asks in #general: &#8220;@PolicyBot what&#8217;s the parental leave policy for adoptions?&#8221;</p></li><li><p>Helix RAG retrieves relevant chunks from the indexed SharePoint documents</p></li><li><p>AI generates response with citations back to the source documents</p></li><li><p>Response appears in Teams thread&#8212;complete with links to the original SharePoint files</p></li></ol><p>No portal. No context switching. Just answers where people already ask questions.</p><h3>Security Considerations</h3><p>Both integrations use OAuth-based authentication:</p><ul><li><p>SharePoint: Uses Microsoft Graph API with <code>Sites.Read.All</code> and <code>Files.Read.All </code>scopes</p></li><li><p>Teams: Bot Framework handles JWT validation on incoming webhooks</p></li><li><p>Credentials: App secrets stored encrypted in Helix&#8217;s database</p></li><li><p>Tenant restriction: Optional tenant ID filtering to lock down bot access</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;56c8d1c3-1ee9-4191-ae54-4572969bfe35&quot;,&quot;duration&quot;:null}"></div><p></p></li></ul><h1>What&#8217;s Next</h1><p>Enterprise AI shouldn&#8217;t require employees to learn new tools. It should meet them where they already work. With Helix&#8217;s Microsoft integrations, your AI agents live inside SharePoint and Teams&#8212;The integration runs in the background - employees just get faster answers.</p><p>Full setup guides are available in our documentation.</p><h3>Psst - try Helix Code</h3><p>If you're interested in AI-powered development tools, check out <a href="http://helix.ml/code">Helix Code</a> - our upcoming platform for AI-assisted coding.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[We Mass-Deployed 15-Year-Old Screen Sharing Technology and It's Actually Better]]></title><description><![CDATA[Or: How JPEG Screenshots Defeated Our Beautiful H.264 WebCodecs Pipeline]]></description><link>https://blog.helix.ml/p/we-mass-deployed-15-year-old-screen</link><guid isPermaLink="false">https://blog.helix.ml/p/we-mass-deployed-15-year-old-screen</guid><dc:creator><![CDATA[Luke Marsden]]></dc:creator><pubDate>Thu, 18 Dec 2025 17:13:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!WSwQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93791d6f-fef0-4838-b397-a708f2edd0b1_1956x614.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Part 2 of our video streaming saga. <a href="https://blog.helix.ml/p/we-killed-webrtc-and-nobody-noticed">Read Part 1: How we replaced WebRTC with WebSockets &#8594;</a></em></p><h2>The Year is 2025 and We&#8217;re Sending JPEGs</h2><p>Let me tell you about the time we spent three months building a gorgeous, hardware-accelerated, WebCodecs-powered, 60fps H.264 streaming pipeline over WebSockets...</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>...and then replaced it with <code>grim | curl</code> when the WiFi got a bit sketchy.</p><p>I wish I was joking.</p><div><hr></div><h2>Act I: Hubris (Also Known As &#8220;Enterprise Networking Exists&#8221;)</h2><p>We&#8217;re building <a href="https://github.com/helixml/helix">Helix</a>, an AI platform where autonomous coding agents work in cloud sandboxes. Users need to watch their AI assistants work. Think &#8220;screen share, but the thing being shared is a robot writing code.&#8221;</p><p>Last week, we explained how we replaced WebRTC with a custom WebSocket streaming pipeline. This week: why that wasn&#8217;t enough.</p><p><strong>The constraint that ruined everything:</strong> It has to work on enterprise networks.</p><p>You know what enterprise networks love? HTTP. HTTPS. Port 443. That&#8217;s it. That&#8217;s the list.</p><p>You know what enterprise networks hate?</p><ul><li><p><strong>UDP</strong> &#8212; Blocked. Deprioritized. Dropped. &#8220;Security risk.&#8221;</p></li><li><p><strong>WebRTC</strong> &#8212; Requires TURN servers, which requires UDP, which is blocked</p></li><li><p><strong>Custom ports</strong> &#8212; Firewall says no</p></li><li><p><strong>STUN/ICE</strong> &#8212; NAT traversal? In <em>my</em> corporate network? Absolutely not</p></li><li><p><strong>Literally anything fun</strong> &#8212; Denied by policy</p></li></ul><p>We tried WebRTC first. Worked great in dev. Worked great in our cloud. Deployed to an enterprise customer.</p><p>&#8220;The video doesn&#8217;t connect.&#8221;</p><p><em>checks network</em> &#8212; Outbound UDP blocked. TURN server unreachable. ICE negotiation failing.</p><p>We could fight this. Set up TURN servers. Configure enterprise proxies. Work with IT departments.</p><p>Or we could accept reality: <strong>Everything must go through HTTPS on port 443.</strong></p><p>So we built a <strong>pure WebSocket video pipeline</strong>:</p><ul><li><p>H.264 encoding via GStreamer + VA-API (hardware acceleration, baby)</p></li><li><p>Binary frames over WebSocket (L7 only, works through any proxy)</p></li><li><p>WebCodecs API for hardware decoding in the browser</p></li><li><p>60fps at 40Mbps with sub-100ms latency</p></li></ul><p>We were so proud. We wrote Rust. We wrote TypeScript. We implemented our own binary protocol. We measured things in microseconds.</p><p><strong>Then someone tried to use it from a coffee shop.</strong></p><div><hr></div><h2>Act II: Denial</h2><p>&#8220;The video is frozen.&#8221;</p><p>&#8220;Your WiFi is bad.&#8221;</p><p>&#8220;No, the video is definitely frozen. And now my keyboard isn&#8217;t working.&#8221;</p><p><em>checks the video</em></p><p>It&#8217;s showing what the AI was doing 30 seconds ago. And the delay is growing.</p><p>Turns out, 40Mbps video streams don&#8217;t appreciate 200ms+ network latency. Who knew.</p><p>When the network gets congested:</p><ol><li><p>Frames buffer up in the TCP/WebSocket layer</p></li><li><p>They arrive in-order (thanks TCP!) but increasingly delayed</p></li><li><p>Video falls further and further behind real-time</p></li><li><p>You&#8217;re watching the AI type code from 45 seconds ago</p></li><li><p>By the time you see a bug, the AI has already committed it to main</p></li><li><p>Everything is terrible forever</p></li></ol><p>&#8220;Just lower the bitrate,&#8221; you say. Great idea. Now it&#8217;s 10Mbps of blocky garbage that&#8217;s <em>still</em> 30 seconds behind.</p><div><hr></div><h2>Act III: Bargaining</h2><p>We tried everything:</p><p><strong>&#8220;What if we only send keyframes?&#8221;</strong></p><p>This was our big brain moment. H.264 keyframes (IDR frames) are self-contained. No dependencies on previous frames. Just drop all the P-frames on the server side, send only keyframes, get ~1fps of corruption-free video. Perfect for low-bandwidth fallback!</p><p>We added a <code>keyframes_only</code> flag. We modified the video decoder to check <code>FrameType::Idr</code>. We set GOP to 60 (one keyframe per second at 60fps). We tested.</p><p>We got exactly ONE frame.</p><p>One single, beautiful, 1080p IDR frame. Then silence. Forever.</p><pre><code><code>[WebSocket] Keyframe received (frame 121), sending
[WebSocket] ...
[WebSocket] ...
[WebSocket] It's been 14 seconds why is nothing else coming
[WebSocket] Failed to send audio frame: Closed</code></code></pre><p><em>checks Wolf logs</em> &#8212; encoder still running</p><p><em>checks GStreamer pipeline</em> &#8212; frames being produced</p><p><em>checks Moonlight protocol layer</em> &#8212; <strong>nothing coming through</strong></p><p>We&#8217;re using <a href="https://games-on-whales.github.io/wolf/stable/">Wolf</a>, an excellent open-source game streaming server (seriously, the documentation is great). But our WebSocket streaming layer sits on top of the Moonlight protocol, which is reverse-engineered from NVIDIA GameStream. Somewhere in that protocol stack, <em>something</em> decides that if you&#8217;re not consuming P-frames, you&#8217;re not ready for more frames. Period.</p><p>We poked around for an hour or two, but without diving deep into the Moonlight protocol internals, we weren&#8217;t going to fix this. The protocol wanted all its frames, or no frames at all.</p><p><strong>&#8220;What if we implement proper congestion control?&#8221;</strong></p><p><em>looks at TCP congestion control literature</em></p><p><em>closes tab</em></p><p><strong>&#8220;What if we just... don&#8217;t have bad WiFi?&#8221;</strong></p><p><em>stares at enterprise firewall that&#8217;s throttling everything</em></p><div><hr></div><h2>Act IV: Depression</h2><p>One late night, while debugging why the stream was frozen again, I opened our screenshot debugging endpoint in a browser tab:</p><pre><code><code>GET /api/v1/external-agents/abc123/screenshot?format=jpeg&amp;quality=70</code></code></pre><p>The image loaded instantly.</p><p>A pristine, 150KB JPEG of the remote desktop. Crystal clear. No artifacts. No waiting for keyframes. No decoder state. Just... pixels.</p><p>I refreshed. Another instant image.</p><p>I mashed F5 like a degenerate. 5 FPS of perfect screenshots.</p><p>I looked at my beautiful WebCodecs pipeline. I looked at the JPEGs. I looked at the WebCodecs pipeline again.</p><p>No.</p><p>No, we are not doing this.</p><p>We are professionals. We implement proper video codecs. We don&#8217;t spam HTTP requests for individual frames like it&#8217;s 2009.</p><div><hr></div><h2>Act V: Acceptance</h2><pre><code><code>// Poll screenshots as fast as possible (capped at 10 FPS max)
const fetchScreenshot = async () =&gt; {
  const response = await fetch(`/api/v1/external-agents/${sessionId}/screenshot`)
  const blob = await response.blob()
  screenshotImg.src = URL.createObjectURL(blob)
  setTimeout(fetchScreenshot, 100) // yolo
}</code></code></pre><p>We did it. We&#8217;re sending JPEGs.</p><p>And you know what? <strong>It works perfectly.</strong></p><div><hr></div><h2>Why JPEGs Actually Slap</h2><p>Here&#8217;s the thing about our fancy H.264 pipeline:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WSwQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93791d6f-fef0-4838-b397-a708f2edd0b1_1956x614.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WSwQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93791d6f-fef0-4838-b397-a708f2edd0b1_1956x614.png 424w, https://substackcdn.com/image/fetch/$s_!WSwQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93791d6f-fef0-4838-b397-a708f2edd0b1_1956x614.png 848w, https://substackcdn.com/image/fetch/$s_!WSwQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93791d6f-fef0-4838-b397-a708f2edd0b1_1956x614.png 1272w, https://substackcdn.com/image/fetch/$s_!WSwQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93791d6f-fef0-4838-b397-a708f2edd0b1_1956x614.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WSwQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93791d6f-fef0-4838-b397-a708f2edd0b1_1956x614.png" width="1456" height="457" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/93791d6f-fef0-4838-b397-a708f2edd0b1_1956x614.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:457,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:122027,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/182005849?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93791d6f-fef0-4838-b397-a708f2edd0b1_1956x614.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WSwQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93791d6f-fef0-4838-b397-a708f2edd0b1_1956x614.png 424w, https://substackcdn.com/image/fetch/$s_!WSwQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93791d6f-fef0-4838-b397-a708f2edd0b1_1956x614.png 848w, https://substackcdn.com/image/fetch/$s_!WSwQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93791d6f-fef0-4838-b397-a708f2edd0b1_1956x614.png 1272w, https://substackcdn.com/image/fetch/$s_!WSwQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93791d6f-fef0-4838-b397-a708f2edd0b1_1956x614.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A JPEG screenshot is <strong>self-contained</strong>. It either arrives complete, or it doesn&#8217;t. There&#8217;s no &#8220;partial decode.&#8221; There&#8217;s no &#8220;waiting for the next keyframe.&#8221; There&#8217;s no &#8220;decoder state corruption.&#8221;</p><p>When the network is bad, you get... fewer JPEGs. That&#8217;s it. The ones that arrive are perfect.</p><p>And the size! A 70% quality JPEG of a 1080p desktop is like <strong>100-150KB</strong>. A single H.264 keyframe is 200-500KB. We&#8217;re sending LESS data per frame AND getting better reliability.</p><div><hr></div><h2>The Hybrid: Have Your Cake and Eat It Too</h2><p>We didn&#8217;t throw away the H.264 pipeline. We&#8217;re not <em>complete</em> animals.</p><p>Instead, we built adaptive switching:</p><ol><li><p><strong>Good connection</strong> (RTT &lt; 150ms): Full 60fps H.264, hardware decoded, buttery smooth</p></li><li><p><strong>Bad connection detected</strong>: Pause video, switch to screenshot polling</p></li><li><p><strong>Connection recovers</strong>: User clicks to retry video</p></li></ol><p>The key insight: <strong>we still need the WebSocket for input</strong>.</p><p>Keyboard and mouse events are tiny. Like, 10 bytes each. The WebSocket handles those perfectly even on a garbage connection. We just needed to stop sending the massive video frames.</p><p>So we added one control message:</p><p>json</p><pre><code><code>{"set_video_enabled": false}</code></code></pre><p>Server receives this, stops sending video frames. Client polls screenshots instead. Input keeps flowing. Everyone&#8217;s happy.</p><p>15 lines of Rust. I am not joking.</p><p>rust</p><pre><code><code>if !video_enabled.load(Ordering::Relaxed) {
    continue; // skip frame, it's screenshot time baby
}</code></code></pre><div><hr></div><h2>The Oscillation Problem (Lol)</h2><p>We almost shipped a hilarious bug.</p><p>When you stop sending video frames, the WebSocket becomes basically empty. Just tiny input events and occasional pings.</p><p><strong>The latency drops dramatically.</strong></p><p>Our adaptive mode sees low latency and thinks: &#8220;Oh nice! Connection recovered! Let&#8217;s switch back to video!&#8221;</p><p>Video resumes. 40Mbps floods the connection. Latency spikes. Mode switches to screenshots.</p><p>Latency drops. Mode switches to video.</p><p>Latency spikes. Mode switches to screenshots.</p><p><strong>Forever. Every 2 seconds.</strong></p><p>The fix was embarrassingly simple: once you fall back to screenshots, <strong>stay there until the user explicitly clicks to retry</strong>.</p><pre><code><code>setAdaptiveLockedToScreenshots(true) // no oscillation for you</code></code></pre><p>We show an amber icon and a message: &#8220;Video paused to save bandwidth. Click to retry.&#8221;</p><p>Problem solved. User is in control. No infinite loops.</p><h2>Ubuntu Doesn&#8217;t Ship JPEG Support in grim Because Of Course It Doesn&#8217;t</h2><pre><code><code>$ grim -t jpeg screenshot.jpg
error: jpeg support disabled</code></code></pre><p>Oh, you thought we were done? Cute.</p><p><code>grim</code> is a Wayland screenshot tool. Perfect for our needs. Supports JPEG output for smaller files.</p><p>Except Ubuntu compiles it without libjpeg.</p><p><em>incredible</em></p><p>So now our Dockerfile has a build stage that compiles grim from source:</p><pre><code><code>FROM ubuntu:25.04 AS grim-build
RUN apt-get install -y meson ninja-build libjpeg-turbo8-dev ...
RUN git clone https://git.sr.ht/~emersion/grim &amp;&amp; \
    meson setup build -Djpeg=enabled &amp;&amp; \
    ninja -C build</code></code></pre><p>We&#8217;re building a screenshot tool from source so we can send JPEGs in 2025. This is fine.</p><h2>The Final Architecture</h2><pre><code><code>&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;                     User's Browser                          &#9474;
&#9500;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9508;
&#9474;  WebSocket (always connected)                               &#9474;
&#9474;  &#9500;&#9472;&#9472; Video frames (H.264) &#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472; when RTT &lt; 150ms     &#9474;
&#9474;  &#9500;&#9472;&#9472; Input events (keyboard/mouse) &#9472;&#9472; always                &#9474;
&#9474;  &#9492;&#9472;&#9472; Control messages &#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472; {"set_video_enabled"} &#9474;
&#9474;                                                             &#9474;
&#9474;  HTTP (screenshot polling) &#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472; when RTT &gt; 150ms    &#9474;
&#9474;  &#9492;&#9472;&#9472; GET /screenshot?quality=70                             &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;</code></code></pre><p><strong>Good connection:</strong> 60fps H.264, hardware accelerated, beautiful</p><p><strong>Bad connection:</strong> 2-10fps JPEGs, perfectly reliable, works everywhere</p><p>The screenshot quality adapts too:</p><ul><li><p>Frame took &gt;500ms? Drop quality by 10%</p></li><li><p>Frame took &lt;300ms? Increase quality by 5%</p></li><li><p>Target: minimum 2 FPS, always</p></li></ul><div><hr></div><h2>Lessons Learned</h2><ol><li><p><strong>Simple solutions often beat complex ones.</strong> Three months of H.264 pipeline work. One 2am hacking session the night before production deployment: &#8220;what if we just... screenshots?&#8221;</p></li><li><p><strong>Graceful degradation is a feature.</strong> Users don&#8217;t care about your codec. They care about seeing their screen and typing.</p></li><li><p><strong>WebSockets are for input, not necessarily video.</strong> The input path staying responsive is more important than video frames.</p></li><li><p><strong>Ubuntu packages are missing random features.</strong> Always check. Or just build from source like it&#8217;s 2005.</p></li><li><p><strong>Measure before optimizing.</strong> We assumed video streaming was the only option. It wasn&#8217;t.</p></li></ol><div><hr></div><h2>Try It Yourself</h2><p>Helix is  source available: <a href="https://github.com/helixml/helix">github.com/helixml/helix</a></p><p>The shameful-but-effective screenshot code:</p><ul><li><p><code>api/cmd/screenshot-server/main.go</code> &#8212; 200 lines of Go that changed everything</p></li><li><p><code>MoonlightStreamViewer.tsx</code> &#8212; React component with adaptive logic</p></li><li><p><code>websocket-stream.ts</code> &#8212; WebSocket client with <code>setVideoEnabled()</code></p></li></ul><p>The beautiful H.264 pipeline we&#8217;re still proud of:</p><ul><li><p><code>moonlight-web-stream/</code> &#8212; Rust WebSocket server</p></li><li><p>Still used when your WiFi doesn&#8217;t suck</p></li></ul><div><hr></div><p><em>We&#8217;re building Helix, open-source AI infrastructure that works in the real world &#8212; even on terrible WiFi. We started by <a href="LINK-TO-PART-1">killing WebRTC</a>, then we killed our replacement. Sometimes the 15-year-old solution is the right one.</em></p><p>Want to experience the joy of interacting with an agent desktop at 6 JPEGs a second yourself? Join us for the private beta on Discord:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://discord.gg/VJftd844GE&quot;,&quot;text&quot;:&quot;Join the Private Beta&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://discord.gg/VJftd844GE"><span>Join the Private Beta</span></a></p><p><em>Star us on GitHub: <a href="https://github.com/helixml/helix">github.com/helixml/helix</a></em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[We Killed WebRTC (And Nobody Noticed)]]></title><description><![CDATA[We replaced WebRTC with plain WebSockets for real-time GPU streaming and got lower latency, simpler infrastructure, and better reliability everywhere.]]></description><link>https://blog.helix.ml/p/we-killed-webrtc-and-nobody-noticed</link><guid isPermaLink="false">https://blog.helix.ml/p/we-killed-webrtc-and-nobody-noticed</guid><dc:creator><![CDATA[Luke Marsden]]></dc:creator><pubDate>Thu, 11 Dec 2025 23:09:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uVK-!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6ac6823-53fa-4485-b35d-65c2770f5cb8_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>At <a href="https://helix.ml">Helix</a>, we run AI coding agents in GPU-accelerated containers. Users watch these agents work through a live video stream&#8212;think remote desktop, but for AI. The standard solution for browser-based real-time video is WebRTC.</p><p>After months of TURN server hell, we threw it out and replaced it with plain WebSockets.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The result? Lower latency, simpler infrastructure, and it works everywhere.</p><p><em>(Spoiler: This solution worked so well that we eventually threw it away too. But that&#8217;s a story for next week.)</em></p><h2>The Problem With WebRTC</h2><p>WebRTC is designed for peer-to-peer video calls. It handles NAT traversal, codec negotiation, adaptive bitrate, and packet loss recovery. It&#8217;s an impressive piece of engineering.</p><p>But we don&#8217;t need peer-to-peer. Our architecture is strictly client-server:</p><pre><code><code>Browser &#8594; Proxy &#8594; moonlight-web &#8594; Wolf (GPU encoder)</code></code></pre><p>For this use case, WebRTC&#8217;s complexity becomes pure liability.</p><h3>TURN Server Hell</h3><p>Enterprise customers don&#8217;t allow random UDP ports. They have L7 load balancers that only speak HTTP/HTTPS on port 443. WebRTC requires:</p><ul><li><p>UDP 3478 (STUN)</p></li><li><p>TCP 3478 (TURN)</p></li><li><p>UDP 49152-65535 (media relay)</p></li></ul><p>Getting these through a corporate firewall? Good luck. We spent weeks debugging TURN configurations. coturn, Twilio, custom deployments&#8212;each had its own failure modes. The &#8220;TCP fallback&#8221; that TURN promises? In practice, it&#8217;s unreliable and adds 50-100ms of latency.</p><p>We had a customer whose WebRTC connections worked 80% of the time. The other 20%? Black screen. No error message. WebRTC&#8217;s ICE negotiation would silently fail after 30 seconds of &#8220;connecting...&#8221;</p><h3>The Insight</h3><p>Here&#8217;s the thing: <strong>WebSockets work everywhere</strong>. They&#8217;re just HTTP upgrade. Every L7 proxy handles them. CloudFlare, Akamai, nginx, Kubernetes ingress&#8212;all work out of the box.</p><p>And for real-time video, WebSockets might actually be <em>faster</em> than WebRTC in our architecture:</p><ol><li><p><strong>No jitter buffer</strong> - We can render frames immediately</p></li><li><p><strong>No TURN relay</strong> - Direct connection through existing proxy</p></li><li><p><strong>No ICE negotiation</strong> - Connection established in one round-trip</p></li></ol><p>The trade-off is TCP&#8217;s head-of-line blocking. But on modern networks with low packet loss? Barely matters.</p><h2>The Implementation</h2><p>We stream using the <a href="https://github.com/moonlight-stream">Moonlight protocol</a>&#8212;the same tech that powers NVIDIA GameStream. Wolf (our server) encodes video with NVIDIA&#8217;s hardware encoder. The browser decodes and displays it.</p><p>Previously, our architecture looked like this:</p><pre><code><code>Wolf &#8594; Moonlight &#8594; [RTP packets] &#8594; WebRTC &#8594; Browser
                                    &#8593;
                                 TURN server</code></code></pre><p>Now it&#8217;s:</p><pre><code><code>Wolf &#8594; Moonlight &#8594; [NAL units] &#8594; WebSocket &#8594; Browser
                                    &#8593;
                              Your existing HTTPS</code></code></pre><h3>Binary Protocol</h3><p>We defined a minimal binary protocol:</p><pre><code><code>&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474; Type (1B)  &#9474; Payload (variable)                 &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9524;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;

Message Types:
  0x01 - Video Frame
  0x02 - Audio Frame
  0x10 - Keyboard Input
  0x11 - Mouse Click
  0x12 - Mouse Position
  0x13 - Mouse Movement</code></code></pre><p>Video frames are raw H264 NAL units&#8212;no RTP packetization. Audio is Opus frames. Input goes the other direction.</p><h3>WebCodecs for Decoding</h3><p>The browser-side uses <a href="https://developer.mozilla.org/en-US/docs/Web/API/WebCodecs_API">WebCodecs API</a>, which landed in Chrome 94 and recently in Firefox 130:</p><p>typescript</p><pre><code><code>const decoder = new VideoDecoder({
  output: (frame) =&gt; {
    ctx.drawImage(frame, 0, 0)
    frame.close()
  },
  error: console.error,
})

decoder.configure({
  codec: &#8216;avc1.4d0032&#8217;,  // H264 Main Profile
  hardwareAcceleration: &#8216;prefer-hardware&#8217;,
  avc: { format: &#8216;annexb&#8217; },  // NAL unit format
})

ws.onmessage = (event) =&gt; {
  const data = new Uint8Array(event.data)
  if (data[0] === 0x01) {  // Video frame
    decoder.decode(new EncodedVideoChunk({
      type: isKeyframe ? &#8216;key&#8217; : &#8216;delta&#8217;,
      timestamp: parsePTS(data),
      data: data.slice(HEADER_SIZE),
    }))
  }
}</code></code></pre><p>Hardware-accelerated H264 decoding, straight to canvas. No MediaSource buffering. No jitter buffer. Frame arrives, frame renders.</p><h3>Audio Sync</h3><p>Audio uses the same approach with <code>AudioDecoder</code> and <code>AudioContext</code>. We schedule playback based on presentation timestamps:</p><p>typescript</p><pre><code><code>const scheduledTime = audioStartTime + (framePTS - basePTS) / 1_000_000
source.start(Math.max(scheduledTime, audioContext.currentTime))</code></code></pre><p>First audio frame establishes the baseline. Subsequent frames are scheduled relative to it. If a frame arrives too late (&gt;100ms behind), we drop it rather than accumulating latency.</p><h3>Input Forwarding</h3><p>Input goes the other direction&#8212;same WebSocket, same binary format. We reuse the existing Moonlight input protocol:</p><p>typescript</p><pre><code><code>sendMouseButton(isDown: boolean, button: number) {
  const buf = new Uint8Array([0x02, isDown ? 1 : 0, button])
  ws.send(new Uint8Array([0x11, ...buf]))  // 0x11 = MouseClick
}</code></code></pre><p>Server parses and forwards to the Moonlight stream, which injects into the Linux input subsystem. Click in browser &#8594; click in remote desktop.</p><h2>What We Lost</h2><p>Nothing is free. Here&#8217;s what WebRTC gave us that we had to handle ourselves:</p><h3>1. Adaptive Bitrate</h3><p>WebRTC monitors network conditions and adjusts bitrate automatically. We don&#8217;t. Our bitrate is fixed at connection time. For enterprise deployments on stable networks, this is fine. For variable mobile connections, it might be a problem.</p><h3>2. Packet Loss Recovery</h3><p>WebRTC uses NACK and PLI to request retransmission of lost packets. With TCP, we get reliable delivery but head-of-line blocking. A lost packet stalls the stream until retransmitted.</p><p>In practice? On datacenter-quality networks, packet loss is rare. When it happens, TCP recovers fast enough that users don&#8217;t notice.</p><h3>3. Browser Fallbacks</h3><p>WebCodecs requires Chrome 94+, Safari 16.4+, or Firefox 130+. Older browsers get nothing. We could add MSE-based fallback, but haven&#8217;t needed it&#8212;our users are on modern browsers.</p><h2>What We Gained</h2><h3>Works Everywhere</h3><p>Literally everywhere. No firewall configuration. No TURN servers. No debugging ICE negotiation. The WebSocket connection just... works.</p><h3>Simpler Infrastructure</h3><p>Before:</p><ul><li><p>coturn TURN server (or Twilio, $$$)</p></li><li><p>STUN server</p></li><li><p>ICE configuration management</p></li><li><p>Certificate management for TURN-over-TLS</p></li><li><p>UDP port ranges</p></li></ul><p>After:</p><ul><li><p>Your existing HTTPS proxy</p></li></ul><h3>Lower Latency</h3><p>Without the jitter buffer and TURN relay, we measured 20-30ms lower end-to-end latency. WebRTC&#8217;s adaptive bitrate sometimes caused quality drops that took seconds to recover. Our fixed bitrate is... fixed.</p><h3>Debuggability</h3><p>WebRTC failures are famously opaque. &#8220;ICE connection failed&#8221; tells you nothing. WebSocket failures? You get HTTP status codes, error messages, stack traces. When something breaks, you know why.</p><h2>Should You Do This?</h2><p>Probably not, unless:</p><ol><li><p><strong>Your architecture is client-server</strong> - Peer-to-peer genuinely needs WebRTC</p></li><li><p><strong>Your users are behind restrictive firewalls</strong> - If TURN works for you, keep using it</p></li><li><p><strong>You control the encoder</strong> - We use Moonlight/Wolf which gives us raw NAL units</p></li><li><p><strong>Your target browsers support WebCodecs</strong> - No IE11 here</p></li></ol><p>But if you&#8217;re building real-time video streaming to browsers, and WebRTC&#8217;s complexity is killing you, know that there&#8217;s another way.</p><h2>The Code</h2><p>Both repos are open source:</p><ul><li><p><strong><a href="https://github.com/helixml/helix">helix</a></strong> - The frontend + API (TypeScript/React/Go)</p></li><li><p><strong><a href="https://github.com/helixml/moonlight-web-stream">moonlight-web-stream</a></strong> - The streaming server (Rust)</p></li></ul><p>The WebSocket streaming code is on the <code>feature/websocket-only-streaming</code> branch. Look for <code>WebSocketStream</code> in the TypeScript and <code>run_websocket_only_mode</code> in the Rust.</p><div><hr></div><p><em>We&#8217;re building AI coding agents that work in GPU-accelerated containers. If you&#8217;re interested in remote development environments, AI pair programming, or just want to see this streaming tech in action, check out <a href="https://helix.ml">helix.ml</a>.</em></p><p><em>Next week: Why we threw all of this away.</em></p><p><em>&#8212;Luke Marsden, CEO @ Helix </em></p><h2>Discussion Questions </h2><ol><li><p>Has anyone else replaced WebRTC with WebSockets for real-time video? What was your experience?</p></li><li><p>We&#8217;re considering adding WebTransport as an alternative to WebSockets. Anyone have experience with it in production?</p></li><li><p>The WebCodecs API is relatively new. Are there edge cases we should watch out for?</p></li></ol><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Why I've Joined HelixML]]></title><description><![CDATA[Introducing our new Head of Engineering, Priya Samuel]]></description><link>https://blog.helix.ml/p/why-ive-joined-helixml</link><guid isPermaLink="false">https://blog.helix.ml/p/why-ive-joined-helixml</guid><dc:creator><![CDATA[Priya Samuel]]></dc:creator><pubDate>Thu, 13 Nov 2025 18:23:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!CEy1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7651c7f-e45b-4390-96d6-6c559a9769b2_3000x2250.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I&#8217;ve joined HelixML as <strong>Head of Engineering</strong>. I&#8217;m genuinely pleased to be working alongside Luke, Chris, and Phil - and a team that values software craftsmanship and building great products just as much as cultivating a thoughtful, positive company culture.</p><p>Sometimes the start of a new chapter comes from noticing a pattern you can&#8217;t ignore. Over the past year, I kept meeting teams who were excited about what AI could do but were quietly overwhelmed by what it actually took to run it well &#8212; the infrastructure, the privacy concerns, and the messy real-world constraints. I found myself increasingly drawn to those conversations, to the space between possibility and practicality, and to the people trying to bridge it with clarity instead of hype.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CEy1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7651c7f-e45b-4390-96d6-6c559a9769b2_3000x2250.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CEy1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7651c7f-e45b-4390-96d6-6c559a9769b2_3000x2250.png 424w, https://substackcdn.com/image/fetch/$s_!CEy1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7651c7f-e45b-4390-96d6-6c559a9769b2_3000x2250.png 848w, https://substackcdn.com/image/fetch/$s_!CEy1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7651c7f-e45b-4390-96d6-6c559a9769b2_3000x2250.png 1272w, https://substackcdn.com/image/fetch/$s_!CEy1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7651c7f-e45b-4390-96d6-6c559a9769b2_3000x2250.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CEy1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7651c7f-e45b-4390-96d6-6c559a9769b2_3000x2250.png" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a7651c7f-e45b-4390-96d6-6c559a9769b2_3000x2250.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7301832,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/177917486?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7651c7f-e45b-4390-96d6-6c559a9769b2_3000x2250.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CEy1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7651c7f-e45b-4390-96d6-6c559a9769b2_3000x2250.png 424w, https://substackcdn.com/image/fetch/$s_!CEy1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7651c7f-e45b-4390-96d6-6c559a9769b2_3000x2250.png 848w, https://substackcdn.com/image/fetch/$s_!CEy1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7651c7f-e45b-4390-96d6-6c559a9769b2_3000x2250.png 1272w, https://substackcdn.com/image/fetch/$s_!CEy1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7651c7f-e45b-4390-96d6-6c559a9769b2_3000x2250.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Kubecon 2025 - A room packed with enthusiasm!</figcaption></figure></div><p></p><p><strong>My Background</strong></p><p> My background is in MLOps and Identity &amp; Access Management &#8212; building trustworthy, scalable AI systems at the intersection of infrastructure, identity, and machine learning. Over the years, I&#8217;ve led engineering teams at companies like <strong>Elsevier, Dotscience, </strong>and<strong> ThoughtWorks</strong>, helping these organisations bring structure and discipline to complex applications and data platforms. What&#8217;s always driven me is creating systems that are both technically sound and human-friendly: clear, open, and built to last.</p><p>Helix is <a href="https://blog.helix.ml/p/building-a-generative-ai-platform">building in the open</a> &#8212; iterating in public, listening to customers, and treating transparency as a strength. </p><p><strong>Why Private GenAI Matters</strong></p><p>One thing my career has taught me is that AI itself is rarely the <em>only</em> hard part. The hard part is everything wrapped around it: secure identities, reliable infrastructure, clear deployment patterns, safety checks, evaluation loops, and the long tail of operational detail that turns a clever model into a dependable system. Helix gets that. We&#8217;re not just building AI; we&#8217;re building the layers that make AI usable, safe, and sustainable inside the boundaries where real organisations operate.</p><p>Helix is tackling a problem every enterprise now faces: how to adopt generative AI without giving up control of data, compliance, or infrastructure. With the release of Helix 2.0, teams can deploy production-ready AI agents on their own infrastructure &#8212; with real CI/CD, testing, versioning, and observability.</p><p>As organisations mature in their AI adoption, private GenAI platforms offer something essential: a reliable, accountable, and transparent path forward.</p><p><strong>Looking Ahead</strong></p><p>As Head of Engineering, my priorities are simple: build sustainably, deliver tangible outcomes for customers, and share our learnings openly as we go.</p><p>Helix represents the kind of company I believe in &#8212; transparent, values-driven, and focused on solving real problems with care and craft. I&#8217;m excited to help shape its next chapter.</p><p>Here&#8217;s to building something meaningful &#8212; and doing it the right way. </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Dynamically presenting MCP clients as tools to LLMs in Go]]></title><description><![CDATA[Learn how to transform any MCP server into OpenAI tools that your agent can call without hardcoding any details]]></description><link>https://blog.helix.ml/p/dynamically-presenting-mcp-clients</link><guid isPermaLink="false">https://blog.helix.ml/p/dynamically-presenting-mcp-clients</guid><pubDate>Thu, 06 Nov 2025 13:16:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!QChQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f41701-b300-4d96-8dcd-38c524381367_1280x720.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The most powerful thing (and the weakest part too :)) is how dynamic the agents are. Depending on the scenario, the applications that encapsulate the LLMs must either relax or tighten the grip to get good results.</p><p>Today we will take a look at a &#8220;relaxed&#8221; plumbing that will enable maximum dynamic behavior. We will take in an arbitrary number of remote MCP servers, extract available actions and on the fly convert them to OpenAI tools for our agent to use.</p><p>In this article we will look into main components but for the full code you can check <a href="https://github.com/helixml/helix">our repo</a>:</p><ul><li><p>Store MCP servers</p></li><li><p>Discover MCP capabilities (fetch tools)</p></li><li><p>Convert individual MCP actions into tools (MCP schema &gt; OpenAI tool schema)</p></li></ul><h2>MCP and OpenAI Tools</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3OnA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3416e44-a176-4cb3-a837-838682f63739_889x500.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3OnA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3416e44-a176-4cb3-a837-838682f63739_889x500.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3OnA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3416e44-a176-4cb3-a837-838682f63739_889x500.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3OnA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3416e44-a176-4cb3-a837-838682f63739_889x500.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3OnA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3416e44-a176-4cb3-a837-838682f63739_889x500.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3OnA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3416e44-a176-4cb3-a837-838682f63739_889x500.jpeg" width="646" height="363.32958380202473" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a3416e44-a176-4cb3-a837-838682f63739_889x500.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:500,&quot;width&quot;:889,&quot;resizeWidth&quot;:646,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3OnA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3416e44-a176-4cb3-a837-838682f63739_889x500.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3OnA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3416e44-a176-4cb3-a837-838682f63739_889x500.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3OnA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3416e44-a176-4cb3-a837-838682f63739_889x500.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3OnA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3416e44-a176-4cb3-a837-838682f63739_889x500.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I assume you already know about <a href="https://modelcontextprotocol.io/docs/getting-started/intro">MCPs</a> and <a href="https://platform.openai.com/docs/guides/tools">OpenAI Tools</a> already but I will just quickly brief you about them regardless. </p><p>Tools are presented as name + description + parameters to the LLMs and they are trained/whipped into calling them with non-malformed payloads. They more or less work well, the trick is to keep the schemas relatively simple and provide a few examples. </p><div class="pullquote"><p>&#9888;&#65039; Large schemas (with lots of parameters) can quickly <br>deteriorate even smartest models. &#9888;&#65039;</p></div><p>An example tool presentation to the LLM:</p><pre><code>import OpenAI from &#8220;openai&#8221;;
const client = new OpenAI();

const response = await client.responses.create({
    model: &#8220;gpt-5&#8221;,
    tools: [
        { type: &#8220;web_search&#8221; },
    ],
    input: &#8220;What was a positive news story from today?&#8221;,
});

console.log(response.output_text);</code></pre><p>An example MCP presentation from the <a href="https://github.com/modelcontextprotocol/typescript-sdk">docs</a>:</p><pre><code>...
// List resources
const resources = await client.listResources();

// Read a resource
const resource = await client.readResource({
    uri: &#8216;file:///example.txt&#8217;
});

// Call a tool
const result = await client.callTool({
    name: &#8216;example-tool&#8217;,
    arguments: {
        arg1: &#8216;value&#8217;
    }
});
...</code></pre><p>However this is <strong>not even close</strong> to actual usage, it gets quite verbose as you end up using a bunch of SDKs and potentially need to marry a 3rd party framework like langchain to prep the MCP for use within your application.</p><h2>Keeping track of configured MCP servers</h2><p>In order to make things work well, my minimal Go struct for MCP configuration came to be:</p><pre><code>type ToolMCPClientConfig struct {
&#9;Name          string            
&#9;Description   string            
&#9;Enabled       bool              
&#9;URL           string            
&#9;Headers       map[string]string
&#9;OAuthProvider string            
&#9;Tools []mcp.Tool
}</code></pre><div class="pullquote"><p>This ends up stored in Postgres on a slightly larger struct that contains ID, user ID, agent info but it&#8217;s not important for this example.</p></div><p>I chose to only support SSE/streaming HTTP options as the socket based servers are probably on the way out due to how limited they are and also it&#8217;s just too painful to get them running on the server side. What each field is:</p><ul><li><p><em><strong>Name</strong></em> - this will be presented as top level tool for the LLM</p></li><li><p><em><strong>Description</strong></em> - when to use this tool (if it&#8217;s a Postgres MCP), user who is adding this MCP server should try to be descriptive here</p></li><li><p><em><strong>URL</strong></em> - where to find the server</p></li><li><p><em><strong>Headers</strong></em> <strong>and OAuthProvider</strong> are optional but we can use them for authentication later on</p></li><li><p><em><strong>Tools</strong></em> - read-only, user is not suppose to enter them but we will grab the actions from the server during initial handshake. Let&#8217;s touch on this in the next section :) </p></li></ul><h2>Initial MCP handshake</h2><p>When agents are in their agent cycle mode you don&#8217;t want to call the MCP server all the time as you would be adding significant latency. </p><p>First we <a href="https://github.com/helixml/helix/blob/main/api/pkg/agent/skill/mcp/mcp_client.go#L37">construct the client</a>:</p><pre><code>...
// Initialize the MCP session
initRequest := mcp.InitializeRequest{
  Params: mcp.InitializeParams{
  ProtocolVersion: mcp.LATEST_PROTOCOL_VERSION,
  Capabilities:    mcp.ClientCapabilities{},
  ClientInfo: mcp.Implementation{
    Name:    &#8220;helix-http-client&#8221;,
    Version: data.GetHelixVersion(),
    },
  },
}

_, err = mcpClient.Initialize(ctx, initRequest)
if err != nil {
  return nil, err
}
return mcpClient, nil</code></pre><p>Then connect, authenticate and list the tools:</p><pre><code>import (
  ...
  &#8220;github.com/mark3labs/mcp-go/mcp&#8221;
)

func InitializeMCPClientSkill(ctx context.Context, clientGetter ClientGetter, meta agent.Meta, oauthManager *oauth.Manager, cfg *types.AssistantMCP) (*types.ToolMCPClientConfig, error) {
  mcpClient, err := clientGetter.NewClient(ctx, meta, oauthManager, cfg)
  if err != nil {
    return nil, err
  }

  // List tools, server description
  toolsResp, err := mcpClient.ListTools(ctx, mcp.ListToolsRequest{})
  if err != nil { 
    return nil, err
  } 
  return &amp;types.ToolMCPClientConfig{
    Name:        cfg.Name,
    Description: cfg.Description,
    Tools:       toolsResp.Tools, 
  }, nil
}</code></pre><p>Here the retrieved tools will act as a cache for all the iterations that the agent is going through. </p><h2>Presenting MCP Tools as OpenAI Tools</h2><p>First thing that needs to happen for the LLM to be able to use a tool is viewing names, descriptions and parameters. </p><p>For each OpenAI tool within helix we define an interface where one of the methods is:</p><pre><code>func (t *MCPClientTool) OpenAI() []openai.Tool {
  return []openai.Tool{
    {
      Type: openai.ToolTypeFunction,
      Function: &amp;openai.FunctionDefinition{ 
      Name:        &#8220;mcp_&#8221; + t.mcpTool.Name, // Duplicate MCPs
      Description: t.mcpTool.Description,
      Parameters:  buildParameters(t.mcpTool.InputSchema),
    },
  },
  }
}</code></pre><p>Where buildParameters contains most useful things and <a href="https://github.com/helixml/helix/blob/main/api/pkg/agent/skill/mcp/mcp_skill.go#L138-L265">can be found here</a>, to get an idea, we have to recursively convert a <em>map[string]any</em> to <em>jsonschema.Definition</em>:</p><pre><code>func convertMapToDefinition(data map[string]any) jsonschema.Definition {
  def := jsonschema.Definition{}

  // Handle type - ensure we always have a valid type
  if typeVal, ok := data[&#8221;type&#8221;].(string); ok &amp;&amp; typeVal != &#8220;&#8221; {
    switch typeVal {
      case &#8220;string&#8221;:
        def.Type = jsonschema.String
      case &#8220;integer&#8221;:
        def.Type = jsonschema.Integer
      ...
      ...
   // Handle properties (recursive)
   if props, ok := data[&#8221;properties&#8221;].(map[string]any); ok {
     properties := make(map[string]jsonschema.Definition)
     for key, prop := range props {
       if propMap, ok := prop.(map[string]any); ok {
&#9;properties[key] = convertMapToDefinition(propMap)
       }
     }
   if len(properties) &gt; 0 {
     def.Properties = properties
     def.Type = jsonschema.Object
   }
   ...
   // Handle items (for arrays)
   if items, ok := data[&#8221;items&#8221;].(map[string]any); ok {
     itemsDef := convertMapToDefinition(items)
     def.Items = &amp;itemsDef
   }
}</code></pre><p>The goal here is to take whatever we have coming from MCP and convert to OpenAI tools format. Both are JSON in the end, however the format is slightly different.</p><blockquote><p>I have noticed that certain MCP tools work with some models but not with others. Good example could be HubSpot MCP not working with Google Gemini models but working OK with OpenAI ones.</p></blockquote><h2>Calling the MCP tools from your agent</h2><p>Thankfully, only the MCP tool parameter conversion is the hard part, once that&#8217;s done, actual execution is pretty easy:</p><pre><code>import (
  ...
  &#8220;github.com/mark3labs/mcp-go/mcp&#8221;
)

func (t *MCPClientTool) Execute(ctx context.Context, meta agent.Meta, args map[string]any) (string, error) {
  client, err := t.clientGetter.NewClient(ctx, meta, t.oauthManager, &amp;types.AssistantMCP{
    URL:           t.cfg.URL,
    Headers:       t.cfg.Headers, 
    OAuthProvider: t.cfg.OAuthProvider,
    OAuthScopes:   t.cfg.OAuthScopes,
  })
  if err != nil {
    return &#8220;&#8221;, err
  }

  req := mcp.CallToolRequest{}
  req.Params.Name = t.mcpTool.Name 
  req.Params.Arguments = args

  res, err := client.CallTool(ctx, mcp.CallToolRequest{
    Params: req.Params,
  })

  if err != nil {...}
  
  var results []string
  for _, content := range res.Content {
    switch content := content.(type) {
      case mcp.TextContent:
        results = append(results, content.Text)
   ...</code></pre><p>That&#8217;s it, your agent can now talk to any MCP server directly. </p><h2>Ok, you are ready to face the world</h2><p>First thing to do is of course connecting your agent to the production database! </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QChQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f41701-b300-4d96-8dcd-38c524381367_1280x720.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QChQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f41701-b300-4d96-8dcd-38c524381367_1280x720.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QChQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f41701-b300-4d96-8dcd-38c524381367_1280x720.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QChQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f41701-b300-4d96-8dcd-38c524381367_1280x720.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QChQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f41701-b300-4d96-8dcd-38c524381367_1280x720.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QChQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f41701-b300-4d96-8dcd-38c524381367_1280x720.jpeg" width="1280" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/68f41701-b300-4d96-8dcd-38c524381367_1280x720.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Monkey With AK-47 - Coub&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Monkey With AK-47 - Coub" title="Monkey With AK-47 - Coub" srcset="https://substackcdn.com/image/fetch/$s_!QChQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f41701-b300-4d96-8dcd-38c524381367_1280x720.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QChQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f41701-b300-4d96-8dcd-38c524381367_1280x720.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QChQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f41701-b300-4d96-8dcd-38c524381367_1280x720.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QChQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68f41701-b300-4d96-8dcd-38c524381367_1280x720.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Good luck! </p>]]></content:encoded></item><item><title><![CDATA[Technical Deep Dive on Streaming AI Agent Desktop Sandboxes: When Gaming Protocols Meet Multi-User Access]]></title><description><![CDATA[When we started building sandboxes for AI agents at Helix, we wanted to give each agent their own desktop environments that we could stream interactively to users&#8217; browsers.]]></description><link>https://blog.helix.ml/p/technical-deep-dive-on-streaming</link><guid isPermaLink="false">https://blog.helix.ml/p/technical-deep-dive-on-streaming</guid><dc:creator><![CDATA[Luke Marsden]]></dc:creator><pubDate>Thu, 30 Oct 2025 17:47:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!XcJA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2be9ee1-3b89-4c45-919e-a5bfd3a8013a_2710x1612.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When we started building sandboxes for AI agents at Helix, we wanted to give each agent their own desktop environments that we could stream interactively to users&#8217; browsers. Not just static screenshots - full interactive desktops where agents could browse the web, write code, and use tools, in collaboration with their human colleagues. We looked at VNC, RDP, and various browser-based solutions, but kept coming back to Moonlight.</p><p>Moonlight is a game streaming protocol, originally designed to stream PC games to your couch. It&#8217;s fast, efficient, and works beautifully over sketchy network connections. There was just one problem: it was built for single-player gaming, and we needed multi-user agent access.</p><p>This is the story of how we bent a gaming protocol to our will, and why we&#8217;re still working through the consequences.</p><h2>Why Stream Desktops for AI Agents?</h2><p>Most AI coding assistants live in your IDE or terminal. But what if your agent needs to actually see the screen, click buttons, navigate UIs? What if you want to watch your agent work in real-time, or collaborate with it in a proper IDE? And what if you want your agent to be able to run on the server while it does all of this, so it can benefit from a good network connection while you open and close your laptop in cafes, on the train, even on the beach?</p><p>That&#8217;s what we&#8217;re building with Helix Code. We run full Linux desktop environments in containers, each with a GPU attached. Inside each desktop runs an AI agent with access to development tools - Claude, code editors, browsers, terminals. Users connect to watch and interact with these agents as they work. And also get a 30,000-foot view of their fleet of agents, because we&#8217;re all going to become managers of coding agents whether we like it or not.</p><p>The challenge: how do you efficiently stream these GPU-accelerated desktops to browsers and native clients, with low latency, across variable network conditions?</p><h2>Enter Moonlight (and Wolf)</h2><p>The Moonlight protocol was originally created by NVIDIA for their GameStream technology. It&#8217;s designed to stream high-framerate, low-latency video from a gaming PC to another device. Think playing Cyberpunk on your iPad from your gaming rig upstairs.</p><p>We use <a href="https://github.com/games-on-whales/wolf">Wolf</a>, a C++ implementation of the Moonlight server that runs in containers. Wolf exposes the Moonlight protocol, and clients can connect using Moonlight-web in the browser or native Moonlight clients on Mac, Windows, Linux, Android, iOS.</p><p>The setup is elegant: Wolf manages Docker containers with GPU attachment, Moonlight handles the video streaming, and we get hardware-accelerated desktop streaming working smoothly over 4G.</p><p>There&#8217;s just one catch.</p><h2>The Protocol Mismatch</h2><p>Moonlight was designed around a simple mental model: one user, streaming one game at a time. You connect, you launch Steam, you play. You can disconnect and reconnect, and your game is still running. But each client gets its own instance.</p><p>Here&#8217;s where it breaks for us:</p><p><strong>Moonlight expects</strong>: Each client connects to start their own private game session<br><strong>We need</strong>: Multiple users connecting to the same shared agent session</p><p>In Moonlight&#8217;s world, if two clients try to start Steam, they each get separate Steam instances. That&#8217;s great for gaming - you wouldn&#8217;t want your roommate&#8217;s controller inputs affecting your game.</p><p>But for us, if two people connect to the same AI agent, we don&#8217;t want two separate agent instances. We want them both watching and potentially interacting with the same agent doing the same work. The agent has identity and state - it&#8217;s logged into services, it has files open, it&#8217;s in the middle of tasks.</p><p>The semantics just don&#8217;t match.</p><h2>Apps Mode: Our First Workaround</h2><p>In &#8220;apps mode&#8221; (standard Moonlight protocol), Wolf creates containers on-demand when the first client connects. This presents another problem: when does the agent actually start?</p><p>We want agents to start automatically when users drag tasks onto a Kanban board, or when the system kicks off autonomous work. We can&#8217;t wait for someone to connect with a browser before the agent starts running.</p><p>Our solution was a bit of a hack: the Helix API pretends to be a Moonlight client.</p><p>When Helix starts a new agent session, it makes a WebSocket connection to Moonlight-web, pretending to be a browser. It initiates a &#8220;kickoff session&#8221; that starts the container and establishes fixed video parameters (4K, 60fps). Then it immediately disconnects.</p><p>Now the agent is running, the desktop is up, and real users can connect to it.</p><p>But we still have the multi-client problem. If someone connects with an external Moonlight client and starts an agent, they get a completely separate container from the one running in the browser. You end up with multiple &#8220;Zed&#8221; IDE instances, all thinking they&#8217;re the same agent, all trying to stream back, treading on each other&#8217;s toes.</p><p>Apps mode is stable, but it&#8217;s fundamentally single-user.</p><h2>Lobbies Mode: The Real Solution</h2><p>Wolf recently added &#8220;lobbies mode&#8221; - a feature explicitly designed for multiplayer gaming scenarios. Split-screen gaming, multiple controllers, shared screens.</p><p>This is exactly what we need.</p><p>In lobbies mode:</p><ul><li><p>You start a lobby through Wolf&#8217;s API</p></li><li><p>The container starts immediately (no need for our kickoff hack)</p></li><li><p>Multiple clients can connect to the same lobby</p></li><li><p>Everyone sees the same screen</p></li><li><p>Screen resolution is pre-configured, not determined by the first connecting client</p></li></ul><p>We&#8217;re currently migrating to lobbies mode. It solves our fundamental architecture problems:</p><ul><li><p>Multiple users can connect to the same agent</p></li><li><p>Agents start without any client connection needed</p></li><li><p>Browser and native clients can connect to the same session</p></li><li><p>We can delete all the kickoff session complexity</p></li></ul><h2>The Current Reality (And Remaining Bugs)</h2><p>Lobbies mode is still being stabilized. A few weeks ago it had memory leaks and stability issues. The Wolf maintainer has done heroic work making it production-ready, but we&#8217;re still ironing out bugs:</p><p><strong>Input scaling is broken</strong>: When you connect with a different screen resolution than the lobby was configured for, Wolf rescales the video correctly, but mouse coordinates scale wrong. Click where you see a button, hit somewhere else entirely.</p><p><strong>Video corruption on some clients</strong>: Connecting from Mac sometimes results in corrupted video streams. Still debugging.</p><p><strong>Resolution flexibility</strong>: In apps mode, each client could negotiate its own optimal resolution. In lobbies mode, we pre-configure the resolution when creating the agent. We let users choose (including &#8220;iPhone 15 vertical&#8221; because streaming to phones would be cool), but it&#8217;s less dynamic.</p><p>We&#8217;re running apps mode for development right now because it&#8217;s stable, even with its limitations. But lobbies mode is the future.</p><h2>What This Looks Like In Practice</h2><p>Here&#8217;s the architecture:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XcJA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2be9ee1-3b89-4c45-919e-a5bfd3a8013a_2710x1612.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XcJA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2be9ee1-3b89-4c45-919e-a5bfd3a8013a_2710x1612.png 424w, https://substackcdn.com/image/fetch/$s_!XcJA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2be9ee1-3b89-4c45-919e-a5bfd3a8013a_2710x1612.png 848w, https://substackcdn.com/image/fetch/$s_!XcJA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2be9ee1-3b89-4c45-919e-a5bfd3a8013a_2710x1612.png 1272w, https://substackcdn.com/image/fetch/$s_!XcJA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2be9ee1-3b89-4c45-919e-a5bfd3a8013a_2710x1612.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XcJA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2be9ee1-3b89-4c45-919e-a5bfd3a8013a_2710x1612.png" width="1456" height="866" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d2be9ee1-3b89-4c45-919e-a5bfd3a8013a_2710x1612.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:866,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:576725,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/177309822?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2be9ee1-3b89-4c45-919e-a5bfd3a8013a_2710x1612.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XcJA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2be9ee1-3b89-4c45-919e-a5bfd3a8013a_2710x1612.png 424w, https://substackcdn.com/image/fetch/$s_!XcJA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2be9ee1-3b89-4c45-919e-a5bfd3a8013a_2710x1612.png 848w, https://substackcdn.com/image/fetch/$s_!XcJA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2be9ee1-3b89-4c45-919e-a5bfd3a8013a_2710x1612.png 1272w, https://substackcdn.com/image/fetch/$s_!XcJA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2be9ee1-3b89-4c45-919e-a5bfd3a8013a_2710x1612.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Helix API</strong>: Manages agent sessions, talks to Wolf to create/destroy containers<br><strong>Moonlight-web</strong>: WebRTC adapter that bridges browser clients to Moonlight protocol<br><strong>Wolf</strong>: Moonlight server running in Kubernetes, managing GPU-attached containers<br><strong>Desktop containers</strong>: Sway (Wayland compositor) running on gst-wayland-src, with full desktop environment<br><strong>External clients</strong>: Native Moonlight clients on Mac/Windows/Linux/iOS/Android</p><p>The video stream uses WebRTC from browser to Moonlight-web, then Moonlight protocol from there to Wolf. Control signals (connection &amp; encryption setup) flow through websockets. Wolf handles GPU attachment and video encoding. The desktop runs real GUI applications in GPU-accelerated Wayland, not VNC or RDP forwarding.</p><p>You can watch an AI agent browse the web, write code in a real IDE, run commands in a real terminal, all streamed to your browser with gaming-grade latency.</p><h2>Why This Matters</h2><p>Streaming protocols matter a lot when you&#8217;re building visual AI agents. The latency, video quality, and network resilience all affect the user experience. Moonlight gives us:</p><ul><li><p><strong>Low latency</strong>: 50-100ms typically, works over 4G</p></li><li><p><strong>Hardware encoding</strong>: GPU-accelerated H.264/H.265</p></li><li><p><strong>Network resilience</strong>: Designed for unreliable wireless</p></li><li><p><strong>Multi-platform</strong>: Works everywhere without custom apps</p></li><li><p><strong>Mature protocol</strong>: Battle-tested by millions of gamers</p></li></ul><p>But we had to work within constraints designed for different semantics. Gaming protocols assume private, single-user sessions. AI agents need shared, multi-user sessions. The impedance mismatch creates real engineering challenges.</p><h2>What We Learned</h2><p><strong>Protocol assumptions run deep</strong>: Even when a protocol is technically capable of what you need, the assumptions baked into the design can bite you. Moonlight&#8217;s one-app-per-client model is fundamental.</p><p><strong>Workarounds compound complexity</strong>: Our kickoff session hack worked, but added a whole layer of complexity. Sometimes you need to wait for the right feature (lobbies) rather than building around limitations.</p><p><strong>Multiplayer gaming has solved this</strong>: The gaming community has already solved shared-screen streaming. We just needed to find the right mode and wait for it to stabilize.</p><p><strong>Open source saves the day</strong>: Wolf&#8217;s maintainer added lobbies mode based on real user needs (ours included). Being able to work directly with the developer and contribute back is why we love open source infrastructure.</p><h2>What&#8217;s Next</h2><p>We&#8217;re actively migrating to lobbies mode. Once we fix the input scaling and video corruption bugs, we&#8217;ll have proper multi-user agent support. At that point, you&#8217;ll be able to:</p><ul><li><p>Connect with native Moonlight clients to watch agents work</p></li><li><p>Have multiple people viewing the same agent session</p></li><li><p>Remove all the kickoff session complexity from our codebase</p></li><li><p>Support mobile clients properly with pre-configured resolutions</p></li></ul><p>If you&#8217;re building anything involving desktop streaming, especially for non-gaming use cases, check out <a href="https://github.com/games-on-whales/wolf">Wolf</a>. And if you&#8217;re curious about Helix Code or want to try streaming AI agent desktops, join our private beta via our Discord.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://discord.gg/VJftd844GE&quot;,&quot;text&quot;:&quot;Join the Private Beta&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://discord.gg/VJftd844GE"><span>Join the Private Beta</span></a></p><p>Oh, and here&#8217;s proof it can stream 4K video way nicer than RDP or VNC!</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;cfbc708b-89b5-4437-9c0d-4770dbae147d&quot;,&quot;duration&quot;:null}"></div><p></p>]]></content:encoded></item><item><title><![CDATA[Is MCP authentication that complicated?]]></title><description><![CDATA[Let's take a closer look!]]></description><link>https://blog.helix.ml/p/is-mcp-authentication-that-complicated</link><guid isPermaLink="false">https://blog.helix.ml/p/is-mcp-authentication-that-complicated</guid><dc:creator><![CDATA[Chris Sterry]]></dc:creator><pubDate>Sat, 18 Oct 2025 12:57:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!HsL9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc23126f-a964-4cd0-88b5-683d7c1adf21_3745x2905.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>MCP started with a basic implementation using local socket to communicate. Nobody really liked it but then when the time came to add authentication lots of vibe coders started to imagine that it&#8217;s very complicated. We even saw these kind of presentations:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HsL9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc23126f-a964-4cd0-88b5-683d7c1adf21_3745x2905.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HsL9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc23126f-a964-4cd0-88b5-683d7c1adf21_3745x2905.jpeg 424w, https://substackcdn.com/image/fetch/$s_!HsL9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc23126f-a964-4cd0-88b5-683d7c1adf21_3745x2905.jpeg 848w, https://substackcdn.com/image/fetch/$s_!HsL9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc23126f-a964-4cd0-88b5-683d7c1adf21_3745x2905.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!HsL9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc23126f-a964-4cd0-88b5-683d7c1adf21_3745x2905.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HsL9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc23126f-a964-4cd0-88b5-683d7c1adf21_3745x2905.jpeg" width="1456" height="1129" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fc23126f-a964-4cd0-88b5-683d7c1adf21_3745x2905.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1129,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1879148,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/176078384?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc23126f-a964-4cd0-88b5-683d7c1adf21_3745x2905.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HsL9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc23126f-a964-4cd0-88b5-683d7c1adf21_3745x2905.jpeg 424w, https://substackcdn.com/image/fetch/$s_!HsL9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc23126f-a964-4cd0-88b5-683d7c1adf21_3745x2905.jpeg 848w, https://substackcdn.com/image/fetch/$s_!HsL9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc23126f-a964-4cd0-88b5-683d7c1adf21_3745x2905.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!HsL9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc23126f-a964-4cd0-88b5-683d7c1adf21_3745x2905.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>but we have been doing this forever. We have been implementing auth flows with GitHub, Google, etc. for ages and it&#8217;s exactly the same flow. You authenticate the user and this time instead of just using the token to get person&#8217;s Google avatar or their GitHub repos we store the token so we can use it for MCP calls.</p><p>Time to try MCP auth for ourselves.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YTlQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333d1b76-b5d8-47d9-a48c-039c20371a16_675x499.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YTlQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333d1b76-b5d8-47d9-a48c-039c20371a16_675x499.jpeg 424w, https://substackcdn.com/image/fetch/$s_!YTlQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333d1b76-b5d8-47d9-a48c-039c20371a16_675x499.jpeg 848w, https://substackcdn.com/image/fetch/$s_!YTlQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333d1b76-b5d8-47d9-a48c-039c20371a16_675x499.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!YTlQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333d1b76-b5d8-47d9-a48c-039c20371a16_675x499.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YTlQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333d1b76-b5d8-47d9-a48c-039c20371a16_675x499.jpeg" width="675" height="499" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/333d1b76-b5d8-47d9-a48c-039c20371a16_675x499.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:499,&quot;width&quot;:675,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YTlQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333d1b76-b5d8-47d9-a48c-039c20371a16_675x499.jpeg 424w, https://substackcdn.com/image/fetch/$s_!YTlQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333d1b76-b5d8-47d9-a48c-039c20371a16_675x499.jpeg 848w, https://substackcdn.com/image/fetch/$s_!YTlQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333d1b76-b5d8-47d9-a48c-039c20371a16_675x499.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!YTlQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333d1b76-b5d8-47d9-a48c-039c20371a16_675x499.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>How Helix OAuth works</h2><p>In Helix we first introduced OAuth support to be able to authenticate to third party APIs on behalf of the user. This enables users to automate various tasks that can be done by the agent. It&#8217;s a two step process:</p><ol><li><p>Enabling the supported provider in the admin dashboard</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kGBf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a9a199-0b0f-443d-9ec1-3f8a4b263705_1646x1076.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kGBf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a9a199-0b0f-443d-9ec1-3f8a4b263705_1646x1076.png 424w, https://substackcdn.com/image/fetch/$s_!kGBf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a9a199-0b0f-443d-9ec1-3f8a4b263705_1646x1076.png 848w, https://substackcdn.com/image/fetch/$s_!kGBf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a9a199-0b0f-443d-9ec1-3f8a4b263705_1646x1076.png 1272w, https://substackcdn.com/image/fetch/$s_!kGBf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a9a199-0b0f-443d-9ec1-3f8a4b263705_1646x1076.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kGBf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a9a199-0b0f-443d-9ec1-3f8a4b263705_1646x1076.png" width="1456" height="952" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/54a9a199-0b0f-443d-9ec1-3f8a4b263705_1646x1076.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:952,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:164917,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/176078384?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a9a199-0b0f-443d-9ec1-3f8a4b263705_1646x1076.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kGBf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a9a199-0b0f-443d-9ec1-3f8a4b263705_1646x1076.png 424w, https://substackcdn.com/image/fetch/$s_!kGBf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a9a199-0b0f-443d-9ec1-3f8a4b263705_1646x1076.png 848w, https://substackcdn.com/image/fetch/$s_!kGBf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a9a199-0b0f-443d-9ec1-3f8a4b263705_1646x1076.png 1272w, https://substackcdn.com/image/fetch/$s_!kGBf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a9a199-0b0f-443d-9ec1-3f8a4b263705_1646x1076.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol start="2"><li><p>Connecting it in the user&#8217;s <a href="https://app.helix.ml/oauth-connections">OAuth connections page</a>:</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1WNY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5aec42a-ad5b-4342-942e-2d123c0a4f27_1869x1195.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1WNY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5aec42a-ad5b-4342-942e-2d123c0a4f27_1869x1195.png 424w, https://substackcdn.com/image/fetch/$s_!1WNY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5aec42a-ad5b-4342-942e-2d123c0a4f27_1869x1195.png 848w, https://substackcdn.com/image/fetch/$s_!1WNY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5aec42a-ad5b-4342-942e-2d123c0a4f27_1869x1195.png 1272w, https://substackcdn.com/image/fetch/$s_!1WNY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5aec42a-ad5b-4342-942e-2d123c0a4f27_1869x1195.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1WNY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5aec42a-ad5b-4342-942e-2d123c0a4f27_1869x1195.png" width="1456" height="931" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a5aec42a-ad5b-4342-942e-2d123c0a4f27_1869x1195.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:931,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:178307,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/176078384?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5aec42a-ad5b-4342-942e-2d123c0a4f27_1869x1195.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1WNY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5aec42a-ad5b-4342-942e-2d123c0a4f27_1869x1195.png 424w, https://substackcdn.com/image/fetch/$s_!1WNY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5aec42a-ad5b-4342-942e-2d123c0a4f27_1869x1195.png 848w, https://substackcdn.com/image/fetch/$s_!1WNY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5aec42a-ad5b-4342-942e-2d123c0a4f27_1869x1195.png 1272w, https://substackcdn.com/image/fetch/$s_!1WNY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5aec42a-ad5b-4342-942e-2d123c0a4f27_1869x1195.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Using HubSpot MCP</h2><p>One fun thing that I thought of was trying out <a href="https://developers.hubspot.com/mcp">HubSpot&#8217;s MCP</a>. It allows LLMs to query it&#8217;s database so you can get information about various deals. You can create a new agent by visiting <a href="https://app.helix.ml/new-agent">https://app.helix.ml/new-agent</a>.</p><p></p><p>There&#8217;s an endless path in improving agent&#8217;s system prompt but something as simple as this would do the trick:</p><blockquote><p>You are a helpful AI assistant called Helix. Today is {{ .LocalDate }}, local time is {{ .LocalTime }}. You can access Hubspot CRM data through an MCP tools that are provided to you. </p></blockquote><p>Select gpt-4o-mini model. </p><p>Then, we open Skills tab and add HubSpot configuration under the MCP tab. Details:</p><ul><li><p>name: hubspot</p></li><li><p>MCP server URL:  https://mcp.hubspot.com/</p></li><li><p>OAuth Configuration: select the HubSpot</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QOCY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850f3663-d146-4ccc-8446-9642ad664399_1955x1138.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QOCY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850f3663-d146-4ccc-8446-9642ad664399_1955x1138.png 424w, https://substackcdn.com/image/fetch/$s_!QOCY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850f3663-d146-4ccc-8446-9642ad664399_1955x1138.png 848w, https://substackcdn.com/image/fetch/$s_!QOCY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850f3663-d146-4ccc-8446-9642ad664399_1955x1138.png 1272w, https://substackcdn.com/image/fetch/$s_!QOCY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850f3663-d146-4ccc-8446-9642ad664399_1955x1138.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QOCY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850f3663-d146-4ccc-8446-9642ad664399_1955x1138.png" width="1456" height="848" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/850f3663-d146-4ccc-8446-9642ad664399_1955x1138.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:848,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:184414,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/176078384?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850f3663-d146-4ccc-8446-9642ad664399_1955x1138.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QOCY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850f3663-d146-4ccc-8446-9642ad664399_1955x1138.png 424w, https://substackcdn.com/image/fetch/$s_!QOCY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850f3663-d146-4ccc-8446-9642ad664399_1955x1138.png 848w, https://substackcdn.com/image/fetch/$s_!QOCY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850f3663-d146-4ccc-8446-9642ad664399_1955x1138.png 1272w, https://substackcdn.com/image/fetch/$s_!QOCY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F850f3663-d146-4ccc-8446-9642ad664399_1955x1138.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Trying it out</h2><p>We can go to any other tab that has a preview side panel and try it out:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LuuE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea705d72-97ee-474e-91b1-b3923dc4efcd_1678x1082.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LuuE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea705d72-97ee-474e-91b1-b3923dc4efcd_1678x1082.png 424w, https://substackcdn.com/image/fetch/$s_!LuuE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea705d72-97ee-474e-91b1-b3923dc4efcd_1678x1082.png 848w, https://substackcdn.com/image/fetch/$s_!LuuE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea705d72-97ee-474e-91b1-b3923dc4efcd_1678x1082.png 1272w, https://substackcdn.com/image/fetch/$s_!LuuE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea705d72-97ee-474e-91b1-b3923dc4efcd_1678x1082.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LuuE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea705d72-97ee-474e-91b1-b3923dc4efcd_1678x1082.png" width="1456" height="939" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ea705d72-97ee-474e-91b1-b3923dc4efcd_1678x1082.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:939,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:484970,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/176078384?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea705d72-97ee-474e-91b1-b3923dc4efcd_1678x1082.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LuuE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea705d72-97ee-474e-91b1-b3923dc4efcd_1678x1082.png 424w, https://substackcdn.com/image/fetch/$s_!LuuE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea705d72-97ee-474e-91b1-b3923dc4efcd_1678x1082.png 848w, https://substackcdn.com/image/fetch/$s_!LuuE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea705d72-97ee-474e-91b1-b3923dc4efcd_1678x1082.png 1272w, https://substackcdn.com/image/fetch/$s_!LuuE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea705d72-97ee-474e-91b1-b3923dc4efcd_1678x1082.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>You can also visit &#8220;Usage&#8221; tab to view how the agent approached the task:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Z24u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa7cd21-f4c5-4ecb-9d76-6cdc8b7740ee_1464x853.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Z24u!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa7cd21-f4c5-4ecb-9d76-6cdc8b7740ee_1464x853.png 424w, https://substackcdn.com/image/fetch/$s_!Z24u!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa7cd21-f4c5-4ecb-9d76-6cdc8b7740ee_1464x853.png 848w, https://substackcdn.com/image/fetch/$s_!Z24u!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa7cd21-f4c5-4ecb-9d76-6cdc8b7740ee_1464x853.png 1272w, https://substackcdn.com/image/fetch/$s_!Z24u!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa7cd21-f4c5-4ecb-9d76-6cdc8b7740ee_1464x853.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Z24u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa7cd21-f4c5-4ecb-9d76-6cdc8b7740ee_1464x853.png" width="1456" height="848" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8aa7cd21-f4c5-4ecb-9d76-6cdc8b7740ee_1464x853.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:848,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:149275,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/176078384?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa7cd21-f4c5-4ecb-9d76-6cdc8b7740ee_1464x853.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Z24u!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa7cd21-f4c5-4ecb-9d76-6cdc8b7740ee_1464x853.png 424w, https://substackcdn.com/image/fetch/$s_!Z24u!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa7cd21-f4c5-4ecb-9d76-6cdc8b7740ee_1464x853.png 848w, https://substackcdn.com/image/fetch/$s_!Z24u!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa7cd21-f4c5-4ecb-9d76-6cdc8b7740ee_1464x853.png 1272w, https://substackcdn.com/image/fetch/$s_!Z24u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa7cd21-f4c5-4ecb-9d76-6cdc8b7740ee_1464x853.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This tab is instrumental in building reliable agents. You can view all requests, responses, how long they took and how many tokens were consumed.</p><h2>Next steps:</h2><p>You can combine this with Helix Tasks that can run agents on a schedule:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zYPu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff18e1afb-5b5c-4f0f-96e7-521240b0dd24_1178x957.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zYPu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff18e1afb-5b5c-4f0f-96e7-521240b0dd24_1178x957.png 424w, https://substackcdn.com/image/fetch/$s_!zYPu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff18e1afb-5b5c-4f0f-96e7-521240b0dd24_1178x957.png 848w, https://substackcdn.com/image/fetch/$s_!zYPu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff18e1afb-5b5c-4f0f-96e7-521240b0dd24_1178x957.png 1272w, https://substackcdn.com/image/fetch/$s_!zYPu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff18e1afb-5b5c-4f0f-96e7-521240b0dd24_1178x957.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zYPu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff18e1afb-5b5c-4f0f-96e7-521240b0dd24_1178x957.png" width="1178" height="957" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f18e1afb-5b5c-4f0f-96e7-521240b0dd24_1178x957.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:957,&quot;width&quot;:1178,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:90361,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/176078384?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff18e1afb-5b5c-4f0f-96e7-521240b0dd24_1178x957.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zYPu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff18e1afb-5b5c-4f0f-96e7-521240b0dd24_1178x957.png 424w, https://substackcdn.com/image/fetch/$s_!zYPu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff18e1afb-5b5c-4f0f-96e7-521240b0dd24_1178x957.png 848w, https://substackcdn.com/image/fetch/$s_!zYPu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff18e1afb-5b5c-4f0f-96e7-521240b0dd24_1178x957.png 1272w, https://substackcdn.com/image/fetch/$s_!zYPu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff18e1afb-5b5c-4f0f-96e7-521240b0dd24_1178x957.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Also, feel free to iterate on the system prompt to improve the report structure. </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[GPU-Accelerated AI Agent Sandboxes: Rethinking How We Interact with Coding Agents]]></title><description><![CDATA[I got this working in a coffee shop a few hours ago, and I&#8217;m genuinely excited about it.]]></description><link>https://blog.helix.ml/p/gpu-accelerated-ai-agent-sandboxes</link><guid isPermaLink="false">https://blog.helix.ml/p/gpu-accelerated-ai-agent-sandboxes</guid><dc:creator><![CDATA[Chris Sterry]]></dc:creator><pubDate>Wed, 15 Oct 2025 16:02:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/wDFeCGwD_R0" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I got this working in a coffee shop a few hours ago, and I&#8217;m genuinely excited about it. Not because it&#8217;s fancy new tech for the sake of it, but because it solves some real pain points I&#8217;ve been hitting with AI coding agents.</p><p>Let me show you what I mean.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2><strong>The Problem: Agents Need Better Infrastructure</strong></h2><p>Here&#8217;s where we are with AI coding in 2025: The LLMs themselves are plateauing. We&#8217;re not getting exponential intelligence gains anymore - we&#8217;re on more of an S-curve where things went up fast and now they&#8217;re leveling off. GPT-5 was... fine. Claude 4.5 is quite good. But they&#8217;re not going to magically solve all our problems.</p><p>This matters because current coding agents still make plenty of mistakes. And when you combine that with how most agents are architected - typically as JavaScript/TypeScript applications running on your laptop - you hit some fundamental limitations:</p><p><strong>Performance issues:</strong> My main dev machine at home is a 16-core CPU from 2018. It was state of the art back then. Cursor is basically unusable on it. Even Claude Code starts grinding to a halt when you have lots of threads or messages. And I&#8217;m not running some ancient potato - this is a machine with plenty of cores.</p><p><strong>Limited workflows:</strong> Background agents exist, but they&#8217;re either clunky separate UIs (looking at you, Cursor&#8217;s rushed implementation) or they require your laptop to stay open and connected.</p><p><strong>No fleet management:</strong> What if you want to manage 5 agents working on different tasks simultaneously? What if you want a 30,000-foot dashboard view of what your agents are doing?</p><p>The core insight here is that <strong>agents should run on servers, not laptops</strong>. When your agent is a long-running server process, you can close your laptop, get on a train with dodgy internet, and your agent keeps working. You can kick off background tasks from Slack. You can manage fleets of agents.</p><p>But how do you make that feel as smooth as a local IDE?</p><h2><strong>Enter: GPU-Accelerated Agent Sandboxes</strong></h2><p>Here&#8217;s what we built: Each agent gets its own dedicated desktop environment running on a GPU. Not a VNC session that feels like molasses. An actual GPU-accelerated Linux desktop that runs at 120fps and responds instantly to keystrokes.</p><p>The architecture looks like this:</p><ol><li><p><strong>Helix manages the control plane</strong> - You interact with agents through the Helix UI, which handles orchestration, knowledge sources, and conversation history</p></li><li><p><strong>Each agent spins up a containerized desktop</strong> - When you start a coding task, we launch a dedicated environment with Zed (the Rust-based IDE) and your choice of agent (Claude Code, Gemini CLI, or Qwen Code)</p></li><li><p><strong>Moonlight protocol for streaming</strong> - We expose the desktop via Moonlight, which the gaming community built for streaming games from home rigs to phones over 5G. Turns out it works great for streaming IDEs too.</p></li></ol><p>The result? You can work with your agent in the browser, getting full GPU-accelerated rendering. Or you can use the Moonlight client on your phone, tablet, or laptop and get the same smooth experience. The agent keeps running on the server whether you&#8217;re connected or not.</p><h2><strong>Why This Architecture Matters</strong></h2><p><strong>1. It works with any agent, any LLM</strong></p><p>The Zed team created this protocol called ACP (Agent Communication Protocol) that standardizes how agents talk to IDEs. This means we can plug in:</p><ul><li><p>Claude Code (running Anthropic&#8217;s models)</p></li><li><p>Gemini CLI (running Google&#8217;s models)</p></li><li><p>Qwen Code (fully open source, runs entirely on your infrastructure)</p></li></ul><p>We&#8217;re not betting on one agent framework or trying to build our own. We&#8217;re adopting the best tools the community builds and making them work together.</p><p><strong>2. Full context for agents</strong></p><p>When you configure knowledge sources, upload PDFs, integrate with Confluence or Jira, or add MCP servers - all of that gets mirrored into the agent&#8217;s environment. Your agent has the same context you would, but it&#8217;s running in a sandbox.</p><p><strong>3. RAG over your entire team&#8217;s work</strong></p><p>Here&#8217;s where it gets interesting: All conversation history from every agent flows back through Helix. That means you can RAG over your team&#8217;s coding sessions. Every time someone&#8217;s agent solves a problem, that solution becomes searchable for everyone else. It&#8217;s like having your whole team&#8217;s problem-solving experience in a searchable database.</p><p><strong>4. Spec coding by default</strong></p><p>I&#8217;m a big believer in spec coding as the antidote to &#8220;vibe coding.&#8221; The idea is simple: Instead of giving your agent vague instructions like &#8220;add OAuth support,&#8221; you:</p><ul><li><p>Have the agent analyze your codebase and generate a design document</p></li><li><p>Review the spec as a human (catch the stupid ideas before any code is written)</p></li><li><p>Only then implement the spec</p></li></ul><p>We&#8217;re building spec workflows directly into the infrastructure, including a Kanban board for managing agent tasks. Not for teams of humans - for fleets of agents.</p><h2><strong>The Technical Details (For Those Who Care)</strong></h2><p>The gaming community already solved most of the hard problems here. There&#8217;s this project called Games on Whales (whales = Docker containers) that lets you run GPU-accelerated gaming in containers using Wayland.</p><p>We&#8217;re building on top of that foundation:</p><ul><li><p><strong>Wayland desktop</strong>: Only uses a few MB of GPU memory, so you can run dozens of these on a single GPU</p></li><li><p><strong>Moonlight streaming</strong>: Battle-tested by gamers streaming over 5G networks</p></li><li><p><strong>Container isolation</strong>: Each agent gets its own filesystem, preventing agents from stepping on each other&#8217;s toes</p></li><li><p><strong>Zed for the IDE</strong>: Written entirely in Rust with a custom UI library that renders directly to the GPU. It&#8217;s fast. Like, actually fast - not &#8220;fast for an Electron app.&#8221;</p></li></ul><p>The beauty is that these don&#8217;t need fancy GPUs like LLMs do. You can run this on an old laptop with Intel integrated graphics and it works fine. For a production deployment, you can fit ~100 of these instances on a single 16GB GPU.</p><h2><strong>What This Enables</strong></h2><p>When agents only need your attention twice an hour instead of constantly, you can have a fundamentally different interaction mode:</p><ul><li><p><strong>Ambient computing</strong>: Get a WhatsApp message from your agent when it needs input, respond with a voice note</p></li><li><p><strong>Fleet management</strong>: See all your active agents working on different tasks, with visual thumbnails of what they&#8217;re doing</p></li><li><p><strong>Long-running personal environments</strong>: Not just task-based agents, but your daily driver development environment that happens to run in the cloud with GPU acceleration</p></li></ul><p>And here&#8217;s the part that gets me most excited: We can use this ourselves to make Helix better. The snake eating its own tail. Our development team using the product we&#8217;re building, using it to make itself better, faster and faster.</p><h2><strong>The Demo</strong></h2><div id="youtube2-wDFeCGwD_R0" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;wDFeCGwD_R0&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/wDFeCGwD_R0?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>In the video above, you can see:</p><ul><li><p>Spinning up an agent with dedicated desktop environment</p></li><li><p>The Moonlight connection (complete with PIN for security)</p></li><li><p>Claude Code 4.5 building a to-do list app in real-time</p></li><li><p>Updating the branding mid-stream</p></li><li><p>Smooth, GPU-accelerated UI throughout</p></li></ul><p>The agent has access to a full browser (Firefox), can run commands, and gets all the knowledge sources we configured in Helix.</p><h2><strong>Now Open for Private Beta</strong></h2><p>If you want early access:</p><ol><li><p>Join our Discord community and request an invite to be among the first to experience the future of software development.<br></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://discord.gg/VJftd844GE&quot;,&quot;text&quot;:&quot;Join the Private Beta&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://discord.gg/VJftd844GE"><span>Join the Private Beta</span></a></p></li><li><p><strong>Connect with me on LinkedIn</strong> - <a href="https://linkedin.com/in/luke-marsden-71b3789/">linkedin.com/in/luke-marsden-71b3789</a></p></li><li><p><strong>Try Helix</strong> - Even without the agent sandboxes, Helix is a complete private GenAI stack you can run on your infrastructure. Check it out at<a href="https://helix.ml"> helix.ml</a></p></li></ol><p>We&#8217;re especially interested in feedback from teams that:</p><ul><li><p>Run their own GPU infrastructure</p></li><li><p>Need to keep code and data on-prem</p></li><li><p>Want to manage fleets of agents working on multiple tasks</p></li><li><p>Are frustrated with current agent performance</p></li></ul><p>The gaming community figured out how to stream Call of Duty to a phone over 5G. Turns out the same tech makes coding agents feel smooth and responsive. Who knew?</p><div><hr></div><p><em>P.S. - If you&#8217;re wondering about the project name: My co-founder Phil called this a &#8220;massively abstracted distraction&#8221; when I first pitched it, hence MAD. We started by calling it the Helix Agentic Development Environment System - HADES. The god of the underworld is also the god of creating wealth from the earth, which feels appropriate for a bootstrapped company building infrastructure. But every time I tell people about it they say &#8220;isn&#8217;t that hell?&#8221; and I have to explain no, everyone goes to the underworld, but I feel like if you&#8217;re having that conversation then you&#8217;ve already lost, so we decided to be boring and call it Helix Code ;-)</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Kodit 0.5: All Things Git]]></title><description><![CDATA[Kodit has a new domain model which unlocks future enrichments]]></description><link>https://blog.helix.ml/p/kodit-05-all-things-git</link><guid isPermaLink="false">https://blog.helix.ml/p/kodit-05-all-things-git</guid><dc:creator><![CDATA[Phil Winder]]></dc:creator><pubDate>Fri, 26 Sep 2025 11:42:20 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/02fe0156-713a-436c-8f7d-5c6055e41577_320x213.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Yesterday I released <a href="https://github.com/helixml/kodit/releases/tag/0.5.0">Kodit 0.5</a>. As these things often do, This release started off fairly benign. The original intention was to implement features that allowed Kodit to scale to index greater numbers of repositories. However, after attempting to tackle incremental indexing, I quickly realised that we should be mimicking the Git domain more than we currently were.</p><p>In code at 0.4 and before, everything was based upon a directory and files within that directory. But after considering that I wanted to index different versions of a repository (like a different tag or a different branch), that quickly became unsustainable. So I took the decision to migrate everything to a git-based domain model and take advantage of the structure.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2><strong>Breaking Changes Ahoy</strong></h2><p>Because I&#8217;ve changed the domain model, it means that the database schema doesn&#8217;t really match. I made the decision to restructure the database, which means that any old data you have in there will get deleted.</p><p>I also took the opportunity to remove the auto-indexing command. That was introduced as a stopgap before we had API-based indexing. Since we have API indexing now, this was no longer used, so I removed it. </p><h2>New Features</h2><p>With that out of the way, we can now talk about some exciting new features:</p><ol><li><p>The change to the Git domain model. This means that Kodit now has an internal representation of commits, tags, files, and everything else. This not only helps with incremental indexing, which means that you won&#8217;t have to reprocess commits. It also means that new commits where nothing much has changed will hardly require any processing at all. This also unlocks the next round of future enhancements we have planned.</p></li><li><p>Next on the list is LiteLLM integration. The reason for this was that I wanted to incorporate different providers for enrichment and embedding. The simplest way to do that was to use LiteLLM, which supports more than a hundred external embedding providers. I&#8217;ve tested it with Helix, Ollama, vLLM, Azure, and OpenAI, but it should work with any provider. </p></li><li><p>In order to handle increased demand, I&#8217;ve completely refactored the indexing pipeline. Now, we have a queue-based system that also has status endpoints so that you can review the status of an indexing operation without having to look at the logs so much.</p></li><li><p>And finally, there&#8217;s probably more to do here. There&#8217;s been a bit of refactoring and improvement for the database reads and writes. I found that once we had large numbers of commits, the database read performance was quite slow because of the inefficiency in the way that things were structured. This has improved things, but there&#8217;s still more to do. </p></li></ol><h2>What&#8217;s Next?</h2><p>Now that we&#8217;ve got a good domain model, we have big plans for our next steps. First on the list is a wide range of new enrichments. These new enrichments are based around three key repository use cases: using, developing and reading.</p><p>Users of a repository need to know things like the public API and the examples that they can copy from. Developers of a repository need to know the system architecture, the database schema, the layers, and the ways of working. But the readers of a repository want to know the history, the status, a 10km view of the repository as a whole. I&#8217;m not entirely sure how this will be exposed to the MCP at this point in time, but I know that it is useful information. </p><p>The next step after that is to build a user interface to allow users to view all of this information in a pretty, user-friendly way. People shouldn&#8217;t have to browse the API docs to get access to this information.</p><p>And finally, it&#8217;s still on my mind that I want to index more things. I want to index documentation, I want to index API documentation, I want to index all the things. At the moment, Kodit still only indexes code. And I&#8217;m confident that there is more to do in the front-end world as well.</p><p>That&#8217;s all for now, but of course if you have any ideas or any requests for new features, then please visit the repository. https://github.com/helixml/kodit </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Kodit 0.4: Hosting a SaaS, Smarter APIs, and Scaling the Future]]></title><description><![CDATA[Check out the public SaaS instance! It's the future!]]></description><link>https://blog.helix.ml/p/kodit-04-hosting-a-saas-smarter-apis</link><guid isPermaLink="false">https://blog.helix.ml/p/kodit-04-hosting-a-saas-smarter-apis</guid><dc:creator><![CDATA[Phil Winder]]></dc:creator><pubDate>Sat, 09 Aug 2025 11:55:42 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/9e7a17c3-7c34-4cce-bd7d-b12abe15b474_1000x774.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>What started as a side-note turned into one of the biggest leaps forward yet.</p><p>My vision for <a href="https://docs.helix.ml/kodit/">Kodit</a> was to help <strong>AI coding assistants to search for and provide relevant</strong> <strong>context</strong> from private repositories.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>But while marvelling at the traction <a href="https://context7.com">Context7</a> has received on Reddit, I realised that there&#8217;s much more value up for grabs by indexing <em>public</em> repositories as well. So I planned a small feature in 0.3 to launch a hosted Kodit instance that <a href="https://docs.helixml.tech/kodit/reference/hosted-kodit/">users can connect to</a> <strong>without installing any MCP servers</strong>.</p><p>It turned out that the act of launching a public service highlighted a variety of scalability challenges in the current implementation. This is fantastic in that it helped me harden Kodit, but it meant that it&#8217;s been nearly a month since the last release!</p><p>But this does mean that 0.4 is chock-full of juicy features, so let me dive in&#8230;</p><h2>Highlights</h2><p>Let&#8217;s cut to the chase. Here&#8217;s an at-a-glance view of what you should take notice of in Kodit 0.4:</p><ul><li><p><a href="https://docs.helixml.tech/kodit/reference/hosted-kodit/">Kodit SaaS</a> - Pull in context from public repositories without installing anything</p></li><li><p>Incremental Indexing - Only changed files are reindexed</p></li><li><p><a href="https://docs.helixml.tech/kodit/reference/api/">Management API</a> - Full REST control over a Kodit server</p></li><li><p><a href="https://docs.helixml.tech/kodit/reference/mcp/">Streaming HTTP Support</a> - SSE has been deprecated by MCP</p></li><li><p>Program Slicing - Slightly more sophisticated way of indexing codebases</p></li><li><p><a href="https://docs.helixml.tech/kodit/reference/deployment/">Cron-based sync schedule &amp; CLI API integration</a></p></li></ul><h2><strong>Getting Started with Kodit SaaS &amp; HTTP Streaming</strong></h2><p>If you want to see Kodit 0.4 in action, just try it. The <a href="https://docs.helixml.tech/kodit/reference/hosted-kodit/">hosted version</a> makes it so simple try you almost don&#8217;t need any instructions to do it. But just in case, here&#8217;s a quick demo.</p><p>Browse to the <a href="https://kodit.helix.ml/docs">API docs</a> and try using the <a href="https://kodit.helix.ml/docs#/search/search_snippets_api_v1_search_post">/search API</a>. Click on the &#8220;Try it out&#8221; button and paste something like this:</p><pre><code>{
  "data": {
    "type": "search",
    "attributes": {
      "text": "an mapper that maps an index domain object to a database object"
    }
  }
}</code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!m5ds!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13000a7d-66ff-4b04-a82a-fbfe5d88b3e7_1473x846.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!m5ds!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13000a7d-66ff-4b04-a82a-fbfe5d88b3e7_1473x846.png 424w, https://substackcdn.com/image/fetch/$s_!m5ds!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13000a7d-66ff-4b04-a82a-fbfe5d88b3e7_1473x846.png 848w, https://substackcdn.com/image/fetch/$s_!m5ds!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13000a7d-66ff-4b04-a82a-fbfe5d88b3e7_1473x846.png 1272w, https://substackcdn.com/image/fetch/$s_!m5ds!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13000a7d-66ff-4b04-a82a-fbfe5d88b3e7_1473x846.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!m5ds!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13000a7d-66ff-4b04-a82a-fbfe5d88b3e7_1473x846.png" width="1456" height="836" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/13000a7d-66ff-4b04-a82a-fbfe5d88b3e7_1473x846.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:836,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:245517,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/170520624?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13000a7d-66ff-4b04-a82a-fbfe5d88b3e7_1473x846.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!m5ds!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13000a7d-66ff-4b04-a82a-fbfe5d88b3e7_1473x846.png 424w, https://substackcdn.com/image/fetch/$s_!m5ds!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13000a7d-66ff-4b04-a82a-fbfe5d88b3e7_1473x846.png 848w, https://substackcdn.com/image/fetch/$s_!m5ds!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13000a7d-66ff-4b04-a82a-fbfe5d88b3e7_1473x846.png 1272w, https://substackcdn.com/image/fetch/$s_!m5ds!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13000a7d-66ff-4b04-a82a-fbfe5d88b3e7_1473x846.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The results will list the content of the snippet, the relevancy score, a summary of the snippet (which is what you just searched) and some metadata related to where the file can be found. Take a look at the <a href="https://docs.helixml.tech/kodit/reference/api/">API docs</a> to learn more about how you can use the rest of the API.</p><p>Or you can <strong>add Kodit to your favourite AI coding assistant</strong> by connecting to the public MCP server: <code>https://kodit.helix.ml/mcp</code></p><p>For example, in Claude Code you can execute:</p><pre><code>claude mcp add --transport http kodit https://kodit.helix.ml/mcp</code></pre><p>Or in Cline, add the following settings:</p><pre><code>{
  "mcpServers": {
    "kodit": {
      "autoApprove": [],
      "disabled": false,
      "timeout": 60,
      "type": "streamableHttp",
      "url": "https://kodit.helix.ml/mcp"
    }
  }
}</code></pre><p><a href="https://docs.helixml.tech/kodit/reference/mcp/">Instructions for other AI coding assistants</a> are available in the documentation.</p><h2>Management API and Enterprise Features</h2><p>Kodit was initially designed to <strong>index the private repositories that exist throughout larger organisations</strong>. And our design partners suggested a variety of new features that would make it easier to operate Kodit at scale.</p><p>The new <a href="https://docs.helixml.tech/kodit/reference/api/">REST API</a> allows you to remotely manage a Kodit server from afar. Simple key-based authentication adds a rudimentary access control mechanism.</p><p>Plugging the CLI into the API allows users to continue to have the same <a href="https://docs.helixml.tech/kodit/reference/deployment/#remote-cli-access">CLI experience even when working with a remote instance.</a></p><p>And a new <a href="https://docs.helixml.tech/kodit/reference/sync/">cron-based scheduler </a>allows Kodit servers to keep indexes up-to-date.</p><h2>Core Features</h2><p>Slightly less exciting, but fundamental to the value of Kodit is the algorithm used to index repositories. Previously, Kodit used a simple query-based selection algorithm that basically just pulled out all methods.</p><p>The new program slicer takes this a step further and attempts to identify all dependencies of a method. In results you will see relevant imports, dependent functions and even examples of usage. It&#8217;s not perfect and quality might differ between languages because of different language implementations, but it&#8217;s a lot better than before.</p><p>Talking of languages, Kodit now officially supports the following:</p><ul><li><p>python</p></li><li><p>java</p></li><li><p>c</p></li><li><p>c++</p></li><li><p>rust</p></li><li><p>go</p></li><li><p>javascript</p></li><li><p>c#</p></li><li><p>html</p></li><li><p>css</p></li></ul><p>html and css are particularly interesting because they are obviously markup and design languages, not procedural ones. Defining exactly what constitutes a snippet in these languages is hard and I didn&#8217;t spend too much time on it. So if you have any suggestions I&#8217;d <a href="https://github.com/helixml/kodit/discussions">love to hear from you</a>.</p><h2>Initial Helix Integration</h2><p>The deployment of the <strong>new Kodit SaaS</strong> takes one step towards becoming a part of the Helix family. Since you&#8217;re reading this on the Helix blog, you probably already know about Helix.</p><p>The eventual goal is to have much tighter integration with Helix, but the first and most obvious integration point is to leverage Helix&#8217;s on-premise private architecture to provide embeddings and enrichment.</p><p>So you&#8217;ll be glad to hear that everything that exists within the Kodit database is powered by Helix. <strong>No information is shared with or delegated to third party AI services. It&#8217;s all running on our own A100&#8217;s.</strong></p><p>I did, however, start with Kodit&#8217;s parallelism set too high and temporarily both saturated the Helix SaaS and locked myself out due to violating rate limits. To fix this I implemented a new dedicated, socket-based API to communicate with directly with Helix and added greater configuration over the parallelism to give standard Helix SaaS users room to breathe.</p><h2><strong>Closing</strong></h2><p>Kodit 0.4 is the strongest yet, but it&#8217;s still not reached a scale that I&#8217;m happy with. To be truly valuable to public users, Kodit must index at least the top 1000 repositories on Github. This is at least 2 orders of magnitude greater than what exists today and I have no doubt there will be challenges achieving that scale. <strong>Kodit 0.5 will concentrate on enabling Github-scale.</strong></p><p>Together with Helix&#8217;s design partners, we&#8217;re also thinking towards Kodit 0.6, where <strong>we want to expose important information about fixes and features by indexing issues and pull requests</strong>. I also want to <strong>index documentation</strong> too, to unlock the indexing of private enterprise documentation. This is more challenging due to the different systems involved (Github, Gitlab, Azure DevOps, Jira, and so on). But I feel like it&#8217;s achievable.</p><h3>Help Me Help You</h3><p>Any open source project lives and dies through support. I&#8217;d really appreciate it if you <strong>give Kodit a try</strong> and let me know about your experience. Kodit&#8217;s not quite ready for prime-time public adoption yet, purely because of the lack scale, but it will come soon. In the meantime, now is the right time to address any issues.</p><p>Also, if you have any <strong>burning AI coding needs</strong> that are blocking you from doing what you want to do, that&#8217;s just the kind of idea that would be helpful to Kodit.</p><p>You can reach out to me at <a href="mailto:phil@helix.ml">phil@helix.ml</a> or <a href="https://github.com/helixml/kodit/discussions">start a discussion</a>.</p><h2>Further Reading</h2><ul><li><p><a href="https://docs.helixml.tech/kodit/">Kodit Docs</a></p></li><li><p><a href="https://github.com/helixml/kodit">Kodit Repository</a></p></li><li><p><a href="https://helix.ml/">Helix 2.0</a></p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Bootstrapped Private GenAI Startup Hits $1M Annual Revenue, Launches Helix 2.0]]></title><description><![CDATA[The people behind the story, how agentic AI is changing and why we don't want a sales call with you]]></description><link>https://blog.helix.ml/p/bootstrapped-private-genai-startup</link><guid isPermaLink="false">https://blog.helix.ml/p/bootstrapped-private-genai-startup</guid><dc:creator><![CDATA[Luke Marsden]]></dc:creator><pubDate>Thu, 31 Jul 2025 13:40:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!RxcO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b8cf4b-c89e-4236-81d0-aa34a318382a.heic" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RxcO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b8cf4b-c89e-4236-81d0-aa34a318382a.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RxcO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b8cf4b-c89e-4236-81d0-aa34a318382a.heic 424w, https://substackcdn.com/image/fetch/$s_!RxcO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b8cf4b-c89e-4236-81d0-aa34a318382a.heic 848w, https://substackcdn.com/image/fetch/$s_!RxcO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b8cf4b-c89e-4236-81d0-aa34a318382a.heic 1272w, https://substackcdn.com/image/fetch/$s_!RxcO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b8cf4b-c89e-4236-81d0-aa34a318382a.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RxcO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b8cf4b-c89e-4236-81d0-aa34a318382a.heic" width="375" height="281.25" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b6b8cf4b-c89e-4236-81d0-aa34a318382a.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:375,&quot;bytes&quot;:1577671,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/169743593?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b8cf4b-c89e-4236-81d0-aa34a318382a.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RxcO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b8cf4b-c89e-4236-81d0-aa34a318382a.heic 424w, https://substackcdn.com/image/fetch/$s_!RxcO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b8cf4b-c89e-4236-81d0-aa34a318382a.heic 848w, https://substackcdn.com/image/fetch/$s_!RxcO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b8cf4b-c89e-4236-81d0-aa34a318382a.heic 1272w, https://substackcdn.com/image/fetch/$s_!RxcO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b8cf4b-c89e-4236-81d0-aa34a318382a.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Luke, Phil and Matt (advisor) at a conference in London</figcaption></figure></div><p>It was late 2023 when we decided to do startup #3. This time, having experienced first-hand the impact that the &#8220;ChatGPT moment&#8221; had on my consulting clients, I saw the opportunity for a private GenAI stack that you could run on your own infrastructure.</p><p>I was fortunate to have saved enough from consulting so that I was able to stop earning for a year and dive headfirst into building product again. Phil and Chris did the same, and we committed to bootstrapping this thing. (If you haven&#8217;t read &#8220;<a href="https://medium.com/signal-v-noise/reconsider-41adf356857f">Reconsider</a>&#8221;, I recommend it.)</p><p>Since then, we&#8217;ve had our share of ups and downs &#8211;&nbsp;but having intentionally not taken any external funding, I&#8217;m pleased &#8211;&nbsp;and significantly relieved &#8211; to be able to announce that we&#8217;ve just hit the milestone of $1M in annual enterprise revenue.</p><p>I can&#8217;t speak publicly about who the customers are yet, but suffice to say we are lucky to have some of the most innovative hedge funds, investment managers, and service providers in the world working with us.</p><p>Why are they investing in <em><strong>AI Agents on a Private GenAI Stack</strong></em>?</p><h1>Because AI Agents will (actually) change everything</h1><p>In the early days of Helix, I had a healthy degree of skepticism that AI might be more hype than substance. But it has been this year, 2025, that has convinced me that we&#8217;re on a trajectory for AI Agents to be a true new industrial revolution. What convinced me? It was using the agents in my own work. If you look at Cursor with Claude 4, it&#8217;s made me 30-40x more productive than I used to be armed merely with <code>vim</code>. If you look at the hours worth of research you can do with Perplexity in minutes, I&#8217;m able to make informed business and technical decisions in a fraction of time it took before.</p><p>So is that it? We all hand our data over to these AI SaaS companies, and they take the whole pie? Wait up &#8211;&nbsp;there&#8217;s a problem lurking in big businesses. With increasing geopolitical strife and threats from security breaches, enterprises are ever-more sensitive to where they send their data, and how they protect their core IP. Turns out, lots of companies are not comfortable sending a lot of their data to OpenAI, or even Microsoft. Private cloud is back. So where&#8217;s the private cloud GenAI stack?</p><h1>Enter Helix 2.0: The Fastest Path to AI Agents on a Private GenAI Stack</h1><p>You can read more about it on our <a href="https://helix.ml">shiny new website</a>, but the goal here is to be the Macbook Pro of GenAI stacks.</p><p>Ever tried using Linux on the desktop? Spent time recompiling your kernel so you can get your webcam working? Wonder why people buy Apple products even though they&#8217;re expensive? It&#8217;s because they <em>just work</em>, from soup to nuts. That&#8217;s the goal with our GenAI stack for running on your own infrastructure.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!w10G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710de85-c10c-41f8-8473-c27db11f7ccc_1448x1320.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!w10G!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710de85-c10c-41f8-8473-c27db11f7ccc_1448x1320.png 424w, https://substackcdn.com/image/fetch/$s_!w10G!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710de85-c10c-41f8-8473-c27db11f7ccc_1448x1320.png 848w, https://substackcdn.com/image/fetch/$s_!w10G!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710de85-c10c-41f8-8473-c27db11f7ccc_1448x1320.png 1272w, https://substackcdn.com/image/fetch/$s_!w10G!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710de85-c10c-41f8-8473-c27db11f7ccc_1448x1320.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!w10G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710de85-c10c-41f8-8473-c27db11f7ccc_1448x1320.png" width="491" height="447.5966850828729" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b710de85-c10c-41f8-8473-c27db11f7ccc_1448x1320.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1320,&quot;width&quot;:1448,&quot;resizeWidth&quot;:491,&quot;bytes&quot;:1373984,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/169743593?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710de85-c10c-41f8-8473-c27db11f7ccc_1448x1320.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!w10G!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710de85-c10c-41f8-8473-c27db11f7ccc_1448x1320.png 424w, https://substackcdn.com/image/fetch/$s_!w10G!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710de85-c10c-41f8-8473-c27db11f7ccc_1448x1320.png 848w, https://substackcdn.com/image/fetch/$s_!w10G!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710de85-c10c-41f8-8473-c27db11f7ccc_1448x1320.png 1272w, https://substackcdn.com/image/fetch/$s_!w10G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb710de85-c10c-41f8-8473-c27db11f7ccc_1448x1320.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We give you everything you need to run AI Agents, connected to your data and business systems, either on open source models like Qwen3 running on your own GPUs or proprietary LLMs that you can provision in your VPC, like Claude 4.</p><p>Here&#8217;s a demo of our latest stuff, check it out:</p><div id="youtube2-N9Fcas3w4xw" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;N9Fcas3w4xw&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/N9Fcas3w4xw?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h1>Wait, you don&#8217;t want a sales call with me?</h1><p>So sometime last year I was talking with my friend <a href="https://www.linkedin.com/in/merrells/">John Merrells</a> and he made the excellent point that most people under 40 don&#8217;t want a sales call to be able to buy something. So we put tons of effort into making Helix easy to self-serve, with transparent pricing.</p><p>It&#8217;s not that we don&#8217;t want to talk to you &#8211;&nbsp;we talk to our customers all the time, pair with them, fly to their offices to run workshops and co-develop agents with them &#8211; but instead of being forced to sit through a slide deck, you can <a href="https://www.helix.ml/docs">deploy Helix yourself</a> in minutes, in Docker, Kubernetes or any major cloud. If you have a question then just hit the chat box in the bottom right of the website (it will connect you to the real team, not an AI). You can evaluate Helix yourself and provision a license through our new <a href="https://helix.ml/home">self-service Launchpad system</a> where you can deploy:</p><ul><li><p><strong>Agents onto Helix Cloud</strong>, our SaaS demo environment - where you get a regular user account</p></li><li><p><strong>Trial VMs</strong> - which give you root access to a VM and full admin access to Helix, although they are configured to talk to external inference providers so are not fully private. We spin these up for you instantly, because we keep a warm pool of them for instant use</p></li><li><p><strong>GPU Instances</strong> - we offer 2x A100 80GB GPU nodes at just $5/hour through our partner <a href="https://www.civo.com/newsroom/helixml-launches-helix-2-0-with-civo-as-gpu-provider">Civo</a>, who have been awesome to work with</p></li></ul><p>Everything you can do with Helix, from multi-turn agents integrated with business apps and vision RAG over complex document layouts, can all run on a single A100 GPU, fully private.</p><h1>Cheers to $1M revenue!</h1><p>So cheers, here&#8217;s to the team who made this happen (including our secret co-founder), folks joining and helping out, the customers who put their trust in us, and all of you crazy AI labs out there building SOTA LLMs &#8211;&nbsp;thank you!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GND5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F227b81d8-9cc5-438e-b320-72d3c0affaf8_4032x3024.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GND5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F227b81d8-9cc5-438e-b320-72d3c0affaf8_4032x3024.heic 424w, https://substackcdn.com/image/fetch/$s_!GND5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F227b81d8-9cc5-438e-b320-72d3c0affaf8_4032x3024.heic 848w, https://substackcdn.com/image/fetch/$s_!GND5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F227b81d8-9cc5-438e-b320-72d3c0affaf8_4032x3024.heic 1272w, https://substackcdn.com/image/fetch/$s_!GND5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F227b81d8-9cc5-438e-b320-72d3c0affaf8_4032x3024.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GND5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F227b81d8-9cc5-438e-b320-72d3c0affaf8_4032x3024.heic" width="376" height="282" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/227b81d8-9cc5-438e-b320-72d3c0affaf8_4032x3024.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:376,&quot;bytes&quot;:1351236,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/169743593?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F227b81d8-9cc5-438e-b320-72d3c0affaf8_4032x3024.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GND5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F227b81d8-9cc5-438e-b320-72d3c0affaf8_4032x3024.heic 424w, https://substackcdn.com/image/fetch/$s_!GND5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F227b81d8-9cc5-438e-b320-72d3c0affaf8_4032x3024.heic 848w, https://substackcdn.com/image/fetch/$s_!GND5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F227b81d8-9cc5-438e-b320-72d3c0affaf8_4032x3024.heic 1272w, https://substackcdn.com/image/fetch/$s_!GND5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F227b81d8-9cc5-438e-b320-72d3c0affaf8_4032x3024.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Chris and Luke having a well-deserved whisky en route from swampUP in Austin to San Francisco</figcaption></figure></div><div><hr></div><p><em>Check out <a href="https://helix.ml">helix.ml</a> to build AI agents on Helix, deploy to your own infrastructure, and eliminate tedious work in your business.</em></p><p><em>For a slightly more formal take on the 2.0 release, and more information, check out the press release: <a href="https://www.businesswire.com/news/home/20250731430866/en/Helix-2.0-Gives-Global-Enterprises-the-Fastest-Path-to-AI-Agents-on-a-Private-GenAI-Stack">Helix 2.0 Gives Global Enterprises the Fastest Path to AI Agents on a Private GenAI Stack</a></em></p>]]></content:encoded></item><item><title><![CDATA[Kodit 0.3: 10x Faster Indexing and Enterprise-Grade New Features]]></title><description><![CDATA[Information about Kodit's latest 0.3 release]]></description><link>https://blog.helix.ml/p/kodit-03-10x-faster-indexing-and</link><guid isPermaLink="false">https://blog.helix.ml/p/kodit-03-10x-faster-indexing-and</guid><dc:creator><![CDATA[Phil Winder]]></dc:creator><pubDate>Fri, 27 Jun 2025 14:12:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uVK-!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6ac6823-53fa-4485-b35d-65c2770f5cb8_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="https://github.com/helixml/kodit">Kodit</a>, the MCP server that keeps your most important codebases searchable, has just reached its biggest milestone yet. Thanks to community feedback I&#8217;ve dramatically improved indexing throughput and delivered a raft of enterprise-focused enhancements.</p><ul><li><p><strong>10&#215; faster indexing</strong>: smarter batching + streaming generators</p></li><li><p><strong>Private Azure DevOps support</strong>: zero-config, secrets scrubbed</p></li><li><p><strong>Pre-filter searches</strong>: by language, author, timestamp or repo</p></li><li><p><strong>Auto-indexing</strong>: via environment variables (AI GitOps!)</p></li><li><p><strong>Slick CLI progress bars</strong>: for instant feedback</p></li></ul><p>Read on for the details or <a href="https://docs.helixml.tech/kodit/getting-started/">skip to the Quick Start</a> and give it a spin.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support our work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Improving Performance</h2><p>This version delivers a major throughput improvement to the indexing process. I started with a <a href="https://github.com/helixml/kodit/issues/118">GitHub issue</a> that rightly suggested that the indexing UX was poor. So I began by converting all heavy i/o loops to generators to reduce RAM usage by streaming results back to whatever needed it.</p><p>On the way I found a massive issue with the way I was batching data for embedding. Batching is required because most embedding APIs (local and remote) support sending batches of embeddings to an endpoint out of the box. But they only support this up to a point. OpenAI, for example, <a href="https://platform.openai.com/docs/api-reference/embeddings">only supports batches of up to 8192 tokens</a>, otherwise you get a HTTP 400 error. That means you need to a) calculate the number of tokens in your data, and b) only batch them up to a point where they fit.</p><p>What&#8217;s worse, sometimes people like to write massive functions, which means that I have seen snippets longer than 8192, in which case you need to truncate. But because tokens != words, you need to use the tokeniser to figure out where you need to truncate.</p><p>I found, however, that I had a <a href="https://github.com/helixml/kodit/pull/141/files#diff-d85f95a5d5f5c9608ae5af58ca1008a025a2c8d8ce09a7a235b067ca9922ddf9L49">while loop iteratively trying to reduce the character count and recalculating the number of tokens that took </a><em><a href="https://github.com/helixml/kodit/pull/141/files#diff-d85f95a5d5f5c9608ae5af58ca1008a025a2c8d8ce09a7a235b067ca9922ddf9L49">every</a></em><a href="https://github.com/helixml/kodit/pull/141/files#diff-d85f95a5d5f5c9608ae5af58ca1008a025a2c8d8ce09a7a235b067ca9922ddf9L49"> character</a>. Stupid, I know. I replaced this with a version that used the raw token array to truncate data in one go. This alone provided a 10x improvement.</p><p>After that, and after a brief quest to make the codebase more domain driven, I then implemented an observer pattern to have callbacks to the CLI code to display nice progress bars for all operations. UX win for everyone!</p><blockquote><p><em>Indexing will still crawl if you try to index large repositories on your laptop using local models. Use <a href="https://docs.helixml.tech/kodit/reference/configuration/#default-indexing-provider">an external AI provider</a> like OpenAI or <a href="https://Helix.ML">Helix.ML</a> to make it really snappy!</em></p></blockquote><h2>Indexing Private Repositories</h2><p>I had an important enterprise request to be able to index private Azure DevOps repositories. Thankfully it turned out that the Git URI schema happily accepted personal access tokens and Azure DevOps repositories. The only thing I needed to do was <a href="https://github.com/helixml/kodit/pull/152">sanitise the URI</a> so that secrets didn&#8217;t end up in the database or the logs.</p><p><a href="https://docs.helixml.tech/kodit/reference/indexing/#indexing-a-private-azure-devops-repository">Check out the documentation for more details</a>.</p><h2>Filtering Searches By X</h2><p>Another enterprise feature that is also useful to power users, is the ability to pre-filter search results in the MCP or CLI interfaces. Previously, if you had a large number of repositories, it was hard for the agent to find canonical results. There&#8217;s a variety of reasons for this, but the main one is that much of the index isn&#8217;t relevant to the user&#8217;s current workspace. For example, it&#8217;s quite likely that the user doesn&#8217;t need Java snippets when they are writing a Python application.</p><p>So Kodit 0.3 introduces filters that allow you to restrict the search to source, language, author, or timestamp. Of course in most usage, it&#8217;s the AI agent that makes this decision, but you can influence what filters it predicts <a href="https://docs.helixml.tech/kodit/reference/mcp/#filtering-capabilities">with good prompting.</a></p><h2>Auto-Indexing</h2><p>Aside from <a href="https://docs.helixml.tech/kodit/reference/deployment/">improving the deployment documentation</a>, we also had an enterprise request to make it possible to index via configuration; AI GitOps, if you will. I achieved this by <a href="https://docs.helixml.tech/kodit/reference/indexing/#auto-indexing">exposing some new environmental variables</a> that allow you to specify what gets indexed at configuration time. I call this &#8220;auto-indexing.&#8221;</p><p>In the future I envisage that I might get requests for the ability to specify configuration options per index or even provide an external API to update the index remotely. If you&#8217;re interested in any of this, please <a href="https://github.com/helixml/kodit/discussions">raise a feature request</a>.</p><h2>What&#8217;s Next?</h2><p>I have lots more planned for <a href="https://github.com/helixml/kodit/milestone/4">the next milestone</a>. Although I&#8217;d love to hear your thoughts. If you have a great idea don&#8217;t keep it to yourself. <a href="https://github.com/helixml/kodit/discussions">Let me know!</a> I&#8217;d love to include it in a future milestone. The next milestone will include the following major features:</p><ul><li><p><strong>better CLI tools</strong> to manage indexes</p></li><li><p>ability to <strong>keep indexes synchronised</strong> with their source</p></li><li><p><strong>full MCP protocol coverage</strong> to make it easier to use and install (especially streaming HTTP, to get the OAuth support)</p></li><li><p>a <strong>Helix hosted SaaS</strong> version of Kodit to make it even easier to get started and open the door to federated indexing</p></li></ul><h2>Try Kodit Now</h2><p>Now&#8217;s your chance to try Kodit if you haven&#8217;t tried it yet. I think it&#8217;s fast becoming <em>the</em> way to ensure your AI coding assistant has the context it needs to work with obscure libraries, <a href="https://docs.helixml.tech/kodit/demos/knock-knock-auth/">private enterprise repositories</a>, or even when you&#8217;re working within <a href="https://docs.helixml.tech/kodit/demos/go-simple-microservice/">a microservices architecture</a>!</p><p>Try it now and <a href="https://github.com/helixml/kodit/discussions">let me know how it goes</a>!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Helix Kodit: Open Source MCP Server to Index External Repositories]]></title><description><![CDATA[Early adopter edition of Helix Kodit - get the best out of your AI coding assistant]]></description><link>https://blog.helix.ml/p/helix-kodit-open-source-mcp-server</link><guid isPermaLink="false">https://blog.helix.ml/p/helix-kodit-open-source-mcp-server</guid><dc:creator><![CDATA[Phil Winder]]></dc:creator><pubDate>Mon, 09 Jun 2025 14:16:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!6bAW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90997a05-425b-453b-b02a-daaaf7401a62_1000x774.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>AI coding assistants have emerged as one of the best use cases for generative AI. VC&#8217;s and companies are <a href="https://tracxn.com/d/trending-business-models/startups-in-ai-coding-assistants/__3QagxTE--TJ84LXFHTH9fsuMWPV16HtCvsvtOGaaNBs">investing billions</a> in developing this use case because the value proposition is clear. They help you develop faster.</p><h2>The Problem With AI Coding Assistants</h2><p>And if you&#8217;re anything like me, you&#8217;ve been using AI coding assistants to speed up your development. You&#8217;ve probably had a lot of success but there are still many areas for improvement.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>One key problem that hinders me most is that AI models have a large data blind spot. Some of these are quite obvious.</p><p>Foundation models have been trained on data up to a certain point in time. Whenever you use a new model, it&#8217;s likely that the data cutoff was, at best, approximately one year before. This means that the model is incapable of accurately predicting code for new versions of a language or library.</p><p>The next problem is that the capability of a model is directly related how much relevant data was included in the training data. Esoteric libraries, with few examples, despite being old, might be so under-sampled that again, the model can&#8217;t infer what the code should look like.</p><p>The third, and possibly most important example is when codebases are private. Private codebases are restricted and it&#8217;s unlikely (not impossible!) that this data has made it into the model&#8217;s training data. This means that the model is again incapable of generating code directly related to your private, enterprise code.</p><p>There&#8217;s more situations where models perform poorly due to lack of awareness, but these are the main three. So what&#8217;s the best way of overcoming this?</p><h2>Can RAG Help?</h2><p>One pattern that has <a href="https://winder.ai/llm-architecture-rag-implementation-design-patterns/">proven itself is retrieval</a>. Retrieval augmented generation (RAG) incorporates extra context like examples, documentation, data, and <a href="https://winder.ai/practical-use-cases-for-retrieval-augmented-generation-rag/">anything else related to the problem at hand</a>. This provides the model with extra information with which to make a prediction. The results are often much better.</p><p>This lead me to an idea to build a tool that allows you to &#8220;include&#8221; codebases and related information that help overcome the previous three issues. You can include codebases for new libraries, codebases with accepted enterprise patterns, and in the future, much much more.</p><p>You ingest codebases and pass relevant information to the AI assistant to help it write better code.</p><h2>Introducing Kodit - Early Adopter Edition</h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6bAW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90997a05-425b-453b-b02a-daaaf7401a62_1000x774.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6bAW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90997a05-425b-453b-b02a-daaaf7401a62_1000x774.png 424w, https://substackcdn.com/image/fetch/$s_!6bAW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90997a05-425b-453b-b02a-daaaf7401a62_1000x774.png 848w, https://substackcdn.com/image/fetch/$s_!6bAW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90997a05-425b-453b-b02a-daaaf7401a62_1000x774.png 1272w, https://substackcdn.com/image/fetch/$s_!6bAW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90997a05-425b-453b-b02a-daaaf7401a62_1000x774.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6bAW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90997a05-425b-453b-b02a-daaaf7401a62_1000x774.png" width="124" height="95.976" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/90997a05-425b-453b-b02a-daaaf7401a62_1000x774.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:774,&quot;width&quot;:1000,&quot;resizeWidth&quot;:124,&quot;bytes&quot;:865912,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.helix.ml/i/165539615?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90997a05-425b-453b-b02a-daaaf7401a62_1000x774.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6bAW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90997a05-425b-453b-b02a-daaaf7401a62_1000x774.png 424w, https://substackcdn.com/image/fetch/$s_!6bAW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90997a05-425b-453b-b02a-daaaf7401a62_1000x774.png 848w, https://substackcdn.com/image/fetch/$s_!6bAW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90997a05-425b-453b-b02a-daaaf7401a62_1000x774.png 1272w, https://substackcdn.com/image/fetch/$s_!6bAW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90997a05-425b-453b-b02a-daaaf7401a62_1000x774.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>I&#8217;m pleased to announce the early adopter edition of Kodit. Kodit is an MCP server that indexes codebases and offers relevant snippets of code to your coding assistant.</p><p>I chose to expose Kodit as an MCP server because the vast majority of coding assistants can now integrate with tools via MCP. So all you need to do is index your codebases, connect it to your coding assistant, and let your assistant query Kodit for relevant examples.</p><p>In this early adopter version, you can index local and remote codebases, search using keyword and semantic search, and scale by using an external database and AI providers.</p><p>I&#8217;ve focused on trying to provide a strong local experience, so out of the box it will use a local database and local models. Performance won&#8217;t be great, but you won&#8217;t need to add any API keys. For more advanced, daily enterprise users, you can start Kodit as a container, use specialised search-optimised databases, and external (or on premise!) AI providers. Learn how to do this in the <a href="https://docs.helixml.tech/kodit/reference/">reference documentation</a>.</p><p>In my experience, I&#8217;ve had much better results on a variety of tasks when using Kodit. But I&#8217;m launching this early adopter edition to <a href="https://github.com/helixml/kodit/discussions/new/choose">gather feedback</a> from your experience with Kodit.</p><p>This early feedback will help ensure that the roadmap represents ideas that really help you. So please help by trying Kodit, giving feedback on what does and doesn&#8217;t work, and what you&#8217;d like to see moving forward.</p><p>More information:</p><ul><li><p><a href="https://github.com/helixml/kodit/">Repository</a></p></li><li><p><a href="https://docs.helix.ml/kodit/">Documentation</a> and <a href="https://docs.helixml.tech/kodit/getting-started/">Getting Started Guide</a></p></li><li><p><a href="https://github.com/helixml/kodit/discussions/new/choose">Provide feedback</a>, good or bad!</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.helix.ml/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading HelixML! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>