GPU-Accelerated AI Agent Sandboxes: Rethinking How We Interact with Coding Agents

and

Oct 15, 2025

I got this working in a coffee shop a few hours ago, and I’m genuinely excited about it. Not because it’s fancy new tech for the sake of it, but because it solves some real pain points I’ve been hitting with AI coding agents.

Let me show you what I mean.

The Problem: Agents Need Better Infrastructure

Here’s where we are with AI coding in 2025: The LLMs themselves are plateauing. We’re not getting exponential intelligence gains anymore - we’re on more of an S-curve where things went up fast and now they’re leveling off. GPT-5 was... fine. Claude 4.5 is quite good. But they’re not going to magically solve all our problems.

This matters because current coding agents still make plenty of mistakes. And when you combine that with how most agents are architected - typically as JavaScript/TypeScript applications running on your laptop - you hit some fundamental limitations:

Performance issues: My main dev machine at home is a 16-core CPU from 2018. It was state of the art back then. Cursor is basically unusable on it. Even Claude Code starts grinding to a halt when you have lots of threads or messages. And I’m not running some ancient potato - this is a machine with plenty of cores.

Limited workflows: Background agents exist, but they’re either clunky separate UIs (looking at you, Cursor’s rushed implementation) or they require your laptop to stay open and connected.

No fleet management: What if you want to manage 5 agents working on different tasks simultaneously? What if you want a 30,000-foot dashboard view of what your agents are doing?

The core insight here is that agents should run on servers, not laptops. When your agent is a long-running server process, you can close your laptop, get on a train with dodgy internet, and your agent keeps working. You can kick off background tasks from Slack. You can manage fleets of agents.

But how do you make that feel as smooth as a local IDE?

Enter: GPU-Accelerated Agent Sandboxes

Here’s what we built: Each agent gets its own dedicated desktop environment running on a GPU. Not a VNC session that feels like molasses. An actual GPU-accelerated Linux desktop that runs at 120fps and responds instantly to keystrokes.

The architecture looks like this:

Helix manages the control plane - You interact with agents through the Helix UI, which handles orchestration, knowledge sources, and conversation history
Each agent spins up a containerized desktop - When you start a coding task, we launch a dedicated environment with Zed (the Rust-based IDE) and your choice of agent (Claude Code, Gemini CLI, or Qwen Code)
Moonlight protocol for streaming - We expose the desktop via Moonlight, which the gaming community built for streaming games from home rigs to phones over 5G. Turns out it works great for streaming IDEs too.

The result? You can work with your agent in the browser, getting full GPU-accelerated rendering. Or you can use the Moonlight client on your phone, tablet, or laptop and get the same smooth experience. The agent keeps running on the server whether you’re connected or not.

Why This Architecture Matters

1. It works with any agent, any LLM

The Zed team created this protocol called ACP (Agent Communication Protocol) that standardizes how agents talk to IDEs. This means we can plug in:

Claude Code (running Anthropic’s models)
Gemini CLI (running Google’s models)
Qwen Code (fully open source, runs entirely on your infrastructure)

We’re not betting on one agent framework or trying to build our own. We’re adopting the best tools the community builds and making them work together.

2. Full context for agents

When you configure knowledge sources, upload PDFs, integrate with Confluence or Jira, or add MCP servers - all of that gets mirrored into the agent’s environment. Your agent has the same context you would, but it’s running in a sandbox.

3. RAG over your entire team’s work

Here’s where it gets interesting: All conversation history from every agent flows back through Helix. That means you can RAG over your team’s coding sessions. Every time someone’s agent solves a problem, that solution becomes searchable for everyone else. It’s like having your whole team’s problem-solving experience in a searchable database.

4. Spec coding by default

I’m a big believer in spec coding as the antidote to “vibe coding.” The idea is simple: Instead of giving your agent vague instructions like “add OAuth support,” you:

Have the agent analyze your codebase and generate a design document
Review the spec as a human (catch the stupid ideas before any code is written)
Only then implement the spec

We’re building spec workflows directly into the infrastructure, including a Kanban board for managing agent tasks. Not for teams of humans - for fleets of agents.

The Technical Details (For Those Who Care)

The gaming community already solved most of the hard problems here. There’s this project called Games on Whales (whales = Docker containers) that lets you run GPU-accelerated gaming in containers using Wayland.

We’re building on top of that foundation:

Wayland desktop: Only uses a few MB of GPU memory, so you can run dozens of these on a single GPU
Moonlight streaming: Battle-tested by gamers streaming over 5G networks
Container isolation: Each agent gets its own filesystem, preventing agents from stepping on each other’s toes
Zed for the IDE: Written entirely in Rust with a custom UI library that renders directly to the GPU. It’s fast. Like, actually fast - not “fast for an Electron app.”

The beauty is that these don’t need fancy GPUs like LLMs do. You can run this on an old laptop with Intel integrated graphics and it works fine. For a production deployment, you can fit ~100 of these instances on a single 16GB GPU.

What This Enables

When agents only need your attention twice an hour instead of constantly, you can have a fundamentally different interaction mode:

Ambient computing: Get a WhatsApp message from your agent when it needs input, respond with a voice note
Fleet management: See all your active agents working on different tasks, with visual thumbnails of what they’re doing
Long-running personal environments: Not just task-based agents, but your daily driver development environment that happens to run in the cloud with GPU acceleration

And here’s the part that gets me most excited: We can use this ourselves to make Helix better. The snake eating its own tail. Our development team using the product we’re building, using it to make itself better, faster and faster.

The Demo

In the video above, you can see:

Spinning up an agent with dedicated desktop environment
The Moonlight connection (complete with PIN for security)
Claude Code 4.5 building a to-do list app in real-time
Updating the branding mid-stream
Smooth, GPU-accelerated UI throughout

The agent has access to a full browser (Firefox), can run commands, and gets all the knowledge sources we configured in Helix.

Now Open for Private Beta

If you want early access:

Join our Discord community and request an invite to be among the first to experience the future of software development.
Join the Private Beta
Connect with me on LinkedIn - linkedin.com/in/luke-marsden-71b3789
Try Helix - Even without the agent sandboxes, Helix is a complete private GenAI stack you can run on your infrastructure. Check it out at helix.ml

We’re especially interested in feedback from teams that:

Run their own GPU infrastructure
Need to keep code and data on-prem
Want to manage fleets of agents working on multiple tasks
Are frustrated with current agent performance

The gaming community figured out how to stream Call of Duty to a phone over 5G. Turns out the same tech makes coding agents feel smooth and responsive. Who knew?

P.S. - If you’re wondering about the project name: My co-founder Phil called this a “massively abstracted distraction” when I first pitched it, hence MAD. We started by calling it the Helix Agentic Development Environment System - HADES. The god of the underworld is also the god of creating wealth from the earth, which feels appropriate for a bootstrapped company building infrastructure. But every time I tell people about it they say “isn’t that hell?” and I have to explain no, everyone goes to the underworld, but I feel like if you’re having that conversation then you’ve already lost, so we decided to be boring and call it Helix Code ;-)

HelixML

Discussion about this post