`helix apply -f` lets you deploy yaml-defined GenAI apps with RAG and APIs on local open models
In this podcast recording, we show a sneak peek of Helix 1.0 along with the first time we publicly show `helix apply -f` - iykyk (k8s nerds)
Thanks to Viktor for hosting me on his podcast! Demo starts at 50m into the video. This was a bit terrifying to record because 2am the previous night everything was totally broken after a major refactor (so that we could add external LLM support as well as local GPUs). But pressure can be a useful force :-D
We start with a stack deployed on my laptop without a GPU, pointing to together.ai so we can run open source LLMs easily without having to have access to a GPU. We show simple inference through the ChatGPT-like web interface (with users, sessions etc) and then simple drag'n'drop RAG.
Then we show some helix apps defined as yaml: Marvin the Paranoid Android (just a system prompt on top of llama3:8b), an HR app that interacts with an API, and a surprise API integration I'd done that morning with the podcast host's own OpenAPI spec for their app Screenly.
Finally, we deploy it for real on a DigitalOcean droplet for the controlplane - see https://docs.helix.ml/helix/getting-started/architecture/ - and a $0.35/h A40 on runpod.io. Armed with a real GPU, we can do image inference and fine-tuning as well as all the things described above!
We’ll be launching Helix 1.0 properly on September 4, which I’ll post about when it happens. But if you want to kick the tires on the new features described here ahead of that, they’re all in the latest 0.10.8 release already on GitHub.
As ever, any questions or comments come find us in Discord please :-)
Cheers,
Luke