The Open Source AI Revolution: How Regulation Is Reshaping Enterprise GenAI

Navigating the tension between regulatory risk and the existential threat from failing to adopt new technology

Luke Marsden

Sep 05, 2024

Local LLMs and fine-tuning?

When we launched Helix v0.1 in December 2023, we went into the market to test two hypotheses:

People will care about running LLMs locally
People will care about fine-tuning for knowledge

We launched on Dec 21 and within days, the market strongly validated hypothesis #1.

A German company, AWA Network, showed up on our Discord, and by Jan 1, had already deployed Helix locally with our Docker Compose stack and connected a GPU. Weeks later, we signed a support and licensing contract with them to build out their open source GenAI stack. Why?

Regulatory compliance: we’re on to something!

Because (and we didn’t know this beforehand), there’s European regulation that makes it “complicated” to be compliant if you use services provided by US companies. As Marten Schönherr, CEO of AWA Network said in our interview:

“For many European companies, OpenAI, Microsoft... it's not compliant. Right. It's US/American companies, even if they host in Europe. I mean you hand over your API credentials to those guys and that actually makes you pretty naked in the meaning of European regulations.”

So we have these two quite interesting opposing forces:

On one hand, every major company on the planet is looking at GenAI and the promised efficiency gains, and saying at the CEO level that if they don’t adopt this technology they will have their lunch eaten by the competition (they’re right, by the way).
On the other hand, if their employees start copy pasting private client data into ChatGPT, they’re in a world of pain from a regulatory, security, IP, PII and GDPR perspective. And even with enterprise versions, APIs and Azure OpenAI etc, folks still want more control, especially for use cases over sensitive data.

So, we found ourselves tapping into this growing pent-up demand from large European and global corporations who desperately want to benefit from GenAI while simultaneously having banned chatgpt.com at the firewall. What’s the solution?

Open Source AI, locally hosted

Back in December, we made a bet that open source LLMs in 2024 catch up with GPT-4. We were right, thanks in a large part to Zuck, and we got there in April with Llama3-70B. Cheers Mark, we couldn’t do this without you! With Llama3 and newer models, we now have highly capable GPT-4 level models you can run locally on consumer hardware.

So is that it? Enterprises just need to download the weights, grab a couple of 4090s on eBay, run llama.cpp and off they go?

Fortunately for us :-) it’s not that simple. It turns out, the gap between just doing inference with these models on a single GPU, and having a full production-ready, HA deployment which is integrated into your data sources and business systems is non-trivial.

And that’s where Helix comes in: with Secure Local GenAI.

What about fine-tuning?

Fine-tuning was our focus for a while. We pushed really hard and even made it not suck. But we kept hearing over and over again: “Why are you doing fine-tuning? We just want RAG. Everyone is doing RAG.” So, ok fine, I’m like, let’s add RAG to the product with pgvector and a bit of llamaindex code.

Well, RAG is like 10,000x faster than fine-tuning :-) But to make it work well you also have to do complicated stuff with chunking and re-ranking. So now we support both.

I believe fine-tuning will make a comeback when people get further down the line and want to optimize specific systems that use large general purpose LLMs (e.g 70B) and “distill” a fine-tuned LLM in 3B so they can scale to production traffic without much cost. We still support fine-tuning in the product, and we’ll get there…

Use cases! Use cases! APIs, oh my

But what we also heard from our growing customer base was the need to integrate with business systems. We started to see a clear trend that putting a natural language interface over an API, like a product catalog so you can create e.g. automated sales assistants that can sell your stuff via SMS in 70 different languages is compelling.

So we added API Tools (now part of Helix Apps), so that you can integrate your apps into APIs & business system.

Getting this working well with open models is where we have a significant advantage. It’s working really well now. I was talking last night to an automated sales assistant we’re developing that can speak Japanese and make API calls to a product catalog in German. It’s super powerful and it’s going to make the customer more money.

Gap between prototype and production quality

As Marten said so astutely in our recent interview, it’s pretty quick to get something running. But it’s really hard to know how long it will take to make it good enough to show to the customer.

Which leads to the question… how do you iterate on an LLM application?

Well, for starters, you need to version control it. You can’t iterate on an application, especially in an environment where multiple developers are working on it at once, without version controlling it! So, you need a format to capture all the prompts that go into it…

Then, well, how do you change any software? You want to know if a code change is going to make your software better or worse. You need tests!

You don’t ship software to production without tests. And you shouldn’t ship LLM applications to production without evals.

LLMOps? GitOps!? Evals!

OK so this is a theme that I keep coming back to in my career (last time around it was MLOps and data versioning), and something I’m really passionate about. You’ll find me waving my hands around on podcasts ranting about it: you gotta version and track all the things!

Open source models are getting better really fast, catching up now with OpenAI’s capabilities.
With platforms like Helix you can now deploy those models yourself, locally on your own infrastructure; integrate them with your APIs, plug them into RAG, do image gen etc.
We’re pushing this yaml format called AISpec (in discussion to become a Linux Foundation project): so as a Kubernetes/DevOps person you ought to be able to have this situation where anyone in the business can prototype an app by clicking and pointing, by dragging documents into a RAG store etc…
But under the hood, these applications that people are building should be version controlled yaml in git. That is just the right way to do it. LLMOps should be GitOps-powered. That should allow both the DevOps people in the organization to (a) deploy the stack to begin with (b) productionize the application once it’s been prototyped by people in the business.
It should also allow you to create these evals loops where you’re able to ensure the quality of your LLM applications.
You can build evals loops on top of Helix, and because everything is version controlled and you’ve committed every version of the prompts and every version of the system, you can actually compare the quality between one version and another or you can have a Pull Request that says “changing prompting to fix edge case X” and you can run the evals against the PR just like you’d run tests on an incoming PR and get a result that shows whether your change helped or if it regressed something else.
And now you can apply software best practices to deploying and managing fully internally hosted LLM applications.

And that’s what Helix is all about!

2025 will be the year that GenAI gets boring

Yes, there’s a trough of disillusionment coming for GenAI. But through the trough of disillusionment comes the plateau of productivity. GenAI is just mathematical models, and models don’t generalize beyond their training data. This stuff is not going to take over the world. The capabilities will plateau. Nevertheless, super-human scale knowledge processing capabilities will change the business world for good. Yet, many VC backed GenAI companies will fail to hit the growth that they need to appease their investors because, like in the .com bubble, dial-up internet wasn’t sufficient to enable a lot of use cases.

But we plan to be cockroaches. To survive the downturn by connecting with real business use cases where we can make customers more money. As a bootstrapped business, we only need a few more customers to be able to continue indefinitely. So the strategy is: build good relationships with customers, outlast the competition, create value and build trust, and then be brilliantly positioned for the sunlit uplands of the plateau of productivity. Like the shift from dial-up to broadband, we’ll stop talking about GenAI and it will just become “just how it’s done”. And because you care about security and that means you want to run it locally/in-VPC, you’ll do it with Helix.

Try it today

Read the product blog and take a look at the website
Book a demo with Chris, who is awesome, by emailing founders@helix.ml
Kick the tires on the SaaS, then install it yourself on any Linux/macOS/WSL2 machine with:
- curl -sL -O https://get.helix.ml/install.sh && bash install.sh
- Read the docs for more
If you are in San Francisco on Sep 12, join us at Making Open Source and Local LLMs Work in Practice x MLOps Community!

Cheers,
Luke

HelixML

Discussion about this post