From ClickOps to GitOps: The Evolution of AI App Development

How we bridge the gap between no-code AI prototypes and production-ready applications

Nov 21, 2024

Last week's post about democratizing AI engineering sparked some interesting discussion on Hacker News. Also, some people didn’t like the AI images on my other blog posts so for this I used a real photograph :P

Today, I want to dive deeper into a critical aspect of this transformation: the bridge between rapid prototyping and production-ready AI applications. This post is based on the latest conversation in the MLOps Community Podcast: Become an AI Engineer with Open Source.

The ChatGPT GPTs Revelation

When OpenAI launched GPTs, many of us (myself included) were skeptical. Yet another attempt at ChatGPT plugins, we thought. But something interesting started happening: businesses began using GPTs to solve real problems. At a recent conference, I met a film industry risk assessment team that had built a chain of GPTs to automate complex safety evaluations. They weren't AI engineers – they were domain experts who found a way to encode their knowledge into a useful AI tool.

This is where it gets interesting.

The ClickOps-to-GitOps Bridge

The problem with tools like ChatGPT's GPTs is that they're trapped in a web interface. Remember Jenkins? The DevOps community collectively shuddered at configuration through click-ops. We learned that lesson: production systems need to be declarative, version-controlled, and reproducible.

But here's the key insight: we don't have to choose between accessibility and production-readiness. We can have both.

The Three Layers of AI App Development

The Prototyper (Business/Product Layer)
- Uses a web interface
- Configures knowledge bases
- Sets up API integrations
- Tests basic functionality
The Bridge (YAML Export)
- Exports the entire configuration as version-controlled YAML
- Includes system prompts, knowledge configurations, and API specs
- Preserves all the functionality of the prototype
The Production Engineer (DevOps Layer)
- Adds automated tests (evals)
- Sets up CI/CD pipelines
- Manages deployments
- Monitors performance

Real-World Example: The JIRA Integration

Let me share a recent experience building a JIRA integration. The initial requirement seemed simple: enable natural language queries for JIRA issues. The reality was more complex, involving:

API Chain Architecture
- Classifier to determine if JIRA API is needed
- Request builder to construct proper JQL queries
- Response summarizer to present results naturally
Test-Driven Development for AI
- Writing test cases in natural language
- Using LLMs as judges for response quality
- Iterating on prompts while maintaining test coverage

Here’s an example of an app that allows users to interact with a currency exchange rate API in natural language:

apiVersion: app.aispec.org/v1alpha1
kind: AIApp
metadata:
  name: exchangerates
spec:
  assistants:
    - model: meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
      type: text
      # Tests for this assistant
      tests:
      - name: check usd to gbp rate
        steps:
          - prompt: what is the usd to gbp exchange rate?
            expected_output: the usd to gbp exchange rate. it's ok if it includes additional information such as the rate being based on the latest data.
      - name: usdgbp
        steps:
          - prompt: usdgbp
            expected_output: the usd to gbp exchange rate only. it should specifically mention both usd and gbp and not other currencies. if it mentions other currencies, FAIL the test
      apis:
        - name: Exchange Rates API
          description: Get latest currency exchange rates
          url: https://open.er-api.com/v6
          schema: |-
            openapi: 3.0.0
            info:
              title: Exchange Rates API
[...]

The full example is here: https://github.com/helixml/genai-cicd-ref and there’s a complete walkthrough of deploying it to Kubernetes with Flux for GitOps in this video:

The Open Source Advantage

This approach becomes particularly powerful with open source models. Organizations can:

Keep sensitive data on-premises
Customize models for specific use cases
Avoid vendor lock-in
Meet regulatory requirements

Beyond Productivity Tools

While many early AI applications focus on internal productivity (like our JIRA or exchange rates example), this pattern works equally well for customer-facing features. One of our customers uses this exact architecture to provide natural language interfaces for heavy machinery rentals – turning complex equipment specifications into conversational interactions.

What's Next?

The AI engineering landscape is evolving rapidly, but some patterns are emerging:

Start with rapid prototyping in user-friendly interfaces
Export to version-controlled specifications
Apply traditional DevOps practices
Iterate based on automated testing and deployment

Ready to Build?

The tools and practices we've refined over decades of software engineering aren't obsolete in the AI era – they're more relevant than ever. Check out aispec.org to dive deeper into these patterns, or join us for a hands-on workshop:

Workshop: Testing & CI for GenAI

Monday, December 2 @ 12 PM ET / 9 AM PT

You'll learn rapid prototyping, testing strategies, and CI/CD integration for GenAI applications. Bring an API you'd like to integrate with or documents you want to use as an LLM knowledge base.

Register for Workshop

Space is limited. By registering, you consent to sharing your data with HelixML.

Thanks to Demetrios Brinkmann and the MLOps Community for the conversation that inspired this post. Join us at mlops.community to continue the discussion.

HelixML