From ClickOps to GitOps: The Evolution of AI App Development
How we bridge the gap between no-code AI prototypes and production-ready applications
Last week's post about democratizing AI engineering sparked some interesting discussion on Hacker News. Also, some people didn’t like the AI images on my other blog posts so for this I used a real photograph :P
Today, I want to dive deeper into a critical aspect of this transformation: the bridge between rapid prototyping and production-ready AI applications. This post is based on the latest conversation in the MLOps Community Podcast: Become an AI Engineer with Open Source.
The ChatGPT GPTs Revelation
When OpenAI launched GPTs, many of us (myself included) were skeptical. Yet another attempt at ChatGPT plugins, we thought. But something interesting started happening: businesses began using GPTs to solve real problems. At a recent conference, I met a film industry risk assessment team that had built a chain of GPTs to automate complex safety evaluations. They weren't AI engineers – they were domain experts who found a way to encode their knowledge into a useful AI tool.
This is where it gets interesting.
The ClickOps-to-GitOps Bridge
The problem with tools like ChatGPT's GPTs is that they're trapped in a web interface. Remember Jenkins? The DevOps community collectively shuddered at configuration through click-ops. We learned that lesson: production systems need to be declarative, version-controlled, and reproducible.
But here's the key insight: we don't have to choose between accessibility and production-readiness. We can have both.
The Three Layers of AI App Development
The Prototyper (Business/Product Layer)
Uses a web interface
Configures knowledge bases
Sets up API integrations
Tests basic functionality
The Bridge (YAML Export)
Exports the entire configuration as version-controlled YAML
Includes system prompts, knowledge configurations, and API specs
Preserves all the functionality of the prototype
The Production Engineer (DevOps Layer)
Adds automated tests (evals)
Sets up CI/CD pipelines
Manages deployments
Monitors performance
Real-World Example: The JIRA Integration
Let me share a recent experience building a JIRA integration. The initial requirement seemed simple: enable natural language queries for JIRA issues. The reality was more complex, involving:
API Chain Architecture
Classifier to determine if JIRA API is needed
Request builder to construct proper JQL queries
Response summarizer to present results naturally
Test-Driven Development for AI
Writing test cases in natural language
Using LLMs as judges for response quality
Iterating on prompts while maintaining test coverage
Here’s an example of an app that allows users to interact with a currency exchange rate API in natural language:
apiVersion: app.aispec.org/v1alpha1
kind: AIApp
metadata:
name: exchangerates
spec:
assistants:
- model: meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
type: text
# Tests for this assistant
tests:
- name: check usd to gbp rate
steps:
- prompt: what is the usd to gbp exchange rate?
expected_output: the usd to gbp exchange rate. it's ok if it includes additional information such as the rate being based on the latest data.
- name: usdgbp
steps:
- prompt: usdgbp
expected_output: the usd to gbp exchange rate only. it should specifically mention both usd and gbp and not other currencies. if it mentions other currencies, FAIL the test
apis:
- name: Exchange Rates API
description: Get latest currency exchange rates
url: https://open.er-api.com/v6
schema: |-
openapi: 3.0.0
info:
title: Exchange Rates API
[...]
The full example is here: https://github.com/helixml/genai-cicd-ref and there’s a complete walkthrough of deploying it to Kubernetes with Flux for GitOps in this video:
The Open Source Advantage
This approach becomes particularly powerful with open source models. Organizations can:
Keep sensitive data on-premises
Customize models for specific use cases
Avoid vendor lock-in
Meet regulatory requirements
Beyond Productivity Tools
While many early AI applications focus on internal productivity (like our JIRA or exchange rates example), this pattern works equally well for customer-facing features. One of our customers uses this exact architecture to provide natural language interfaces for heavy machinery rentals – turning complex equipment specifications into conversational interactions.
What's Next?
The AI engineering landscape is evolving rapidly, but some patterns are emerging:
Start with rapid prototyping in user-friendly interfaces
Export to version-controlled specifications
Apply traditional DevOps practices
Iterate based on automated testing and deployment
Ready to Build?
The tools and practices we've refined over decades of software engineering aren't obsolete in the AI era – they're more relevant than ever. Check out aispec.org to dive deeper into these patterns, or join us for a hands-on workshop:
Workshop: Testing & CI for GenAI
Monday, December 2 @ 12 PM ET / 9 AM PT
You'll learn rapid prototyping, testing strategies, and CI/CD integration for GenAI applications. Bring an API you'd like to integrate with or documents you want to use as an LLM knowledge base.
Space is limited. By registering, you consent to sharing your data with HelixML.
Thanks to Demetrios Brinkmann and the MLOps Community for the conversation that inspired this post. Join us at mlops.community to continue the discussion.