From Development to Production: Dockerizing a RAG Application with Zero Downtime
From Development to Production: Dockerizing a RAG Application with Zero Downtime
Last week, I wrapped up an intense development session focused on taking MiniRAG—a Retrieval-Augmented Generation application—from a local development setup to production-ready infrastructure. What started as "let's just containerize this thing" turned into a comprehensive exploration of modern deployment patterns, complete with some valuable lessons learned along the way.
The Challenge: Bridging the Dev-Prod Gap
Like many developers, I had a perfectly functional RAG application running locally with PostgreSQL, Qdrant vector database, Redis for caching, and a FastAPI backend. The classic "works on my machine" scenario. But getting from uvicorn --reload to a robust, scalable production deployment? That's where things get interesting.
The goal was ambitious but clear:
- Zero-downtime deployments with Docker Compose
- Automatic HTTPS with Caddy reverse proxy
- Automated CI/CD via GitHub Actions
- Production hardening with proper resource limits and security headers
- Backup strategy for both SQL and vector data
The Technical Journey
Containerization Strategy
The first major decision was choosing the right containerization approach. Rather than going full Kubernetes (overkill for a single-server deployment), I opted for Docker Compose with production overrides—a sweet spot that provides container benefits without operational complexity.
# docker-compose.prod.yml (excerpt)
services:
caddy:
image: caddy:2-alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile
- caddy_data:/data
- caddy_config:/config
web:
ports: !override [] # Remove dev port exposure
deploy:
resources:
limits:
memory: 512M
The !override YAML tag became a crucial discovery—it allows production configurations to completely replace development settings rather than merging them. Without it, your production containers might still expose development ports, creating potential security vulnerabilities.
Reverse Proxy with Caddy
Choosing Caddy over nginx was a game-changer for this deployment. The automatic HTTPS certificate management alone saved hours of Let's Encrypt configuration:
{$DOMAIN:localhost} {
reverse_proxy web:8000
header {
Strict-Transport-Security max-age=31536000
X-Content-Type-Options nosniff
X-Frame-Options DENY
}
}
This simple configuration handles SSL certificates, security headers, and load balancing. When DOMAIN is set to a real domain name, Caddy automatically provisions certificates. When unset, it falls back to HTTP for local testing.
CI/CD Pipeline Design
The GitHub Actions workflow focuses on efficiency and safety:
# .github/workflows/deploy.yml (simplified)
on:
push:
branches: [main]
paths:
- 'app/**'
- 'dashboard/**'
- 'Dockerfile'
- 'docker-compose*.yml'
jobs:
deploy:
steps:
- name: Build and push images
# Build web and worker containers
- name: Deploy to VPS
uses: appleboy/ssh-action@v1.0.3
with:
script: |
cd /opt/minirag
docker compose pull
docker compose up -d --remove-orphans
The path filtering ensures deployments only trigger when actual application code changes, not on documentation updates or configuration tweaks.
Lessons Learned: The Real-World Friction
The Docker Compose Override Gotcha
The biggest technical hurdle was understanding Docker Compose's merge behavior. By default, when you use multiple compose files, arrays (like ports and volumes) get merged together. This means your production deployment might accidentally expose development ports:
# Without !override
ports:
- "5432:5432" # Dev exposure
- "80:80" # Prod exposure (both active!)
# With !override
ports: !override
- "80:80" # Only prod exposure
This seemingly minor detail has major security implications for production deployments.
Environment Variable Security
During the development session, I made a critical mistake—accidentally exposing real API keys in the session logs. This became an immediate security priority, requiring immediate key rotation for both OpenAI and Anthropic services.
The lesson: Never use production secrets in development sessions. Always use placeholder values and maintain separate credential management for different environments.
Resource Limiting Strategy
Setting appropriate memory limits required some educated guessing based on component roles:
- PostgreSQL: 512MB (handles structured data, indexes)
- Qdrant: 1GB (vector operations are memory-intensive)
- Redis: 128MB (lightweight caching layer)
- FastAPI services: 512MB each (moderate Python memory footprint)
These limits prevent any single service from consuming all available server resources while providing room for normal operation spikes.
The Deployment Architecture
The final architecture elegantly separates concerns:
Internet → Caddy (80/443) → FastAPI (8000) → PostgreSQL/Qdrant/Redis
↓
Automatic HTTPS + Security Headers
Supporting infrastructure includes:
- Automated backups via cron (
pg_dump+ Qdrant snapshots) - Log rotation with Docker's json-file driver
- UFW firewall restricting access to SSH, HTTP, HTTPS
- Dedicated system user for security isolation
What's Next: Production Readiness
The infrastructure is code-complete with 77 passing tests, but several operational steps remain:
- Server provisioning on Hetzner VPS (Ubuntu 24.04, CX22 instance)
- DNS configuration pointing domain to server IP
- Initial deployment and SSL certificate provisioning
- CI/CD activation with GitHub secrets configuration
- Backup automation via cron scheduling
Key Takeaways for Your Next Deployment
- Docker Compose overrides are powerful but require understanding merge vs. replace behavior
- Caddy's automatic HTTPS eliminates certificate management complexity
- Resource limits are essential for multi-service deployments
- Security scanning of configuration files prevents credential leaks
- Path-filtered CI/CD reduces unnecessary deployments and server load
The journey from development to production doesn't have to be overwhelming. With the right tooling choices and careful attention to security details, you can build deployment infrastructure that's both robust and maintainable.
Have you tackled similar containerization challenges? I'd love to hear about your deployment strategies and lessons learned in the comments below.
The complete infrastructure code and deployment scripts are available in the MiniRAG repository. All sensitive credentials mentioned in this post have been rotated and are no longer valid.