From Development to Production: Dockerizing a RAG Application with Zero Downtime

Last week, I wrapped up an intense development session focused on taking MiniRAG—a Retrieval-Augmented Generation application—from a local development setup to production-ready infrastructure. What started as "let's just containerize this thing" turned into a comprehensive exploration of modern deployment patterns, complete with some valuable lessons learned along the way.

The Challenge: Bridging the Dev-Prod Gap

Like many developers, I had a perfectly functional RAG application running locally with PostgreSQL, Qdrant vector database, Redis for caching, and a FastAPI backend. The classic "works on my machine" scenario. But getting from uvicorn --reload to a robust, scalable production deployment? That's where things get interesting.

The goal was ambitious but clear:

Zero-downtime deployments with Docker Compose
Automatic HTTPS with Caddy reverse proxy
Automated CI/CD via GitHub Actions
Production hardening with proper resource limits and security headers
Backup strategy for both SQL and vector data

The Technical Journey

Containerization Strategy

The first major decision was choosing the right containerization approach. Rather than going full Kubernetes (overkill for a single-server deployment), I opted for Docker Compose with production overrides—a sweet spot that provides container benefits without operational complexity.

# docker-compose.prod.yml (excerpt)
services:
  caddy:
    image: caddy:2-alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile
      - caddy_data:/data
      - caddy_config:/config
    
  web:
    ports: !override []  # Remove dev port exposure
    deploy:
      resources:
        limits:
          memory: 512M

The !override YAML tag became a crucial discovery—it allows production configurations to completely replace development settings rather than merging them. Without it, your production containers might still expose development ports, creating potential security vulnerabilities.

Reverse Proxy with Caddy

Choosing Caddy over nginx was a game-changer for this deployment. The automatic HTTPS certificate management alone saved hours of Let's Encrypt configuration:

{$DOMAIN:localhost} {
    reverse_proxy web:8000
    
    header {
        Strict-Transport-Security max-age=31536000
        X-Content-Type-Options nosniff
        X-Frame-Options DENY
    }
}

This simple configuration handles SSL certificates, security headers, and load balancing. When DOMAIN is set to a real domain name, Caddy automatically provisions certificates. When unset, it falls back to HTTP for local testing.

CI/CD Pipeline Design

The GitHub Actions workflow focuses on efficiency and safety:

# .github/workflows/deploy.yml (simplified)
on:
  push:
    branches: [main]
    paths: 
      - 'app/**'
      - 'dashboard/**' 
      - 'Dockerfile'
      - 'docker-compose*.yml'

jobs:
  deploy:
    steps:
      - name: Build and push images
        # Build web and worker containers
        
      - name: Deploy to VPS
        uses: appleboy/ssh-action@v1.0.3
        with:
          script: |
            cd /opt/minirag
            docker compose pull
            docker compose up -d --remove-orphans

The path filtering ensures deployments only trigger when actual application code changes, not on documentation updates or configuration tweaks.

Lessons Learned: The Real-World Friction

The Docker Compose Override Gotcha

The biggest technical hurdle was understanding Docker Compose's merge behavior. By default, when you use multiple compose files, arrays (like ports and volumes) get merged together. This means your production deployment might accidentally expose development ports:

# Without !override
ports:
  - "5432:5432"  # Dev exposure
  - "80:80"      # Prod exposure (both active!)

# With !override  
ports: !override
  - "80:80"      # Only prod exposure

This seemingly minor detail has major security implications for production deployments.

Environment Variable Security

During the development session, I made a critical mistake—accidentally exposing real API keys in the session logs. This became an immediate security priority, requiring immediate key rotation for both OpenAI and Anthropic services.

The lesson: Never use production secrets in development sessions. Always use placeholder values and maintain separate credential management for different environments.

Resource Limiting Strategy

Setting appropriate memory limits required some educated guessing based on component roles:

PostgreSQL: 512MB (handles structured data, indexes)
Qdrant: 1GB (vector operations are memory-intensive)
Redis: 128MB (lightweight caching layer)
FastAPI services: 512MB each (moderate Python memory footprint)

These limits prevent any single service from consuming all available server resources while providing room for normal operation spikes.

The Deployment Architecture

The final architecture elegantly separates concerns:

Internet → Caddy (80/443) → FastAPI (8000) → PostgreSQL/Qdrant/Redis
                ↓
        Automatic HTTPS + Security Headers

Supporting infrastructure includes:

Automated backups via cron (pg_dump + Qdrant snapshots)
Log rotation with Docker's json-file driver
UFW firewall restricting access to SSH, HTTP, HTTPS
Dedicated system user for security isolation

What's Next: Production Readiness

The infrastructure is code-complete with 77 passing tests, but several operational steps remain:

Server provisioning on Hetzner VPS (Ubuntu 24.04, CX22 instance)
DNS configuration pointing domain to server IP
Initial deployment and SSL certificate provisioning
CI/CD activation with GitHub secrets configuration
Backup automation via cron scheduling

Key Takeaways for Your Next Deployment

Docker Compose overrides are powerful but require understanding merge vs. replace behavior
Caddy's automatic HTTPS eliminates certificate management complexity
Resource limits are essential for multi-service deployments
Security scanning of configuration files prevents credential leaks
Path-filtered CI/CD reduces unnecessary deployments and server load

The journey from development to production doesn't have to be overwhelming. With the right tooling choices and careful attention to security details, you can build deployment infrastructure that's both robust and maintainable.

Have you tackled similar containerization challenges? I'd love to hear about your deployment strategies and lessons learned in the comments below.

The complete infrastructure code and deployment scripts are available in the MiniRAG repository. All sensitive credentials mentioned in this post have been rotated and are no longer valid.