← Back to 2026-02-10
deploymentdockerdevopsragci-cdproductionhetznercaddy

Deploying MiniRAG to Production: A Journey Through Docker, DNS, and Deployment Gotchas

Oli·

Deploying MiniRAG to Production: A Journey Through Docker, DNS, and Deployment Gotchas

Last night, I successfully deployed MiniRAG—a Retrieval-Augmented Generation (RAG) application—to production on a Hetzner VPS. What started as a straightforward deployment quickly became a masterclass in the subtle differences between development and production environments. Here's the story of how https://mini-rag.de came to life, complete with the bumps, bruises, and lessons learned along the way.

The Goal: Full Production Infrastructure

The mission was ambitious but clear: deploy a complete RAG application stack with:

  • Docker Compose orchestration for 6 services (web app, background worker, PostgreSQL, Redis, Qdrant vector database, and Caddy reverse proxy)
  • Automated HTTPS with Let's Encrypt certificates
  • CI/CD pipeline for seamless deployments
  • Custom domain with proper DNS configuration
  • Production-ready configuration with proper secrets and networking

By the end of the evening, all systems were operational at https://mini-rag.de with a 17-second deployment pipeline. But getting there required solving several production-specific puzzles.

Lessons Learned: When Development Meets Reality

1. Docker Health Checks in Minimal Images

The Problem: Qdrant containers kept showing as "unhealthy" despite running perfectly.

My initial approach seemed reasonable:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:6333/health"]

Then I tried wget. Both failed with "command not found." The issue? The official Qdrant Docker image is intentionally minimal—no curl, no wget, just the essentials.

The Solution: Bash to the rescue with built-in TCP connectivity testing:

healthcheck:
  test: ["CMD", "bash", "-c", "</dev/tcp/localhost/6333"]

This elegant solution uses bash's built-in /dev/tcp pseudo-device to test TCP connectivity without requiring external tools. A reminder that sometimes the simplest solutions are the most robust.

2. Container Networking: Localhost vs. Service Names

The Problem: ConnectionRefusedError when services tried to communicate.

My development .env file worked perfectly locally:

DATABASE_URL=postgresql://user:pass@localhost:5432/minirag
REDIS_URL=redis://localhost:6379
QDRANT_URL=http://localhost:6333

In production Docker Compose, these connections failed completely.

The Solution: Docker's internal networking requires service names, not localhost:

DATABASE_URL=postgresql://user:pass@postgres:5432/minirag
REDIS_URL=redis://redis:6379
QDRANT_URL=http://qdrant:6333

This is Docker networking 101, but it's easy to forget when your development environment uses port forwarding to localhost.

3. DNS and Certificate Mysteries

The Problem: HTTPS requests were failing with certificate mismatches.

After setting up the domain and waiting for DNS propagation, https://mini-rag.de was serving a *.your-server.de certificate instead of my Let's Encrypt certificate. The culprit? Stale DNS records and aggressive caching.

The Solution: A multi-step DNS cleanup:

  1. Remove stale AAAA records pointing to old IPv6 addresses
  2. Clear Chrome's DNS cache via chrome://net-internals/#dns
  3. Flush system DNS with sudo dscacheutil -flushcache
  4. Wait for DNS propagation (the hardest part!)

DNS issues are notorious for making you question everything else about your setup. When HTTPS doesn't work, always check DNS first.

4. API Field Name Mismatches

The Problem: Tenant creation was failing with cryptic 422 errors.

I was sending what seemed like the right data:

{
  "name": "Production Tenant",
  "slug": "minirag-prod",
  "admin_email": "admin@example.com",
  "admin_password": "secure_password"
}

The Solution: Check the actual API contract. The FastAPI backend expected:

{
  "tenant_name": "Production Tenant",
  "tenant_slug": "minirag-prod", 
  "owner_email": "admin@example.com",
  "owner_password": "secure_password"
}

Field names matter! Always verify API contracts when moving between environments, especially when error messages aren't descriptive.

The Winning Architecture

The final production setup includes:

Docker Compose Stack

services:
  caddy:          # Reverse proxy + automatic HTTPS
  web:           # FastAPI application
  worker:        # Background task processor
  postgres:      # Primary database
  redis:         # Cache and task queue
  qdrant:        # Vector database for embeddings

CI/CD Pipeline

The GitHub Actions workflow is beautifully simple:

  1. Trigger: Push to main branch
  2. Deploy: SSH to VPS, pull latest code, rebuild containers
  3. Duration: ~17 seconds from push to live

Custom Domain with Auto-HTTPS

Caddy handles the complexity of Let's Encrypt certificates and renewals:

mini-rag.de {
    reverse_proxy web:8000
}

mini-rag.de {
    redir / https://landingpage-mini-rag.vercel.app/ permanent
}

What's Next?

With the core infrastructure humming, the immediate roadmap includes:

  • Adding LLM API keys to enable the actual RAG functionality
  • Setting up automated backups with cron jobs
  • Implementing monitoring with UptimeRobot
  • Connecting the dashboard to the marketing landing page

Key Takeaways

  1. Minimal Docker images require creative health check solutions
  2. Container networking behaves differently than local development
  3. DNS issues can masquerade as application problems
  4. API contracts must be verified when moving between environments
  5. Production deployment is 10% copying files and 90% debugging environment differences

The journey from localhost to production is never just about moving code—it's about understanding how systems behave in the real world. Every failed connection and certificate error teaches you something new about the infrastructure stack.

MiniRAG is now live at https://mini-rag.de, ready to help users build their own RAG applications. The production environment is stable, the deployment pipeline is smooth, and the lessons learned will make the next deployment even better.

Have you faced similar deployment challenges? What production gotchas have caught you off guard? Share your stories in the comments below.