Building MiniRAG: A Multi-Tenant RAG Platform from Scratch - Part 1

Building a production-ready RAG (Retrieval-Augmented Generation) platform is no small feat. After seeing countless projects struggle with vendor lock-in and scalability issues, I decided to build MiniRAG - a modular, provider-agnostic RAG platform designed for multi-tenancy from day one.

This is the first post in a 5-part series documenting the entire build process. Today, we're laying the foundation.

The Vision: Why Another RAG Platform?

The RAG landscape is fragmented. Most solutions lock you into specific providers (OpenAI, Anthropic, etc.) or vector databases. MiniRAG aims to solve this by providing:

Provider agnostic: Swap LLM providers without changing your code
Multi-tenant: One deployment, multiple isolated customers
Modular architecture: Pick and choose components
Production ready: Built with scalability and security in mind

Step 1: Building the Foundation

Project Structure That Scales

Rather than throwing everything in a single file, I opted for a structure that can grow:

app/
├── api/v1/          # API routes (versioned)
├── core/            # Configuration, database, security
├── models/          # SQLAlchemy models + Pydantic schemas
├── services/        # Business logic layer
└── workers/         # Background task processing

This separation ensures we can scale different components independently and maintain clear boundaries between concerns.

The Tech Stack Decision

For the infrastructure backbone, I chose:

FastAPI: For its excellent async support and automatic OpenAPI generation
PostgreSQL 16: Reliable, ACID-compliant data storage
Qdrant: Open-source vector database for embeddings
Redis: Caching and task queue management
Docker: Containerized deployment with health checks

Here's the Docker Compose setup that brings it all together:

services:
  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: minirag
      POSTGRES_USER: minirag
      POSTGRES_PASSWORD: minirag_dev
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U minirag"]
      interval: 10s
      timeout: 5s
      retries: 5

  qdrant:
    image: qdrant/qdrant:v1.13.0
    ports:
      - "6333:6333"
    volumes:
      - qdrant_data:/qdrant/storage

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 5

Security-First Database Design

Multi-tenancy requires bulletproof data isolation. Every model extends a TimestampMixin and includes proper tenant relationships:

class Tenant(TimestampMixin, Base):
    __tablename__ = "tenants"
    
    id: Mapped[UUID] = mapped_column(primary_key=True, default=new_uuid)
    name: Mapped[str] = mapped_column(String(255), nullable=False)
    # Additional tenant metadata...

class User(TimestampMixin, Base):
    __tablename__ = "users"
    
    id: Mapped[UUID] = mapped_column(primary_key=True, default=new_uuid)
    tenant_id: Mapped[UUID] = mapped_column(ForeignKey("tenants.id"), nullable=False)
    email: Mapped[str] = mapped_column(String(255), unique=True, nullable=False)
    # Relationship ensures proper joins
    tenant: Mapped["Tenant"] = relationship("Tenant")

API Token Security Strategy

Instead of storing raw API tokens, we hash them with SHA-256 and store only a prefix for display purposes:

class ApiToken(TimestampMixin, Base):
    __tablename__ = "api_tokens"
    
    id: Mapped[UUID] = mapped_column(primary_key=True, default=new_uuid)
    tenant_id: Mapped[UUID] = mapped_column(ForeignKey("tenants.id"), nullable=False)
    name: Mapped[str] = mapped_column(String(255), nullable=False)
    token_hash: Mapped[str] = mapped_column(String(64), unique=True, nullable=False)
    prefix: Mapped[str] = mapped_column(String(8), nullable=False)  # For display only

This approach means even if someone gains database access, they can't steal working API tokens.

Lessons Learned: The Packaging Gotcha

Not everything went smoothly. I hit a frustrating issue with Python packaging that cost me an hour of debugging:

$ pip install -e ".[dev]"
ValueError: Unable to determine which files to ship inside the wheel

The problem? When your project name differs from your package directory name, Hatchling (the build backend) gets confused. The fix was adding this to pyproject.toml:

[tool.hatch.build.targets.wheel]
packages = ["app"]

This explicitly tells Hatchling which directory contains our Python package. A small detail, but it would have saved me significant debugging time!

Current Status and What's Next

We now have a solid foundation:

✅ Complete project structure with proper separation of concerns
✅ Multi-service Docker environment with health checks
✅ Security infrastructure with hashed tokens and encrypted fields
✅ Database models designed for multi-tenancy
✅ Working FastAPI application with basic health endpoint

In Part 2, we'll tackle authentication middleware and tenant management. We'll build:

Bearer token validation dependencies
Middleware that resolves API tokens to tenant context
Tenant registration and API token creation endpoints
End-to-end authentication testing

The goal is to have a fully working multi-tenant authentication system that automatically isolates data by tenant.

Following Along

You can follow this build series as I document each step. We're building something production-ready, not a toy demo, so expect real-world challenges and solutions.

The complete code will be available once we reach a stable milestone. Until then, these posts serve as both documentation and a learning resource for anyone building similar systems.

This is Part 1 of a 5-part series on building MiniRAG. Next up: implementing robust multi-tenant authentication and middleware.