Building MiniRAG: A Journey Through Provider-Agnostic RAG Architecture

Building a Retrieval-Augmented Generation (RAG) platform from scratch is like assembling a complex puzzle where each piece needs to fit perfectly with the others. Over the past few weeks, I've been developing MiniRAG, a modular, provider-agnostic RAG platform designed with multi-tenancy at its core. Today, I want to share the journey so far—the wins, the challenges, and the architectural decisions that shaped this project.

The Vision: Why Another RAG Platform?

The RAG landscape is crowded with solutions, but most are either too opinionated about providers or lack proper multi-tenant support. MiniRAG aims to solve this by being:

Provider-agnostic: Switch between OpenAI, Anthropic, local models, or any LiteLLM-supported provider
Multi-tenant by design: Complete isolation between tenants from day one
Modular architecture: Each component can be developed, tested, and deployed independently
Production-ready: Built with proper authentication, error handling, and observability

The Foundation: Getting the Basics Right

Step 1: Project Architecture

Starting with a solid foundation is crucial. The project structure reflects a clear separation of concerns:

app/
├── core/           # Configuration, database, security
├── models/         # SQLModel database models
├── api/v1/         # FastAPI route handlers
├── services/       # Business logic layer
└── workers/        # Background job processing

The infrastructure stack runs on Docker Compose with:

PostgreSQL 16 for relational data
Qdrant 1.13 for vector storage
Redis 7 for job queues and caching
Separate containers for web and worker processes

One early decision was using SQLModel instead of pure SQLAlchemy or Pydantic models. This choice pays dividends by providing type safety across the entire stack while reducing boilerplate code.

Step 2: Authentication That Actually Works

Authentication in multi-tenant applications is notoriously tricky. I implemented a dual-dispatch system that supports both API tokens and JWT tokens:

class AuthContext:
    tenant_id: str
    user_id: str | None
    token_id: str | None
    
async def get_auth_context(
    authorization: HTTPAuthorizationCredentials = Depends(security)
) -> AuthContext:
    # Dispatch between API token (SHA-256) and JWT
    # Always return tenant context for downstream handlers

The beauty of this approach is that every API endpoint automatically has tenant context without additional configuration. Cross-tenant access becomes impossible by design.

Step 3: Core Domain Models

Building the core entities required careful consideration of how different components would interact:

BotProfile: Represents an AI assistant configuration with encrypted provider credentials. Using Fernet encryption ensures sensitive API keys are never stored in plaintext.

Source: Represents data sources (documents, URLs, databases) with flexible JSON configuration. The polymorphic design allows for different source types without schema migrations.

class BotProfile(SQLModel, table=True):
    id: str = Field(default_factory=lambda: str(uuid4()), primary_key=True)
    tenant_id: str = Field(foreign_key="tenant.id")
    name: str
    encrypted_credentials: str | None  # Fernet encrypted
    
    @property
    def has_credentials(self) -> bool:
        return self.encrypted_credentials is not None

Lessons Learned: The Pain Points

Every project has its "gotcha" moments. Here are the key ones that might save you time:

The Hatch Build Configuration Mystery

The Problem: Getting pip install -e ".[dev]" to work with hatchling as the build backend.

ValueError: Unable to determine which files to ship inside the wheel

The Solution: Explicitly configure the wheel target in pyproject.toml:

[tool.hatch.build.targets.wheel]
packages = ["app"]

This seems obvious in retrospect, but the error message wasn't particularly helpful for debugging.

FastAPI Authentication Status Codes

The Problem: Modern FastAPI's HTTPBearer returns 401 Unauthorized for missing auth headers, not 403 Forbidden as some older tutorials suggest.

The Lesson: Always test your assumptions about framework behavior, especially when following older documentation. The fix was simple:

# More flexible assertion for tests
assert resp.status_code in (401, 403)

The Async SQLAlchemy Greenlet Requirement

The Problem: Using async SQLAlchemy with aiosqlite without the greenlet library causes cryptic errors.

The Solution: Add explicit dependencies:

[tool.hatch.envs.default.dependencies]
greenlet = ">=3"
aiosqlite = ">=0.20"

This one cost me an hour of debugging—the error message mentioned greenlet, but I initially thought it was optional for SQLite.

Testing Strategy: Building Confidence

With 15 passing tests covering authentication flows, CRUD operations, and tenant isolation, the test suite serves as both documentation and regression protection. Key testing principles:

In-memory SQLite for fast test execution
Tenant isolation verification in every multi-tenant test
Cross-tenant access prevention explicitly tested

def test_cross_tenant_source_access_forbidden(client, tenant_a_headers, tenant_b_source):
    """Ensure Tenant A cannot access Tenant B's sources"""
    resp = client.get(f"/v1/sources/{tenant_b_source.id}", headers=tenant_a_headers)
    assert resp.status_code == 404  # Not found, not forbidden (security)

What's Next: The Ingestion Pipeline

The next phase focuses on building the document ingestion pipeline—the heart of any RAG system:

Document and Chunk models for storing processed content
Text normalization and chunking with configurable strategies
Embedding generation using LiteLLM for provider flexibility
Vector storage integration with Qdrant
Background processing using ARQ workers

The modular design means each component can be developed and tested independently, then composed into a complete ingestion pipeline.

Architectural Insights

Several patterns emerged during development that might be useful for similar projects:

Provider Abstraction: Rather than coupling to specific LLM or embedding providers, using LiteLLM as an abstraction layer provides flexibility without complexity.

Tenant-First Design: Making tenant isolation a core architectural principle (not an afterthought) simplifies security and data modeling decisions throughout the codebase.

Configuration as Code: Using Pydantic Settings for configuration provides type safety, environment variable injection, and clear documentation of required settings.

Closing Thoughts

Building MiniRAG has reinforced several important principles:

Start with strong foundations: The time invested in proper project structure, authentication, and testing pays dividends as the codebase grows.
Embrace constraints: Multi-tenancy and provider-agnostic design add complexity but force cleaner abstractions.
Test everything: Especially in multi-tenant systems, explicit tests for isolation prevent costly security bugs.

The journey is far from over—the ingestion pipeline will bring new challenges around scalability, error handling, and data consistency. But with solid foundations in place, I'm confident the remaining pieces will fall into place smoothly.

Stay tuned for the next installment where we'll dive into vector embeddings, chunk optimization strategies, and building resilient background processing pipelines.

Want to follow along with the MiniRAG development? The code and detailed progress updates are available on the project repository.