Building MiniRAG: A Journey Through Provider-Agnostic RAG Architecture
Building MiniRAG: A Journey Through Provider-Agnostic RAG Architecture
Building a Retrieval-Augmented Generation (RAG) platform from scratch is like assembling a complex puzzle where each piece needs to fit perfectly with the others. Over the past few weeks, I've been developing MiniRAG, a modular, provider-agnostic RAG platform designed with multi-tenancy at its core. Today, I want to share the journey so far—the wins, the challenges, and the architectural decisions that shaped this project.
The Vision: Why Another RAG Platform?
The RAG landscape is crowded with solutions, but most are either too opinionated about providers or lack proper multi-tenant support. MiniRAG aims to solve this by being:
- Provider-agnostic: Switch between OpenAI, Anthropic, local models, or any LiteLLM-supported provider
- Multi-tenant by design: Complete isolation between tenants from day one
- Modular architecture: Each component can be developed, tested, and deployed independently
- Production-ready: Built with proper authentication, error handling, and observability
The Foundation: Getting the Basics Right
Step 1: Project Architecture
Starting with a solid foundation is crucial. The project structure reflects a clear separation of concerns:
app/
├── core/ # Configuration, database, security
├── models/ # SQLModel database models
├── api/v1/ # FastAPI route handlers
├── services/ # Business logic layer
└── workers/ # Background job processing
The infrastructure stack runs on Docker Compose with:
- PostgreSQL 16 for relational data
- Qdrant 1.13 for vector storage
- Redis 7 for job queues and caching
- Separate containers for web and worker processes
One early decision was using SQLModel instead of pure SQLAlchemy or Pydantic models. This choice pays dividends by providing type safety across the entire stack while reducing boilerplate code.
Step 2: Authentication That Actually Works
Authentication in multi-tenant applications is notoriously tricky. I implemented a dual-dispatch system that supports both API tokens and JWT tokens:
class AuthContext:
tenant_id: str
user_id: str | None
token_id: str | None
async def get_auth_context(
authorization: HTTPAuthorizationCredentials = Depends(security)
) -> AuthContext:
# Dispatch between API token (SHA-256) and JWT
# Always return tenant context for downstream handlers
The beauty of this approach is that every API endpoint automatically has tenant context without additional configuration. Cross-tenant access becomes impossible by design.
Step 3: Core Domain Models
Building the core entities required careful consideration of how different components would interact:
BotProfile: Represents an AI assistant configuration with encrypted provider credentials. Using Fernet encryption ensures sensitive API keys are never stored in plaintext.
Source: Represents data sources (documents, URLs, databases) with flexible JSON configuration. The polymorphic design allows for different source types without schema migrations.
class BotProfile(SQLModel, table=True):
id: str = Field(default_factory=lambda: str(uuid4()), primary_key=True)
tenant_id: str = Field(foreign_key="tenant.id")
name: str
encrypted_credentials: str | None # Fernet encrypted
@property
def has_credentials(self) -> bool:
return self.encrypted_credentials is not None
Lessons Learned: The Pain Points
Every project has its "gotcha" moments. Here are the key ones that might save you time:
The Hatch Build Configuration Mystery
The Problem: Getting pip install -e ".[dev]" to work with hatchling as the build backend.
ValueError: Unable to determine which files to ship inside the wheel
The Solution: Explicitly configure the wheel target in pyproject.toml:
[tool.hatch.build.targets.wheel]
packages = ["app"]
This seems obvious in retrospect, but the error message wasn't particularly helpful for debugging.
FastAPI Authentication Status Codes
The Problem: Modern FastAPI's HTTPBearer returns 401 Unauthorized for missing auth headers, not 403 Forbidden as some older tutorials suggest.
The Lesson: Always test your assumptions about framework behavior, especially when following older documentation. The fix was simple:
# More flexible assertion for tests
assert resp.status_code in (401, 403)
The Async SQLAlchemy Greenlet Requirement
The Problem: Using async SQLAlchemy with aiosqlite without the greenlet library causes cryptic errors.
The Solution: Add explicit dependencies:
[tool.hatch.envs.default.dependencies]
greenlet = ">=3"
aiosqlite = ">=0.20"
This one cost me an hour of debugging—the error message mentioned greenlet, but I initially thought it was optional for SQLite.
Testing Strategy: Building Confidence
With 15 passing tests covering authentication flows, CRUD operations, and tenant isolation, the test suite serves as both documentation and regression protection. Key testing principles:
- In-memory SQLite for fast test execution
- Tenant isolation verification in every multi-tenant test
- Cross-tenant access prevention explicitly tested
def test_cross_tenant_source_access_forbidden(client, tenant_a_headers, tenant_b_source):
"""Ensure Tenant A cannot access Tenant B's sources"""
resp = client.get(f"/v1/sources/{tenant_b_source.id}", headers=tenant_a_headers)
assert resp.status_code == 404 # Not found, not forbidden (security)
What's Next: The Ingestion Pipeline
The next phase focuses on building the document ingestion pipeline—the heart of any RAG system:
- Document and Chunk models for storing processed content
- Text normalization and chunking with configurable strategies
- Embedding generation using LiteLLM for provider flexibility
- Vector storage integration with Qdrant
- Background processing using ARQ workers
The modular design means each component can be developed and tested independently, then composed into a complete ingestion pipeline.
Architectural Insights
Several patterns emerged during development that might be useful for similar projects:
Provider Abstraction: Rather than coupling to specific LLM or embedding providers, using LiteLLM as an abstraction layer provides flexibility without complexity.
Tenant-First Design: Making tenant isolation a core architectural principle (not an afterthought) simplifies security and data modeling decisions throughout the codebase.
Configuration as Code: Using Pydantic Settings for configuration provides type safety, environment variable injection, and clear documentation of required settings.
Closing Thoughts
Building MiniRAG has reinforced several important principles:
- Start with strong foundations: The time invested in proper project structure, authentication, and testing pays dividends as the codebase grows.
- Embrace constraints: Multi-tenancy and provider-agnostic design add complexity but force cleaner abstractions.
- Test everything: Especially in multi-tenant systems, explicit tests for isolation prevent costly security bugs.
The journey is far from over—the ingestion pipeline will bring new challenges around scalability, error handling, and data consistency. But with solid foundations in place, I'm confident the remaining pieces will fall into place smoothly.
Stay tuned for the next installment where we'll dive into vector embeddings, chunk optimization strategies, and building resilient background processing pipelines.
Want to follow along with the MiniRAG development? The code and detailed progress updates are available on the project repository.