Your AI.
Your Data.
Your Infrastructure.

The open-source RAG platform that puts you in control. Multi-tenant architecture. Provider-agnostic LLMs. Real-time streaming. Deploy in 5 minutes.

129 Tests PassingMulti-TenantProvider-AgnosticReal-Time Streaming
terminal
$ docker compose up -d
✓ postgres ready
✓ qdrant ready
✓ redis ready
✓ minirag-api ready on :8000

$ curl -X POST localhost:8000/v1/tenants \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -d '{"name":"my-org"}'
{"id":"t_9k3...","name":"my-org","status":"active"}

$ curl localhost:8000/v1/chat \
  -H "Authorization: Bearer $BOT_TOKEN" \
  -d '{"message":"How does ingestion work?"}'
{"answer":"Documents are chunked, embedded, and...","sources":[...]}
0Tests
Multi-Tenant
0-MinuteSetup
MIT Licensed

Everything you need for RAG

A complete platform for building, deploying, and managing retrieval-augmented generation chatbots.

Multi-Tenant Isolation

Complete data separation. tenant_id on every query, dedicated API tokens, and role-based access control.

RAG Pipeline

Ingest text, URLs, PDFs, and DOCX files. Auto-chunk, embed with any model, and vector-store in Qdrant.

Provider-Agnostic LLMs

OpenAI, Anthropic, Google, Ollama — switch providers per bot profile. Powered by LiteLLM.

Real-Time Streaming

Server-Sent Events with structured protocol: sources → content deltas → completion. Sub-second first token.

Embeddable Widget

One script tag. Shadow DOM isolation. CSS custom properties for theming. Any website, 30 seconds.

Admin Dashboard

Manage bot profiles, sources, chat history, and analytics. Built-in chat testing with streaming preview.

Webhooks & Events

Real-time notifications for source.ingested, source.failed, chat.message. HMAC-SHA256 signed payloads.

Auto-Refresh

Scheduled URL re-ingestion — hourly, daily, or weekly. Keep your knowledge base current automatically via ARQ cron.

Usage Analytics

Cost tracking per model, token usage breakdown, user feedback analytics, and CSV export.

How the pipeline works

Every query flows through a battle-tested retrieval-augmented generation pipeline. Embeddings, vector search, and LLM completion in one seamless request.

User
FastAPIAPI Gateway
EmbedLLM Provider
QdrantVector Store
LLMCompletion
SSEResponse
Backed by
PostgreSQL
Redis

User queries hit the FastAPI gateway, which embeds the question, performs a similarity search against Qdrant, retrieves the top-k chunks, and streams an LLM completion back via Server-Sent Events. PostgreSQL stores metadata, Redis handles rate limiting and caching.

Three steps to production

From zero to a fully functional RAG chatbot in minutes, not weeks.

01

Deploy

One command. Five minutes. You’re live. Clone, configure, docker compose up. PostgreSQL, Qdrant, Redis, and FastAPI all orchestrated. Bootstrap your first tenant with a single API call.

docker compose up -d
02

Ingest

Feed your knowledge. Text, URLs, PDFs — auto-chunked and embedded. Upload documents through the API or admin dashboard. MiniRAG chunks content intelligently, generates embeddings, and stores vectors in Qdrant. Set up auto-refresh for URLs that change.

POST /v1/sources
03

Chat

Ask anything. Get grounded answers with source citations. Your users ask questions. RAG retrieves relevant context. The LLM generates answers with source citations, streamed in real-time via SSE.

POST /v1/chat

Everything you need in one dashboard

Manage your entire RAG platform from a single interface. No CLI required.

Bot Profiles

Configure LLM provider, model, system prompt, and temperature per bot. Test conversations in real-time.

Source Management

Upload text, URLs, PDFs, DOCX. Monitor ingestion status, chunk counts, and embedding progress.

Chat History

Browse all conversations. View source citations, token usage, and user feedback per message.

Webhook Configuration

Set delivery URLs, select events, view delivery logs. HMAC-SHA256 signature verification built in.

Usage Analytics

Track costs per model, token consumption over time, and export detailed reports as CSV.

User & Role Management

Invite users, assign roles (admin/member), manage API tokens per tenant.

Embed anywhere in seconds

Drop one script tag into any website. Shadow DOM keeps your styles clean.

your-website.com
MiniRAG Assistant
How do I reset my password?
Navigate to Settings → Security → Reset Password. You'll receive a confirmation email.
Source: help-center.md (0.92)
HTML
<script
  src="https://your-host/dashboard/widget/minirag-widget.js"
  data-bot-id="YOUR_BOT_ID"
  data-api-url="https://your-host"
  data-api-token="YOUR_TOKEN">
</script>

Shadow DOM isolation means no CSS conflicts. Style it with CSS custom properties to match your brand.

Security at every layer

MiniRAG doesn't compromise on security. Four layers of protection for your data and credentials.

Argon2id

Password Hashing

Memory-hard algorithm that resists GPU and ASIC attacks. Industry-recommended for credential storage.

Fernet (AES-128-CBC)

Encryption at Rest

LLM API keys and sensitive credentials encrypted before database storage. Keys never stored in plaintext.

HMAC-SHA256

Webhook Signatures

Every webhook delivery is signed. Verify payload integrity and authenticity before processing events.

JWT (HS256)

Session Tokens

Stateless authentication with signed tokens. No server-side session storage required.

From zero to production in 5 minutes

Choose your deployment method. Every path gets you a fully functional RAG platform with multi-tenant isolation, vector search, and streaming chat.

git clone https://github.com/mrwind-up-bird/mini-chat-rag.git
cd mini-chat-rag
cp .env.example .env
docker compose up -d

Frequently Asked Questions

Everything you need to know about MiniRAG.

You own everything. MiniRAG runs on your infrastructure — your data never leaves your servers. No vendor lock-in, no per-query pricing, no usage limits. Full source code under MIT.

Yes. MiniRAG is battle-tested with 129 automated tests (pytest + Newman), async FastAPI for high concurrency, connection pooling, and proper error handling. It runs PostgreSQL, Qdrant, and Redis — all production-grade infrastructure.

Any provider compatible with the OpenAI API format via LiteLLM: OpenAI, Anthropic, Google Gemini, Ollama (local models), Azure OpenAI, and more. Switch providers per bot profile without code changes.

Add one <script> tag to any website. The widget loads in a Shadow DOM for complete style isolation — no CSS conflicts. Customize colors, position, and behavior with CSS custom properties and data attributes.

Bot profile management, document source ingestion, chat history with feedback tracking, webhook configuration, usage analytics with cost breakdowns, and user role management. All behind a glassmorphism UI with built-in chat testing.

Four layers: Argon2id for password hashing, Fernet (AES-128-CBC) for encrypting LLM API keys at rest, HMAC-SHA256 for signed webhook deliveries, and JWT (HS256) for stateless session tokens. Multi-tenant isolation ensures complete data separation.

Docker and Docker Compose. The stack includes PostgreSQL (structured data), Qdrant (vector storage), Redis (caching and task queues), and the FastAPI application. Minimum 2GB RAM recommended. All services are containerized.

Yes. Use the manual setup: create a Python virtualenv, install dependencies, run the supporting services with Docker Compose, and start the FastAPI server with hot-reload. Full development docs in the README.

Deploy your RAG chatbot today

Open-source. MIT licensed. Production-ready. Join developers building intelligent chatbots with MiniRAG.