Your AI.
Your Data.
Your Infrastructure.
The open-source RAG platform that puts you in control. Multi-tenant architecture. Provider-agnostic LLMs. Real-time streaming. Deploy in 5 minutes.
$ docker compose up -d
✓ postgres ready
✓ qdrant ready
✓ redis ready
✓ minirag-api ready on :8000
$ curl -X POST localhost:8000/v1/tenants \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-d '{"name":"my-org"}'
{"id":"t_9k3...","name":"my-org","status":"active"}
$ curl localhost:8000/v1/chat \
-H "Authorization: Bearer $BOT_TOKEN" \
-d '{"message":"How does ingestion work?"}'
{"answer":"Documents are chunked, embedded, and...","sources":[...]}
Everything you need for RAG
A complete platform for building, deploying, and managing retrieval-augmented generation chatbots.
Multi-Tenant Isolation
Complete data separation. tenant_id on every query, dedicated API tokens, and role-based access control.
RAG Pipeline
Ingest text, URLs, PDFs, and DOCX files. Auto-chunk, embed with any model, and vector-store in Qdrant.
Provider-Agnostic LLMs
OpenAI, Anthropic, Google, Ollama — switch providers per bot profile. Powered by LiteLLM.
Real-Time Streaming
Server-Sent Events with structured protocol: sources → content deltas → completion. Sub-second first token.
Embeddable Widget
One script tag. Shadow DOM isolation. CSS custom properties for theming. Any website, 30 seconds.
Admin Dashboard
Manage bot profiles, sources, chat history, and analytics. Built-in chat testing with streaming preview.
Webhooks & Events
Real-time notifications for source.ingested, source.failed, chat.message. HMAC-SHA256 signed payloads.
Auto-Refresh
Scheduled URL re-ingestion — hourly, daily, or weekly. Keep your knowledge base current automatically via ARQ cron.
Usage Analytics
Cost tracking per model, token usage breakdown, user feedback analytics, and CSV export.
How the pipeline works
Every query flows through a battle-tested retrieval-augmented generation pipeline. Embeddings, vector search, and LLM completion in one seamless request.
User queries hit the FastAPI gateway, which embeds the question, performs a similarity search against Qdrant, retrieves the top-k chunks, and streams an LLM completion back via Server-Sent Events. PostgreSQL stores metadata, Redis handles rate limiting and caching.
Three steps to production
From zero to a fully functional RAG chatbot in minutes, not weeks.
Deploy
One command. Five minutes. You’re live. Clone, configure, docker compose up. PostgreSQL, Qdrant, Redis, and FastAPI all orchestrated. Bootstrap your first tenant with a single API call.
docker compose up -dIngest
Feed your knowledge. Text, URLs, PDFs — auto-chunked and embedded. Upload documents through the API or admin dashboard. MiniRAG chunks content intelligently, generates embeddings, and stores vectors in Qdrant. Set up auto-refresh for URLs that change.
POST /v1/sourcesChat
Ask anything. Get grounded answers with source citations. Your users ask questions. RAG retrieves relevant context. The LLM generates answers with source citations, streamed in real-time via SSE.
POST /v1/chatEverything you need in one dashboard
Manage your entire RAG platform from a single interface. No CLI required.
Bot Profiles
Configure LLM provider, model, system prompt, and temperature per bot. Test conversations in real-time.
Source Management
Upload text, URLs, PDFs, DOCX. Monitor ingestion status, chunk counts, and embedding progress.
Chat History
Browse all conversations. View source citations, token usage, and user feedback per message.
Webhook Configuration
Set delivery URLs, select events, view delivery logs. HMAC-SHA256 signature verification built in.
Usage Analytics
Track costs per model, token consumption over time, and export detailed reports as CSV.
User & Role Management
Invite users, assign roles (admin/member), manage API tokens per tenant.
Embed anywhere in seconds
Drop one script tag into any website. Shadow DOM keeps your styles clean.
<script
src="https://your-host/dashboard/widget/minirag-widget.js"
data-bot-id="YOUR_BOT_ID"
data-api-url="https://your-host"
data-api-token="YOUR_TOKEN">
</script>Shadow DOM isolation means no CSS conflicts. Style it with CSS custom properties to match your brand.
Security at every layer
MiniRAG doesn't compromise on security. Four layers of protection for your data and credentials.
Password Hashing
Memory-hard algorithm that resists GPU and ASIC attacks. Industry-recommended for credential storage.
Encryption at Rest
LLM API keys and sensitive credentials encrypted before database storage. Keys never stored in plaintext.
Webhook Signatures
Every webhook delivery is signed. Verify payload integrity and authenticity before processing events.
Session Tokens
Stateless authentication with signed tokens. No server-side session storage required.
From zero to production in 5 minutes
Choose your deployment method. Every path gets you a fully functional RAG platform with multi-tenant isolation, vector search, and streaming chat.
git clone https://github.com/mrwind-up-bird/mini-chat-rag.git
cd mini-chat-rag
cp .env.example .env
docker compose up -dFrequently Asked Questions
Everything you need to know about MiniRAG.
You own everything. MiniRAG runs on your infrastructure — your data never leaves your servers. No vendor lock-in, no per-query pricing, no usage limits. Full source code under MIT.
Yes. MiniRAG is battle-tested with 129 automated tests (pytest + Newman), async FastAPI for high concurrency, connection pooling, and proper error handling. It runs PostgreSQL, Qdrant, and Redis — all production-grade infrastructure.
Any provider compatible with the OpenAI API format via LiteLLM: OpenAI, Anthropic, Google Gemini, Ollama (local models), Azure OpenAI, and more. Switch providers per bot profile without code changes.
Add one <script> tag to any website. The widget loads in a Shadow DOM for complete style isolation — no CSS conflicts. Customize colors, position, and behavior with CSS custom properties and data attributes.
Bot profile management, document source ingestion, chat history with feedback tracking, webhook configuration, usage analytics with cost breakdowns, and user role management. All behind a glassmorphism UI with built-in chat testing.
Four layers: Argon2id for password hashing, Fernet (AES-128-CBC) for encrypting LLM API keys at rest, HMAC-SHA256 for signed webhook deliveries, and JWT (HS256) for stateless session tokens. Multi-tenant isolation ensures complete data separation.
Docker and Docker Compose. The stack includes PostgreSQL (structured data), Qdrant (vector storage), Redis (caching and task queues), and the FastAPI application. Minimum 2GB RAM recommended. All services are containerized.
Yes. Use the manual setup: create a Python virtualenv, install dependencies, run the supporting services with Docker Compose, and start the FastAPI server with hot-reload. Full development docs in the README.
Deploy your RAG chatbot today
Open-source. MIT licensed. Production-ready. Join developers building intelligent chatbots with MiniRAG.