Building Cost Analytics for a RAG Dashboard: From Feature Request to Production
Building Cost Analytics for a RAG Dashboard: From Feature Request to Production
Last week, I tackled an interesting challenge: extending our RAG (Retrieval-Augmented Generation) system's dashboard with comprehensive cost analytics and embed functionality. What started as a simple "can we track spending by model?" request turned into a full-featured analytics suite with real-time cost tracking, projections, and embeddable widgets.
Here's the story of how it came together, the roadblocks I hit, and what I learned along the way.
The Mission: Making AI Costs Visible
Our dashboard was already tracking basic usage metrics, but users were flying blind when it came to costs. With 14 different models from OpenAI, Anthropic, and Google, each with different pricing structures, it was impossible to understand spending patterns or predict monthly bills.
The requirements were clear:
- Cost breakdown by model and bot
- Monthly projections based on recent usage
- Visual analytics with charts and summary cards
- Embeddable widgets for external integrations
Architecture Decisions: API-First Design
I decided to build this as a proper API-first feature with three new endpoints:
# Usage grouped by bot with cost calculations
GET /v1/stats/usage/by-bot
# Usage grouped by model with pricing info
GET /v1/stats/usage/by-model
# Cost estimates and projections
GET /v1/stats/cost-estimate?days=30
The key architectural choice was creating a centralized pricing map in the API layer:
MODEL_PRICING = {
"gpt-4": {"input": 0.03, "output": 0.06},
"gpt-3.5-turbo": {"input": 0.001, "output": 0.002},
"claude-3-opus": {"input": 0.015, "output": 0.075},
# ... 11 more models
}
This pricing data drives both the backend cost calculations and frontend projections, ensuring consistency across the entire analytics pipeline.
Frontend: Progressive Enhancement
Rather than rebuilding the entire usage page, I enhanced it progressively:
- Four summary cards showing total costs, daily averages, and projections
- Interactive doughnut chart breaking down costs by model
- Tabbed interface with three views: By Model, By Bot, and Daily usage
The frontend loads all three API endpoints in parallel and renders everything client-side:
// Parallel API loading for better performance
const [usageByBot, usageByModel, costEstimate] = await Promise.all([
api.getUsageByBot(),
api.getUsageByModel(),
api.getCostEstimate(30)
]);
The Embed Feature: Developer-Friendly Integration
Alongside the analytics, I added an embed system that generates three types of integration snippets:
- Script Tag: Traditional
<script>inclusion - Custom Element: Web Components approach with
<minirag-bot> - Styling: CSS customization options
Each bot card now has an "Embed" button that opens a tabbed interface with copy-paste ready code snippets.
Lessons Learned: API Response Shapes Matter
The biggest time sink wasn't the complex cost calculations—it was dealing with inconsistent API response structures in the test suite.
Challenge 1: Nested Response Data
I kept hitting KeyError exceptions in tests because I assumed user and tenant IDs were at the root level of API responses. Turns out, they were nested:
// Wrong assumption
const userId = response.id;
// Actual structure
const userId = response.user.id;
Challenge 2: Missing Required Fields
Creating test data seemed straightforward until I hit database constraints:
# This failed with NOT NULL constraint
chat = Chat(bot_profile_id=bot.id, message="test")
# Needed the user_id from the auth endpoint
user_id = api_client.get("/v1/auth/me").json()["user"]["id"]
chat = Chat(bot_profile_id=bot.id, user_id=user_id, message="test")
The lesson: Always check your API documentation (or write it first). Inconsistent response shapes between endpoints create friction that compounds over time.
Testing Strategy: Exact Math Validation
For cost analytics, I couldn't rely on fuzzy testing. Every penny needs to be accurate. My test strategy focused on exact mathematical validation:
def test_cost_calculation_precision():
# Create usage with known token counts
create_usage(tokens_input=1000, tokens_output=500, model="gpt-4")
# Verify exact cost: (1000 * 0.03 + 500 * 0.06) / 1000 = $0.06
response = client.get("/v1/stats/cost-estimate")
assert response.json()["total_cost"] == 0.06
This approach caught several rounding errors and pricing lookup bugs that would have been expensive to discover in production.
Production Deployment: Seamless Rollout
The entire feature shipped in three incremental deployments:
8dbfd61- Basic embed buttond35d782- Full embed tab interface92cddc6- Complete cost analytics suite
Each deploy was backwards compatible, allowing for safe rollouts with immediate rollback capability if needed.
What's Next: Scaling the Analytics
With 85 passing tests and the feature live in production, there are already requests for enhancements:
- Date range filtering (last 7/30/90 days)
- Cost alerts and budgeting
- Moving pricing data to database for easier updates
The foundation is solid, and extending it will be much easier than this initial build.
Key Takeaways
- API-first design made both development and testing cleaner
- Parallel data loading significantly improved dashboard performance
- Exact mathematical testing is crucial for financial features
- Incremental deployment reduces risk and improves team confidence
- Response shape consistency across endpoints saves debugging time
Building cost analytics taught me that the hardest part of financial features isn't the math—it's the data pipeline integrity and user trust. When users see dollar amounts, every calculation needs to be bulletproof.
The dashboard now gives our users complete visibility into their AI spending, and the embed feature is already being used by several customers to integrate our bots into their own applications. Sometimes the best features are the ones that make the invisible visible.