Building Hierarchical Data Sources: From Flat Lists to Parent-Child Relationships
Building Hierarchical Data Sources: From Flat Lists to Parent-Child Relationships
Ever found yourself staring at a long, flat list of data sources and thinking "there has to be a better way to organize this"? That was exactly the problem we faced when users started creating dozens of related URLs and file uploads that cluttered our dashboard. Today, I'll walk you through how we transformed our flat source management system into a clean, hierarchical structure with parent-child relationships.
The Challenge: When Flat Lists Become Overwhelming
Our original system worked great for individual sources—each URL or file stood alone in a simple table. But as users began working with related content (think: multiple pages from the same website, or a collection of related documents), our dashboard became unwieldy. Users had no way to group related sources, and finding specific items in a list of 50+ sources became a real usability nightmare.
The solution? Implement a parent-child hierarchy where multi-URL ingestions and multi-file uploads could be grouped under a parent source, complete with expand/collapse functionality in the dashboard.
The Technical Journey
Step 1: Database Schema Evolution
The foundation of any hierarchical system is the self-referencing relationship. We added a simple but powerful parent_id field to our existing Source model:
class Source(Base):
# ... existing fields ...
parent_id: Mapped[UUID | None] = mapped_column(
UUID(as_uuid=True),
ForeignKey("sources.id"),
nullable=True,
index=True
)
This classic pattern creates a tree structure where each source can optionally reference another source as its parent. The nullable foreign key gives us the flexibility we need—top-level sources have parent_id = NULL, while child sources reference their parent.
Step 2: API Schema Design
The real magic happens in how we present this data. We extended our API schemas to support the new hierarchy:
class SourceRead(BaseModel):
# ... existing fields ...
parent_id: UUID | None = None
children_count: int = 0 # Computed field for parents
That children_count field is key—it lets the frontend know whether to show an expand/collapse chevron without making additional API calls.
Step 3: Smart API Endpoints
We introduced three new endpoints that make working with hierarchical sources intuitive:
POST /v1/sources/batch- Create multiple related sources under a parent in one requestGET /v1/sources/{id}/children- Fetch child sources for a given parentPOST /v1/sources/{id}/ingest-children- Trigger processing for all children of a parent
The batch endpoint was particularly satisfying to implement—instead of forcing users to make N+1 API calls (create parent, then create each child), they can now create an entire hierarchy in a single request.
Step 4: Frontend UX Magic
The dashboard transformation was where this feature really came to life. Parent rows now display:
- An expand/collapse chevron
- A badge showing "(N items)"
- An "Ingest All" button for processing all children at once
- Aggregated status information (if any child is processing, the parent shows as processing)
Child rows render with visual indentation when their parent is expanded, creating a clear visual hierarchy.
Smart Behavioral Changes
One of the most important decisions was how to handle the default list view. Rather than overwhelming users with every source (parents AND children), we made the default API behavior return only top-level sources (parent_id IS NULL). Users can then expand parents to see children, or use query parameters to customize the view.
This seemingly small change dramatically improved the user experience—a dashboard that previously showed 30 individual items now shows perhaps 8 parent groups, each expandable to reveal its children.
Lessons Learned
The Power of Backward Compatibility
Throughout this implementation, we maintained strict backward compatibility. Existing sources without a parent_id continue to work exactly as before—they simply appear as top-level items in the new hierarchy. This meant we could roll out the feature without any migration headaches for existing users.
Database Design Decisions Matter
Adding the parent_id index was crucial for performance. While it might seem obvious, hierarchical queries can become expensive quickly, especially when aggregating status information across parent-child relationships. The index ensures that queries like "find all children of parent X" remain fast even as the dataset grows.
Subtle Behavior Changes
One interesting side effect of the refactor was how we handle soft deletes. The new code filters is_active == True by default, which wasn't explicitly part of the hierarchy feature but emerged as a natural improvement. It's a reminder that feature development often surfaces opportunities for broader improvements.
The Results
After implementing this feature, our test suite grew from 65 to 77 tests, all passing. But more importantly, the user experience transformed from cluttered to clean. Users can now:
- Upload multiple files and have them automatically grouped
- Create multi-URL ingestion jobs that appear as a single expandable item
- Delete entire groups with cascade confirmation
- Process all related sources with a single "Ingest All" click
Looking Forward
The foundation is now in place for even richer organizational features. We're considering adding visual indicators for processing status that auto-refresh, and the hierarchical structure opens the door for features like bulk operations, source templates, and advanced filtering.
Sometimes the most impactful features are the ones that take something complex and make it feel simple. Converting a flat list into a thoughtful hierarchy is exactly that kind of feature—technically straightforward, but transformative for the user experience.
Have you implemented hierarchical data structures in your applications? I'd love to hear about your approach and any challenges you encountered. The self-referencing foreign key pattern is powerful, but the devil is always in the UX details.