Building Hierarchical Data Sources: From Flat Lists to Parent-Child Relationships

Ever found yourself staring at a long, flat list of data sources and thinking "there has to be a better way to organize this"? That was exactly the problem we faced when users started creating dozens of related URLs and file uploads that cluttered our dashboard. Today, I'll walk you through how we transformed our flat source management system into a clean, hierarchical structure with parent-child relationships.

The Challenge: When Flat Lists Become Overwhelming

Our original system worked great for individual sources—each URL or file stood alone in a simple table. But as users began working with related content (think: multiple pages from the same website, or a collection of related documents), our dashboard became unwieldy. Users had no way to group related sources, and finding specific items in a list of 50+ sources became a real usability nightmare.

The solution? Implement a parent-child hierarchy where multi-URL ingestions and multi-file uploads could be grouped under a parent source, complete with expand/collapse functionality in the dashboard.

The Technical Journey

Step 1: Database Schema Evolution

The foundation of any hierarchical system is the self-referencing relationship. We added a simple but powerful parent_id field to our existing Source model:

class Source(Base):
    # ... existing fields ...
    parent_id: Mapped[UUID | None] = mapped_column(
        UUID(as_uuid=True), 
        ForeignKey("sources.id"), 
        nullable=True, 
        index=True
    )

This classic pattern creates a tree structure where each source can optionally reference another source as its parent. The nullable foreign key gives us the flexibility we need—top-level sources have parent_id = NULL, while child sources reference their parent.

Step 2: API Schema Design

The real magic happens in how we present this data. We extended our API schemas to support the new hierarchy:

class SourceRead(BaseModel):
    # ... existing fields ...
    parent_id: UUID | None = None
    children_count: int = 0  # Computed field for parents

That children_count field is key—it lets the frontend know whether to show an expand/collapse chevron without making additional API calls.

Step 3: Smart API Endpoints

We introduced three new endpoints that make working with hierarchical sources intuitive:

POST /v1/sources/batch - Create multiple related sources under a parent in one request
GET /v1/sources/{id}/children - Fetch child sources for a given parent
POST /v1/sources/{id}/ingest-children - Trigger processing for all children of a parent

The batch endpoint was particularly satisfying to implement—instead of forcing users to make N+1 API calls (create parent, then create each child), they can now create an entire hierarchy in a single request.

Step 4: Frontend UX Magic

The dashboard transformation was where this feature really came to life. Parent rows now display:

An expand/collapse chevron
A badge showing "(N items)"
An "Ingest All" button for processing all children at once
Aggregated status information (if any child is processing, the parent shows as processing)

Child rows render with visual indentation when their parent is expanded, creating a clear visual hierarchy.

Smart Behavioral Changes

One of the most important decisions was how to handle the default list view. Rather than overwhelming users with every source (parents AND children), we made the default API behavior return only top-level sources (parent_id IS NULL). Users can then expand parents to see children, or use query parameters to customize the view.

This seemingly small change dramatically improved the user experience—a dashboard that previously showed 30 individual items now shows perhaps 8 parent groups, each expandable to reveal its children.

Lessons Learned

The Power of Backward Compatibility

Throughout this implementation, we maintained strict backward compatibility. Existing sources without a parent_id continue to work exactly as before—they simply appear as top-level items in the new hierarchy. This meant we could roll out the feature without any migration headaches for existing users.

Database Design Decisions Matter

Adding the parent_id index was crucial for performance. While it might seem obvious, hierarchical queries can become expensive quickly, especially when aggregating status information across parent-child relationships. The index ensures that queries like "find all children of parent X" remain fast even as the dataset grows.

Subtle Behavior Changes

One interesting side effect of the refactor was how we handle soft deletes. The new code filters is_active == True by default, which wasn't explicitly part of the hierarchy feature but emerged as a natural improvement. It's a reminder that feature development often surfaces opportunities for broader improvements.

The Results

After implementing this feature, our test suite grew from 65 to 77 tests, all passing. But more importantly, the user experience transformed from cluttered to clean. Users can now:

Upload multiple files and have them automatically grouped
Create multi-URL ingestion jobs that appear as a single expandable item
Delete entire groups with cascade confirmation
Process all related sources with a single "Ingest All" click

Looking Forward

The foundation is now in place for even richer organizational features. We're considering adding visual indicators for processing status that auto-refresh, and the hierarchical structure opens the door for features like bulk operations, source templates, and advanced filtering.

Sometimes the most impactful features are the ones that take something complex and make it feel simple. Converting a flat list into a thoughtful hierarchy is exactly that kind of feature—technically straightforward, but transformative for the user experience.

Have you implemented hierarchical data structures in your applications? I'd love to hear about your approach and any challenges you encountered. The self-referencing foreign key pattern is powerful, but the devil is always in the UX details.