Part 3: Stop Paying to Move Data You Already Have

Written by Ryan Bermel | Apr 8, 2026 10:52:56 PM

Last week, we launched a significant new version of Data Liberator packed with powerful new tools to access and capitalize on data from multiple sources. We're breaking it down into a 4-part series. Part 3 is the technical deep dive for those who want a look under the hood.

Part One covers how Data Liberator connects scattered, siloed data into a single, easy-to-access source — without moving any of it.

Part Two demonstrates how Data Liberator works with Claude for conversations, not just queries, making it easy for non-technical users to explore data, self-serve analytics, and opening the door for rapid prototyping to test hypotheses.

So let's dive in...

Cross-Dataset Joins and Intelligent Caching

The Billion-Row Challenge

Imagine you have:

100 different datasets
Each contains millions to billions of records
You need to find which datasets contain specific entities (products, customers, instruments)

Traditional approach? Query all 100 datasets. Hours of computing. Thousands in cloud costs.

Intelligent Dataset Discovery

When you have hundreds of datasets across multiple systems, finding the right data is half the battle. Liberator makes this simple through rich metadata and descriptions.

Self-Documenting Data

Every dataset in Liberator includes:

Business context: What this data represents, who owns it, and update frequency
Column descriptions: What each field means in your organization's terms
Data quality notes: Known issues, validation rules, business logic
Entitlements: Who can access what, enforced at query time

This metadata becomes immediately available to both humans exploring data and AI agents like Claude trying to understand what's available.

Search and Filter

Need customer purchase data? Search across all your datasets for keywords like "customer," "purchase," "transaction." Liberator searches through dataset names, descriptions, column names, and metadata to surface relevant data.

Cross-Dataset Joins Without ETL

Now the real magic: Liberator can join data across datasets without creating intermediate tables or running ETL.

Example Workflow

Join three datasets in a single query:

Customer demographics (from Salesforce)
Purchase transactions (from payments database)
Product inventory (from warehouse management files)

Traditional warehouse approach:

Design unified schema
ETL all three sources into that schema
Maintain consistency as sources change
Write complex SQL with foreign keys

Liberator approach:

Create one dataset containing all data
Query the combined dataset as you would any other dataset
Get fresh results from source systems

The data never moves. The schemas never change. The query just works.

Performance at Scale

How does Liberator handle billions of records efficiently?

Intelligent Caching

Liberator maintains query result caches so repeated queries don't hammer source systems. The cache is transparent—you don't configure it, you don't tune it, it just works.

Chunk-Based Processing

Large datasets are processed in chunks (typically 2 million rows). Each chunk gets a unique identifier for cache management. When data changes, only affected chunks are invalidated.

Query Optimization

Liberator intelligently optimizes queries by understanding data structure and access patterns. Repeated queries benefit from cached results, while smart filtering pushes computation to source systems where possible.

Real-World Applications

Manufacturing: Root Cause Analysis

"Show me all production runs where quality dropped below threshold, joined with maintenance records and sensor data for those time periods." Liberator finds the relevant datasets, joins them across systems, and you get your answer—without moving gigabytes of IoT data.

Healthcare: Patient Journey Analysis

"Find patients with diagnosis X who had lab results Y and Z within 30 days." Join clinical data, lab results, and claims data—all in different systems with different schemas—in a single query while maintaining HIPAA compliance.

Retail: Customer Behavior

"What products are frequently bought together across online, in-store, and mobile channels?" Liberator joins the transaction data from multiple systems, and you discover cross-channel patterns without consolidating point-of-sale data.

Energy: Grid Optimization

"Correlate outage events with weather data, grid load, and maintenance schedules." Join operational data, weather APIs, and maintenance databases without consolidating sensitive infrastructure data.

The Technical Foundation

Liberator runs on Kubernetes and deploys anywhere:

Multi-cloud: OCI, AWS, Azure
GitOps: Reproducible deployments
Container-native: Horizontal scaling, rolling updates
API-first: RESTful interface for all operations

Key components:

Liberator Core: Query engine with optimization
Admin API: Management for connections, datasets, entitlements
Liberator UI: Modern web console for administration
MCP Server: AI integration layer

Looking Forward: Liberator Cloud

We're working on Liberator Cloud—a hybrid architecture where the UI and management plane run in our cloud while data processing stays local to your environment. Think 'bring your own compute' for data virtualization.

This solves the common challenge: you want easy deployment and management, but your data can't leave your network due to security, compliance, or performance requirements. Liberator Cloud gives you both.

If your team is paying cloud costs to query data that could stay exactly where it is, we'd like to show you what that looks like in your environment. Contact us to schedule a technical deep dive.

View full post