Last week, we launched a significant new version of Data Liberator packed with powerful new tools to access and capitalize on data from multiple sources. We're breaking it down into a 4-part series. Part 3 is the technical deep dive for those who want a look under the hood.
Part One covers how Data Liberator connects scattered, siloed data into a single, easy-to-access source — without moving any of it.
Part Two demonstrates how Data Liberator works with Claude for conversations, not just queries, making it easy for non-technical users to explore data, self-serve analytics, and opening the door for rapid prototyping to test hypotheses.
So let's dive in...
Cross-Dataset Joins and Intelligent Caching
Imagine you have:
Traditional approach? Query all 100 datasets. Hours of computing. Thousands in cloud costs.
When you have hundreds of datasets across multiple systems, finding the right data is half the battle. Liberator makes this simple through rich metadata and descriptions.
Every dataset in Liberator includes:
This metadata becomes immediately available to both humans exploring data and AI agents like Claude trying to understand what's available.
Need customer purchase data? Search across all your datasets for keywords like "customer," "purchase," "transaction." Liberator searches through dataset names, descriptions, column names, and metadata to surface relevant data.
Now the real magic: Liberator can join data across datasets without creating intermediate tables or running ETL.
Join three datasets in a single query:
Traditional warehouse approach:
Liberator approach:
The data never moves. The schemas never change. The query just works.
How does Liberator handle billions of records efficiently?
Liberator maintains query result caches so repeated queries don't hammer source systems. The cache is transparent—you don't configure it, you don't tune it, it just works.
Large datasets are processed in chunks (typically 2 million rows). Each chunk gets a unique identifier for cache management. When data changes, only affected chunks are invalidated.
Liberator intelligently optimizes queries by understanding data structure and access patterns. Repeated queries benefit from cached results, while smart filtering pushes computation to source systems where possible.
"Show me all production runs where quality dropped below threshold, joined with maintenance records and sensor data for those time periods." Liberator finds the relevant datasets, joins them across systems, and you get your answer—without moving gigabytes of IoT data.
"Find patients with diagnosis X who had lab results Y and Z within 30 days." Join clinical data, lab results, and claims data—all in different systems with different schemas—in a single query while maintaining HIPAA compliance.
"What products are frequently bought together across online, in-store, and mobile channels?" Liberator joins the transaction data from multiple systems, and you discover cross-channel patterns without consolidating point-of-sale data.
"Correlate outage events with weather data, grid load, and maintenance schedules." Join operational data, weather APIs, and maintenance databases without consolidating sensitive infrastructure data.
Liberator runs on Kubernetes and deploys anywhere:
Key components:
We're working on Liberator Cloud—a hybrid architecture where the UI and management plane run in our cloud while data processing stays local to your environment. Think 'bring your own compute' for data virtualization.
This solves the common challenge: you want easy deployment and management, but your data can't leave your network due to security, compliance, or performance requirements. Liberator Cloud gives you both.
If your team is paying cloud costs to query data that could stay exactly where it is, we'd like to show you what that looks like in your environment. Contact us to schedule a technical deep dive.