Query Processing

LLM-Powered Proactive Data Systems

Sepanta Zeighami, Yiming Lin, Shreya Shankar, Aditya Parameswaran

TWIX: Automatically Reconstructing Structured Data from Templatized Documents

Yiming Lin, Mawil Hasan, Rohan Kosalge, Alvin Cheung, Aditya G. Parameswaran

Towards Accurate and Efficient Document Analytics with Large Language Models

Yiming Lin, Madelon Hulsebos, Ruiying Ma, Shreya Shankar, Sepanta Zeighami, Aditya G. Parameswaran, Eugene Wu

PLAQUE: Automated Predicate Learning at Query Time

Yiming Lin, Sharad Mehrotra

ZIP: Lazy Imputation during Query Processing

Yiming Lin, Sharad Mehrotra

EnrichDB is a system designed to support just-in-time data enrichment during query processing. EnrichDB is motivated by applications that consume (potentially large volumes of) raw data that must first be interpreted using expensive machine learning / signal processing functions prior to being queried/used in analysis. Executing such enrichment during data ingestion (to support real-time analytics) is challenging to scale specially when dataset can be very large and/or when data arrives at a high velocity. EnrichDB addresses this challenge by supporting enrichment at all phases of data processing including intermixing enrichment with query processing. It exploits query context to steer enrichment in ways such that the query results can be computed progressively. EnrichDB is implemented using a layered approach on top of PostgreSQL, though it can easily be layered on other databases.

EnrichDB