Publications

(2025). Bolt-on, Verifiable Provenance for LLM-Powered Data Processing. Under Review in VLDB 2026.

(2025). LLM-Powered Proactive Data Systems. IEEE Data Engineering Bulletin March 2025 issue.

PDF

(2025). TWIX: Automatically Reconstructing Structured Data from Templatized Documents. In SIGMOD 2026.

PDF

(2024). Towards Accurate and Efficient Document Analytics with Large Language Models. In ICDE, 2025.

PDF

(2024). SPADE: Synthesizing Data Quality Assertions for Large Language Model Pipelines. In VLDB 2024 (Industry).

PDF

(2024). PLAQUE: Automated Predicate Learning at Query Time. In SIGMOD, 2024.

PDF

(2023). ZIP: Lazy Imputation during Query Processing. In PVLDB, 2024.

PDF

(2023). Robust Occupancy Computation Based on WiFi Connectivity Events. In ASTRIDE@ICDE 2023, Winner of the first place in the ASTRIDE workshop competition.

PDF

(2021). T-cove: an exposure tracing system based on cleaning wi-fi events on organizational premises. In PVLDB 2021 (demo).

PDF Cite Code Poster Video DOI

(2020). Locater: Cleaning Wifi Connectivity Datasets for Semantic Localization. In PVLDB 2021.

PDF Cite Code Slides Video DOI

(2020). Efficient entity resolution on heterogeneous records. . In ICDE, 2020.

PDF Cite

(2019). Data source selection for information integration in big data era. In Information Sciences 2019.

PDF Cite

(2016). Efficient quality-driven source selection from massive data sources. In Journal of Systems and Software, 2016.

PDF Cite

(0001). .

Cite