Projects

BLIP
Large Language Models (LLMs) are powerful tools for processing data. However, LLMs are also complex black-boxes, returning answers to queries on data, without any indication for where the answer came from or whether it is trustworthy. We introduce the notion of provenance for data processing with LLMs. While existing heuristics (such as embedding similarity or directly asking an LLM) could provide some hints for where the answer was derived, they provide no guarantees that the answer can be derived using the identified provenance, and indeed, are often incorrect. Instead, we propose the notion of verifiable provenance wherein we identify a subset of the input text that reproduces the same (or equivalent) answer as that on the complete text, and introduce the notion of minimality, where the verifiable provenance is as small as possible. To identify such a provenance, a naive solution would require checking all possible subsets of the source data with the LLM, which is prohibitively expensive. We present BLIP, a bolt-on framework for efficiently inferring a small-sized verifiable provenance for any LLM-powered data processing task, with any LLM. As part of BLIP, we introduce eight strategies, each guaranteed to find a minimal verifiable provenance, as well as an adaptive strategy that combines their strengths to reduce cost further. We further extend BLIP to produce multiple minimal verifiable provenances. Experiments on five datasets show that the provenance generated by BLIP is always guaranteed to reproduce the answer—achieving over 30% higher accuracy than the best-performing baseline with a comparable provenance size. Moreover, BLIP incurs a low cost, comparable to the original query on the original data.
T-COVE
T-COVE is an exposure tracing and occupancy system based on cleaning wi-fi events on organizational premises. It first supports a real-time occupancy tracking application that displays real-time occupancy, i.e., the number of users, of locations of different granularities, such as building/floor/region. T-COVE has been deployed in over 30 buildings in UCI and BSU campuses and has been running since 2020. T-COVE will be planned to be installed in several other campuses and companies in the future. Another application supported in T-COVE is a passive exposure tracing system with potentially 100% adoption in campus area, that could be used effectively to track exposures as one of COVID-19 protection polycies in UCI. T-COVE is passive and off-the-shelf without the needs to install any new hardware or software while achieving a very usable accuracy, around 90%.