Yiming is currently a PhD student from ISG group, University of California, Irvine from 2017, under the supervision of Prof. Sharad Mehrotra. Before that, he earned master and bachelor degrees of computer science in Harbin Institute of Technology. His research area mainly focuses on data management, specifically in data cleaning, data preparation, query processing and query optimization.

Download my resumé.

  • Data Cleaning
  • Data Preparation
  • Query Processing
  • Query Optimization


Research Intern in Microsoft Research
Jun 2022 – Sep 2022 Seattle
I worked with Yeye He during internship at MSR. We developed an Auto Business Intelligence (BI) system that helps end-users by accurately predicting BI models given a set of input tables, i.e., to discover join columns accurately. We propose a principled graph-based optimization problem that considers both local join prediction and global schema-graph structures, which achieves over 90% F1-score on real-world and TPC benchmarks.
Applied Scientist Intern
Jun 2021 – Sep 2021 Seattle
I worked with Dmitri, Kalashnikov and Vidit, Bansal during the internship in Amazon. I was working on data cleaning project during Amazon internship. Specifically, this work tries to resolve super dirty clusters produced by ER algorithms, which contain multiple errors, incorrect/missing/incomplete/copied values. Our proposed algorithm SCC improves the old method used in Amazon by around 61% precision (from 34.1% to 95.5%) and by around 52% F-1 score (from 42.4% to 94.7%).
Research Assistant
University of California, Irvine
Sep 2017 – Present California
I worked on several projects with the focus of research areas in data cleaning, query processing and building efficient online data processing systems.


T-COVE is an exposure tracing and occupancy system based on cleaning wi-fi events on organizational premises. It first supports a real-time occupancy tracking application that displays real-time occupancy, i.e., the number of users, of locations of different granularities, such as building/floor/region. T-COVE has been deployed in over 30 buildings in UCI and BSU campuses and has been running since 2020. T-COVE will be planned to be installed in several other campuses and companies in the future. Another application supported in T-COVE is a passive exposure tracing system with potentially 100% adoption in campus area, that could be used effectively to track exposures as one of COVID-19 protection polycies in UCI. T-COVE is passive and off-the-shelf without the needs to install any new hardware or software while achieving a very usable accuracy, around 90%.


(2022). ZIP: Lazy Imputation during Query Processing. Under review in PVLDB 2023.

PDF Cite

(2021). T-cove: an exposure tracing system based on cleaning wi-fi events on organizational premises. In PVLDB 2021.

PDF Cite Code Poster Video DOI

(2020). Locater: Cleaning Wifi Connectivity Datasets for Semantic Localization. In PVLDB 2021.

PDF Cite Code Slides Video DOI

(2020). Efficient entity resolution on heterogeneous records. (Extended Abstract). In ICDE, 2020.

PDF Cite

(2019). Demo Abstract: SemIoTic: Bridging the Semantic Gap in IoT Spaces. In BuildSys 2019.

PDF Cite

(2019). Efficient quality-driven source selection from massive data sources. In TKDE, 2019.

PDF Cite

(2019). Data source selection for information integration in big data era. In Information Sciences 2019.

PDF Cite

(2016). Efficient quality-driven source selection from massive data sources. In Journal of Systems and Software, 2016.

PDF Cite


Hi From California!