Actions and Detail Panel
DSC/e Lecture Angela Bonifati: Big Data Integration: from scalability to ac...
Mon, June 12, 2017, 12:30 PM – 1:30 PM CEST
DSC/e Lecture Series
Prof. Angela Bonifati, currently at University Claude Bernard Lyon 1.
Big Data Integration: from scalability to accessibility
Data integration has been a long standing research challenge in the database and information systems community. The problem of data integration arises when data collections arising in separate contexts must be brought together for value and knowledge creation. As a simple example, performing analytics over an online professional social network and local municipal databases to identify important skilled professionals in a given city requires the integration of these separate private and public data sources. Such bridging of data collections lies at the heart of many data science applications, and hence data integration is an extremely common and fundamental challenge in data science.
Currently available integration solutions, however, are only at the disposal of highly trained expert users and cannot be readily exploited in the presence of voluminous data. A key process in data integration is data exchange, which is the problem of translating data structured under a source schema into data structured under a target schema, according to a set of source-to-target constraints known as schema mappings. In this talk, we will present our recent contributions to the following problems: (i) scalable and controllable execution of schema mappings; (ii) interactive schema mapping specification for non-expert users. The above contributions led us (i) to investigate the problem of scalable data exchange in the presence of target functional dependencies by allowing a piecemeal and efficient execution of the underlying chase algorithm; (ii) to design a specification of schema mappings tailored for ordinary users (such as domain data scientists), by leveraging simple user feedback along the process. We believe that both directions need to be pursued in order to make the integration of massive data efficient, controllable and accessible by ordinary users. We conclude by pinpointing a number of interesting open problems in big data integration, which are both recurrent and impactful in data science.
Angela Bonifati is a Full Professor in Computer Science (since 2011), currently at University Claude Bernard Lyon 1. She received a Ph.D. degree in Computer Science from Politecnico di Milano in 2002. After graduating she worked at the INRIA research institute in Paris and returned to Italy in 2003, joining the Italian National Research Council as a research scientist. Her research focuses on advanced database applications such as data integration, structured and semi-structured information, web databases, focusing on both theoretical and practical aspects. Angela served as the Area PC Vice Chair of ICDE 2011 (Semi-structured Data Track) and ICDE 2018 (Information Extraction and Data Cleaning and Curation Track), and Program Chair of international workshops such as WebDB 2013, the VLDB PhD workshop 2013, WIDM 2005 and WIDM 2006. She is associate editor of the journal Distributed and Parallel Databases, edited by Springer. She has co-authored more than 80 articles in international journals and conferences, and recently contributed to a book on ‘Schema Matching and Mapping’ with Z. Bellahsene and E. Rahm.
Date and time: Monday, June 12, 12:30 – 13.30
Location: TU/e, Filmzaal de Zwarte Doos
12.30-13.30 Lecture by Prof. Angela Bonifati