A specific schema matching algorithm and an industrial report

For this class we'll be reading two papers:

A. Doan, P. Domingos, and A. Halevy: Reconciling Schemas of Disparate Data Sources: A Machine Learning Approach. Proceedings of the ACM SIGMOD Conf. on Management of Data (SIGMOD), 2001. We're reading this paper to get a thorough understanding about one particular schema matching algorithm and metrics for evaluating it.
Philip A. Bernstein, Sergey Melnik, Michalis Petropoulos, and Christoph Quix Industrial-Strength Schema Matching. SIGMOD Record, 33(4), 2004. This paper is an interesting read because it's about what happens when you actually try to use one of these algorithms rather than being about the algorithm itself.

Both papers should be fairly straightforward to read. Don't get too hung up on all of the equations in the LSD paper, but do make sure you understand the general ideas of what the equations are trying to get at. We are not reading COMA (though you can find it here). The key thing to know in order to understand this paper is that like LSD it combines information from a number of sources, and to do so it uses a matrix to hold the values.

WebCT discussions

[534A home] [grading] [schedule][project]

Rachel Pottinger
E-mail Address: rap [at] cs [dot] ubc [dot] ca

Office Location: CICSR 393
Phone: (604)822-0436
Fax:(604)822-5485
Postal/Courier address:
The Department of Computer Science
University of British Columbia
201-2366 Main Mall
Vancouver, B.C. V6T 1Z4
Canada