Here is the schedule of talks.
Brief Course Description
Much information of interest to the human society is online today, with experts predicting most, if not all, of the information in the world will be available online in future. While traditional database technology has been extremely successful in providing efficient and effective solutions for storing, retireving, and managing information that is well structured, the proportion of online information that can be attributed solely to databases is relatively small! Think about how much information is locked in non-database information repositories and applications: file systems, spreadsheets, plain text, LDAP style network directories, HTML pages, etc. Most of the questions that were asked (and conclusively answered) concerning the management of traditional structured data can (and should) be progitably asked against information that is partly or poorly structured and is locked away in applications/tools such as above.
How can we harness this information? How can we interact across applications? How can we maintain consistency of information so stored? The advent of semistructured data models on the research arena and of XML on the technological arena hold much promise for addressing these questions and for developing technologies for integrating information across diverse data stores and applications.
The theme of this year's offering of 534B is thus Web Data Integration and Management. The relational data model, invented for traditional business data processing applications, surprisingly can still serve us in our ``hour of need" in offering useful abstractions. Therefore, we begin this course with a brief review of the relational model. We then work our way through issues arising in interoperability across heterogeneous database systems as a natural transition point to study semistructured data and XML and then unstructured data including plain text.
Privacy and security of data play an increasingly important role today. Peer-to-peer data management systems are finding increasing applications and popularity. While traditionally we take the accuracy and correctness of the information in a database for granted, increasingly we are having to cope with inherent uncertainty in the data. These are but some of the many challenges in harnessing the underlying information in the huge amounts of data contained in diverse data stores.
Marking Scheme
Projects Check out the project suggestions and related reference/background material.
First meeting will be on Monday, September 11, 2005, 9:30-11:00 am, in in CICSR/ICICS 104. Regular schedule, MW 9:30-11:00 am, ICICS 104.
Here is a tentative
Course Outline
Here is a link to course notes.
Course Resources:
There is no single text that adequately covers the desired material. The material will instead be drawn extensively from recent research literature. Here are some books that cover the basics.
Attention MSS students:
This might be a new experience for you: this course emphasizes research, innovation, and creativity much more than traditional courses that you may be used to. Make sure you understand the material discussed in class. Do participate in classroom discussions. For projects, I recommend that you include at least one MCS (or CS-PhD) student in your team and work with them closely.