UBC Database Group Website

Useful Links

DB Talks
Fall 2008
Fall 2007
Fall 2006
Summer 2006
Spring 2006
Fall 2005
Spring 2005
Fall 2004
Spring 2004
Fall 2003
Summer 2003
Spring 2003
Fall 2002
Fall 2001

DbTalks: Fall 2004

Contact: Ganesh Ramesh, Elaine Chang

Sep 17, 2004 at 2:00pm in CICSR 304, Report on SIGMOD/PODS, VLDB 2004

Sep 24, 2004 at 2:00pm in CICSR 304, Data Model Versioning and Database Evolution

Oct 1, 2004 at 2:00pm in CICSR 304, Evaluating Reference Genes used in Microarray Analysis with SAGE data

Oct 8, 2004 at 2:00pm in CICSR 304, Lab Meeting but No Talk

Oct 15, 2004 at 2:00pm in CICSR 304, Lab Meeting and Research Discussion

Oct 22, 2004 at 2:00pm in CICSR 304, Schema Mapping

Nov 5, 2004 at 2:00pm in CICSR 304, Mediated Schema Creation

Nov 12, 2004 at 2:00pm in CICSR 304, Master's Thesis Presentation: Shaofeng Bu

Nov 19, 2004 at 1:00pm in Student Recreation Center, DB Group Volleyball

Nov 26, 2004 at 2:00pm in CICSR 304, Support Vector Machine Classification of Microarray Gene Expression Data

Dec 03, 2004 at 2:00pm in CICSR 304, Grid Computing

Dec 10, 2004 at 2:00pm in CICSR 304, XSEarch - A Semantic Search Engine for XML

Dec 20, 2004 at 2:00pm in CICSR 104, Designing and Using Views to Improve Performance of Aggregate Queries

Details:

Friday, Sep 17, 2004 at 2:00pm in CICSR304
Report Presentation: Report on SIGMOD/PODS 2004 and VLDB 2004
Presenters: SIGMOD/PODS - Laks Lakshmanan, Raymond Ng
Presenters: VLDB - Ed Knorr, Rachel Pottinger, Elaine Chang, Ganesh Ramesh
Friday, Sep 24, 2004 at 2:00pm in CICSR 304
Title: Data Model Versioning and Database Evolution
Presenter: Hassina Bounif, EPFL Switzerland

ABSTRACT: In the field of computer science, we are currently facing a major problem designing models that evolve over time. This holds in particular for the case of databases: Their data models need to evolve, but their evolution is difficult. User requirements are now much faster changing than before for several reasons, among them the changing perception of the real world and the development of new technologies. Databases show little flexibility in terms of supporting changes in the organization of their schemas and data. Database evolution approaches maintain current populated data and software application functionalities when changing database schema. Data Model Versioning is one of these chosen approaches used to resolve the evolution of conventional and non-conventional databases. This talk provides some background on database evolution and versioning technique fields. It presents the unresolved issues as well.
Friday, Oct 1, 2004 at 2:00pm in CICSR 304
Title: Evaluating Reference Genes used in Microarray Analysis with SAGE data
Presenter: Timothy Chan, UBC

ABSTRACT: Normalizing to housekeeping genes is a common method in microarray analysis based on the assumption that housekeeping genes are not differentially regulated between homeostatic tissue and unregulated growth. The housekeeping genes that are usually chosen for this normalization are chosen according to our current understanding of their biological function. From a statistical standpoint, we hypothesized that ideally, a stable housekeeping gene should be highly expressed and not vary much between groups cancerous and normal tissues. To our knowledge, no one has ever analyzed the stability of these chosen housekeeping genes in SAGE data with respect to these criteria. Since SAGE has been shown in many studies to be more accurate and reproducible than microarrays, we propose using lung SAGE data to evaluate chosen housekeeping genes for their suitability for microarray analysis. In order to discover ideal housekeeping genes using SAGE data we used a non-parametric class of statistical tests called the permutation test on 11 cancerous lung libraries and 17 normal lung libraries and scored every gene. Low permutation scores correspond to stable expression across normal and cancerous libraries (that is their distributions between the two groups are similar). We selected all the scores with permutation scores less than 0.15, a permutation test standard deviation of < 10, and genes that had an average raw tag count > 25 across all libraries. Next, we examined all the housekeeping genes used to normalize the microarray data published by Bhattacharjee et al. (Proc. Natl. Acad. Sci.USA 98, 13790) that had a UID and mapped them to these permutation results. Our results show that most of the ideal SAGE candidates were not part of the set of housekeeping genes used to normalize the microarray data. In addition, the traditional microarray chosen housekeeping genes do not score well in our criteria for being an ideal candidate for a housekeeping gene. We are evaluating whether these SAGE-derived new reference housekeeping would be good housekeeping genes for normalizing microarray experiments. So far, we have renormalized the microarray data of Bhattacharjee et al [1] using the new reference genes in order to examine expression differences by using the permutation test. In addition, each of these newly found candidate reference genes needs to be further investigated to see if they are valid housekeeping genes from the biological standpoint.

[1] Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M,Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M. (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci U S A , 98(24):13790-5

Friday, Oct 8, 2004 - Lab Meeting but No Talk

Friday, Oct 15, 2004 - Lab Meeting and Research Discussion

Friday, Oct 22, 2004 at 2:00 pm in CICSR 304
Title: Schema Mapping
Presenter: Rong David Dai

ABSTRACT: Schema matching is a basic problem in many database application domains, such as data integration, E-business, data warehousing, and semantic query processing. In current implementations, schema matching is typically performed manually, which has significant time and resource limitations. Research in this area has proposed many techniques to achieve partial automation of schema matching for specific application domains. Today's talk will present two papers concerning schema matching: E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. The VLDB Journal, 10:334-350, 2001. This paper provides a taxonomy that covers many of the existing approaches to schema matching, and describes the approaches in some detail. In particular, the paper distinguishes between schema- and instance-level, element- and structure-level, and language- and constraint-based matchers. Based on this classification, the paper reviews some previous match implementations thereby indicating which part of the solution space they cover. J. Madhavan, P. A. Bernstein and E. Rahm. Generic schema matching with Cupid In Proc. of the Int. Conference on Very Large Data Bases (VLDB2001), 49-58, Rome, Italy, September 2001. Both authors involved in the previous paper also co-authored this paper, which proposes a new schema matching algorithm, Cupid, that discovers mappings between schema elements based on their names, data types, constraints, and schema structure, using a broader set of techniques than past approaches. Some of Cupid's innovations include integrated use of linguistic and structural matching, context-dependent matching of shared types, and a bias towards leaf structure where much of the schema resides. The paper also presents experimental results that compare Cupid to two other schema matching systems.

Friday, Nov 5, 2004 at 2:00pm in CICSR 304
Title: Mediated Schema Creation
Presenter: Dr. Rachel Pottinger, CS Dept., UBC

ABSTRACT: Data integration is a mechanism for allowing multiple databases to be queried simultaneously. The databases and their schemas remain separate; queries are asked over a mediated schema that describes concepts from all of the source schemas. In this talk I will describe my current work on how to create a mediated schema given the sources and a mapping consisting of set of conjunctive queries describing how the sources are related. We analyze the mapping's formal semantics, show how to derive a mediated schema based on such mappings, and show how to translate user queries over the mediated schema into queries over local schemas.

Friday, Nov 12, 2004 at 2:00pm in CICSR 304
Title: Master's Thesis Presentation
Presenter: Shaofeng Bu, CS Dept., UBC

ABSTRACT: In many applications of OLAP or data warehouse, users need to query data of interest, such as a set of data that satisfies specific properties. A normal answer to such query just enumerates all the interesting cells. This is the most accurate but not the most informative method. Summarizations need to be done in order to return more concise descriptions of these interesting cells to the users. MDL approach has been applied on the hierarchical data to get concise descriptions. However in many cases the descriptions are not concise enough to the users. Another method, GMDL, can generate much shorter descriptions, but the GMDL descriptions are not truly pure. The motivation of our research is to overcome the disadvantages in the above methods. In this talk, we bring up a methodology that focuses on generating the summarization with exceptions of the hierarchical data. We extend the MDL approach to include some exceptions in the description. The exceptios are some uninteresting cells. The result shows that the description with exceptions is pure, which means that the description only covers ``interesting cells''. We call this new approach MDLE, i.e. MDL with exceptions. Our new approach aims to find the shortest description with exceptions to cover all ``interesting cells''. Firstly, we study two simple cases that can be solved in polynomial time and we give the algorithms. Secondly, we prove that MDL with exceptions is an NP-Hard problem in general cases and we propose three heuristics. Finally, we show some experiments that we have done to compare MDLE with MDL and GMDL. The experiment results show that MDLE generates more concise descriptions than MDL and meantime MDLE gets shorter descriptions than GMDL when the white-ratio is low or there are some red cells.

Friday, Nov 26, 2004 at 2:00pm in CICSR 304
Title: Support Vector Machine Classification of Microarray Gene Expression Data
Presenter: Mingyue Tan, CS Dept., UBC

Details: The paper that is going to be presented for the talk is "M. P. S. Brown et. al., Support Vector Machine Classification of Microarray Gene Expression Data", University of California, Santa Cruz, Tech. report UCSC-CRL-99-09. The Abstract of the paper can be found at this url. The full paper can be downloaded by using this link.

Friday, Dec 03, 2004 at 2:00pm in CICSR 304
Title: Grid Computing
Presenter: Dr. Alan Wagner, CS Dept., UBC

ABSTRACT: Grid computing is one of most promising new technologies of the decade and is set to dramatically change the way we do science and may spawn completely new forms of network-based businesses. The key word in the last sentence is "promising". Indeed, grid computing "promises" a lot, attempting to seamlessly allow the access, sharing, metering, and use of heterogeneous collections of resources.
The talk will be introductory in nature and will the cover the following. What is grid computing and what are the major forces driving the technology? I will give some examples of where grids are being used and the type of problems being tackled. We will look at its layered architecture and give examples of the types of components that exist at each of the layers. I will dicuss a few problems related to database and Grids. Finally an open discussion on what role can or should computer science play in the development of Grid technology.

Friday, Dec 10, 2004 at 2:00pm in CICSR 304
Title: XSEarch - A Semantic Search Engine for XML
Presenter: Terence Ho, CS Dept., UBC

ABSTRACT: XSEarch, a semantic search engine for XML, is presented. XSEarch has a simple query language, suitable for a naive user. It returns semantically related document fragments that satisfy the user.s query. Query answers are ranked using extended information-retrieval techniques and are generated in an order similar to the ranking. Advanced indexing techniques were developed to facilitate efficient implementation of XSEarch. The performance of the different techniques as well as the recall and the precision were measured experimentally. These experiments indicate that XSEarch is efficient, scalable and ranks quality results highly.

Monday, Dec 20, 2004 at 2:00pm in CICSR 104
Title: Designing and Using Views to Improve Performance of Aggregate Queries
Presenter: Dr. Rada Chirkova, Dept. of Computer Science, North Carolina State University

ABSTRACT: Data-intensive systems routinely use derived data, such as indexes or materialized views, to improve query-evaluation performance. In this context, the problem of designing derived data is as follows: Given a set of queries and a database, return definitions of derived data that, when materialized in the database, would reduce the evaluation costs of the queries. Designing materialized views and indexes is an important part of automated query-performance tuning in data-management systems that experience changes over time, where a system addresses the performance requirements of current frequent and important queries by periodically reconsidering and rematerializing the stored derived data.
In this talk we present an extensible system architecture for Query-Performance Enhancement by Tuning (QPET). QPET combines design and use of derived data in an end-to-end approach to automated query-performance tuning, and selects appropriate data-design algorithms depending on the characteristics of the prevalent queries. Our focus in automated query-performance tuning is on a tradeoff between the amount of system resources spent on designing derived data and the degree of the resulting improvement in query performance. We present algorithms and experimental results in designing and using materialized views for practically important classes of aggregate queries, including range-aggregate queries on star-schema data warehouses.

Last Update: Oct. 15, 2003