|
DbTalks: Fall 2004
Contact: Ganesh Ramesh,
Elaine Chang
- Sep 17, 2004 at 2:00pm in CICSR 304, Report on SIGMOD/PODS, VLDB 2004
- Sep 24, 2004 at 2:00pm in CICSR 304, Data Model Versioning and Database Evolution
- Oct 1, 2004 at 2:00pm in CICSR 304, Evaluating Reference Genes used in Microarray
Analysis with SAGE data
- Oct 8, 2004 at 2:00pm in CICSR 304, Lab Meeting but No Talk
- Oct 15, 2004 at 2:00pm in CICSR 304, Lab Meeting and Research Discussion
- Nov 5, 2004 at 2:00pm in CICSR 304, Mediated Schema Creation
- Nov 12, 2004 at 2:00pm in CICSR 304, Master's Thesis Presentation: Shaofeng Bu
- Nov 19, 2004 at 1:00pm in Student Recreation Center, DB Group Volleyball
- Nov 26, 2004 at 2:00pm in CICSR 304,
Support Vector Machine Classification of Microarray Gene Expression Data
- Dec 10, 2004 at 2:00pm in CICSR 304,
XSEarch - A Semantic Search Engine for XML
- Dec 20, 2004 at 2:00pm in CICSR 104,
Designing and Using Views to Improve Performance of Aggregate Queries
Details:
- Friday, Sep 17, 2004 at 2:00pm in CICSR304
Report Presentation: Report on SIGMOD/PODS 2004 and VLDB 2004
Presenters: SIGMOD/PODS - Laks Lakshmanan, Raymond Ng
Presenters: VLDB - Ed Knorr, Rachel Pottinger, Elaine Chang, Ganesh Ramesh
- Friday, Sep 24, 2004 at 2:00pm in CICSR 304
Title: Data Model Versioning and Database Evolution
Presenter: Hassina Bounif, EPFL Switzerland
ABSTRACT:
In the field of computer science, we are currently facing a major
problem designing models that evolve over time. This holds in particular
for the case of databases: Their data models need to evolve, but their
evolution is difficult. User requirements are now much faster changing
than before for several reasons, among them the changing perception of
the real world and the development of new technologies. Databases show
little flexibility in terms of supporting changes in the organization of
their schemas and data. Database evolution approaches maintain current
populated data and software application functionalities when changing
database schema. Data Model Versioning is one of these chosen approaches
used to resolve the evolution of conventional and non-conventional
databases. This talk provides some background on database evolution and
versioning technique fields. It presents the unresolved issues as well.
- Friday, Oct 1, 2004 at 2:00pm in CICSR 304
Title: Evaluating Reference Genes used in Microarray Analysis with SAGE data
Presenter: Timothy Chan, UBC
ABSTRACT:
Normalizing to housekeeping genes is a common method in microarray analysis based on the assumption that housekeeping
genes are not differentially regulated between homeostatic tissue and unregulated growth. The housekeeping genes that are
usually chosen for this normalization are chosen according to our current understanding of their biological function.
From a statistical standpoint, we hypothesized that ideally, a stable housekeeping gene should be highly expressed and
not vary much between groups cancerous and normal tissues. To our knowledge, no one has ever analyzed the stability of
these chosen housekeeping genes in SAGE data with respect to these criteria.
Since SAGE has been shown in many studies to be more accurate and reproducible than microarrays, we propose using lung
SAGE data to evaluate chosen housekeeping genes for their suitability for microarray analysis. In order to discover ideal
housekeeping genes using SAGE data we used a non-parametric class of statistical tests called the permutation test on 11
cancerous lung libraries and 17 normal lung libraries and scored every gene. Low permutation scores correspond to stable
expression across normal and cancerous libraries (that is their distributions between the two groups are similar). We
selected all the scores with permutation scores less than 0.15, a permutation test standard deviation of < 10, and genes
that had an average raw tag count > 25 across all libraries. Next, we examined all the housekeeping genes used to
normalize the microarray data published by Bhattacharjee et al. (Proc. Natl. Acad. Sci.USA 98, 13790) that had a UID and
mapped them to these permutation results.
Our results show that most of the ideal SAGE candidates were not part of the set of housekeeping genes used to normalize
the microarray data. In addition, the traditional microarray chosen housekeeping genes do not score well in our criteria
for being an ideal candidate for a housekeeping gene. We are evaluating whether these SAGE-derived new reference
housekeeping would be good housekeeping genes for normalizing microarray experiments. So far, we have renormalized the
microarray data of Bhattacharjee et al [1] using the new reference genes in order to examine expression differences by
using the permutation test. In addition, each of these newly found candidate reference genes needs to be further
investigated to see if they are valid housekeeping genes from the biological standpoint.
[1] Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda
M,Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M. (2001) Classification of human
lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci U S A ,
98(24):13790-5
- Friday, Oct 8, 2004 - Lab Meeting but No Talk
- Friday, Oct 15, 2004 - Lab Meeting and Research Discussion
- Friday, Oct 22, 2004 at 2:00 pm in CICSR 304
Title: Schema Mapping
Presenter: Rong David Dai
ABSTRACT:
Schema matching is a basic problem in many database application
domains, such as data integration, E-business, data warehousing, and
semantic query processing. In current implementations, schema matching
is typically performed manually, which has significant time and
resource limitations. Research in this area has proposed many
techniques to achieve partial automation of schema matching for
specific application domains. Today's talk will present two papers
concerning schema matching:
E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema
matching. The VLDB Journal, 10:334-350, 2001.
This paper provides a taxonomy that covers many of the existing
approaches to schema matching, and describes the approaches in some
detail. In particular, the paper distinguishes between schema- and
instance-level, element- and structure-level, and language- and
constraint-based matchers. Based on this classification, the paper
reviews some previous match implementations thereby indicating which
part of the solution space they cover.
J. Madhavan, P. A. Bernstein and E. Rahm. Generic schema matching with Cupid
In Proc. of the Int. Conference on Very Large Data Bases (VLDB2001),
49-58, Rome, Italy, September 2001.
Both authors involved in the previous paper also co-authored this
paper, which proposes a new schema matching algorithm, Cupid, that
discovers mappings between schema elements based on their names, data
types, constraints, and schema structure, using a broader set of
techniques than past approaches. Some of Cupid's innovations include
integrated use of linguistic and structural matching,
context-dependent matching of shared types, and a bias towards leaf
structure where much of the schema resides. The paper also presents
experimental results that compare Cupid to two other schema matching
systems.
- Friday, Nov 5, 2004 at 2:00pm in CICSR 304
Title: Mediated Schema Creation
Presenter: Dr. Rachel Pottinger, CS Dept., UBC
ABSTRACT:
Data integration is a mechanism for allowing multiple databases to be
queried simultaneously. The databases and their schemas remain
separate; queries are asked over a mediated schema that describes
concepts from all of the source schemas. In this talk I will describe
my current work on how to create a mediated schema given the sources
and a mapping consisting of set of conjunctive queries describing how
the sources are related. We analyze the mapping's formal semantics,
show how to derive a mediated schema based on such mappings, and show
how to translate user queries over the mediated schema into queries
over local schemas.
- Friday, Nov 12, 2004 at 2:00pm in CICSR 304
Title: Master's Thesis Presentation
Presenter: Shaofeng Bu, CS Dept., UBC
ABSTRACT:
In many applications of OLAP or data warehouse, users need to query data of interest, such as a set of data that satisfies
specific properties. A normal answer to such query just enumerates all the interesting cells. This is the most accurate
but not the most informative method. Summarizations need to be done in order to return more concise descriptions of these
interesting cells to the users. MDL approach has been applied on the hierarchical data to get concise descriptions.
However in many cases the descriptions are not concise enough to the users. Another method, GMDL, can generate much
shorter descriptions, but the GMDL descriptions are not truly pure. The motivation of our research is to overcome the
disadvantages in the above methods.
In this talk, we bring up a methodology that focuses on generating the summarization with exceptions of the hierarchical
data. We extend the MDL approach to include some exceptions in the description. The exceptios are some uninteresting
cells. The result shows that the description with exceptions is pure, which means that the description only covers
``interesting cells''. We call this new approach MDLE, i.e. MDL with exceptions.
Our new approach aims to find the shortest description with exceptions to cover all ``interesting cells''. Firstly, we
study two simple cases that can be solved in polynomial time and we give the algorithms. Secondly, we prove
that MDL with exceptions is an NP-Hard problem in general cases and we propose three heuristics. Finally, we show some
experiments that we have done to compare MDLE with MDL and GMDL. The experiment results show that MDLE generates more
concise descriptions than MDL and meantime MDLE gets shorter descriptions than GMDL when the white-ratio is low or there
are some red cells.
- Friday, Nov 26, 2004 at 2:00pm in CICSR 304
Title: Support Vector Machine Classification of Microarray Gene Expression Data
Presenter: Mingyue Tan, CS Dept., UBC
Details:
The paper that is going to be presented for the talk is
"M. P. S. Brown et. al., Support Vector Machine Classification of Microarray Gene Expression Data",
University of California,
Santa Cruz, Tech. report UCSC-CRL-99-09.
The Abstract of the paper can be found
at this url.
The full paper can be downloaded by
using this link.
- Friday, Dec 03, 2004 at 2:00pm in CICSR 304
Title: Grid Computing
Presenter: Dr. Alan Wagner, CS Dept., UBC
ABSTRACT:
Grid computing is one of most promising new technologies of the decade
and is set to dramatically change the way we do science and may spawn
completely new forms of network-based businesses. The key word in the
last sentence is "promising". Indeed, grid computing "promises" a lot,
attempting to seamlessly allow the access, sharing, metering, and use of
heterogeneous collections of resources.
The talk will be introductory in nature and will the cover the
following. What is grid computing and what are the major forces driving
the technology? I will give some examples of where grids are being used
and the type of problems being tackled. We will look at its layered
architecture and give examples of the types of components that exist at
each of the layers. I will dicuss a few problems related to database and
Grids. Finally an open discussion on what role can or should computer
science play in the development of Grid technology.
- Friday, Dec 10, 2004 at 2:00pm in CICSR 304
Title: XSEarch - A Semantic Search Engine for XML
Presenter: Terence Ho, CS Dept., UBC
ABSTRACT:
XSEarch, a semantic search engine for XML, is presented. XSEarch has a
simple query language, suitable for a naive user. It returns semantically
related document fragments that satisfy the user.s query. Query answers
are ranked using extended information-retrieval techniques and are
generated in an order similar to the ranking. Advanced indexing techniques
were developed to facilitate efficient implementation of XSEarch. The
performance of the different techniques as well as the recall and the
precision were measured experimentally. These experiments indicate that
XSEarch is efficient, scalable and ranks quality results highly.
- Monday, Dec 20, 2004 at 2:00pm in CICSR 104
Title: Designing and Using Views to Improve Performance of Aggregate Queries
Presenter: Dr. Rada Chirkova, Dept. of Computer Science, North Carolina State University
ABSTRACT:
Data-intensive systems routinely use derived data, such as indexes or materialized views, to improve query-evaluation performance. In this context, the problem of designing derived data is as follows: Given a set of
queries and a database, return definitions of derived data that, when materialized in the database, would reduce the evaluation costs of the queries. Designing materialized views and indexes is an important part of automated query-performance tuning in data-management systems that experience changes over time, where a system addresses the performance requirements of current frequent and important queries by periodically reconsidering and rematerializing the stored derived data.
In this talk we present an extensible system architecture for Query-Performance Enhancement by Tuning (QPET). QPET combines design and use of derived data in an end-to-end approach to automated query-performance tuning, and selects appropriate data-design algorithms depending on the characteristics of the prevalent queries. Our focus in automated query-performance tuning is on a tradeoff between the amount of system resources spent on designing derived data and the degree of the resulting improvement in query performance. We present algorithms and experimental results in designing and using materialized views for practically important classes of aggregate queries, including range-aggregate queries on star-schema data warehouses.
|
|
|
|
|