Project Information
One of the major components of this class is the project. The point
of this project is to delve further into some aspect that we have
been studying. You may do your project either alone or in groups of
two to three. The amount of work expected from the project is
commensurate with the number of people working on it (i.e., you
personally are expected to put in the same amount of work on a project
regardless of whether you're working alone or in a group). Keep in
mind that I do not require that this project be an
implementation. A literature survey is a perfectly fine project.
This project should not eat your life.
Schedule
- January 29: 1-page proposal due. This should include:
- What problem(s) you want to solve,
- What is going to be new and challenging about it,
- How you will try to solve the problem(s)
- What problems you don't consider to be part of the project (i.e.,
non-goals)
- What resources you need that you don't already have
- Who is on your team if you are working in a team. Teams are
strongly encouraged
- Week of February 5: feedback on proposal returned to you
- February 28: 4-page midterm status report due; this should
describe what you have done, what you have left to do, roadblocks
you've encountered, interesting or unexpected questions or issues that
uncovered, etc. Included in this report should be 1-2 pages of
literature search for related work; this should include both a written
component comparing your project to related work, as well as a
bibliography. Note that this checkpoint is largely a chance for
you to get the feedback that you need. While it is not graded,
students who have not made a good effort on this checkpoint often wind
up not making good effort on the project overall and thus not doing well.
- Week of March 4: feedback on status report returned to you
- April 3-10: Project presentations. Precise schedule TBD
(it's first come first served via requests that are e-mailed to me ---
requests only accepted after proposals are turned in),
but everyone should be prepared for April 3. Your presentation
should be ~15 minutes long - with at least 3 minutes of that time
reserved for questions and answers. Here is what I expect out of the
presentation (not necessarily in this order):
- A good description of what the project is (inputs, outputs, etc.)
- The motivation for why this project is interesting - why did
you choose to do it, and why should we care about the problem
- A discussion of what makes this project non-trivial (especially
for the more research-oriented projects)
- A description of how the project fits into the context of the class
- A presentation of the results thus far
- A discussion of what results you expect to get by the project
deadline
- A discussion of the difficulties or surprises that you had when
working on the project
- If your project is in a group, everyone must speak.
- Tuesday, April 16 - 5:00pm:
Final report due. Your final project report is due, along with a
group evaluation for those working in groups (see below).
The final report must be a full-length conference-style paper
discussing your project. (It should be roughly equivalent to 10-14
single column pages. Note that this is a rough guideline. It
is okay to go a bit over this, particularly if you're working in a
large group - this is just meant to help you decide if you're in the
right ball park.) You should model your paper on some of the papers
we've read this term. Either PDF by e-mail or a hard copy in my box
is fine. If you give a hard copy, I'd appreciate an e-mail letting me
know that you've turned it in. Note that I don't care about the
formatting (i.e., what the page length is - I do care about the
structure of the document, though); I only specify the length in
single column pages because otherwise people ask if I mean single or
double column pages.
The goal of saying "conference style paper" is that I want
you to include things like:
- Motivate the problem that you're working on
- Provide an example of a scenario where you'd use your solution
- Tell me about the solution that you've created, this includes telling me about what makes the problem interesting
and
hard. If you'd like, you can interpret this as telling me what
problems you ran into.
- Relate it to related work
- Tell me about potential future work - even if you have no
intension of ever doing it. Just like in a real conference paper, the
goal is for you to show that you know what some of the flaws are with
your system, even if you have no intension of solving them. ;)
Note that some of you won't actually create a solution, but just
explore the literature, which is fine. In this case, your job is to
explore the strengths and weaknesses of the approaches, and, if you
feel like there's an obvious choice, say what you would do if you were
going to implement a solution to the problem. Note that I do not care
about the layout.
In addition to the report, I want each person who is working
on a project in a group to SEPARATELY turn in a report on how they
felt that all of the group members (yourself included) contributed to
the project. Useful information is: what parts of the project you did
(e.g., if you divided the work by sections, who did what section), how
many hours you estimated that you worked, and how well you feel like
you and the other people in the project did.
Project Ideas
Here are some ideas that would be appropriate for the course project.
The best project ideas are likely to come from you; however, here are
some that you can use as is or use to think of new ones. The projects
can run the gamut from all theory to having a heavy implementation
component. I'll add more project ideas as I come up with them.
- Most database research topics that you would like to pursue. Keep
in mind that I do mean research topics; implementing a database
application does not qualify. Feel free to send me mail or come by to
talk about what qualifies as a good project.
- Helping users to create an ontology or schema is a well understood
process. Explore the best methodologies for doing so, especially
focusing on open source software.
- Open data
Open data is an interesting challenge because there is substantial
pressure to make the open data available, but not a huge amount of
incentive to make it easy to use. One of the challenges that my
group is working on is how to make open data easier to use. Here are
some topics within that that may make good course projects:
- The city of Surrey has a large catalog of open data. Look at
the open data and propose/try to complete interesting projects
that would combine the data from multiple sources. The idea would
be that you would simultaneously do something with open data,
which would be an interesting project in and of itself, but would
also likely mean that you would run into problems, which would be
good potential research directions.
- Part of the challenge of working on open data is coming up
with a dataset (or datasets) to use. Look at the literature on
open data and come up with a repository of the open datasets that are
used in the papers, including what characteristics the datasets
have and what problems were looked into in the paper that cites them.
- Conduct a related work search on Open Data Discovery. A good
place to start is Renée J. Miller:
Open Data Integration. Proc. VLDB Endow. 11(12): 2130-2139
(2018)
- End users often have different mental models of how their data is laid
out than it actually is. As long as the data that they want is
accessible through the applications that they are using, this is
fine. However, when that is *not* the case, then it is hard for
people to find what they need.
A current research direction for me is to make it so that we can help
users to write down their mental model and then map it to the actual
data sources as automatically as possible. Then users can use data
integration and data exchange techniques to query the representation
of their mental model and get answers from the source(s) in which
their data is stored.
There are several directions that I
am interested in going for this project:
- Some students and colleagues in Civil Engineering and I have
previously done a manual version of this by creating an ontology
of building data corresponding to a class of users' mental
models. We then manually mapped this to the Building Information
Model standard Industry Foundation Classes XML Schema. This
project was done ~10 years ago. Dig out the old pieces, see what's
there, and if possible, see what needs to be updated.
- Survey current work on how to help users with little database
knowledge create mental models of their own work.
- Survey current mapping algorithms (i.e., algorithms that
help to create mappings between schemas)
- A digital twin is a version of a real world object that is
represented digitially so that people can look at the digital
representation and find things about it more easily than they can by
investigating the real world object. For example, the digital twin
of a building would allow users to keep track of the temperature in
each room by looking at an application rather than by having to go
to each room in the building.
A group of civil engineering
researchers and I are investigating what the various needs of
users of digital twins are and how that aligns with existing digital
twin technology. One project would be to investigate existing
digital twin technology and see what is needed for the data
management aspects.
- Throughout this course, we'll talk about how the concepts
that we study relate to your data. Choose some part of your
data that is difficult to manage using current data management
techniques/software. Describe what would need to change in order for
your data to be managed effectively. Relate to readings both in class
and out of class.
A word on plagiarism
Your project, as with all of your work, is to be your work. If
you take ideas from anywhere else, you have to cite them, and
that if you take words from somewhere else, they have to be
quoted and cited (taking names of things is okay without quotes as
long as they are well cited, but if you're taking more than that, you
need to have it in quotes). Copying other people's text or figures
and claiming it as your own is not okay; it is plagiarizing.
What does this mean precisely? Let's say that this webpage is your
source [1]. If you were writing something about the first paragraph,
it might look something like the following:
504 includes a class project which can be done either individually or
in groups [1]. Overall, it shouldn't be too bad, in particular, "it
should not eat your life"[1].
Note that the first sentence is paraphrased, so it has just been
cited. The second sentence contains a direct quote, so it has been
put in quotation marks along with having a citation.
To make sure that you don't plagiarize, always add in citations where
appropriate as you are working on your paper. Never cut
and paste text and put it in your work without putting it quotations. Do not rely on the
fact that you will come back later and change wording later.
I note that using ChatGPT or a similar system on any work that you
turn in also
constitutes plagiarism.
If you find yourself thinking "there's no point in my writing this
differently, the source that I'm looking at has written it better than
I could", I offer you the following words of wisdom (1) I don't care
if they wrote it better,
you can't plagiarize (2) in each case where I have detected
plagiarism, the plagiarized sections are the WORST part of the paper,
since they are generally just cut and pasted from other sources
without regard to the context that the project is supposed to be about.
So do us both a favour, save us both a lot of grief, and don't do it.
You'll learn more and turn in a better result.
Resources
If you are looking for relevant papers, here are some suggestions:
- DBLP
is a fantastic bibliography and link to papers for database and logic
programming.
- Google Scholar also has
a search engine that can be quite helpful since it indexes more than
just the metadata about the paper
For any source, you want to make sure that you're reading the best
papers. One way that will often, though not always, lead you in the
right direction, is to look at the highly rated venues. In data
management, some of those are:
Conferences:
- SIGMOD
- PODS (theory)
- VLDB
- EDBT
- ICDE
Journals
[504 home] [grading] [schedule] [project] [Canvas]
Rachel Pottinger
E-mail Address:
Office Location: ICCS 345
Phone: (604)822-0436
Fax:(604)822-5485
Postal/Courier address:
The Department of Computer Science
University of British Columbia
201-2366 Main Mall
Vancouver, B.C. V6T 1Z4
Canada
Traditional, Ancestral & Unceded Musqueam Territory