Filed under: Project
Individual Project by: J. Karen Parker, parker AT cs.ubc.ca
Quick Links:
Domain, Task & Dataset
Personal Expertise
Proposed Solution
Scenario of Use
Implementation Approach
Milestones
References
DOMAIN, TASK & DATASET
Domain
In the field of Human-Computer Interaction (HCI), data analysis can be a complex problem. Researchers often use a combination of logging and qualitative methods to record data, resulting in a daunting amount of information which must be sifted through in order to do a thorough analysis.
This data overload is a common problem, as evidenced by a recent workshop at CHI 2005 which aimed to tackle the problems associated with combining data logging and qualitative methods [1]. The workshop organizers identified several key tasks which are difficult to achieve with existing data analysis solutions: detecting patterns in behavior, comparing patterns between users, and combining patterns in behavior with qualitative data [1].
Hilbert and Redmiles suggest, among other things, “[visualizing] the results of transformations and analyses of event streams so they can be explored with more ease” [2]. So, some form of information visualization may be able to aid HCI researchers in their data analysis problems.
Previous work in visualizing complex systems may provide some insight into the HCI log data problem. Bosch et al. state that “because of the complexity of computer systems, the analysis process is a highly unpredictable and iterative one: an initial look at the data often ends up raising more questions than it answers” [3]. This is also true for HCI log data.
Task: High Level Description
In trying to create a solution that is good for everybody, one may end up with a solution that suits nobody terribly well. In an attempt to avoid this pitfall we will take a somewhat narrow approach to visualizing HCI log data, focussing solely on the visualization of web browsing behaviour.
Most previous InfoVis research into web browsing has been concerned primarily with representing users’ navigation through websites. The main aims of this type of visualization include improving website usability [4], and characterizing how users navigate complex information spaces [5].
Recent research into web browsing has begun to examine user behaviour at a much deeper level than simple navigation. In particular, several researchers at Dalhousie University are conducting examining web browsing behaviour by logging low-level browser events in order to gain information about users’ web browsing habits [6][7]. By combining browser events with user provided data (e.g. the task a user was trying to accomplish when they were at a particular page), these researchers hope to reveal interesting trends and patterns in web browsing behaviour.
Dataset
The first dataset we will support is from a research study in which participants were asked to rate the “privacy level” of each page they visited over a week-long period [6]. The following information was logged for EVERY individual page a user visited during that week:
- browser window ID
- date
- time
- page title
- url
- primary content category (approx. 40 categories)
- secondary content category (approx. 50 categories)
- privacy level (4 categories)
- location (home/work/school)
- computer type (consistent for each participant)
Task: Low Level Description
Some basic tasks that the owner of this data would like to be able to accomplish include:
- examine privacy level changes within rapid bursts of browsing
- find temporal patterns
- see how browsing is partitioned between windows.
- see how transitions between privacy levels relate to the content
- filter on specific variables
PERSONAL EXPERTISE
While web browsing is not my particular area of interest, I am an HCI researcher and am keenly aware of the need for better analysis tools in our field. Furthermore, the research this project aims to support is being conducted by two friends from Dalhousie University. (We were all members of the same research lab when I was doing my Masters there.) I’m thrilled that something I do in a course project might actually help them in their doctoral data analyses!
PROPOSED INFOVIS SOLUTION
Event ordering is a very important component of our target dataset, thus a time-series display is the obvious solution. At the simplest level, we want to represent each currently open browser window on a timeline. Then, using visual attributes such as colour and markings on each “window”, we can indicate the various events and attributes of a given window at any given time. The resulting visualization is a sort of web browsing Gantt chart. While Gantt charts show “a graphical representation of the duration of tasks against the progression of time” [8], our solution will show a graphical representation of web browser windows against the progression of time.
The dataset we are using already has an established colour scheme which corresponds to user-assigned privacy level (blue for “don’t save”, green for “public”, yellow for “semi-public”, and red for “private”). We will maintain this colour scheme in our solution, colouring each page in a window to show its privacy level. Further markings on each window will indicate important attributes such as category and location. These markings may be turned on/off by the viewer. Due to the large number of pages in the dataset, screen real estate for each individual page will be quite limited, so meta information such as URL and page title will be available only as a popup on mouseover.
The viewer will also be able to control the width of the timeline. In some cases, they may want to view the data in true temporal time, while in other cases they may only be interested in sequence information. In addition to being able to switch between a “real timeline” and “sequential timeline”, a focus+context technique will be provided, allowing the viewer to stretch a particular section (or sections) of data along the horizontal axis. This technique will help viewers get a closer look at a particular area of the dataset, or compare two distant areas side-by-side.
Note that while we are explicitly identifying attributes of the privacy study data to which we will assign various visualizations in this project, it is expected that the same visualization techniques could be used for other studies of web browsing behaviour with different attributes.
SCENARIO OF USE
(Click on an image to pop up a window containing a larger version )
1. Kirstie wants to see an overview of the data for one of her users, so she loads the data file into out InfoVis tool. She is presented with a temporal representation of all the windows that user browsed, and the colour-coded privacy levels for each page in each window:
2. She sees some interesting privacy patterns in the data from the second half of the day on 01/01/05 so she zooms in on that time period:
.
3. She sees an anomalous red marked-area (private) in the middle of a window that is otherwise completely semi-public (yellow), so she turns on the “page change” markers to see when the user changes pages:
4. She mouses over the area (which she is now able to distinguish as a single red page) to get further details about it:
IMPLEMENTATION APPROACH
We’ll be using Java as our development language. We chose Java because it is multi-platform and we would like to be able to support data analysis tasks on Windows/Mac/Unix. We own a Mac and our target end user owns a Windows box, so at the very least our software it has to work on both of those platforms.
At the moment, we do not plan to use a toolkit (i.e. Prefuse or InfoVis toolkit). However, we’d like to do a bit more research on what these two toolkits offer before completely ruling them out.
Week of November 6th: Do some more research on toolkits and such. Create paper prototype. Meet with end user to test prototype and collect more info on desired tasks.
Week of November 13th Start playing around in Eclipse and programming bits and pieces of the interface.
Week of November 20th Code code code.
Week of November 27th Code code code, hopefully have a mostly final version by the end of this week.
Week of December 4th Demo software for end user and get feedback. Mark CPSC 344 exams.
Week of December 11th Refine software based on end user feedback. Start writing report.
Week of December 18th Finish report, present project in class, and hand in report.
References
[1]Usage Analysis: Combining Logging and Qualitative Methods, Kort, J., de Poot, H., CHI2005 Workshops, April 2-7, 2005, Portland, Oregon, USA.
[2] D. Hilbert, and D.F. Redmiles, “Extracting Usability Information from User Interface Events,” ACM Computing Surveys, Dec. 2000, pp. 384-421.
[3] Rivet: A Flexible Environment for Computer Systems Visualization Robert Bosch, Chris Stolte, Diane Tang, John Gerth, Mendel Rosenblum, and Pat Hanrahan. Computer Graphics, February 2000.
[4] Jason I. Hong, and James A. Landay, “WebQuilt: A Framework for Capturing and Visualizing the Web Experience.” In Proceedings of The Tenth International World Wide Web Conference (WWW10), Hong Kong, May 2001, pp. 717-724.
[5] Berendt, B. & Brenstein, E. (2001). Visualizing Individual Differences in Web Navigation: STRATDYN, a Tool for Analyzing Navigation Patterns. Behavior Research Methods, Instruments, & Computers, 33, 243-257.
[6] Hawkey, K. and Inkpen, K.M. (2005) Privacy gradients: exploring ways to manage incidental information during co-located collaboration. (Late Breaking Results: Short Papers) in Extended Abstracts of the Conference on Human Factors in Computing Systems (CHI 2005). Portland, OR, USA. pp. 1431 - 1434.
[7] Kellar, M. & Watters, C. (2005). Studying User Behaviour on the Web: Methods and Challenges. CHI 2005 Workshop on Usage Analysis: Combining Logging and Qualitative Methods, Portland, OR.
[8]“Gantt Charts.” Website. http://www.ganttcharts.com. Accessed Friday, November 4th, 2005.