NOTE: Only CPSC 445 students are expected to hand in this assignment; for them, it is part of their exam preparation.
Consider the following set of points in two-dimensional Euclidean space, which you may imagine to be a (small) subset of results from a gene expression analysis experiment.
P1 = (3, 2)
P2 = (2, 4)
P3 = (4, 7)
P4 = (7, 1)
P5 = (7, 8)
P6 = (9, 10)
(a) Complete the following table of pairwise Euclidean distances between the points, where all distances are rounded to one decimal place (i.e., show 2.2 instead of 2.2360679...). [1 marks]
P1 | P2 | P3 | P4 | P5 | P6 | |
P1 | 0 | 2.2 | 5 | 4.1 | 7.2 | ? |
P2 | ? | 3.6 | 5.8 | 6.4 | 9.2 | |
P3 | 0 | 6.7 | 3.1 | 5.8 | ||
P4 | 0 | ? | 9.2 | |||
P5 | 0 | 2.8 | ||||
P6 | 0 |
(b) Perform agglomerative hierarchical single linkage clustering on the six data points and show the resulting cluster tree. (Show all steps.) [3 marks]
(c) Perform agglomerative hierarchical complete linkage clustering on the six data points and show the resulting cluster tree. (Show all steps.) [3 marks]
(d) Perform k-means clustering with k=3 on the six data points, starting with the following clustering: {{P1,P2,P5}, {P4}, {P3,P6}}. (Show all steps.) [3 marks]
1: a b -> e_out
2: c a -> c a_out
3: c' b -> c' b_out
Consider an initial state in which membrane 2 contains 3 copies of a and one copy of c, and membrane 3 contains 2 copies of b and 2 copies of c' each.
(a) Show all steps of this system until the computation stops, i.e., no rules can be applied in any of the membranes. [4 marks]
(b) What changes if you start the computation with only one instead of two copies of c' in membrane 3? [2 marks]