Base case (leaf): R_u contains the single character at leaf node u.
General case: node u with children v, w:
R_u = | R_v intersection R_w, if this intersection is non-empty, and |
  | R_v union R_w, otherwise. |
(a) Prove that for all nodes u in the tree T, if T_u is the subtree of T rooted at u, the bases in the set R_u are exactly those that could label u in an optimal (most parsimonious) labeling of tree T_u. (You can use induction on the depth of the subtree rooted at u.)
(b) This question was raised by Daniel in class. Is it the case that, for all input trees T, there is an optimal labeled tree in which every node along some path from the root to a leaf has the same character? (Justify your answer.)
(c) Bonus points: In class, Mirela pointed out that it is possible to construct optimal labelings of trees in which the character at a node u of the tree is NOT in R_u. Can you design an efficient (polynomial time) algorithm that, given as input a tree T with labels just at the leaves, calculates, for each node u of the tree the set of ALL bases that could label that node in an optimal labeling of the tree?
(a) Use ClustalW to align these sequences. You can use the web-based interface for ClustalW at http://www.cbr.nrc.ca/newdocs/services/clustalw_form.html. Choose Phylip format for the output.
(b) Next, use Phylip to form a tree from these sequences. You can use the web-based interface for Phylip at http://www.cbr.nrc.ca/cgi-bin/WebPhylip/index.html. What tree do you get? (You may use only part of the aligned sequences if needed.)
----------------------------------------------------------------------- >AF2282 MMPEIEILEEKDFKIKFILKNASPALANSFRRAMKAEVPAMAVDYVDIYLNSSYFYDEVIAHRLAMLPIK TYLDRFNMQSECSCGGEGCPNCQISFRLNVEGPKVVYSGDFISDDPDVVFAIDNIPVLELFEGQQLMLEA VARLGTGREHAKFQPVSVCVYKIIPEIVVNENCNGCGDCIEACPRNVFEKDGDKVRVKNVMACSMCGECV EVCEMNAISVNETNNFLFTVEGTGALPVREVMKKALEILRSKAEEMNKIIEEIQ >Ta1030 MDVKIFKLSDKYIRFEIDGITPSQANALRRTLINDIPKLAIENVTFHHGEIRDSEGNVYDSSLPLFDEMV AHRLGLIPLKTDLTMNFRDQCSCGGKGCSLCTVTYSINKLGPSTVFSSDLQAVSHPDLIPVDGEIPIVKL GPKQAILITAEAILGTAKEHAKWQVTSGVSYKYHREFHVSKKDFEDWQKLKGACPKSVMSETDTEIVFTD DFGCNDLNVLFESDGVNMIEDDSRFIFQFETDGSLTAKETLLYALNRLKDRWDILVESLSE >TVN0565 MEVKIFKLSDKYMSFEIDGITPSQANALRRTLINDIPKLAIENVTFHHGEIRDAEGNVYDSSLPLFDEMV AHRLGLIPLKTDLSLNFRDQCSCGGKGCSLCTVTYSINKIGPASVMSGDIQAISHPDLVPVDPDIPIVKL GAKQAILITAEAILGTAKEHAKWQVTSGVAYKYHREFEVNKKLFEDWAKIKERCPKSVLSEDENTIVFTD DYGCNDLSILFESDGVQIKEDDSRFIFHFETDGSLTAEETLSYALNRLMDRWGILVESLSE >Rv3457c MLISQRPTLSEDVLTDNRSQFVIEPLEPGFGYTLGNSLRRTLLSSIPGAAVTSIRIDGVLHEFTTVPGVK EDVTEIILNLKSLVVSSEEDEPVTMYLRKQGPGEVTAGDIVPPAGVTVHNPGMHIATLNDKGKLEVELVV ERGRGYVPAVQNRASGAEIGRIPVDSIYSPVLKVTYKVDATRVEQRTDFDKLILDVETKNSISPRDALAS AGKTLVELFGLARELNVEAEGIEIGPSPAEADHIASFALPIDDLDLTVRSYNCLKREGVHTVGELVARTE SDLLDIRNFGQKSIDEVKIKLHQLGLSLKDSPPSFDPSEVAGYDVATGTWSTEGAYDEQDYAETEQL >ML1957 MLISQRPTLSEDILTDNRSQFVIEPLEPGFGYTLGNSLRRTLLSSIPGAAVTSIRIDGVLHEFTTVPGVK EDVTEIILNLKGLVVSSEEDEPVTMYLRKQGPGEVTAGDIVPPAGVTLHNPGMRIATLNDKGKIEAELVV ERGRGYVPAVQNRALGAEIGRIPVDSIYSPVLKVTYKVDATRVEQRTDFDKLILDVETKSSITPRDALAS AGKTLVELFGLARELNVEAEGIEIGPSPAEADHIASFALPIDDLDLTVRSYNCLKREGVHTVGELVSRTE SDLLDIRNFGQKSIDEVKVKLHQLGLSLKDSPDSFDPSEVAGYDVTTGTWSTDGAYDSQDYAETEQL >HI0802 MQGSVTEFLKPRLVDIEQISSTHAKVILEPLERGFGHTLGNALRRILLSSMPGCAVTEVEIDGVLHEYSS KEGVQEDILEVLLNLKGLAVKVQNKDDVILTLNKSGIGPVVAADITYDGDVEIVNPDHVICHLTDENASI SMRIRVQRGRGYVPASSRTHTQEERPIGRLLVDACYSPVERIAYNVEAARVEQRTDLDKLVIELETNGAL EPEEAIRRAATILAEQLDAFVDLRDVRQPEIKEEKPEFXPILLRPVDDLELTVRSANCLKAETIHYIGDL VQRTEVELLKTPNLGKKSLTEIKDVLASRGLSLGMRLENWPPASIAED