Limericks, sonnets, haiku, and other forms of poetry each follow prescribed patterns that give the number of lines, the number of syllables on each line, and a rhyme scheme. For example, limericks are five lines long; the first, second, and fifth lines each have eight syllables and rhyme with each other; and the third and fourth lines each have five syllables and rhyme with each other. (There are additional rules about the location and number of stressed vs. unstressed syllables, but we'll ignore those rules for this assignment; we will be counting syllables, but not paying attention to whether they are stressed or unstressed.)
Here is a stupendous work of limerick art:
I wish I had thought of a rhyme Before I ran all out of time! I'll sit here instead, A cloud on my head That rains 'til I'm covered with slime.
We're sure that you've all kept yourselves awake wondering if there was a way to have a computer program check whether a poem is a limerick or if it follows some other poetry pattern. Here's your chance to resolve the question!
The Carnegie Mellon University Pronouncing Dictionary describes how to pronounce words. Head there now and look up a couple of words; try searching for words like "Daniel", "is", and "goofy", and see if you can interpret the results. Do contractions like "I'll" (short for "I will") and "we'll" (short for "we will") work? Try clicking the "Show Lexical Stress" checkbox too, and see how that changes the result.
Here is the output for "Daniel" (with "Show Lexical Stress" turned on): D AE1 N Y AH0 L
. The separate pieces are called phonemes and each phoneme describes a sound. The sounds are either vowel sounds or consonant sounds. We will refer to phonemes that describe vowel sounds as vowel phonemes, and similarly for consonants. The phonemes that are used were defined in a project called Arpabet that was created by the Advanced Research Projects Agency (ARPA) back in the 1970's.
In the CMU Pronouncing Dictionary, all vowel phonemes end in a 0
, 1
, or 2
, with the digit indicating a level of stress. Consonant phonemes do not end in a digit. The number of syllables in a word is the same as the number of vowel sounds in the word, so you can determine the number of syllables in a word by counting the number of phonemes that end in a digit.
As an example, in the word "secondary" (S EH1 K AH0 N D EH2 R IY0
), there are four vowel phonemes, and therefore four syllables. The vowel phonemes are EH1
, AH0
, EH2
, and IY0
.
In case you're curious, 0 means unstressed, 1 means primary stress, and 2 means secondary stress — try saying "secondary" out loud to hear for yourself which syllables have stress and which do not. In this assignment, your program will not need to distinguish between the levels of syllabic stress (although we cannot guarantee a completely stress-free experience while you work on this project).
Your program will read the file dictionary.txt
, which is our version of the Pronouncing Dictionary. You must use this file, not any files from the CMU website. Our version differs from the CMU version: we have removed alternate pronunciations for words and words that do not start and end with alphanumeric characters (like #HASH-MARK
, #POUND-SIGN
and #SHARP-SIGN
). Take a look at our dictionary.txt
file to see the format; notice that any line beginning with ;;;
is a comment and not part of the dictionary.
The words in dictionary.txt
are all uppercase and do not contain surrounding punctuation. When your program looks up a word, use the uppercase form, with no leading or trailing punctuation. Function clean_up
in the starter code file poetry_functions.py
will be helpful here.
For each type of poetry form (limerick, haiku, etc.), we will write its rules as a poetry form description. For example, at the beginning of this handout, we gave the rules for what it means to be a limerick. Here's our poetry form description for the limerick poetry form:
8 A 8 A 5 B 5 B 8 A
On each line, the first piece of information is a number that indicates the number of syllables required on that line of the poem. The second piece of information on each line is a letter that indicates the rhyme scheme. Here, lines 1, 2, and 5 must rhyme with each other because they're all marked with the same letter (A
), and lines 3 and 4 must rhyme with each other because they're both marked with the same letter (B
). (Note that the choice to use the letters A
and B
was arbitrary. Other letters could have been used to describe this rhyme scheme.) We say that two lines rhyme with each other when the final vowel phonemes and all subsequent consonant phoneme(s) after the final vowel phonemes match (i.e., are the same and are in the same order).
Some poetry forms don't require lines that rhyme. For example, a haiku has 5 syllables in the first line, 7 in the second line, and 5 in the third line, but there are no rhyme requirements. Here is an example:
Dan's hands are quiet. Soft peace surrounds him gently: No thought moves the air.
And another one:
Jen sits quietly, Thinking of assignment three. All ideas bad.
We'll indicate the lack of a rhyme requirement by using the symbol *
. Here is our poetry form description for the haiku poetry form:
5 * 7 * 5 *
Some poetry forms have rhyme requirements but don't have a specified number of syllables per line. Quintain (English) is one such example; these are 5-line poems with an ABABB rhyme scheme, but with no syllable requirements. Here is our poetry form description for the Quintain (English) poetry form (notice that the number 0 is used to indicate that there is no requirement on the number of syllables in the line):
0 A 0 B 0 A 0 B 0 B
Here's an example of a Quintain (English) from Percy Bysshe Shelly's Ode To A Skylark:
Teach us, Sprite or Bird, What sweet thoughts are thine: I have never heard Praise of love or wine That panted forth a flood of rapture so divine.
Your program will read a poetry form description file containing poetry form names and descriptions. For each poetry form in the file, the first line gives the name of the poetry form, and subsequent lines contain the number of syllables and rhyme scheme as described in this section. Each poetry form is separated from the next by a blank line. We have provided poetry_forms.txt as an example poetry form description file. We will test your code with other poetry form descriptions as well. You should assume that the poetry form names given in a poetry form description file are all different.
list of int
)list of str
)For example, here is the poetry pattern for a limerick:
([8, 8, 5, 5, 8], ['A', 'A', 'B', 'B', 'A'])
dict of {str: list of str}
, where:
str
)list of str
)For example, here is a (very tiny) pronunciation dictionary:
{'DANIEL': ['D', 'AE1', 'N', 'Y', 'AH0', 'L'], 'IS': ['IH1', 'Z'], 'GOOFY': ['G', 'UW1', 'F', 'IY0']}
For all poetry samples used in this assignment, you should assume that all words in the poems will appear as keys in the pronunciation dictionary. We will test with other pronunciation dictionaries, but we will always follow this rule.
In the starter code file poetry_functions.py
, complete the following
function definitions. In addition, you must add some helper functions to aid with the implementation of these required functions.
Function name: (Parameter types) -> Return type |
Full Description (paraphrase to get a proper docstring description) |
---|---|
get_poem_lines: (str) -> list of str
|
The parameter represents a poem. Return a list of non-blank, non-empty lines from the poem with whitespace removed from the beginning and end of each line. |
count_vowel_phonemes: (list of list of str) -> int
|
A vowel phoneme is a phoneme whose last character is 0 , 1 , or 2 . As examples, the word BEFORE (B IH0 F AO1 R ) contains two vowel phonemes and the word GAP (G AE1 P ) has one.
The parameter represents a list of lists of phonemes. The function is to return the total number of vowel phonemes found in the list of lists of phonemes. |
last_phonemes: (list of str) -> list of str
|
A vowel phoneme is a phoneme whose last character is 0 , 1 , or 2 . As examples, the word BEFORE (B IH0 F AO1 R ) contains two vowel phonemes and the word GAP (G AE1 P ) has one.
The parameter represents a list of phonemes. The function is to return a list that contains the last vowel phoneme and any subsequent consonant phoneme(s) in the given list of phonemes. The ordering must be the same as in the given list. The empty list is to be returned if the list of phonemes does not contain a vowel phoneme. |
check_syllable_counts: (list of str, poetry pattern, pronunciation dictionary) ->
|
The first parameter represents a poem as a list of lines (as produced by get_poem_lines ), the second represents a poetry pattern, and the third represents a pronunciation dictionary. Return the list of the lines from the poem that do not have the right number of syllables for the poetry pattern. The lines should appear in the list in the same order as they appear in the poem. If all lines have the right number of syllables, return the empty list. (The number of syllables in a line is the same as the number of vowel phonemes in the line.)
|
check_rhyme_scheme: (list of str, poetry pattern, pronunciation dictionary) ->
|
A vowel phoneme is a phoneme whose last character is 0 , 1 , or 2 . We say that two lines rhyme if and only if their final vowel phonemes and all subsequent consonant phoneme(s) after the final vowel phonemes match (i.e., are the same and are in the same order).For example:
The first parameter represents a poem as a list of lines (as produced by get_poem_lines ), the second represents a poetry pattern, and the third represents a pronunciation dictionary. Return a list of lists of lines in the poem that should rhyme with each other (according to the poetry pattern) but don't. If all lines rhyme as they should, return the empty list.
Notes:
|
In the starter code file poetry_reader.py
, complete the following
function definitions.
Function name: (Parameter types) -> Return type |
Full Description (paraphrase to get a proper docstring description) |
---|---|
read_pronunciation: (file open for reading) ->
|
The parameter represents an open file in the format of the CMU Pronouncing Dictionary. Return the pronunciation dictionary based on the given file. |
read_poetry_form_descriptions: (file open for reading) ->
|
The parameter represents a poetry form description file that has been opened for reading. Return a dictionary where each key is a poetry form name and each value is the poetry pattern for that form based on the given file. |
Once you have correctly implemented the functions in poetry_functions.py
and poetry_reader.py
, execution of the main program (poetry_program.py
) will:
dictionary.txt
)
poetry_forms.txt
unittest
)
Write (and submit) a set of unittests for functions count_vowel_phonemes
and check_syllable_counts
. Name these two files test_count_vowel_phonemes.py
and test_check_syllable_counts.py
. For each test method, include a brief docstring description specifying what is being tested. For unittest methods, the docstring description should not include a type contract or example calls.
All of the files that you need to download for the assignment are listed in this section. These files must all be placed in the same directory (folder).
We are providing a type-check module that can be used to test whether your functions in poetry_functions.py
have the correct parameter and return types. To use the type checker, place a3_type_checker.py in the same folder (directory) as your poetry_functions.py
and run it.
If the type checks pass: the output will tell you that the typechecker passed (and what it means for the typechecker to pass!). If the typechecker passes, then the parameters and return types match the assignment specification for each of the functions.
If any type checks fail: Look carefully at the message provided. One or more of your parameter or return types does not match the assignment specification. Fix your code and re-run the tests. Make sure the tests pass before submitting.
doctest
Each function in poetry_functions.py
has a doctest test in its docstring description. Be sure to run the doctest
s and check that they pass. Compared to functions from previous assignments, there are many more possible cases to test (and cases where your code could go wrong). If you want to get a great mark on the correctness of your functions, do a great job of testing your functions under all possible conditions. Then we won't be able to find any errors that you haven't already fixed!
You may have noticed that there is an r
before some of the opening docstring triple-quotes in the poetry_functions.py
starter code. The r
denotes a raw string and means that special characters like \
will not be treated as special. By using raw strings for doctests that include \n
and other escape sequences, the docstrings will be runnable as doctest
s.
print
, input
, or open
. (Notice that the three required functions in poetry_reader.py
take an open file, not a filename string.)
break
or continue
statements. Any functions that do will receive a mark of zero. We are imposing this restriction (and we have not even taught you these statements) because they are very easy to "abuse," resulting in terrible code.
unittest
and poetry_functions
.
count_vowel_phonemes
and check_syllable_counts
, write those tests before or as you write the functions themselves. That way you can execute the unittests to test the code you are writing.
Here is a good order in which to solve the pieces of this assignment.
poetry_functions.py
starter code to get an overview of what you will be writing.
poetry_functions.py
, along with helper functions. Now is also a good time to write the unittest test files test_count_vowel_phonemes.py
and test_check_syllable_counts.py
.
poetry_reader.py
, and implement and test those functions.
poetry_program.py
and run it. If there are any problems with the results, try to identify which of your functions has an issue, and go back to testing that function.
These are the aspects of your work that we will focus on in the marking:
unittests
that you submit based on whether the tests are implemented properly and the quality of the test case choices. Ideally, your tests should cover all relevant cases without redundant (unnecessary) tests.
You must hand in your work electronically, using the MarkUs online system. Instructions for doing so are posted on the Assignments page of the course website.
The very last thing you do before submitting should be to run a3_type_checker.py
one last time and ensure that the type checks pass. This will prevent your code from receiving a correctness grade of zero due to a small error that was made during your final changes before submission.
For this assignment, hand in four files:
poetry_reader.py
poetry_functions.py
test_count_vowel_phonemes.py
test_check_syllable_counts.py
Once you have submitted, be sure to check that you have submitted the correct version; new or missing files will not be accepted after the due date. Remember that the correct spelling of filenames, including case, is necessary. If your files are not named exactly as above, your code will receive zero for correctness.