CSC 111: Intro to Computer Science through Programming

Lab 5

Due: Sunday, Mar. 5 at 11:59pm on Moodle

The main goal of this lab is to practice reading and writing files. We will also be practicing writing and using functions, as well as working with strings and lists.

For this lab, first find your randomly assigned partner. Introduce yourselves - the person with the first name that comes last alphabetically should begin as the "driver", with the other partner as the "navigator". The driver will have the code open, and the navigator will have these instructions open.

At the end of the lab, email the code (finished or not) and transcript to the person who started as the navigator. If you do not finish during lab you have two options:

(1) Arrange to meet before Sunday and finish the lab together.

(2) Continue the code separately and denote the part you did on your own with a comment.

Note: it is not an option for one person to complete the code on their own and then send the finished code to their partner to submit. Any code that you submit should be either written by you, or written by you and your partner while you were pair programming. Both partners should submit their code on Moodle.

Part A: Reading files

In this part of the lab we will investigate the frequencies of letters in the English language.

First create a new file called lab5A.py. In this file create a main method. Now download the file letter_freq.txt (right click and select "Save Link As...") and place this file in the same folder as lab5A.py. In the next steps you will answer several questions about this file.

  1. Inside your main method, open the letter frequency file and loop over each line using one of the methods discussed in class. Within your loop, print each line. Then after the loop, close the file. Test this out - you should get something like this:

    letters

    (Note that these values are percentages.)

  2. The next task is to create two lists from this file as shown below (one list of the letters and one list of the frequencies):

    letters

    First create two empty lists to store the letters and the frequencies. Then inside the for loop you created to print each line, split each line using the split() method for strings. This will return a list with two elements (the letter and the frequency). Using the append() method for list, add the letter to your list of letters and the frequency to your list of frequencies. Make sure to convert the frequency from a string to a float.

  3. Sum function

    So far everything you've done has been in main. The next part is to create a sum function that will add up all the frequencies (what should they add up to?) and return the result. Write a function called my_sum(lst) that will add up the elements of any list of numbers using a for loop. (Note: sum is also a built-in function, but in this case we will write it from scratch.) When you are finished, call your function on the list of frequencies inside main.

    Click here to check your result.

    Switch Driver and Navigator here!

  4. Minimum and maximum functions

    Next find the minimum and maximum frequencies in this list of frequencies. To do this part, copy over the minimum and maximum functions that we've been developing in class. Then inside main, invoke both these functions on your list of frequencies and print the results.

    Click here to check your result.

  5. Most common and least common letters

    On their own, the minimum and maximum frequency values do not tell us the whole story. It would be better to also know which letters they were associated with. To do this, we can use the index function, which returns the index of a given element in list. Try out the code below in the shell to see how index works:

    index

    Discuss these results and how to use them to obtain the letter that is associated with the maximum and minimum frequencies. You can do this part in main, right after obtaining the minimum and maximum.

    Click here to check your result.

  6. Average function

    Finally, create a function mean(lst) that will return the average of the values of any numerical list. Think about how to make use of one of the functions you've already created.

    Click here to check your result.

Make sure to save what you've done in the shell (this should show your program working and printing all the correct output). Also make sure that all four of your helper functions are returning something.

Switch Driver and Navigator here!

Part B: Writing files

Create a new file called lab5B.py. In this file create a main method. Now download the file tempest.txt (right click and select "Save Link As...") and place this file in the same folder as lab5B.py. This file contains the entire text of Shakespeare's "The Tempest". In this second part you will compare the letter frequencies in "The Tempest" to the average letter frequencies in the English language.

  1. First set up a list of the letters a,b,c...z as strings (Hint: use the list of letters in your shell output from Part A). You can set up this list at the top of your file (similar to the lists of names in Homework 4). Now inside your main method, open the tempest.txt file and read the entire file into one string (it is small enough to do this). Then close the file.

  2. Next, use the built-in count method to count the number of times "a" occurs in this entire string of all the text. Print this out (you should get 4681). Now count the number of "b"s that occur in this string of text. Eventually we want to do this for all the letters. Set up a for loop to do this.

  3. After printing out all the counts in a loop, move this loop to a separate function called compute_counts(string). This function should not print anything, but should return the counts for all the letters as a list. Call this function from main and make sure you can print out the resulting list inside main.

    Switch Driver and Navigator here!

  4. To be able to compare these counts to the frequencies we had in Part A, we need to normalize them so that the sum is 100%. Create another function called normalize_counts(lst)

    that does not return anything, but modifies each value of a list so that the resulting sum is 100. For this function, feel free to use the built-in sum method. An example is shown below:

    sum

    Also round each result to three decimal places. Call this function from main, passing in the list of counts that was returned from compute_counts. Print out this list to make sure it is working. The first value in the list should be 7.215.

  5. Finally, write these frequencies to a file. Open a new file called tempest_freq.txt. Then use a for loop to write the letters and frequencies to a file, in exactly the same format as letter_freq.txt. The first line of this file should look like:
              a 7.215
            
    Make sure to close the file after you finish writing it. Open up the resulting file in a text editor to make sure it is working.
Continue to save your Part B shell work, although the main output is the tempest_freq.txt file that you will submit with your lab.

Transcript and Submit

From the shell, highlight all the testing you have done and all the output (since the beginning of lab) and copy it into a plain text file (.txt extension). On Windows you can use the program Notepad (under Accessories) and on Mac you can use the program TextEdit (under Applications). For Mac, if it doesn't have an option to save as txt, go to "Format" -> "Make Plain Text", then save again. Save this file as lab5_transcript.txt.

Make sure that both you and your partner have a copy of all the code written during the lab period, and the transcript. Both partners should submit the files:

  • lab5A.py

  • lab5B.py

  • lab5_transcript.txt

  • tempest_freq.txt
on Moodle. If you did not finish, you can either meet to finish or finish separately. All labs must be submitted by Sunday night. If you finish early, I encourage you to start on Homework 5, but you are also welcome to depart.