CS260 Lab 1: Computing and Plotting in Python

Due: Tuesday, September 7 at 11:59pm


Overview

Goals:

Credit: Based on materials created by Allison Gong. Some of the numpy examples are from Numpy’s Quickstart tutorial.


Python style notes

The following are a few notes on Python style, but see the Official Python Style Guide for more complete documentation. Some of the notes below are good style of any programming language. Points will be awarded for style on every lab.

  1. Include a header (as a block comment in triple quotes) at the top of each file. Your header should include:

Example:

'''
Author      : Allison Gong
Class       : CS260
Date        : 7/30/21
Description : This is the main driver program for the Introduction to Python
portion of Lab 1. This intro consists of a series of coding exercises that cover
some of the core syntax, data structures, and functionality of Python.
'''
  1. Include concise comments that summarize major sections of your code. I typically follow the pattern of a comment on its own line, followed by a block of code (4-10 lines), followed by a line break. Very short comments can be inline.

Example:

# add up 10 random numbers
total = 0
for i in range(10):
    total += random.randrange(6) # includes 0, excludes 6
print("sum:", total)
  1. Use concise but meaningful names for variables, functions, classes, and methods. Variables, functions, and methods should use “snake case”, with words separated by underscores. For example, linear_search NOT linearSearch. However, classes should use “camel case” and the first word should also be capitalized. For example ElectionCandidate NOT election_candidate.

  2. Function headers (triple quotes) should be below the function declaration and include:

Example:

def fib(n):
  '''
  Compute and return the nth Fibonacci number.
  n: non-negative integer
  return: nth Fibonacci number
  '''
  # code here
  1. Lines should be 80 characters or fewer.

  2. Code should be well organized, with no code (besides imports and occasional global variables) outside of functions/classes/etc. Use line breaks and white space to improve code readability. Avoid repeated or redundant code!

  3. Organize imports below the header in alphabetical order. Separate Python libraries from your own code imports. Use of from XXX import * is almost never appropriate! Then functions within this library cannot be identified later on.

Example:

import math
import numpy as np
import random

GitHub Classroom

This semester we will be using GitHub Classroom. We will also be using git on the command line. For all grading, we will be running your code on the command line, so it will be necessary to test your code outside of an IDE. To access the command line on different platforms:

Next, make sure you have a GitHub account. You can create one here.

If you’re new to git or haven’t set up ssh keys (not required but it makes pulling/pushing easier), see the following links:

For each lab assignment, after you accept the lab and access your repo, clone using ssh or https. Make sure to do this in a folder that is not already a git repo (i.e. set up a folder for all your labs, and each one will be in a separate folder). Here is an example (note your url will be different):

$ git clone git@github.com:saramathieson/cs260-lab1.git
$ cd cs260-lab1

Now say you make some changes to Lab 1 and are ready to push. The common workflow is add, commit, push. I also use git status frequently to make sure everything worked:

git add numpy_tutorial.py
git commit -m "worked on numpy"
git push

Part 1: Introduction to Python

Accept your Lab 1 repository on Github classroom. You should have the following folders/files:

First we will work with python_intro.py. Python may be completely new for you, or it may already be very familiar. Either way make sure to complete and push all the exercises, as it will be good practice for the style of Python used in this course.

  1. Price of movie tickets: (Topics: I/O and conditionals)

Assume that movie tickets have the following prices based on age:

Write a function movie_ticket_cost that calculates the ticket price based on the user’s age. The method should prompt the user to enter their age, then print out the result.

Use the Python input function input() documentation here.

  1. Shuffling lists: (Topics: Python lists, random library, in-place vs. out-of-place)

Write a pair of functions that shuffle a given list. One shuffles the list in-place (i.e. changes the order of the original list and does not return anything), while the other shuffles it out-of-place (i.e. leaves the original list intact and returns a new shuffled version of the list).

You are not allowed to use any built-in list shuffling methods, nor any built-in methods that swap list elements. However, you may use the Python random library documentation here. Include details of your shuffling methods in the function headers.

Python lists are stored in arrays in the backend and their functionality is somewhat similar to ArrayLists in Java. They can be iterated via index, elements can be added using append() and removed using remove() (but these operations can be costly). Lists have many built in functions. Documentation here

  1. Fibonacci numbers: (Topics: basic recursion)

Write a recursive function that calculates the nth Fibonacci number, given a non-negative integer n. You may assume that the 0th and 1th Fibonacci number are both 1.

  1. Binary search: (Topics: recursion, binary search, list indexing/slicing)

Implement two binary search functions, one that uses recursion and list slicing, and one that is not recursive. Details below.

  1. Non-recursive algorithm: The inputs to this method are a query (what we would like to find) and a sorted list lst. We would like to return the index of query, or -1 if it is not in the list. One non-recursive (i.e. iterative) algorithm follows these steps: - Compute the “low” index (usually begins at 0) and the “high” index (usually begins at the length of the list - 1) - Compute the index of the middle element (using “low” and “high”) and check it against the query - If the middle element is greater than the query, move the “high” down and repeat the process using a loop

  2. Recursive algorithm: The inputs and goal are the same, but now we will use recursion to complete the algorithm. The idea is as follows:

Take a middle element and check it against the target. - If they are equal, return the index of the middle element. - If the target is less than the middle element, call binary search on the first half of the list (up through but not including the middle element). - If the target is greater than the middle element, call binary search on the second half of the list (starting after the middle element and going all the way to the end).

Hints: think about what your base case should be, and what you should return in that case. You are welcome to use a helpful function for this version (either the main function or helper function can be recursive).

List slicing is a way to return a portion of the list. The syntax looks like this:

lst[start index (inclusive) : end index (exclusive) : index stepsize]

For example:

>>> words = ["hello", "welcome", "to", "cs", "260"]
>>> words[:2]
['hello', 'welcome']
>>> words[2:]
['to', 'cs', '260']
>>> words[2:4]
['to', 'cs']

Further examples can be found in this article.


Part 2: Introduction to numpy

numpy is a computational Python library that you will often use in higher level computer science courses and beyond. numpy is primarily used to work with arrays and matrices, as well as perform other high-level mathematical computations. This portion of Lab 1 will be a quick introduction to numpy. Note: All portions of this section must be completed, including the exercises done by hand. Refer to the starter code for more detailed instructions, but overall there are four “TODO” sections listed below.

  1. Array initialization (complete the TODO section and answer questions in your README.md)

  2. Array slicing (complete the TODO section in your README.md, then uncomment the code)

  3. Concatenation and basic operations. Go through this code and the output and make sure everything makes sense to you. We’ll be practicing these operations more later.


Part 3: Introduction to matplotlib

In this part of the tutorial you’ll create and push two different plots - one using real data and the other using mathematical functions.

  1. Reading in Data from a File, Graphing a Scatter Plot and the Line of Best Fit

In this part, you will be working with the dataset data/facebook_users.csv. This data was pulled from here. The first column holds the year and the second column the number of monthly active Facebook users worldwide in the 4th Quarter (Q4) (in millions).

There are a few different ways of reading this type of data. This first way opens the file and reads it in as a sequence of lines (which are represented as strings). Try out this way (in matplotlib_tutorial.py) and make sure it makes sense. This file reading method is useful when the file is too big to hold in memory, but you still want to iterate over it and pull out useful information.

years = []
users = []
fb_file = open('data/facebook_users.csv','r')
for line in fb_file:
    tokens = line.split(",") # split data based on commas
    years.append(int(tokens[0]))
    users.append(int(tokens[1]))
print(years)
print(users)

A simpler way is to use numpy, which works if your data is in a very regular format (i.e. array-like). This will read in the data as a numpy array (of floats by default). Try out this way and make sure it makes sense.

user_data = np.loadtxt('data/facebook_users.csv', delimiter=',')
print(user_data)

Now we will plot this data as a scatter plot. Use plt.scatter() to do so; the first parameter is a list/array of x values and the second for the y values. There many more parameters that you can read about in the scatter plot documentation.

Example:

plt.scatter(x_values, y_values, color="black")

Note: if you used the second method of file reading, you’ll need to use numpy slicing to obtain the x and y values.

You can use plt.show() to visualize your current progress. Now we will add axis labels and a title (which you should have on all graphs!) All these functions take in strings:

plt.xlabel(<str>)
plt.ylabel(<str>)
plt.title(<str>)

For more information, see the documentation below:

These additions significantly improve the graph. Often we have multiple datasets we would like to plot. Now we will overlay a linear model on top of the scatter plot (in the coming weeks we’ll see how to come up with such a model).

The linear model we will use is

y = -432342.27 + 215.39 * x

For each x-value (i.e. year in this case), compute the corresponding prediction of the number of users based on this model. This should give you a list of x-values (years) and y-values (predictions for the number of users). Graph these two lists using plt.plot(). See the plot documentation for more information.

plt.plot(x_values, y_values, color="blue")

After you are happy with the figure, save your plot into a new figures directory using plt.savefig(). Use the exact code below so we know where to find your figures for this lab.

plt.savefig("figures/facebook_users.pdf", format='pdf')

Right after this line, you may want to use plt.clf() to clear all this information before starting the next graph.

  1. Graphing Multiple Lines

In the same file, we will now create another plot of mathematical equations. Below are the coefficients of two polynomials:

quad_coefs = [0, 4, -1]       # y = 4x - x^2
cubic_coefs = [2, 0, -2, 1]   # y = 2 - 2x^2 + x^3

We will graph both these polynomials. However, we cannot graph them in an analytic way in matplotlib, so we’ll need to plot them at a series of x-values and then connect them to create the impression of a smooth curve.

One way to do this is to use np.linspace(). This function generates an array of points based on a starting point (first parameter), an ending point (second parameter), and the number of points total (third parameter). This linspace array will become our x-values. See here for more information about np.linspace().

np.linspace(0, 5, 10) # start at 0, end at 5, with 10 points total

Compute a series of y-values (based on these x-values) for each equation. Finally use plot like before to plot both curves. Don’t forget to add labels and a title (they can be something generic since you don’t have information on what these functions represent).

Now show your graph with both lines. If it looks strange, adjust the number of points and observe how that affects the graph. Write down your observations in the README.md.

Since this graph has two curves, it would be good to label them. To do this, when plotting each function, add in a label keyword argument. After doing this for both functions, use plt.legend() to create the legend.

plt.plot(x_values, y_values, label="my label")
plt.legend()

More information about legends can be found here and here.

Use plt.show() throughout to check your progress. When you are happy with the figure, save it as shown below:

plt.savefig("figures/quad_cubic.pdf", format='pdf')

Note: you can use ylim (or xlim) to set the axis limits for a better picture. For example:

plt.ylim(min_y_value, max_y_value)

Part 4: Classes and objects in Python

In Student.py, write a Student class that takes the full name of the student (string), their graduating year (int), and their TriCo college (string). Each student should also have a list of courses they are taking for the current semester (that starts out empty). First set up the constructor, which should take the form:

def __init__(self, <arg 1>, <arg 2>, etc):

NOTES:

Following the starter code, create getters for each of the attributes (also called “fields” or “instance variables”).

Next, write an add course method and a drop course method. add_course() will first check that the course list does already have that course, if it does not, it will print out an appropriate message indicating that the course was added. If it does already have that course, it will print out a message that the student is already taking that course. drop_course() similarly check that the course is in the course list before removing that course and printing out an appropriate message. If the course is not in list, it will print out a message stating so.

Then, create a __str__ method that returns a string representation of the instance (clearly showing all fields).

Don’t forget to comment your class and write tests in the main function.

Dictionaries

Finally, we will practice dictionaries in Python by creating a dictionary of Students.

The built-in Python data structure dictionary is a hash table, so it has the methods you would expect and more: dictionary documentation.

In the starter code, there is a list where each element represents one student. Create a dictionary that takes a student ID number (int) as a key and a Student as the value (will need to use the Student constructor). You can make up the ID numbers or choose them at random.

After filling the dictionary, try adding a student to the dictionary with the same key as a prior one. What happens? Print out the dictionary to verify. Now try adding a new entry with a new key but a repeated value/Student. What happens? Write both your responses in the README.md.

Finally, try getting (returning) a Student based on a specific key and then try removing a Student based on this key.

Final checks: