CSC 390: Topics in Artificial Intelligence

Lab 4: Markov Chains


In this lab we will actually be creating the data, using a generative model. We will practice the concepts of Markov chains and discrete probability distributions.

For this lab, please work with your randomly assigned partner, with code on one computer. Make sure to email each other the code afterwards, since Homework 6 will also be about graphical models.

Step 1: Set up the probabilities

In this lab we will consider the Markov process of sleep, assuming that the amount of sleep on any given night only depends on the amount of sleep the previous night. To discretize the distribution of sleep, we will use "low" (L) and "high" (H). First, create an array called states that contains the chars 'L' and 'H'. These are the two states our model can be in at any given time. Then, using our probabilities from class, let the initial probabilities for each state to be P(L)=0.2 and P(H)=0.8 (i.e. more likely to start out with high sleep). Create another array to hold these values.

Now for the transition probabilities, create a 2D array to hold the transition matrix. Use P(L|L)=0.5, P(H|L)=0.5, P(L|H)=0.9, P(H|H)=0.1.

Finally, create a variable called num_days and set it equal to 1000. Our goal is to simulate a sequence of sleep values for 1000 days.

Step 2: Set the initial state

We need to start our Markov chain somewhere, so we will "flip a coin" with weights 0.2 on L and 0.8 on H. To do this, we can use a numpy built-in function that draws from a discrete probability distribution:

elements = ['red', 'green', 'blue']
probabilities = [0.4, 0.1, 0.5]
sequence = np.random.choice(elements, 20, p=probabilities) 

In this example, we will choose a sequence of 20 colors, according to their given probabilities. Use this same concept to draw 1 choice for the initial state, which we can store as the current state. Note that this random.choice method returns a list, but we just want one state.

Step 3: Create the sequence

Now we can create a loop over num_days (technically num_days-1 since we have already chosen the value for the first day). First, based on the current state, find the appropriate row of the transition matrix, then use that row as the probabilities when drawing the next state. Then update the current state based on this outcome. Finally, store the sequence of values in a list (often called Z).

Step 4: Stationary distribution

Lastly, print Z, the sequence of states (you can start with num_days to something smaller than 1000). Then print the fraction of 'L' values vs. the fraction of 'H' values. Does this agree with our stationary distribution calculation in class?

Step 5: Likelihood (optional)

Optionally, compute the likelihood of this sequence of states, using the initial and transition probabilities. What problems might you run into with a very long sequence?