CSC 390: Topics in Artificial Intelligence

"Unsupervised Machine Learning"

Fall 2016

Course Staff

  • Sara Mathieson (Instructor)

Prerequisites

  • CSC 111, Intro to Computer Science

  • MTH 111, Calculus I

  • MTH 220 or another intro statistics course

  • A 200-level computer science course is recommended, but not required. Discuss with the instructor if you're unsure about your background for this course.

Course Description and Goals

This course will begin with a brief introduction to artificial intelligence (AI) and how the material in this course fits into the overall field of AI. We will discuss the difference between supervised and unsupervised learning, starting with a few key methods in supervised learning. Then we will move on to the main focus of the course. Unsupervised learning seeks to uncover underlying structure in a dataset or system, without the use of labeled data. We will explore unsupervised learning methods from a variety of angles, including theory, implementation, application, existing software, and recent literature. Throughout the course we will investigate a variety of datasets, with an emphasis on learning from "big data" (e.g. natural language and biological datasets).

Class meetings will be a combination of interactive lecture, mini-labs, oral presentations by the students, and discussion of research papers. Homeworks will be a mix of programming assignments, readings, and pencil-and-paper exercises. There will be a mid-semester oral presentation (15-20 minutes) and a mid-semester written literature review in an area of the student's interest. Building on this work, during the last third of the course, students will explore a topic of their choice, which will include an final oral presentation and a written report. In all aspects of the course, there will be a focus on effective communication of ideas and questions in an multidisciplinary context.

Assignment Notes

The programming aspects of assignments will generally be in Python, but any language is welcome for the final project. Homeworks will be submitted online through Moodle.

Textbook

PDF and associated datasets available online:

"The Elements of Statistical Learning: Data Mining, Inference, and Prediction"
by Trevor Hastie, Robert Tibshirani, and Jerome Friedman

Software Links

We will be using Python, including the packages numpy, scipy, matplotlib, and sklearn. There are two main options to get Python and all these packages at once:

Online Discussion

We will be using Piazza for online class discussion, homework help, announcements, clarifications, etc. Our class page is:

https://piazza.com/smith/fall2016/csc390/home

Topics

  • Overview of AI
  • Supervised vs. unsupervised learning
  • Key methods in supervised learning
  • Clustering (K-means, hierarchical)
  • Principal components analysis (PCA)
  • Non-negative matrix factorization
  • Autoencoders and back-propogation
  • Latent variables and graphical models
  • Method of moments
  • Topic modeling
  • Latent Dirichlet allocation (LDA)
  • Application: natural language processing
  • Expectation-maximization (EM)
  • EM for hidden Markov models (HMM)
  • Combining unsupervised and supervised learning
  • Neural networks and deep learning
  • Application: image classification

Course Policies

  1. Email

    Questions about course content that apply to the whole class should be posted (non-privately) on Piazza. Individual questions about projects or presentations are fine over email.

  2. Sending me code

    Do not email me or post a long blocks of code on Piazza. If you can distill the problem to 1-2 lines of code and an error message, post on Piazza.

  3. Late work

    Each student may take a 3-day extension on one assignment throughout the semester (except for presentations). No other late work will be accepted. The only exceptions to this policy are:

    • An accommodations letter from the ODS
    • A note or email from a Class Dean
    • A note or email from Health Services

  4. Electronic devices

    Electronic devices may be used in class as long as they are directed towards course material (taking notes, in-class lab, etc).

  5. Attendance

    Two class meetings may be missed without affecting your participation grade.

Collaboration and the Honor Code

Collaboration is encouraged in this course, especially because different backgrounds and skill sets are necessary for making progress in a research setting. Additionally, for this capstone course, one of the goals is to learn where to look for information and how to use available resources. However, code and written work should be produced and understood by each individual student. For each assignment, please cite your classmate collaborators, books, and online resources, as per the Smith College honor code:

"Smith College expects all students to be honest and committed to the principles of academic and intellectual integrity in their preparation and submission of course work and examinations. All submitted work of any kind must be the original work of the student who must cite all the sources used in its preparation."

Grading

  • Homeworks: 40%
  • Midterm assignment: 20%
  • Final project presentation and writeup: 30%
  • Participation (including in-class discussion, labs, and Piazza): 10%

Additional Resources