CSC 212: Programming with Data Structures

Homework 6: Sorting 2

Due: Wednesday, Mar. 23, 11:59pm

Credit for this assignment: Nick Howe


Your task on this assignment is to put together what you know about linked lists with what we have learned about sorting algorithms. As we discussed in class, there are pros and cons to using arrays vs. lists for different algorithms. Here, we'll be using linked lists for both sorting algorithms. The advantage of this is that we can create and extend linked lists of generics, but not arrays. In this assignment we'll be sorting cards, to visualize what is going on inside these sorting algorithms.

For this assignment, you will implement two sorting algorithms using linked lists: selection sort and merge sort. In the final part of this assignment, you will test your implementations on lists of varying lengths, to see how the sorting time changes.

This assignment is individual and everyone should submit their own original code, but you are welcome to collaborate on the theory and ideas.

First download the following files and put them in the default package of a hw6 project:
  1. CardSortDemo.java

  2. SortRecorder.java

  3. CardPile.java

  4. Card.java

  5. TimerDemo.java


Program Specifications

Your programs should follow the outline given in CardSortDemo: initialize a deck of cards, shuffle them, and then sort them. As it sorts, the program should create a record of what it does using the SortRecorder class, as shown in the demo. You should create one snapshot per outer loop iteration in selection sort, and one snapshot per merge operation in merge sort.

Write separate programs for each sorting algorithm you implement. They should be called CardSelSort.java and CardMergeSort.java.


Sorting with Linked Lists

The CardPile class is based upon LinkedList<Card>. Note that the demo program begins by creating a full deck and then shuffling it randomly. Your program should perform all processing on the linked list structure, using iterators where necessary to keep track of position in the list. For top marks, avoid using an index to refer to an element of the list -- indices are efficient for arrays, but not for lists. Similarly, you should avoid calling methods that trigger a hidden traversal of the list, such as get.

Processing a list differs in style from processing an array. Insertion and deletion of elements is easy on a list, but hard in an array (because a full array includes no room for insertion, and deletion leaves an empty hole in an otherwise full array). The algorithms we developed for sorting on arrays involved lots of swapping, because the only way to make room for an element was to move another one out of the way. With lists, instead of swapping two elements, you will tend instead to work with two lists, excising elements from the unsorted list and inserting them into the sorted list without disturbing any other nodes. If you find yourself swapping values, you're probably using the lists too much like arrays, and failing to take advantage of their strengths. Below are summaries of selection sort and merge sort as implemented on linked lists.

Selection Sort

Until unsorted is empty, scan it for the smallest remaining element. Remove that element from unsorted and add it to the tail of sorted. (One way to do this: Loop through all the nodes, keeping the index of the smallest element seen so far as you continue to scan through the list, then remove that element by index. Another way: avoid using an index by actually pulling out the smallest element seen so far, and then swapping it back in if and when you encounter a smaller one. Yet another: use two iterators, one to traverse the loop element by element and the other to hold the place of the smallest element seen so far, so you get a stable sort.)

Merge Sort

Your other program will implement merge sort. It is much easier to implement using lists, rather than arrays. A high-level description of the algorithm is given below, although you are welcome to use a recursive implementation.

Input:
A list list of cards, of length n.

Output:
A new list of the same cards in increasing order.

Algorithm:
Begin by placing each element of list into a new singleton list. You may store all the singleton lists in a list of lists (or superlist).

While more than one list remains, take the first two lists from the superlist and merge them, preserving the sorted order. Put the result back at the end of the superlist.

To merge two sorted lists into a single sorted list, look at the first element in each list. Take the smaller of the two off the front of its old list and put it at the end of the new (merged) list. Repeat this until both one of the old lists is empty, at which point you can append the remainder of the other original list to the new list. If the original lists were sorted, and you always take the smallest element available, then the resulting list will also be sorted. (You might want to convince yourself of this fact before continuing.)

Note that the key operation here is the merging of two sorted lists. Probably you will want to develop a method for this and test it thoroughly before tackling the full program.


Experimentation

You now have an opportunity to do some empirical investigation of differences in their running time. Once your programs are written, and thoroughly debugged, you can make a variant version that doesn't contain any reference to SortRecorder, following this demo: TimerDemo.java. (Eliminating the recording is necessary because recording takes up both time and memory, which gets in the way of measuring the time required for the sorting itself. Make sure you disable any debugging printouts as well.) Call the variant versions SelSortTimer and MergeSortTimer. You will use these stripped-down versions to sort large CardPiles of different sizes, as described below, so that you can see for yourself how they compare for speed. The conclusions you draw should be written up in your readme.txt file.

You can time a program on unix systems by preceding the call to run it with time. For example:


time java MergeSortTimer 10000

This will print out a rather cryptic result, with timing numbers in different orders depending on the system. On aurora, the first number gives the time spent running the program by the CPU, which is the number you are most interested in. The second number gives the amount of time spent in system calls, and the third number gives the actual time elapsed. On some operating systems, the information provided may be somewhat different, but it you should still be able to figure out which number is the CPU time.

Please run a sort using each of your programs on inputs that double progressively in size: 10000 cards, 20000 cards, 40000 cards, etc. Continue until you see a clear difference in speed, or until one method is unable to finish in a reasonable time (say a few minutes). Then write up a short summary of your findings to include in the readme.txt. Do your experiments match the theoretical results developed in class? If they don't, it may be a sign that one of your programs has a hidden inefficiency.


To Submit

  • CardSelSort.java
  • CardMergeSort.java
  • Any other classes you modified (excluding timer classes).
  • typescript
  • readme.txt