N-Dolphin: Swim Through Symbolic Execution Of Code As If Evolved Over Millenia To Do So

Note: This page describes the instructional purpose of the N-Dolphin tool; the user's manual is a available as a single large document, or in structured form in which you can click on just the section you want.

Motivation: Deductive Reasoning as a Programming Activity

Deductive reasoning is important for practical programming activities, not just formal proofs in theoretical CS classes, as careful programmers must think about universal properties of code, i.e., things that are true for any input, not just "what happened during the test cases I thought of". For example, we may want to know that a given loop can't run forever in any use of our software, not just that it didn't happen to run forever in a given set of tests.

As educators, we are concerned that the reliance on mathematical proof to teach deduction may amplify inequities that arise from disparate pre-college mathematics experiences. Thus, we wish to explore deduction in an environment that centers code and code-centric activities such as refactoring, rather than traditional mathematical notation and structures. Note that classic refactoring steps, such as in-lining a function or variable definition, can be combined with steps so simple they're not usually even called "refactoring" (e.g., turning 1+1 into 2, or reducing a statement that starts "if False" to just the code in the "else" clause), to express the execution of code (i.e., serve as a notional machine). This fact has been known for some time, but it is not usually brought to bear on the question of introducing deduction in a course on practical programming.

Overview

The N-Dolphin tool, and it's cousin ORC2A (when configured for the same logic as N-Dolphin), are visualizers that combine several important features not found together in other tools:

they demonstrate substitution-based execution of pure code (like the Dr. Racket stepper);
they allow control of the level of abstraction, e.g., selecting step-over or step-into (like most debuggers, but not the Dr. Racket stepper);
they allow control of execution order and, thus, symbolic execution via delaying evaluation of variables (unlike any other student-facing visualizer we know);
they have special rules to allow symbolic simplifications corresponding to proof-by-cases and inductive proof.

(We have not attempted to integrate indirect proof into N-Dolphin, as we envision combining the use of N-Dolphin in CS1/CS2 with a separate "Discrete Mathematics" course that grounds this material in traditional mathematical structures.)

N-Dolphin currently renders code in Python, though other languages could be added, as long as they are sufficiently expressive of pure-functional code.

Specific examples of the pedagogy are given, along with full defintions of rules, in Section 2 of the "Bridge" paper (using ORC2A); and, with less verbiage and detail, below.

Examples

Substitution

In its most basic use, N-Dolphin can be used to illustrate the process of code-execution as a series of substitutions, some of which are challenging to students and not well-illustrated in standard debugging tools. For example, we often find ourselves needing to say things like "the value returned by the function is substituted into the expression where the call was", and resort to hand-drawn diagrams to help students "see" that process.

Other things are either simple/familiar ("Python follows the usual rules for arithmetic, so 22+5*4 is 42, not 27*4") or illustrated clearly in a standard debugger ("the arguments given in a function call are used to initialize the parameters"). But the process of returning a result seems to be neither, for beginning students. N-Dolphin can illustrate both simple substitutions (3+4 becoming 7) and subtle ones (the result of a function call appearing where the call expression had been) via repeated clicks of the "Step in" button.

An N-Dolphin session starts with standard Python code, typically some defintions followed by an expression, as shown on the left below. The command-line interpreter would respond to this code by printing the value of the expression; N-Dolphin shows us how the value can be produced by refactoring/substitution rules. Clicking on "Step in" performs only the selected addition step (i.e., the red 3+4), inserting the result in the appropriate place, as shown in the top-right image.

An image of python code including (3+4)*20, with the 3+4 highlighted

An image of the same python code but with 7*20 where we'd had (3+4)*20

An image of the same python code but now highlighting a 7 within square_0(7)

An image of the same python code but now highlighting a 49 where we had previously seen square_0(7)

An image of the final result of the python code, 42

The replacement of (3+4) with 7 is unually completely unsurprising for students; The second image on the right-hand side (above) shows the result of several more clicks, with square(7) being the next step needed in the calculation.

Additional clicks of "Step in" bring us to the next image, in which square(7) has been replaced with 49 (the result of the function), illustrating a process that we've found to be less intuitive for students: the result returned by the function takes the place of the call expression. This process can also be illustrated more concisely via the "Step over" button, as we will discuss below.

The final result of repeated clicks of "Step in" is shown at the bottom of the figure: the result that would have been printed by the python command-line interpreter,

Since N-Dolphin only allows code in which variables and objects don't change (i.e., pure-functional code), there's no need to worry about how substitution might interact with a variable changing value at some point.

Direct Proof (and order-independence)

Since N-Dolphin only works for code in which variables and objects don't change after initialization/creation, we can get the same (correct) answer regardless of the order in which we run separate elements of an expression. Note that this freedom does not extend to changing the structure to combine elements that should have been separate; i.e., in the expression ((3+4)*20)-(2*square_0(3+4)), we could execute the (3+4)*20 before, or after, the 2*square_0(3+4), but we can't re-combine it to do the 20-2 first.

N-Dolphin lets the user select valid components, but only as long as they are valid, so we could choose either of the 3+4's, but not 4+20 or 20-2, in the full expression above. Choosing the left or right side of that substraction is not very interesting; somewhat more interesting is the use of this flexibility to delay the execution of a function parameter, i.e., for square_0(3+4), we could put the 3+4 into the return expression before adding to produce 7, getting (3+4)*(3+4) as an intermediate step ... of course, this still produces 49, as we would hope. But, so far, we've only used N-Dolphin as a rather ecclectic visualizer of program execution. This is about to change: Flexibility of execution order opens the door to direct proof techniques.

For our first proof, we'll use the direct-proof technique, simply relying on the "substitution of equals for equals" principle, with specific rules about what things must be equal. We'll do an extremely simple proof, demonstrating via substition, that power(any_b, 2) is just any_b*any_b, which should come as no surprise. A standard debugger can be used to confirm this fact for any specific value of any_b, but can't show us what must be true for all executions.

In N-Dolphin, we simply guide the sequence of substitutions to express the same things we'd express in a traditional mathematical proof. Depending on the desired pedagogy, we can describe this to the students as refactoring, symbolic execution, or as a proof in which we decide what axiom to apply where, and the computer handles the mechanics.

We begin with what's known: the definition of our power function, expressed recursively to stay within the pure-functional approach, and the starting point of our proof, "power(any_b, 2)", as shown below on the left. Clicking "step in" has no effect, as the first thing to be evaluated is any_b, and we've not defined it. But, if we select the full "power(any_b, 2)", we gain access to the "function-name to body" rule (visible at the bottom of the image), which can be applied even without specific values for parameters, producing the top image on the right. Note that N-Dolphin has expressed the if-else as a Python if-expression, rather than a statement, to ensure the code is legal Python.

An image of python code including a recursive function power_0 and, highlighted, a call power_0(any_b, 2)

Clicking the "function name to body" button replaces the
expression power_0(any_b, 2) with the function's body (as an if-expression).

The recursive function power_0, but now, where the call to power_0 had been, a copy of the body with any_b as the base, and 2 as the exponent

A few clicks of "Step in" select 2==1 and convert it to False, shown below

The recursive function power_0 and copy of the body, but now 2==1 has been changed to False

We can now use "Step in", or the "if False" axiom, the expression containing "if False", to replace the entire if expression with the expression from the "else" branch, shown in red below.

The recursive function power_0 and copy of the body, after simplification by 'if-false' rule

We can now use "Step in", or the "if False" axiom, the expression containing "if False"

The recursive function power_0 and the expression any_b*any_b

We have, thus, transformed power(any_b, 2) to any_b*any_b. More interesting direct proofs are possible, and N-Dolphin also has features to support proof-by-cases, and proof-by-induction.

Proof-by-cases: the "if-irrelevant" rule

(details coming soon-ish ... email davew if you want them now, or see the "Bridge" paper)

Abstraction, in calls to library code or student-written functions

Many standard debuggers provide a "Step over" feature; we have found this to be a valuable tool in teaching abstraction and recursion, as it emphasises the purpose and overall effect of the function.

N-Dolphin provides a similar feature; in the previous example, we could have clicked "Step over" instead of repeatedly clicking "Step in", to turn square_0(7) directly into 49.

Additionally, (details coming soon-ish ... email davew if you want them now, or see the "Bridge" paper).

Inductive Proof: the "substitute-spec-simpler" rule

(details coming soon-ish ... email davew if you want them now, or see the "Bridge" paper).

Note that we still consider inductive proof to be a form of deductive reasoning, in the sense that it "[starts] from a set of general premises and then [draws] a specific conclusion that contains no more information than the premises themselves", rather than generalizing from premises based on experience/experiment, i.e., testing (an important software-development technique in its own right, but different from refactoring in terms of reasoning structure).

Availability

In Fall 2024, we plan to upload resources and archive data sets from our studies of the teaching and testing of abstraction-centric recursion, including a link to a git repo. If you need something promptly, please email davew@cs.haverford.edu.

Errata

There are some known bugs with the logic in the Bridge paper, having to do with making incorrect code seem to execute correctly. We are in the process of correcting those in N-Dolphin, a list of known problems is here.