Grading Scripts

Purpose

To grade a homework assignment, you write a grading script, a file with the suffix .gs. It unpacks, builds, executes the student code, and evaluates its output.

Example

To test this simple C++ program, hello.cc:

#include <iostream>

int main() {
    int useless;  // will cause a problem
    std::cout << "Hello, world!\n";
}

This script, hello.gs, might be used:

#! /s/parsons/d/fac/applin/bin/grader

# Rubric (5-point assignment):
#
# • 1.0 compiles
# • 1.0 no warnings
# • 2.0 produces correct output
# • 0.5 produces no standard error output
# • 0.5 no global variables

src=${1:?first argument must be source file}

setting MaxScore 5          # Value of this assignment

rm -f a.out                 # In case left over from previous run
export LANG=C               # Vanilla compiler output, please.
run g++ -Wall $src
test 1.0 "compiles" [[ -x a.out ]]

if ((score<=0)) return      # Give up if we couldn’t even build.

test 1.0 "no warnings" ! grep -qi warning stdout stderr

run ./a.out

test 2.0 "correct output" exact 'Hello, world!\n' stdout
test 0.5 "stderr is empty" empty stderr
globals 0.5 a.out

When executed like this:

    ./hello.gs hello.cc

It would produce this output, which contains three sections:

The overall score and who did the grading (so that students know who to complain to).
The high-level summary, one line per test.
The details of individual tests.

Most students will read only the first line. Some students will read the summary (one line/test) section. A few students will actually look up the failing tests from the summary section in the details section.

Score: 4.00/5.00 points
Graded by Jack Applin <applin@CS.ColoState.Edu>

Summary of all tests:
  Value  Result  Test  Description
   1.00  pass       1  compiles
   1.00  FAIL       2  no warnings
   2.00  pass       3  correct output
   0.50  pass       4  stderr is empty
   0.50  pass       5  globals
   4.00                Total
Passed 4 tests, failed 1 test.

Details of individual tests:

Executing: g++ -Wall hello.cc
Exit code: 0
Standard output is empty
Standard error (4 lines):
        hello.cc: In function 'int main()':
        hello.cc:4:9: warning: unused variable 'useless' [-Wunused-variable]
            4 |     int useless;
              |         ^~~~~~~

Test 1: compiles
Status: pass
Condition: '[[' -x a.out ']]'
Value: 1.00

Test 2: no warnings
Status: FAIL
Condition: ! grep -qi warning stdout stderr
Value: 1.00

Executing: ./a.out
Exit code: 0
Standard output (1 line):
        Hello, world!
Standard error is empty

Test 3: correct output
Status: pass
Condition: exact 'Hello, world!\n' stdout
Value: 2.00

Test 4: stderr is empty
Status: pass
Condition: empty stderr
Value: 0.50

Test 5: globals
Status: pass
Condition: No globals used
Value: 0.50

Overview

The most important functions of the grading script are run and test.

run: This doesn’t do any testing; it just runs a command, perhaps with options or redirected input, and saves the output.
test: Take the results of the previous run and assert some condition about them. If the condition succeeds, great! Otherwise, the test has failed.

Up/Down

A grading script can either count up or down.

Counting up: The score is initially zero. A passing test adds its value to the score. A failing test does not change the current score. Start a script that counts up like this:

        setting MaxScore 10	# ten-point assignment, count up

Counting down: The score is initially a non-zero value, the number of points for the assignment. A failing test subtracts its value from the score. A passing test does not change the current score.

        let score=5.0	    # five-point assignment

The Grading Script

The grading script is a Bash script, with additional functions defined. (Really, it’s a zsh script, but treat it like a Bash script.) Here are the functions:

setting name value

Set a parameter. These are the default settings:

`setting MinScore 0.0`	minimum possible score
`setting MaxScore 0.0`	maximum possible score
`setting Visible true`	make control chars visible in report
`setting ExpandTabs true`	expand tabs in stdout & stderr
`setting TrimCR true`	remove DOS carriage returns
`setting TrimWhitespace true`	remove trailing horizontal whitespace
`setting TrimTrailingBlankLines true`	remove trailing blank lines
`setting Merge false`	merge stderr into stdout on output
`setting Flatten true`	flatten unpacked archives
`setting StdinTermNull true`	route terminal stdin from /dev/null
`setting ShowLines 10`	show this many lines in report
`setting MaxFileSize 1000000`	maximum file size in bytes (1MB)
`setting TimeLimit 30`	runtime CPU limit (30 seconds)
`setting Header ''`	show this at the start
`setting Footer ''`	show this at the end

unpack [ -C directory ] archive

Unpack the tar or zip archive into the optional directory (or the current directory if -C not specified). If the setting Flatten true is in effect, which it is by default, then any prefix directories are removed when unpacking. Examples:

        unpack -C Build hw4.tar
        unpack foo.zip

test value title condition

Take the results of the previous run and assert some condition about them. If the test succeeds (and we are counting up), add the value to the current score. If the test fails (and we are counting down), subtract the value from the current score.

The condition can be any of:

[[ condition ]]
(( arithmetic-condition ))
any command, e.g., grep. A zero exit code indicates success. The command should produce no output—just an exit code. You can use >&/dev/null to suppress output.

In bash & zsh, a leading ! negates the exit code of a command. It reverses the sense of the test. grep -q "foo" stdout succeeds if foo is in the output. ! grep -q "foo" stdout succeeds if foo is not in the output.

The title names the test, and is shown in the summary & details sections of the report. Provide a positive title—describe the good condition (e.g., "prompt found") as opposed to the bad condition (e.g., "no prompt found").

Examples:

        test 1.0 "foo is executable" [[ -x foo ]]
        test 1.0 "output contains foo" grep -q foo stdout
        test 0.5 "no warnings" ! grep -qi warning stdout stderr

run [ -C directory ] command arg arg …: In the optional directory (or the current directory if -C not specified), execute command arg arg …, sending standard output to the file stdout, and standard error to the file stderr. If setting Merge true is in effect (false by default), then send both streadms to stdout.
The exit code, and contents of stdout and stderr, will be displayed in the detailed results section of the report.
empty file …: Assert that all the given files are empty. Note that stdout and stderr may have been filtered by the settings TrimWhitespace and TrimTrailingBlankLines. Since these are enabled by default, spaces at the end of lines and empty lines at the end of the output have already been removed. Since such trailing whitespace is largely invisible, these settings save a lot of arguments.
Examples:

        test 0.5 "stderr is empty" empty stderr

exact string file: Assert that a file contains exactly these contents. Backslash escapes (\n, \t, \0octal, \xhex, \Uunicode, …) are recognized, but no pattern matching. This is not a regular expression. Examples:

        test 1.0 exact "Hello, world!\n" stdout
        test 1.2 exact "2+2\n4\n" outfile

If you want pattern matching, use grep with -q to suppress the output:

        test 1.0 grep -q "foo.*bar" stderr

badsyms value executable title sym1 sym2 …: Forbid certain symbols (functions). If the given symbols are used in the executable, then report the offending symbols. Example:

        badsyms 1.0 a.out "No C I/O" printf fprintf scanf fscanf

globals value executable exceptions …: Forbid global variables, with exceptions. Examples:

        globals 0.10 Build/a.out	 # no global variables permitted
        globals 0.25 hello program_name	 # allow program_name, but no others

pity value title: If the current score is less than value, set the current score to value. Do this after all the tests, at the end of the grading script.

        pity 0.2 "Minimum score for compilation"

score: It’s a variable, not a function. It contains the current score, as a floating-point number. Most commonly used like this:

        Unpack the archive
        Compile the code
        if ((score<=0)) return	    # give up if unpack/compile failed