Notes on the test modules Quick Start Introduction Test modules and other files Running tests Interpreting test output Quick Start To run all the tests, change to the test directory, then type this command: cpath Then type this command: all_test ... ... test output ... ... On Unix-like systems, you may have to type: source cpath, source all_test You may see some messages about differences, they might even say 'Fail', like this: Comparing files equal_equiv_test.log and equal_equiv_test.ref 37c37 < Fail: b = c does not match s1 = t1 with { s1:b, t1:a } --- > Fail: b = c does not match s1 = t1 with { t1:a, s1:b } Don't worry --- these are benign, see the explanation of test output in the last section of this document. The 'Fail' here is not a test faiure. It indicates that the checker rejected an invalid proof, as intended. Introduction This document discusses the test modules in the Flip logical framework. The test modules contain many proofs in each of the Flip logics. They demonstrate every formula class and inference rule. They are useful as examples, in addition to testing for errors. Before reading this, it would be helpful to look over the Flip web pages. Test modules and other files The test directory contains all the test modules and other files that support testing: *_test.py - Test module, contains test cases coded in Python. command_wrapper.py - helper imported by *_save_test, back*_test, *_apply_test ex*ref.py - helpers imported by *save_test.py, back*_test.py *_test.log - Output from running the corresponding test module. Not included in the Flip distribution. They are generated by running plogdiff (see below) *_test.ref - Like .log, but renamed .ref to serve as reference for plogdiff cpath - Shell script that puts the Flip code directory on PYTHONPATH so test modules can be run from the test directory plogdiff - Shell script that runs a test module, saves its output in a .log file, and compares that to the .ref file. *_all_test - Shell script that runs plogdiff on several test modules, all_test - Shell script that runs all *_all_test scripts The shell scripts are for Unix-like systems, including Linux and Mac OS X. The batch command files cpath.bat, plogdiff.bat, *_all_test.bat, and all_test.bat are like the shell scripts, but for Windows systems. Running tests To prepare to run tests, change to the test directory and run the command to put the Flip code directory on the path: > cpath You only need to do this once in a terminal session. To run a single test module and show its output, run Python with the test module file name argument, including the .py suffix: > python subproof_q_test.py ... ... test output ... ... To run a single test module and compare its output to the saved output in the .ref file, run plogdiff with the test module argument, without the .py suffix: > plogdiff subproof_q_test Comparing files subproof_q_test.log and SUBPROOF_Q_TEST.REF FC: no differences encountered To run several tests, or all the tests, just issue the command without arguments: > fol_all_test ... > all_test ... The same commands work on both Unix-like and Windows systems. The system command interpreter selects the shell script or the batch command, as appropriate. Interpreting test output A test module produces a lot of output. It contains several proofs; each proof is a test case. Running the module checks and prints each proof. After each valid proof, the module prints True. After the first invalid step in a proof, the module prints an error message that begins 'Fail: ....', then prints False. Then the module goes on to the next proof. Many test cases are invalid proofs. This is intended. Error messages and False in the test output do not mean that the checker has failed; they show that the checker has detected the invalid proofs. The checker indicates most errors by printing an error message, but it indicates errors in the number or types of formula constructor arguments by raising an exception. Test modules are coded to catch these exceptions, then print a message: "... caught exception ..." After catching an exception, a test module can continue executing test cases. Test modules should not raise exceptions. If a test module prints a traceback with a message about an uncaught exception, that test case has failed. A test module exits after the first uncaught exception, but a script or batch file that invokes several test modules may continue executing after an exception. Errors are detected by comparing the test output to reference output in a .ref file. The contents of each .ref file are the expected (correct) output from its test module. The plogdiff command performs the comparison. This plogdiff commands is invoked repeatedly by the *_all_test commands. Usually a successful test is indicated when plogdiff reports no differences. Differences usually indicate errors. However, some kinds of differences are expected; they do not indicated errors. The order of key/value pairs in printed dictionaries may differ between runs. This is expected and does not indicate an error: Comparing files subproof_q_test.log and SUBPROOF_Q_TEST.REF ***** subproof_q_test.log ||~q (6) E-Elimination (2) (3) (5) Fail: ~q does not match Q1 with { v1:x, P1:P(x), Q1:F, v2:a } False ***** SUBPROOF_Q_TEST.REF ||~q (6) E-Elimination (2) (3) (5) Fail: ~q does not match Q1 with { v1:x, P1:P(x), v2:a, Q1:F } False ***** On Unix-like systems the format of these messages is a bit different: Comparing files equal_equiv_test.log and equal_equiv_test.ref 37c37 < Fail: b = c does not match s1 = t1 with { s1:b, t1:a } --- > Fail: b = c does not match s1 = t1 with { t1:a, s1:b } Memory addresses may differ between runs. This is expected and does not indicate and error: Comparing files fol_apply_test.log and FOL_APPLY_TEST.REF ***** fol_apply_test.log f(x,g(x,y)) = z (2) Given Apply() (3) E-Introduction (1), with {: 1} Fail: apply command requires argument: {term:variable} ***** FOL_APPLY_TEST.REF f(x,g(x,y)) = z (2) Given Apply() (3) E-Introduction (1), with {: 1} Fail: apply command requires argument: {term:variable} *****