Topic: automated testing

programming > Group: testing

automated tests of specifications and designs
data-driven design
debugging by usage rules
debugging techniques
model checker
software configuration
statistical testing
test hardware
test data selection
test scripts
testing by data mutation
testing by program mutation
testing by voting or N-version
testing multithreaded code

topics a-e
topics to process
Subtopic: automatic testing up

Quote: code can automatically generate highly complex test cases by running on constrained, symbolic input; EXE does not generate false positive testcases [cadaC10_2006]
Quote: automated testing of concurrency libraries should locate bugs quickly, step through the scheduled that produced the bug, and have coverage guarantees [coonEE1_2010]
Quote: AutoTest reports faults in either implementation or contract; no false alarms; displays a witness for each failure with the first offending instruction [meyeB9_2009]
Quote: AutoTest generates class instances and reuses olds instances; tests routines with contracts and arguments; on failure, generates a reduced version [meyeB9_2009]

Subtopic: testable specifications up

Quote: QuickCheck runs testable specifications on a large number of cases; the specification language consists of Haskell classes, as does the test data generation language; tests can satisfy complex invariants [claeK9_2000]
Quote: wrote a unification language along with a QuickCheck testable specification; it was a lot of work; a false sense of security when it passes a large number of trivial tests [claeK9_2000]

Subtopic: automated error reporting up

Quote: Windows Error Reporting (WER) for a billion computers; collects and classifies error data automatically; progressive data collection to reduce overheads; automatically identifies available fixes [kinsK7_2011]
Quote: Windows error reporting attempts to define one bucket per bug; collects name, minidump, system configuration, full memory dump, program dump, files, or queried data [kinsK7_2011]
Quote: Windows Error Reporting uses expanding heuristics for multiple buckets, and condensing heuristics for fewer buckets; e.g., hang_wait_chain of synchronization objects up to user-input thread [kinsK7_2011]
Quote: Windows Error Reporting identified 5000 bugs in Vista beta testing [kinsK7_2011]

Subtopic: automated performance testing up

Quote: Toddler is an automatic oracle for performance bugs; reports repetitive code loops and memory-access patterns; identified old and new performance bugs [nistA5_2013]

Subtopic: static and dynamic veritesting up

Quote: veritesting alternates between static and dynamic symbolic execution using an SMT solver; static testing of dynamically generated control flow graph; found 11,000 bugs in 4,000 programs [avgeT6_2016]

Subtopic: regression test up

Quote: PL.8 optimizer bugs produce catastrophic results since optimizer applies to all code; regression test of 150 self-checking programs [auslM6_1982]
Quote: need systematic regression testing on both distributing and receiving sites; can automate acceptance test; more reliable under change [nowiDA8_1978]
Quote: a third of the performance bugs due to workload shifts after release or unrelated code changes [jinG6_2012]

Subtopic: code and dataflow up

Quote: FindBugs identifies 50 bug patterns using the inheritance hierarchy, linear code scan, control flow graph, and dataflow analysis; implemented with BCEL; most tests are short [hoveD12_2004]

Subtopic: bug finding up

Quote: comparison of Java bug-finding tools; wide range in the kinds of bugs found by different tools; should use multiple tools [rutaN11_2004]
Quote: Java bug-finding tools produce a large volume of warnings; no correlation of warning counts per bug class per pairs of tools; lines of code not strongly correlated to warning counts [rutaN11_2004]
Quote: bug-finding GUIs were invaluable when learning to use the tools; they group classes of bugs together and hyperlink warnings in source code; GUIs conflict with a meta-tool that extracts analysis results [rutaN11_2004]

Subtopic: network simulator up

Quote: a network simulator was the most effective bug-finding tool for Megastore; deterministically explore message orderings and delays; diagnose bugs via logs; add problematic event sequences to the test suite [bakeJ1_2011]

Subtopic: usage rules up

Quote: find bugs by automatic extraction of usage rules; as templates (beliefs) with few contradictions [englD10_2001]
Quote: use finite state model to check proper usage of uid-setting system calls; build a finite state model of the program; check for privileged regions [chenH8_2002]

Subtopic: not stop at first error up

Quote: automated tools should continue despite crashes and other faults; e.g., a test harness [meyeB9_2009]
Quote: automated testing must trigger as many failures as possible; not stop at the first [meyeB8_2008]

Subtopic: crash diagnosis up

Quote: an execution index identifies a unique execution point across runs; determine the execution index from a multicore dump file [weerD3_2010]
Quote: align a good, deterministic run with the execution index of a failed, multicore dump; compare differences between shared variables; quickly generate a failure inducing schedule from this point; 2% overhead [weerD3_2010]
Quote: automatic diagnosis of Windows XP failures; 83% third-party failures, 12% hardware, and 5% Microsoft bugs [murpB11_2004]

Subtopic: crash testing up

Quote: whitebox fuzzing identifies most memory corruptions in the first few days of testing, but new crashes continue to occur throughout the 23 day test session; identify a crash via its stack hash [bounE5_2015]
Quote: with debugging checks and asserts, crash testing a compiler is surprisingly effective [sherF11_2007]
Quote: most system crashes occur within 10 seconds of injecting 10 hardware and software faults; e.g., a memory bit flip or memory allocation error; about 40% of the time, no crash within 15 minutes [ngWT4_2001]

Subtopic: weak memory up

Quote: use stressing and fuzzing to reveal errors in GPU applications due to weak memory effects; identify root cause and suggest GPU fences [soreT6_2016]

Subtopic: fuzzing up

Quote: QuickFuzz is an automatic random fuzzer for common file formats; random mutations of QuickCheck generated files with Haskell file-format-handling libraries [grieG9_2016]
Quote: large scale, constraint-based white-box fuzz testing across hundreds of large Windows applications over seven years; generate test inputs by symbolic execution on binary traces and constraint solving [bounE5_2015]
Quote: white-box fuzz testing found one-third of all file fuzzing bugs in Windows 7; largest usage ever for a SMT solver; bugs missed by everything else, including static program analysis [bounE5_2015]
Quote: SAGE found a third of all Windows 7 bugs discovered by file fuzzing; largest usage of SMT solver [godeP3_2012]
Quote: whitebox fuzzing -- explore input and program behavior by gathering constraints on inputs from conditional branches, then solving for their negation; [godeP3_2012]
Quote: SAGE security tester for binary code; what you fuzz is what you ship [godeP3_2012]

Subtopic: automated test case generation by forks of symbolic execution up

Quote: EXE automatically finds and generates test cases for failures and crashes; EXE tracks the input constraints for each input-derived memory location; EXE forks at conditional checks of constrained inputs [cadaC10_2006]

Subtopic: automated test generation up

Quote: QuickCheck runs testable specifications on a large number of cases; the specification language consists of Haskell classes, as does the test data generation language; tests can satisfy complex invariants [claeK9_2000]
Quote: test synthesis for deadlock detection from single threaded executions; detect cycles in the lock dependency relation and search backwards for data dependencies; found 61 deadlocks including 45 true positives [samaM10_2014]
Quote: meaningful, randomized test programs by interleaving static analysis with code generation; no undefined behaviors [yangX6_2011]
Quote: must compiler bugs found with 81 KByte C programs as generated by Csmith [yangX6_2011]
Quote: test case generation is tedious; AutoTest generates tests, test values, and test objects [meyeB9_2009]
Quote: adaptive random testing generates test cases by object distance; no more effective than random, but finds different faults [meyeB9_2009]
Quote: the argument-less boolean queries of a class partition the object state space; e.g., an account is overdraft or not; improves test coverage and finds more faults [meyeB9_2009]
Quote: generate SQL by random parse trees and delayed decisions; propagate state information downwards and decisions upwards [slutD8_1998 OK]
Quote: must automate the generation of SQL tests; manually written tests cover a tiny fraction of the SQL input domain [slutD8_1998 OK]
Quote: Cleanroom estimates the user input distribution to generate test cases [lingRC10_1988]
Quote: generate test cases by selecting operations randomly according to the operational profile and input states randomly with its domain [musaJD3_1993]
Quote: TOPD's checker generates test cases from the model and determines if the procedure's result matches the model's result [hendP9_1975]

Subtopic: limit constraint generation time up

Quote: for whitebox fuzzing almost all tasks take less than 200 seconds for symbolic execution and 200 seconds for constraint solving; most outliers are unsatisfiable; enforce limits on constraint generation time and number of constraints per run [bounE5_2015]

Subtopic: automated debugging up

Quote: automatic fault localization by identifying likely program invariants from similar passing runs [sahoSK3_2013]

Subtopic: test oracle up

Quote: pass/fail by checksum of the program's non-pointer global variables; all compilers and compiler options must produce the same checksum [yangX6_2011]
Quote: a test oracle determines if a test passed; AutoTest relies on contracts already present in the code [meyeB9_2009]
Quote: test by a pseudo-oracle which independently implements a program and compare results; use a very high level language [daviMD_1981]

Subtopic: falsifier up

Quote: our falsifier detects 80 generic and project-specific errors; difficulty 0 concern one statement, e.g., lint; difficulty 1 violate a finite state property, e.g., initialization error; difficulty 2 are failed assertions [branD10_2000]
Quote: project-specific symptoms are assertions; a falsifier requires some theorem proving; e.g., index out of range [branD10_2000]

Subtopic: compare multiple implementations up

Quote: many performance bugs caught by comparison to other software [jinG6_2012]
Quote: automated, stochastic testing of SQL across multiple database managers; randomly generate SQL statements [slutD8_1998 OK]
Quote: compare SQL queries across multiple vendors; avoid sorting by comparing row counts and column checksums [slutD8_1998 OK]
Quote: example of comparing random SQL across multiple database systems; 0.07% had possible errors [slutD8_1998 OK]
Quote: SQL comparison across database systems not effective for testing NULL, strings, and numeric type coercion; needed for portability [slutD8_1998 OK]
Quote: compiler testing by comparing output with output from other compilers; found several hundred bugs in the compiler and pre-existing code; test cases easily created [sherF11_2007]

Subtopic: test bus up

Quote: over a century ago, telephone companies added a test bus to its phone switches; the built-in test access allowed automatic, nightly testing of every phone line; problems found and fixed before subscribers noticed [martRC7_2005]
Quote: a software test bus is an API for unit and acceptance tests, using the same APIs as the user interface [martRC7_2005]

Subtopic: test case reduction up

Quote: minimization generates a simpler test that exhibits the same fault [meyeB9_2009]
Quote: test case minimization retains only instructions that involve the target and its arguments; if successful, much smaller test case [meyeB9_2009]
Quote: automated testing should include automated test case simplification [zellA2_2002]
Quote: use Delta Debugging to help identify bugs tripped by a test; e.g., reduced an HTML page to a single line [zellA11_2001]
Quote: automatic simplification of erroneous SQL; helped debug problems [slutD8_1998 OK]

Subtopic: database testing up

Quote: use small, canned, in-memory databases for testing; create database just before test execution; faster, known content, repeatable, self-contained [martRC7_2005]

Subtopic: randomized testing up

Quote: Csmith found 25 P1 GCC bugs and 300 more bugs in mainstream C compilers; fixed test suites are inadequate [yangX6_2011]
Quote: wynot minimized the random input that caused a crashed program; crash identification by stack trace [zellA2_2002]
Quote: race-directed random testing postpones potential race conditions until another thread conflicts; no false warnings; RaceFuzzer [senK6_2008]

Subtopic: inapplicable test up

Quote: check class invariant and precondition prior to executing a test; avoids running inapplicable tests [meyeB9_2009]

Subtopic: exceptions up

Quote: automated test of exception-safety; ThisCanThrow() throws an exception when a global counter becomes zero; rerun with increasing values of the counter; operations must be exception-neutral [abraD4_1998]

Subtopic: sampling up

Quote: use sampling to identify predicates that are always true for a bug; called deterministic bugs [liblB6_2003]
Quote: use adaptive profiling to identify memory leaks in long running programs; sample code segments inversely to execution frequency; a leak is a stale object that is not accessed; SWAT has a low false positive rate [chilTM10_2004]

Subtopic: manual testing up

Quote: 44% of Apollo bugs found manually, despite spending half the budget on simulation [hamiM12_2008]
Quote: 60% of verification and validation errors existed in previous Apollo flights; subtle errors that did not cause problems [hamiM12_2008]

Subtopic: problems with automated testing up

Quote: wrote a unification language along with a QuickCheck testable specification; it was a lot of work; a false sense of security when it passes a large number of trivial tests [claeK9_2000]
Quote: test frameworks do not help with the labor-intensive tasks of preparing test cases, interpreting test results, and minimizing test cases [meyeB9_2009]
Quote: automatic testing of software through its UI is slow, opaque, and dangerous; even tiny changes can cause may tests to fail or become inoperable [martRC7_2005]

Related up

Group: debugging
Topic: automated tests of specifications and designs
Topic: data-driven design
Topic: debugging by usage rules
Topic: debugging techniques
Topic: model checker
Topic: simulation
Topic: software configuration
Topic: statistical testing
Topic: test hardware
Topic: test data selection
Topic: test scripts
Topic: testing by data mutation
Topic: testing by program mutation
Topic: testing by voting or N-version
Topic: testing multithreaded code

Subtopics up

automated debugging
automated error reporting
automated performance testing
automated test case generation by forks of symbolic execution
automated test generation
automatic testing
bug finding
code and dataflow
compare multiple implementations
crash diagnosis
crash testing
database testing
inapplicable test
limit constraint generation time
manual testing
network simulator
not stop at first error
problems with automated testing
randomized testing
regression test
static and dynamic veritesting
test bus
test case reduction
test oracle
testable specifications
usage rules
weak memory

Updated barberCB 7/05
Copyright © 2002-2023 by C.B. Barber
Thesa, Avev, and thid-... are trademarks of C.B. Barber