Software Testing
“There are only 2 hard problems in Computer Science.
Naming things, cache invalidation and off-by-one errors.”
—Phil Haack
“Program testing can be used to show
the presence of bugs, but never to
show their absence!”
—Edsger Dijkstra
Humans are infallible;
software is written by humans;
expect software to have defects.
Testing is the most common way of removing
defects in software and improving the quality
of software.
Foundations; Motivations; Terminology
Principles and Concepts
Levels of Testing
Test Process
Deciding when to stop
Defects are Bad
• At a minimum defects in software annoy
• Glitchy software reflects poorly on the
company issuing the software.
• If defects aren’t controlled during a
software project, they increase the cost and
duration of the project.
• For safety critical systems the consequences
can be even more severe.
Spectacular Failures
Ariane 5, 1996
Rocket + Cargo = $500M
Patriot Missile, 1991
Failed to destroy an Iraqi Scud
missile which hit a barracks.
Software defects between
1985 and 1987 lead to 6
accidents. Three patients died
as a direct consequence.
Controlling defects in software
• There are two ways of dealing with the potential for defects
in software:
– The most obvious is: work to identify and remove defects that make
it into the software.
– Another approach that often goes unnoticed is: stop making errors
in the first place. In other words, take action to prevent defects from
ever being injected in the first place. This second approach is called
Defect Prevention.
• Testing is one method of uncovering defects in software.
(Inspection is another.)
• Testing might not be the most efficient method of
uncovering defects but for many companies it is their
primary means of ensuring quality.
What is testing?
• Testing is the dynamic execution of the software
for the purpose of uncovering defects.
• Testing is one technique for improving product
quality. Don’t confuse testing with other, distinct
techniques for improving product quality:
– Inspections and reviews (sometimes called static
– Debugging
– Defect prevention
– Quality assurance
– Quality control
Testing and its relationship
to other related activities
Benefits of Testing
• Testing improves product quality (at least when the
defects that are revealed are fixed),
• The rate and number of defects found during
testing gives an indication of overall product
quality. A high rate of defect detection suggests that
product quality is low. Finding few errors after
rigorous testing, increases confidence in overall
product quality. Such information can be used to
decide the release date. Or, it could mean…???
• Defect data from testing may suggest opportunities for
process improvement preventing certain type of
defects from being introduced into future systems.
Errors, Faults and Failures! Oh my!
• Error or Mistake – human action or inaction that
produces an incorrect result
• Fault or Defect – the manifestation of an error in
code or documentation
• Failure – an incorrect result.
Software Bugs
1947 log book entry for the Harvard Mark II
Verification and Validation
• Verification and validation are two complementary testing
• Verification – Comparing program outcomes against a
specification. “Are we building the product right?”
• Validation – Comparing program outcomes against user
expectations. “Are we building the right product?”
• Verification and validation is accomplished using both
dynamic testing and static evaluation (peer review)
Principles of Testing
• “Program testing can be used to show the presence
of bugs, but never to show their absence!” [Edsger
Dijkstra] He is speaking of course about nontrivial programs
• Mindset is important. The goal of testing is to
demonstrate that the system doesn’t work
correctly not that the software meets its
specification. You are trying to break it. If you
approach testing with the attitude of trying to
show that the software works correctly, you might
unconsciously avoid difficult tests that threaten
your assumption.
• Should programmers test their own code?
Who should do the testing?
• Developers shouldn’t system test their own code.
• There is no problem with developers unit testing
their own code—they are probably the most
qualified to do so—but experience shows
programmers are too close to their code in order to
do a good job at system testing their own code.
• Independent testers are more effective.
• Levels of independence: independent testers on a
team; independent of the team; independent of the
The cost of finding and fixing a defect increases with
the length of time the defect remains in the product
Phase Containment
Cost to correct late-stage defects
• For large projects, a requirements or design error
is often 100 times more expensive to find and fix
after the software is released than during the phase
the error was injected.
Correspondence between Development and different
opportunities for Verification and Validation
Two dimensions to testing
Levels of testing
• Unit – testing individual cohesive units (modules). Usually
white-box testing done by the programmer.
• Integration – verifying the interaction between software
components. Integration testing is done on a regular basis
during development (possibly once a day/week/month
depending on the circumstances of the project).
Architecture and design defects typically show up during
• System – testing the behavior of the system as a whole.
Testing against the requirements (system objectives and
expected behavior). Also a good environment for testing
non-functional software requirements such as usability,
security, performance, etc.
• Acceptance – used to determine if the system meets its
acceptance criteria and is ready for release.
Other types of testing
• Regression testing
• Alpha and Beta testing – limited release of a product
to a few select customers for evaluation before the
general release. The primary purpose of a beta test
isn’t to find defects, but rather, assess how well the
software works in the real-world under a variety of
conditions that are hard to simulate in the lab.
Customers’ impressions are starting to be formed
during beta testing so the product should have releaselike quality.
• Stress testing, load testing etc. –
• Smoke test – a very brief test to determine whether or
not there are obvious problems that would make more
extensive testing futile.
Regression Testing
• Imagine adding a 24-inch lift kit and monster truck tires to
your sensible sedan:
• After making the changes you would of course test the new
and modified components, but is that all that should be
tested? Not by a mile!
Regression Testing [Cont]
• When making changes to a complex system there is no reliable way of
predicting which components might be affected. Therefore, it is
imperative that at least a subset of tests be ran on all components.
• In this analogy, that means testing: the heater, air conditioner, radio,
cup holders, speedometer…hum, that’s interesting, there seems to be a
problem with the speedometer. It significantly understates the speed of
the car.
• On closer inspection you discover the speedometer has a dependency
on wheel size. Who could have predicted it? The implementation for
the speedometer makes an assumption about the wheel size and how
far the car will move for each rotation of the tires. Larger wheels mean
the car is going a greater distance for each revolution.
• Who would have predicted that? Good thing we performed regression
Regression Testing [Cont]
• Making sure new code doesn’t break old code.
• Regression testing is selective retesting. You want to
ensure that changes or enhancements don’t impair existing
• During regression testing you rerun a subset of all test
cases on old code to make sure new code hasn’t caused
old code to regress or stop working properly.
• It’s no uncommon for a change in one area of code to
cause a problem in another area. Designs based on loose
coupling can mitigate this tendency but regression testing
is still needed in order to increase the assurance there were
no unintended consequences of a program change.
Testing Objectives
• Conformance testing (aka correctness or functional testing)
– does the observed behavior of the software conform to its
specification (SRS)?
• Non-functional requirements testing – have non-functional
requirements such as usability, performance and reliability
been met?
• Regression testing – does an addition or change break
existing functionality?
• Stress testing – how well does the software hold up under
heavy load and extreme circumstances?
• Installation testing – can the system be installed and
configured with reasonable effort?
• Alpha/Beta testing – how well does the software work
under the myriad of real-world conditions?
• Acceptance testing – how well does the software work in
the user’s environment?
Integration Strategies
• What doesn’t work?
– All-at-once or Big Bang – waiting until all of the
components are ready before attempting to build the
system for the first time. Not recommended.
• What does work?
– Top-Down – high-level components are integrated and
tested before low level components are complete.
Example high-level components: life-cycle methods of
component framework, screen flow of web application.
– Bottom-Up – low-level components are integrated and
tested before top-level components. Example low-level
components: abstract interface onto database,
component to display animated image.
– Incremental features
Advantages of Incremental/
Continuous Integration
• Easier to find problems. If there is a
problem during integration testing it is most
likely related to the last component
integrated—knowing this usually reduces
the amount of code that has to be examined
in order to find the source of the problem.
• Testing can begin sooner. Big bang testing
postpones testing until the whole system is
Top-Down Integration
• Stubs and mock objects are substituted for as yet
unavailable lower-level components.
• Stubs – A stub is a unit of code that simulates the activity
of a missing component. A stub has the same interface as
the low-level component it emulates, but is missing some
or all of its full implementation. Subs return minimal
values to allow the functioning of top-level components.
• Mock Objects – mock objects are stubs that simulate the
behavior of real objects. The term mock object typically
implies a bit more functionality than a stub. A stub may
return pre-arranged responses. A mock object has more
intelligence. It might simulate the behavior of the real
object or make assertions of its own.
Bottom-Up Integration
• Scaffolding code or drivers are used in
place of high-level code.
• One advantage of bottom-up integration is
that it can begin before the system
architecture is in place.
• One disadvantage of bottom-up integration
is it postpones testing of system
architecture. This is risky because
architecture is a critical aspect of a software
system that needs to be verified early.
Continuous Integration
• Top-down and bottom-up is how you are
going to integrate.
• Continuous integration is when or how
often you are going to integrate.
• Continuous integration = frequent
integration where frequent = daily, maybe
hourly, but not longer than weekly.
• You can’t find integration problems early
unless you integrate frequently.
Test Process
Test planning
Test case generation
Test environment preparation
Test results evaluation
Problem reporting
Defect tracking
Testing artifacts/products
• Test plan – who is doing what when.
• Test case specification – specification of actual
test cases including preconditions, inputs and
expected results.
• Test procedure specification – how to run test
• Test log – results of testing
• Test incident report – record and track errors.
Test Plan
• “A document describing the scope,
approach, resources, and schedule of
intended test activities. It identifies test
items, the features to be tested, the testing
tasks, who will do each task, and any risks
requiring contingency planning.” [IEEE std]
Test Case
• “A test case consists of a set of input values,
execution preconditions, expected results
and execution post-conditions, developed to
cover certain test condition”
• When you run a test there has to be some
way of determining if the test failed.
• For every test there needs to be an oracle
that compares expected output to actual
output in order to determine if the test
• For tests that are executed manually, the
tester is the oracle. For automated unit tests,
actual and expected results are compared
with code.
Test Procedure
• “Detailed instructions for the setup,
execution, and evaluation of results for a
given test case.”
Incident Reporting
• What you track depends on what you need to
understand, control and estimate.
• Example incident report:
Testing Strategies
• Two very broad testing strategies are:
– White-Box (Transparent) – Test cases are
derived from knowledge of the design and/or
– Black-Box (Opaque) – Test cases are derived
from external software specifications.
Testing Strategies
Black-Box Techniques
• Equivalence Partitioning – Tests are divided into
groups according to the criteria that two test cases
are in the same group if both test cases are likely
to find the same error. Classes can be formed
based on inputs or outputs.
• Boundary value analysis – create test cases with
values that are on the edge of equivalence
Equivalence Partitioning
What test cases would you use to test the
following routine?
// This routine returns true if score is >=
50% of possiblePoints, else it returns false.
// This routine throws an exception if either
input is negative or score is > possiblePoints.
boolean isPassing(int score, int possiblePoints);
. .
Expected Result
Equivalence Classes
Score/Possible Pts >= 50%
Score/Possible Pts < 50%
Score > Possible Pts
Score < 0
Possible Pts < 0
Test Cases
Test Case #
Test Case Data
Expected Outcome
Classes Covered
• Write test cases covering all valid equivalence classes.
Cover as many valid equivalence classes as you can with
each test case. (Note, there are no overlapping equivalence
classes in this example.)
• Write one and only one test case for each invalid
equivalence class. When testing a value from an
equivalence class that is expected to return an invalid result
all other values should be valid. You want to isolate tests
of invalid equivalence classes.
Boundary Value Analysis
• Rather than select any element within an
equivalence class, select values at the edge
of the equivalence class.
• For example, given the class: 1 <= input <=
12 you would select values: -1,1,12,13.
Experience-Based Techniques
• Error guessing – “testers anticipate defects
based on experience”
Testing Effectiveness Metrics
• Defect density
• Defect removal effectiveness (efficiency)
• Code coverage
Defect Density
• Software engineers often need to quantify how buggy a
piece of software is. Defect counts alone are not very
meaningful though.
• Is 12 defects a lot to have in a program? Depends on the
size of the product (as measured by features or LOC).
– 12 defects in a 200 line program = 60 defects/KLOC  low
– 12 defects in a 20,000 line program is .6 defects/KLOC  high
• Defect counts are more interesting (meaningful) when
tracked relative to the size of the software.
Defect Density [Cont]
• Defect density is an important measure of software quality.
• Defect density = total known defects / size.
• Defect density is often measured in defects/KLOC. (KLOC =
thousand lines of code)
• Dividing by size normalizes the measure which allows
comparison between modules of different size.
• Size is typically measured in LOC or FP’s.
• Measurement is over a particular time period (e.g. from system
test through one year after release)
• Might calculate defect density after inspections to decide which
modules should be rewritten or give more focused testing.
• Be sure to define LOC. Also, consider weighting defects. A
severe defect is worse than a trivial on.)
• Gives wrong incentive.
Defect Density [Cont]
• Defect density measures can be used to track
product quality across multiple releases.
Defect removal effectiveness
• DRE tells you what percentage of defects that are present
are being found (at a certain point in time).
• Example: when you started system test there were 40
errors to be found. You found 30 of them. The defect
removal effectiveness of system test is 30/40 or 75%.
• The trick of course is calculating the latent number of
errors at any one point in the development process.
• Solution: to calculate latent number of errors at time x,
wait a certain period after time x to learn just how many
errors were present at time x.
Example Calculation of Defect
Removal Effectiveness
Levels of White-Box Code Coverage
• Another important testing metric is code
coverage. How thoroughly have paths
through the code been tested.
• Some of the more popular options are:
Statement coverage
Decision coverage (aka branch coverage)
Condition coverage
Basis path coverage
Path coverage
Statement Coverage
• Each line of code is executed.
if (a)
if (b)
• a=t;b=t gives statement coverage
• a=t;b=f doesn’t give statement coverage
Decision Coverage
• Decision coverage is also known as branch
• The boolean condition at every branch point (if,
while, etc) has been evaluated to both T and F.
if (a and b)
if (c)
• a=t;b=t;c=t and a=f;b=?;c=f gives decision
Does statement coverage guarantee
decision coverage?
if (a)
• If no, give an example of input that gives
statement coverage but not decision
Condition Coverage
• Each boolean sub-expression at a branch
point has been evaluated to true and false.
if (a and b)
• a=t,b=t and a=f;b=f gives condition
Condition Coverage
• Does condition coverage guarantee decision
if (a and b)
• If no, give example input that gives
condition coverage but not decision
Basis Path Coverage
• A path represents one flow of execution from the
start of a method to its exit.
• For example, a method with 3 decisions has 23
paths :
if (a)
if (b)
if (c)
if (a)
elseif (b)
elseif (c)
Basis Path Coverage
• Loops in code make path coverage
impractical for most programs.
• Each time through a loop is a new path.
• A practical alternative to path coverage is
basis path coverage.
Basis Path Coverage
• Basis path coverage is the set of all linearly
independent paths though a method or
section of code.
• The set of linearly independent paths
through a method are special because this
set is the smallest set of paths that can be
combined to create every other possible
path through a method.
Basis Path Coverage
• A path represents one flow of execution from the
start of a method to its exit.
• For example, a method with 3 decisions has 23
paths :
if (a)
if (b)
if (c)
if (a)
elseif (b)
elseif (c)
Path Coverage
• Path coverage is the most comprehensive
type of code coverage.
• In order to achieve path coverage you need
a set of test cases that executes every
possible route through a unit of code.
• Path coverage is impractical for all but the
most trivial units of code.
• Loops are the biggest obstacle to achieving
path coverage. Each time through a loop is a
new/different path.
Path Coverage
• How many paths are there in the following
unit of code?
if (a)
if (b)
if (c)
Path Coverage
• What inputs (test cases) are needed to achieve path
coverage on the following code fragment?
procedure AddTwoNumbers()
top: print “Enter two numbers”;
read a;
read b;
print a+b;
if (a != -1) goto top;
Deciding when to stop testing
• “When the marginal cost finding another
defect exceeds the expected loss from that
• Both factors (cost of finding another defect
and expected loss from that defect) can only
be estimated.
• Stopping criteria is something that should
be determined at the start of a project. Why?
Peer Reviews
Pair Programming
Code Review
Technical review vs. management review

Software Testing