Software Quality
Metrics Overview
Types of Software Metrics
 Product metrics – e.g., size, complexity,
design features, performance, quality
level
 Process metrics – e.g., effectiveness of
defect removal, response time of the fix
process
 Project metrics – e.g., number of
software developers, cost, schedule,
productivity
Software Quality Metrics
 The subset of metrics that focus on quality
 Software quality metrics can be divided into:
 End-product quality metrics
 In-process quality metrics
 The essence of software quality engineering
is to investigate the relationships among inprocess metric, project characteristics , and
end-product quality, and, based on the
findings, engineer improvements in quality
to both the process and the product.
Three Groups of Software
Quality Metrics
 Product quality
 In-process quality
 Maintenance quality
Product Quality Metrics
 Intrinsic product quality


Mean time to failure
Defect density
 Customer related


Customer problems
Customer satisfaction
Intrinsic Product Quality
 Intrinsic product quality is usually
measured by:


the number of “bugs” (functional defects) in
the software (defect density), or
how long the software can run before
“crashing” (MTTF – mean time to failure)
 The two metrics are correlated but
different
Difference Between Errors, Defects,
Faults, and Failures (IEEE/ANSI)
 An error is a human mistake that results in
incorrect software.
 The resulting fault is an accidental condition
that causes a unit of the system to fail to
function as required.
 A defect is an anomaly in a product.
 A failure occurs when a functional unit of a
software-related system can no longer perform
its required function or cannot perform it within
specified limits
What’s the Difference between
a Fault and a Defect?
The Defect Density Metric
 This metric is the number of defects over
the opportunities for error (OPE) during
some specified time frame.
 We can use the number of unique
causes of observed failures (failures are
just defects materialized) to approximate
the number of defects.
 The size of the software in either lines of
code or function points is used to
approximate OPE.
Lines of Code
 Possible variations






Count only executable lines
Count executable lines plus data definitions
Count executable lines, data definitions, and
comments
Count executable lines, data definitions,
comments, and job control language
Count lines as physical lines on an input screen
Count lines as terminated by logical delimiters
Lines of Code (Cont’d)
 Other difficulties




LOC measures are language dependent
Can’t make comparisons when different
languages are used or different operational
definitions of LOC are used
For productivity studies the problems in using
LOC are greater since LOC is negatively
correlated with design efficiency
Code enhancements and revisions complicates
the situation – must calculate defect rate of new
and changed lines of code only
Defect Rate for New and
Changed Lines of Code
 Depends on the availability on having
LOC counts for both the entire produce
as well as the new and changed code
 Depends on tracking defects to the
release origin (the portion of code that
contains the defects) and at what
release that code was added, changed,
or enhanced
Function Points
 A function can be defined as a collection
of executable statements that performs a
certain task, together with declarations of
the formal parameters and local
variables manipulated by those
statements.
 In practice functions are measured
indirectly.
 Many of the problems associated with
LOC counts are addressed.
Measuring Function Points
 The number of function points is a
weighted total of five major components
that comprise an application.





Number of external inputs x 4
Number of external outputs x 5
Number of logical internal files x10
Number of external interface files x 7
Number of external inquiries x 4
Measuring Function Points
(Cont’d)
 The function count (FC) is a weighted total
of five major components that comprise an
application.





Number of external inputs x (3 to 6)
Number of external outputs x (4 to 7)
Number of logical internal files x (7 to 15)
Number of external interface files x (5 to 10)
Number of external inquiries x (3 to 6)
the weighting factor depends on complexity
Measuring Function Points
(Cont’d)
 Each number is multiplied by the weighting
factor and then they are summed.
 This weighted sum (FC) is further refined by
multiplying it by the Value Adjustment
Factor (VAF).
 Each of 14 general system characteristics
are assessed on a scale of 0 to 5 as to their
impact on (importance to) the application.
The 14 System Characteristics
1. Data Communications
2. Distributed functions
3. Performance
4. Heavily used configuration
5. Transaction rate
6. Online data entry
7. End-user efficiency
The 14 System Characteristics
(Cont’d)
8. Online update
9. Complex processing
10. Reusability
11. Installation ease
12. Operational ease
13. Multiple sites
14. Facilitation of change
The 14 System Characteristics
(Cont’d)
 VAF is the sum of these 14
characteristics divided by 100 plus
0.65.
 Notice that if an average rating is given
each of the 14 factors, their sum is 35
and therefore VAF =1
 The final function point total is then the
function count multiplied by VAF
 FP = FC x VAF
Customer Problems Metric
 Customer problems are all the difficulties
customers encounter when using the product.
 They include:





Valid defects
Usability problems
Unclear documentation or information
Duplicates of valid defects (problems already fixed
but not known to customer)
User errors
 The problem metric is usually expressed in
terms of problems per user month (PUM)
Customer Problems Metric
(Cont’d)
 PUM = Total problems that customers
reported for a time period <divided by>
Total number of license-months of the
software during the period
where
Number of license-months = Number
of the install licenses of the software
x Number of months in the
calculation period
Approaches to Achieving a
Low PUM
 Improve the development process and
reduce the product defects.
 Reduce the non-defect-oriented
problems by improving all aspects of the
products (e.g., usability, documentation),
customer education, and support.
 Increase the sale (number of installed
licenses) of the product.
Defect Rate and Customer
Problems Metrics
Defect Rate
Problems per UserMonth (PUM)
Numerator
Valid and unique
product defects
All customer problems
(defects and nondefects, first
time and repeated)
Denominator
Size of product (KLOC
or function point)
Customer usage of the
product (user-months)
Measurement
perspective
Producer—software
development
organization
Customer
Scope
Intrinsic product quality Intrinsic product quality plus
other factors
Customer Satisfaction Metrics
Customer
Satisfaction
Issues
Customer
Problems
Defects
Customer Satisfaction Metrics
(Cont’d)
 Customer satisfaction is often measured
by customer survey data via the fivepoint scale:





Very satisfied
Satisfied
Neutral
Dissatisfied
Very dissatisfied
IBM Parameters of Customer
Satisfaction
 CUPRIMDSO









Capability (functionality)
Usability
Performance
Reliability
Installability
Maintainability
Documentation
Service
Overall
HP Parameters of Customer
Satisfaction
 FURPS





Functionality
Usability
Reliability
Performance
Service
Examples Metrics for
Customer Satisfaction
1. Percent of completely satisfied
customers
2. Percent of satisfied customers (satisfied
and completely satisfied)
3. Percent of dissatisfied customers
(dissatisfied and completely
dissatisfied)
4. Percent of nonsatisfied customers
(neutral, dissatisfied, and completely
dissatisfied)
In-Process Quality Metrics
 Defect density during machine testing
 Defect arrival pattern during machine
testing
 Phase-based defect removal pattern
 Defect removal effectiveness
Defect Density During
Machine Testing
 Defect rate during formal machine
testing (testing after code is integrated
into the system library) is usually
positively correlated with the defect rate
in the field.
 The simple metric of defects per KLOC
or function point is a good indicator of
quality while the product is being tested.
Defect Density During
Machine Testing (Cont’d)
 Scenarios for judging release quality:

If the defect rate during testing is the same
or lower than that of the previous release,
then ask: Does the testing for the current
release deteriorate?
 If
the answer is no, the quality perspective is
positive.
 If the answer is yes, you need to do extra
testing.
Defect Density During
Machine Testing (Cont’d)
 Scenarios for judging release quality
(cont’d):

If the defect rate during testing is
substantially higher than that of the
previous release, then ask: Did we plan for
and actually improve testing effectiveness?
 If
the answer is no, the quality perspective is
negative.
 If the answer is yes, then the quality
perspective is the same or positive.
Defect Arrival Pattern During
Machine Testing
 The pattern of defect arrivals gives more
information than defect density during
testing.
 The objective is to look for defect arrivals
that stabilize at a very low level, or times
between failures that are far apart before
ending the testing effort and releasing
the software.
Two Contrasting Defect Arrival
Patterns During Testing
Three Metrics for Defect
Arrival During Testing
 The defect arrivals during the testing phase by
time interval (e.g., week). These are raw
arrivals, not all of which are valid.
 The pattern of valid defect arrivals – when
problem determination is done on the reported
problems. This is the true defect pattern.
 The pattern of defect backlog over time. This is
needed because development organizations
cannot investigate and fix all reported problems
immediately. This metric is a workload
statement as well as a quality statement.
Phase-Based Defect Removal
Pattern
 This is an extension of the test defect
density metric.
 It requires tracking defects in all phases
of the development cycle.
 The pattern of phase-based defect
removal reflects the overall defect
removal ability of the development
process.
Defect Removal by Phase for
Two Products
Defect Removal Effectiveness
 DRE = (Defects removed during a
development phase <divided by>
Defects latent in the product) x 100%
 The denominator can only be
approximated.
 It is usually estimated as:
Defects removed during the phase +
Defects found later
Defect Removal Effectiveness
(Cont’d)
 When done for the front end of the
process (before code integration), it is
called early defect removal
effectiveness.
 When done for a specific phase, it is
called phase effectiveness.
Phase Effectiveness of a
Software Product
Metrics for Software
Maintenance
 The goal during maintenance is to fix the
defects as soon as possible with
excellent fix quality
 The following metrics are important:




Fix backlog and backlog management
index
Fix response time and fix responsiveness
Percent delinquent fixes
Fix quality
Fix Backlog
 Fix backlog is a workload statement for
software maintenance.
 It is related to both the rate of defect
arrivals and the rate at which fixes for
reported problems become available.
 It is a simple count of reported problems
that remain at the end of each time
period (week, month, etc.)
Backlog Management Index
(BMI)
 BMI = (Number of problems closed
during the month <divided by> Number
of problem arrivals during the month) x
100%.
 If BMI is larger than 100, it means the
backlog is reduced.
 If BMI is less than 100, then the backlog
is increased.
Opened Problems, Closed
Problems, and Backlog
Management Index by Month
Fix Response Time and Fix
Responsiveness
 The fix response time metric is usually




calculated as:
Mean time of all problems from open to
closed
Metric may be used for different defect severity
levels.
Fix response time relates to customer
satisfaction.
But meeting agreed-to fix time is more than just
achieving a short fix time.
A possible metric is the percentage of delivered
fixes meeting committed dates to customers.
Percent Delinquent Fixes
 The mean response time metric is a central
tendency measure.
 A more sensitive metric is the percentage of
delinquent fixes (for each fix, if the turnaround
time greatly exceeds the required response
time, it is classified as delinquent).
 Percent delinquent fixes = (Number of fixes that
exceeded the response time criteria by severity
level <divided by> Number of fixes delivered in
a specified time) x 100%
Percent Delinquent Fixes
(Cont’d)
 This is not a real-time metric because it
is for closed problems only.
 For a real-time metric we must factor in
problems that are still open.
 We can use the following metric
Real-Time Delinquency Index = 100
x Delinquent / (Backlog + Arrivals)
Real-Time Delinquency Index
Fix Quality
 The number of defective fixes is another
quality metric for maintenance.
 The metric of percent defective fixes is
simply the percentage of all fixes in a
time interval that are defective.
 Recording both the time the defective fix
was discovered and the time the fix was
made to be able to calculate the latent
period of the defective fix.
Examples of Metrics Programs

Motorola


Follows the Goal/Question/Metric paradigm of
Basili and Weiss
Goals:
1.
Improve project planning
2.
Increase defect containment
3.
Increase software reliability
4.
Decrease software defect density
5.
Improve customer service
6.
Reduce the cost of nonconformance
7.
Increase software productivity
Examples of Metrics Programs
(Cont’d)
 Motorola (cont’d)

Measurement Areas








Delivered defects and delivered defects per
size
Total effectiveness throughout the process
Adherence to schedule
Accuracy of estimates
Number of open customer problems
Time that problems remain open
Cost of nonconformance
Software reliability
Examples of Metrics Programs
(Cont’d)
 Motorola (cont’d)

For each goal the questions to be asked
and the corresponding metrics were
formulated:



Goal 1: Improve Project Planning
Question 1.1: What was the accuracy of
estimating the actual value of project
schedule?
Metric 1.1: Schedule Estimation Accuracy
(SEA)

SEA = (Actual project duration)/(Estimated
project duration)
Examples of Metrics Programs
(Cont’d)
 Hewlett-Packard



The software metrics program includes
both primitive and computed metrics.
Primitive metrics are directly measurable
Computed metrics are mathematical
combinations of primitive metrics




(Average fixed defects)/(working day)
(Average engineering hours)/(fixed defect)
(Average reported defects)/(working day)
Bang – A quantitative indicator of net usable
function from the user’s point of view
Examples of Metrics Programs
(Cont’d)
 Hewlett-Packard (cont’d)

Computed metrics are mathematical
combinations of primitive metrics (cont’d)





(Branches covered)/(total branches)
Defects/KNCSS (thousand noncomment source
statements)
Defects/LOD (lines of documentation not
included in program source code)
Defects/(testing time)
Design weight – sum of module weights
(function of token and decision counts) over the
set of all modules in the design
Examples of Metrics Programs
(Cont’d)
 Hewlett-Packard (cont’d)

Computed metrics are mathematical
combinations of primitive metrics (cont’d)



NCSS/(engineering month)
Percent overtime – (average overtime)/(40
hours per week)
Phase – (engineering months)/(total
engineering months)
Examples of Metrics Programs
(Cont’d)
 IBM Rochester

Selected quality metrics







Overall customer satisfaction
Postrelease defect rates
Customer problem calls per month
Fix response time
Number of defect fixes
Backlog management index
Postrelease arrival patterns for defects and
problems
Examples of Metrics Programs
(Cont’d)
 IBM Rochester (cont’d)

Selected quality metrics (cont’d)






Defect removal model for the software
development process
Phase effectiveness
Inspection coverage and effort
Compile failures and build/integration defects
Weekly defect arrivals and backlog during
testing
Defect severity
Examples of Metrics Programs
(Cont’d)
 IBM Rochester (cont’d)

Selected quality metrics (cont’d)






Defect cause and problem component analysis
Reliability (mean time to initial program loading
during testing)
Stress level of the system during testing
Number of system crashes and hangs during
stress testing and system testing
Various customer feedback metrics
S curves for project progress
Collecting Software
Engineering Data
 The challenge is to collect the necessary
data without placing a significant burden
on development teams.
 Limit metrics to those necessary to avoid
collecting unnecessary data.
 Automate the data collection whenever
possible.
Data Collection Methodology
(Basili and Weiss)
1. Establish the goal of the data
2.
3.
4.
5.
6.
collection
Develop a list of questions of interest
Establish data categories
Design and test data collection forms
Collect and validate data
Analyze data
Reliability of Defect Data
 Testing defects are generally more
reliable than inspection defects since
inspection defects are more subjective
 An inspection defect is a problem found
during the inspection process that, if not
fixed, would cause one or more of the
following to occur:

A defect condition in a later inspection
phase
Reliability of Defect Data
 An inspection defect is one which would
cause: (cont’d) :




A defect condition during testing
A field defect
Nonconformance to requirements and
specifications
Nonconformance to established standards
An Inspection Summary Form
Descargar

Software Inspections - California State University, Northridge