EuroSPI 2000 
Practical and innovation based software process improvement to prepare for the new millenium.
European Software Process Improvement
SPI and Testing
Category Index
Rated Newspaper Supported by EU Project 

 
 
 
 
 

WHEN
Release Decision Metrics
 
 
 

Niels Bruun Svendsen
& John A. Fodeh

B-K Medical A/S
Denmark


 
 
 

Introduction

When developing systems and software the inevitable management question is: "When is the system ready for release?". On the bottom line, the answer to when to release a new product for production and sales is a matter of being able to determine the balance between time-to-market and quality. Ship now and win market shares or postpone the release for a higher quality product? In order to estimate the economical impact of releasing a product, the release decision must be based on quantitative arguments and consequences.
This paper will share the experiences gained and the lessons learned from introducing metric based release decision support.

For setting up a metrics program, the process described in the CMU/SEI handbook "Goal-Driven Software Measurement" was applied. Apart from leading to a well-defined set of metrics, the impact on the organisation was remarkable. The metrics program resulted in a „Release Form", i.e. a data sheet containing a set of metrics collected during the system test phase together with other relevant information needed for assessing the product's readiness for release.
A number of metrics included in the developed Release Form have been applied in multiple releases and the results evaluated after each release. This paper will highlight the benefits found as well as the problems encountered. Furthermore, it will put emphasis on the experienced effect of introducing and working with these metrics, that has been seen in the organisation.

The work has been supported by the European Commission and this paper is part of the final dissemination of the ESSI – Process Improvement Experiment (PIE) project 27498 – WHEN. The PIE has three objectives, of which two are related to release decision support, the topic of this paper.
 
 

Company Context

B-K Medical develops, produces and markets ultrasound systems for medical diagnostic imaging. The systems are sold throughout the world with the major markets being Europe, USA and Asia. B-K Medical has 250 employees with 170 located in Denmark. The development department consists of 60 employees where 20 are involved in software development. B-K Medical is ISO 9001 certified and most of the products have FDA market clearance and conform to the CE-Medical Device Directive. Therefore, internal as well as external audits are performed accordingly. No formal assessment against a model has been performed. Nevertheless, an informal self-assessment using the BootCheck tool from ESI has been performed. This assessment gave maturity ratings between 2.5 and 3.25, indicating some areas in need of improvement to get to the Defined (3) level, and a general lack of metrics as required in the Managed (4) level.
 
 

Project Initiation

The work on release decision metrics was initiated by the management group. It has its basis in the release meetings held within the management group, where the decision is made whether to release a new product or not. This decision was found to be based too much on subjective input and too little on objective data. Therefore, a goal was set to improve the basis for the release decision and thereby give objective support on when to release. A brainstorm on needed data and information to give an improved support for the release decision was held with the management group. The brainstorm was used as input for the work to be done and to get the full attention of the management group on the fact that the improvement work was started and that it was their problems and wishes being addressed.

In order to measure the impact of the improvement, a questionnaire was designed based on the output of the management brainstorm. The questionnaire was applied after each of three releases, with one release being before the improvement work was started, one being half way through and one being at the end of the improvement work.

The questionnaire consists of 7 questions to be rated in 1 of 5 levels. The questions are:

1. How would you characterise the basis for release decision in general?
2. How was the remaining known errors and their consequences presented?
3. How was the presentation of how much that had been tested?
4. How was the presentation of how thorough the user evaluation was?
5. How was the estimate on remaining unknown errors?
6. How was the estimate on remaining unknown safety errors?
7. How was the post-release plan presented?

The five rating levels are:
1. ___ Non existent
2. ___ Weak (Very subjective)
3. ___ Fair (Subjective, but well argumented)
4. ___ Very Good (Mainly objective)
5. ___ That’s how to do it (Objective, based on solid data)

The initial average score was 2.4, indicating that the basis for release decision was all together subjective. The results obtained during the WHEN project can be seen in The Results section.
 
 

Setting up a Metrics Program

At the time of the WHEN project definition a number of metrics were identified. In addition, the brainstorm with the management group gave input to additional measurements that could be applied. It was decided to test if the planned metrics and techniques were optimal for supporting the objectives of the experiment. The „Goal-Driven Software Measurement" method was selected to set up the relation between the objectives and the measurements to be performed. This handbook, developed by SEI (Software Engineering Institute) at Carnegie Mellon University, ref. [2], delivers a formalised and structured method for decomposing the Business Goals of the PIE into a set of metrics and clarifying the dependencies between the metrics as well as the actions needed for collecting them.

The method builds on the GQM (Goal Question Metric) method by Basili and Rombach, and extends the GQM with a phase that guides the user from Business Goals through Sub-Goals to Measurement Goals. In overview, the method can be illustrated as shown in Figure NBS-JAF 1.
 


Figure NBS-JAF 1: Goal-Driven Software Measurement method

The work on Goal-Driven Software Measurement was conducted as a series of workshops involving the newly formed system test group and an external mentor. The system test group consisted of a senior test manager, a senior SW engineer and two newly employed test managers. The first step on the way from Business goals to Sub-Goal was to ask questions concerning the process involved. As shown in Figure NBS-JAF 2, this raised another question, i.e. which process are we talking about? It was found that despite the fact that work is performed according to an ISO9000 certified quality system, the process definitions were either lacking or not detailed enough and the terms used were not defined. The „Goal-Driven Software Measurement" handbook uses a concept called „Mental Models". Mental Models are the perception of procedures, processes and practices in the mind of the user. Models like that can work when only one person is using the model, although the model has a tendency to change according to the current situation. The problem occurs when more people are involved and only Mental Models exist, because there is at least one Mental Model per person involved.
 


Figure NBS-JAF 2: Which process?

Getting the individual Mental Models aligned and written down in process definitions took quite some time. However, the discussions afterwards could be aimed at continuing the Goal-Driven Software Measurement process, instead of discussing the proper use of terms and which sub-processes existed.

Having reached the point where a number of Sub-Goals were defined we were ready to apply the GQ(i)M part of the process. The (i) part stands for indicator and is an addition to the GQM that we found valuable. The idea is to make sketches of the desired presentation of the measurement results. It makes the measurements more real and „alive" and generates a number of additional discussions and ideas. Examples of indicators can be seen in Figure NBS-JAF 3.

Several measurements were defined using the Goal-Driven Software Measurement method. A number of these were selected as our release decision metrics. It was noticeable that some Sub-Goals did not directly result in measurements, but rather pointed out the need for templates, checklists etc.
 


Figure NBS-JAF 3: Indicators

The final step was to prepare a plan that addressed the identified actions needed for both implementing the measures and completing the templates and checklists.
This plan set the framework for the WHEN PIE activities and established a reference for further improvements of the processes.
 
 

The Release Decision Metrics

The selected release decision metrics can be grouped into the following 4 main groups: Based on these, a Release Template was developed.
 

The Release Template

In the following, the developed Release Template is shown. The release template is a data-sheet containing the gathered metrics regarding the state of the system to be released. The data sheet is usually delivered to the management group some hours, or the day before the actual release meeting, so the contents of the sheet can be studied in advance.
In the template, the actual value of each metric is shown together with a target value and a reference value (i.e. the actual value from a former release of the same or another product). At this time, targets have only been set for a few of the metrics. It is planned to add further target values as we obtain the data to base the target values on.
In addition to the metrics table, the release template includes two charts. One showing the stability of the system for each of the builds during the system test phase (Figure NBS-JAF 5) and the other showing the error trend based on the error detection rate (Figure NBS-JAF 6). The error trend has been one of the key metrics introduced and will be described further in the Error Trending section.

The test coverage data include the information concerning the progress and completeness of the testing. A low value reveals insufficient testing effort and the risk of potential latent defects. It is planned to extend this section with code coverage data for quantifying the portion of the code that is exercised by testing, thereby showing the thoroughness of the applied testing techniques.

The system stability section delivers vital information about the reliability of the software. The data shows the mean number of operations between failures, equivalent to the widely spread Mean Time Between Failures (MTBF) metric. This section refers to the stability chart giving a graphical presentation of the mean number of random operations between failures, as a function of the build number. The chart contains two limits; the lower limit is the entry criterion for system testing, while the higher limit is release stability criterion. In this way, it is straightforward to confirm that the system's stability is adequate for release.

Test system status section contains statistics regarding the problems reported during the system-testing phase. The total number of problems reported is shown and categorised in closed (fixed and verified) and open problems. The open problems are sorted according to their severity. These data deliver a snap shot of the system state at release time, making it possible to take into account the risk and consequence of releasing the system. E.g. if the data reveals a large number of open high-severity or non-verified problems, then it clearly shows that releasing the system at this moment is high-risk decision.

User feedback during the development is undoubtedly of major importance. The user evaluation section presents relevant data collected during the user evaluation activities. At this time, this section only contains a summary of the raised problem reports and their classification. It is planned to extend this section with information about covered applications, user types, countries, etc.

The post release plan section contains an overview of the activities to be performed after the release of the product together with the responsibilities, schedules and the date for the subsequent release. The post release plan sends a clear signal that the project is not ended with the release of the product. This helps preventing management from allocating all resources to new projects just after release. Instead, efficient planning in the transition phase between projects can be made.
 


Figure NBS-JAF 4: Release Template

Figure NBS-JAF 5: Stability Trend

Figure NBS-JAF 6: Error Trend

Error Trending

For estimating the number of remaining unknown errors, error trending is used. Error trending is based on the graph showing the accumulated number of reported errors (y-axis) as a function of test effort (x-axis), as shown in Figure NBS-JAF 6. The test effort is expressed in terms of test days. A test day is the effort equivalent to a typical (8 hours) work day of a single tester.

The dots in the graph represent the reported errors, while the line going through the dots is the best-fit line (mathematical least square) based on the Weibull function. This line is extrapolated, providing a predictive evolution of the error finding rate.

As noticeable, the graph is S-shaped and can be divided into three sections; the first is the section with the slight slope at the beginning, the second is the mid-section with the linear-like slope, the third is the section where the graph flattens out. This S-shape is found to correlate with empirical data from software projects. At system test initiation, the error finding rate is low (as the functionality of the software is often restricted to few areas). The error finding rate increases with the addition of new functionality and the introduction of new errors during the correction of already found errors. Entering the third section, the error finding rate begins to decrease, as it becomes harder to find new errors. Ultimately, the graph flattens. Finding further errors at this stage require huge test effort and shows that the software is possibly ready for release (or that the limitation of the applied testing technique has been reached).

More details on the error trending can be found in ref. [1] and the results obtained by using it are discussed in The Results section.
 
 

The Results

Measurable results have been obtained on the quality of the release decision support. Equally, on the precision of the Error trend based estimate of the number of remaining unknown errors. The quality of the release decision support has been measured by means of the previously mentioned questionnaire. Figure NBS-JAF 7 shows how the rating of each of the 7 questions has evolved through the 3 releases. Included is also an average of the 7 questions for each of the 3 releases. It is seen that the average score has increased from 2.4 to 3.6. With the level definitions in mind, this means that the basis for release decision has been moved from all together subjective to mainly objective.
 
 

Figure NBS-JAF 7: Result of Release Decision Questionnaire

One of the major improvements is the estimate on remaining unknown errors. This estimate is based on the error trend. The results of the error trend based estimates compared with actual number of errors found can be seen in the table below. What we conclude from this, is that the error trend based estimate is an optimistic estimate. It is not high precision, but it is fairly consistent and far more realistic than a subjective estimate. The experience is that the error trend based estimate is nearly always received as being high, i.e. „Do we really have that many errors left". In that case, it is important to notice that so far the estimate always has been too low.
 

  Trend based estimate Reported after release1
Pre-WHEN release 5 15
2100 release 1 17 35
2100 release 2 31 352

1All reports counts, including change requests.
2Only 3 months data available. The other results are based on 6 months.


Lessons Learned

Metrics are valuable in planning and decision making
Clearly, there is a substantial pressure to maximise the profit by releasing early. On the other hand, the economical losses of releasing a poor-quality product as well as the damages in goodwill and reputation may inflect irreversible damage. In the lack of metrics to support the release decision, the state of the system to be released is vague, often resulting in an unnecessary delay of the release.

In this respect, the metrics used for supporting the release decision have shown their value. By giving the management group a more objective release decision basis, a higher degree of freedom in their decision has been obtained. A visible effect has been that management has decided not to delay releases in order to reduce the number of unknown defects at time of release, but to focus on a post-release plan to bring down the impact of post-release errors.

In the planning phase, the metrics have also shown their strength. The ability to give a qualified guess on the effort size of a system test project 9 months ahead, by use of the SW development time to system test time ratio, is convincing. During the system test, the error trend has given input to the planning of the remaining amount of test and needed resources for both testing and error correction.

Furthermore, metrics have also taken the role of a common reference. Especially the stability and error trends gave the common reference for discussion of system state, i.e. a simple graph gives the common basis for discussing system state, which is understood and accepted by top-management, project management, developers and QA staff.

Metrics demand maturity or the will to mature
Working with defining relevant metrics we soon discovered that there was a need for clear definitions of the processes to base the measures on. In other words, for the metrics to be relevant a certain level of maturity is required. We did not initially have that level of maturity but we used the work on metrics to trigger and drive the improvements of process definitions. We experienced major benefits from that work especially in terms of job motivation, as there is no longer any need for spending time on the general way of performing regular routines, instead more effort can be put into solving the specific task at hand. Moreover, when spending time on the process it is to improve it, instead on figuring what the process is.

As much of the work done was focused on the system test phase, the major impact has been seen in the system test group. The results obtained as well as the discussion generated during the PIE has helped greatly in forming a dynamic and committed group that considers metric-supported process improvement a vital part of the process.
 
 

Conclusions

Incorporating metrics into the development process delivers an effective tool for planning, monitoring, predicting and following up. In particular for finding the proper timing for release, possessing the right metrics has shown a tremendous value. These metrics provide insight into the state and quality of the software system making it possible to base the release decision on solid data and well-calculated risk rather than intuition and gut-feel.

The conclusion on the use of Goal-Driven Software Measurement to drive the definition of a metrics program is that it can be highly recommended. Although it involved far more work than initially anticipated, it was undoubtedly worthwhile. Looking back, it was a necessary step for bringing up the level of maturity to where measurements start to make sense. Starting out without the awareness of missing process definition etc., the Goal-Driven Software Measurement was a perfect trigger of the needed improvement actions. Especially in the system test group, the work completed with Goal-Driven Software Measurement had helped establishing a solid infrastructure consisting of well-defined, functional and efficient processes.

In relation to the release decision support, a clear and positive effect has been seen. The greatest positive effect has been seen for the error trend based estimation of number of remaining errors and for the post release plan. These improvements have also triggered an interest in other metrics based on available data. An example of this is the calculation of the general cost of delaying release and comparing that with the cost of field update of the SW. It showed that the cost of updating the SW on all scanners in the field, 6 months after release, equals the loss of delaying the release by only 10 days.

The substantial cost of delaying the release shows the enormous pressure to release early and emphasises the importance of choosing the right release time, as the consequences of a "premature" release may be unrecoverable.

The developed release template will without doubt be used on future releases. It will be enhanced with code coverage and an improved user feedback section. It will also evolve towards defining release criteria by defining more target values.

In a broader sense, this work has helped establishing process improvement as a natural part of daily life in the development department.
 
 

References

[1] Niels Bruun Svendsen, Error Trending, Why and How, in: Proceedings of the EuroSPI’99 Conference, pp. 11.41-11.51, Series A – Pori School of Technology and Economics, No A 25, Pori 1999

[2] Robert E. Park, Wolfhart B- Goethert, William A. Florac, „Goal-Driven Software Measurement – A Guidebook", Handbook CMU/SEI-96-HB-002

[3] Linda Rosenberg and Lawrence Hyatt, „Developing a Successful Metrics Program", Software Assurance Technology Center (SATC), USA, 1997

[4] Stephen H. Kan, „Metrics and Models in Software Quality Engineering", Addison-Wesley, USA
 



 
Partners in EuroSPI

Editors
ISCN LTD, ISCN GesmbH, Schieszstattgasse 4/24, 8010 Graz, and Coordination Office, Florence House, 1 Florence Villas, Bray, Ireland, office@iscn.at, office@iscn.com, office@iscn.ie, Editing Done: 19.7.2002