EuroSPI 2000 
Practical and innovation based software process improvement to prepare for the new millenium.
European Software Process Improvement
SPI and Requirements
Category Index
Rated Newspaper Supported by EU Project 


Linguistic methods of Requirements Engineering (NLP)

The incorporation of linguistic methods (NLP)
in the investigation and quality assurance of requirements –
an experience report from industrial practice

Christine Rupp
SOPHIST GmbH, Nürnberg


Key Words

requirements, requirements analysis, requirements engineering, linguistic methods, software analysis, specification, specification document, Neuro Linguistic Programming (NLP), Meta-model of language.


In order to avoid errors in the analysis phase of software engineering, and also to further integrate clients and users needs in the analysis phase, IRE 9000 (Integrated Requirements Engineering) intends to use linguistic techniques in order to investigate and review requirements. This is not another approach to formally describe requirements but a method to work on natural language requirements directly at the specification document level. The overall goal of object engineering is to produce high-quality, legally binding requirements in natural language as a basis for systems development.

SOPHIST Ltd. transferred research from other sciences, i.e. linguistics and psychology, to computer science, and the result is an easy-to-use set of tools to find and avoid ambiguous, incomplete and inconsistent requirements. This paper will examine the various linguistic phenomena that underlie errors found in requirements. Furthermore, the paper will show how these errors can be found and corrected.


Many studies show that the majority of all defects in software products originate in the analysis phase. Mistakes that originate in the early phases of software development cause a "snowball effect" which propagates many more mistakes into later phases of development where it is much harder to remedy. Sometimes the situation is beyond repair. Approximately 60% of all serious mistakes in programs can be traced back to deficiencies in the analysis phase.

In order to control exploding costs and development time, the root causes that lie in the analysis phase must be removed before they can do serious damage. Improvements at the analysis stage level are more effective than in the design phase and incomparably better than in the implementation phase. The earlier an error is found and fixed, the less damage it can do to later development and the better the important development parameters of cost and time can be controlled. The first phase in the development of a project is decisive; it usually decides the success or failure of the project.

Requirements Engineering

Requirements Engineering (RE) concerns itself with the discovery, documentation, organization, review and administration of requirements for a software product. From an analyst‘s point of view, a typical scenario is as follows:

A semi-detailed specification document with excessively large sections and vague formulations of requests is submitted in natural language (prose) to the client‘s analyst. The analyst may then perhaps transform this specification document into a semi-formal representation such as a SA-, SADT- or OO model. On the basis of this representation, the analyst researches further and adds important points that he determines through dialog with the client or user(s). The original specification document loses all relevance as the model of the analyst shapes all new requirements, hence the original specification document becomes useless.

The result of this quite everyday scenario opposes the thought and purpose of the original representation because the analysis model should be understandable for all participants in the process and not merely from the analyst’s point of view. A clear representation enables the user to follow what the analyst models. The user can immediately "change course" if he notices that the model has gaps.

Perhaps some points are missing or are not adequately addressed. Lastly, (a) the user is more certain that he will receive what he wants and (b) in case of a dispute regarding system functionality (keyword acceptance), he is not standing in legal deep yogurt.

For reasons mentioned previously, natural language requirements (prose requirements) must now be given the primary status, which is opposite the previous convention. This does not mean that the current (semi-) formal representations of requirements are incorrect and useless; object engineering requires the translation of prose requests into an integration model and a simulation model. Rather, the representation shall be in natural language and the subsequent work with the natural language requirements is an integral component of the analysis process.

General consequences for the representation of requirements

In the development of a system, two different points of view emerge. On the one hand there are the analysts, designers and implementers and on the other hand there are the clients and users.

These two often-diverging points of view must be integrated in the development process. Here the transposition of requirements into an executable program and therefore the model of the future system must include the totality of all requirements and the model’s representation must be formal. The analyst and designer pursue this overall goal as if they were the system implementers and therefore they must keep this "prime directive" constantly in mind. The user however is much more interested in a delivered system that corresponds to his ideas. If the program does not correspond to the customer‘s expectations and mental "grand plan" complaints will follow.

The criteria for the model, which includes the totality of the requirements, are essentially derived from these two previously mentioned points of view. To further this goal, the requirement sentences should be --

These points are the basic requirements of a formal description and are also the basis of the legal obligation or contract. Therefore these requirements are quite significant indeed which requires one more point and that is that they are which, besides the already named reasons, greatly facilitates the understandability and hence acceptability of the requirements to the user.

Fig.CRUPP.1 : Reality, personal Knowledge, expression of the Knowledge

Consequences for the representation of requirements in prose

In a natural language representation, there are a few problems that lie in the nature of natural language that can make the necessary formal model an informal model. Remarks in prose are often (usually!) broadly interpretable, so previously mentioned criteria, unambiguity, completeness and logical consistency, seem to be impossible goals! Moreover, prose is generally not suitable for describing complex content structure while maintaining clarity of representation.

To this end, a set of rules for writing and checking of prose requirements would be very expedient and helpful in removing or minimizing the deficiencies and ambiguities of natural language. The goal of an analysis process on the basis of prose requirements must be the named formal criteria of legal obligation and intelligibility for all participants. Happily, such a set of rules exists and the rules can be followed. Therefore they should be put into practice.

Method of application in the analysis of prose requirements

The goal of a systematic language analysis is to uncover formulations that are stamped by subjective experience and to replace these subjective formulations with clear, objective formulations. A method of action is only possible at this point because natural language does possess an independent system of analysis. The discoverers of this system are linguistic scientists who have already examined and defined models of language, parts of which may be useful.

The father of the theory of a systematic construction of language is Noam Chomsky, founder of generative transformational linguistics. The results of his theory make it possible to build a definite sentence by applying grammar rules to any language component, be it spoken or written. Chomsky’s theory has undergone certain expansions and alterations since its first publication because the theory didn’t adequately explain certain aspects of language.

The method of analysis of prose requirements is essentially the application of the improved theory of transformational grammar. Therefore, one must, by means of generative rules, subject sentences to transformations. Ultimately the product is a grammatically correct and exactly defined sentence.

More recently, the basic science from linguistics has been applied to a model of human communication and mode of expression. This has produced a set of psychology rules or transformations, which enable an exact meaning of spoken or written sentences to be found via the application of transformations upon the spoken or written sentences. Correctly applied, the underlying meaning of the communication is revealed. Avenues of further questioning and investigation are also evidenced. The original set of rules are enumerated in broad strokes in the book, "The Structure of Magic I," by R. Bandler and J. Grinder, the creators of therapy-based Neuro Linguistic Programming (NLP).

The firm SOPHIST Ltd. has similarly applied models of language upon areas of Computer Science. The result has been the creation and testing of simple, easy to use rules for the creation and quality assurance of prose requirements in project development.

Through the use of these flexible and adaptable rules, a systematic mode and manner for the review of natural language requirements formulated in natural language is applicable. This is particularly true in regards to eliminating requirements, which contain ambiguous, incomplete and inconsistent statements as are often found in requirements documents. Finally, this method can and should be utilized in the writing of requirement documents so that errors or ambiguities never even "see the light of day."

The linguistic methods in the investigation and quality assurance of requirements

In the following, the most important principles of application will be described. The chart below exhibits the most frequently observed language phenomenon from a linguistic point of view. Due to extreme time limitations for the presentation, only one representative sample of each type of defect will be exhibited.

Fig.CRUPP.2 : linguistic defects

A few of these defects and their case by case removal will be clarified on the basis of suitable examples.


Deletion is a process through which we turn our attention towards selectively determined dimensions of experience and exclude others for simplification. In other words, the process of deletion reduces the world into smaller pieces, which can be more easily handled. This reduction can in certain contexts be quite helpful, but in the area of requirement definitions for a software system, we must know exactly what information has been lost.

The following examples exhibit various types of deletions and give an insight into the spectrum of possible representations from deletion transformations.


Presuppositions are statements that must be true so that a statement has any meaning. This form of deletion is frequently found in requirement documents. Implicit assumptions are also presuppositions. For example, so that the sentence
        When the altitude is less than the minimum height, an alarm should be set.
has any meaning, the statement
        There is a minimal height.
must be true.

Presuppositions or implicit assumptions must be made explicit in requirement documents in order to be meaningfully complete. Presuppositions originate frequently through an omission by the author of a requirement because the presuppositions are either so obvious to the author that he or she doesn’t consider it noteworthy or the author is not even aware that there is an implicit assumption.

Incomplete comparatives and superlatives

In comparison statements like
        This information can also be accessed from slower storage media.
you must place yourself in the question’s frame of reference. Slower than what? How slow is slow?

        The corrections should be easily modifiable.
the question remains what exactly is easily?

A comparative or superlative always requires a reference point to be completely defined. Furthermore, the unit of measure (ex. Meter, Second) and the tolerance (ex. +/- 0.1 meter, +/- 0.0001 seconds) must be declared.

Incomplete Process Words

Process words are those words in a sentence that describe a process. They must not necessarily be verbs, as adjectives and nouns can also play this role in a sentence (see also Nominalization). In order to be complete, a process word usually requires more arguments or perhaps even an entire noun phrase to be completely declared. Consider the following:
        The system should report data loss.
The process word "report" is completely defined only if the following questions are answered: Who reports? What is reported? How or in what manner is the information reported? To where or to whom is it to be reported? When is it reported? For how long a time is it reported?
        The utilization of internal system resources should be monitored.
In this statement "utilization" is in its noun form as a process word. To be completely defined, a process word must first answer the following questions: Who or what exactly is utilized? How is it utilized?

Furthermore, the process word "monitored" is referential. The statement is defined completely only if the following questions are clarified: Who monitors? What is monitored? How or in which manner is it monitored?


The human ability to experience through generalization is a process that is both meaningful and necessary for survival. Through generalization, we are able to transfer an experience to related contexts. It is very important to consider the related context in which the experience is transferred, particularly in regards to the information that may have been omitted in the application of the generalization.

Through the process of generalization requirements are often made which seem to apply to a large or entire part of a system. For other parts of the system, these requirements can be very false indeed, whereas a correctly defined requirement would actually apply to a smaller piece of the system in order to have accurate scope and correct meaning.

Typical for the process of generalization is the suppression and omission of both special and error cases respectively. In the following, the most frequently seen variations of generalization are reviewed.

Universal Quantifiers

Universal quantifiers are parts of statements, which are broadly applied to all incidences of occurrence. Linguistic representatives of this group are concepts like, "never," "always," "no," "every," and "all." The danger with the application of universal quantifiers is that often the specified behavior does not truly apply to all referenced objects in the group or set. The universal grouping usually contains one or more special or exception cases, which the universal quantifier references, but for those cases, the specified behavior is false.
        Each signal shall additionally be labeled with a time stamp
On the basis of the keyword, "each," a question is immediately evidenced. Really each/every signal? Or are there perhaps one or more cases in which the time stamp is not required?

It is critical with this type of generalization to immediately define the range of applicability so that no possibilities and occurrences of applicability are left out. Special cases must also be defined in the generalization process.

Incompletely specified conditions

Incompletely specified conditions are another indicator for a possible information loss through a generalization. The usual representations, among others, are, "if", "then," "in case of," and "depending on." If data X is needed by interface Y, then the time for a response shall be under 0.5 seconds. The begged question is this: which time requirements exist in the case that data X is not requested by the Y interface?

Nouns without reference index

Noun arguments without (clear) reference indices are another key type of information loss. This means that a noun exists in a statement but the noun is not sufficiently specified for clarity of the sentence. The message shall be displayed at the working position. The nouns of this sentence, message and working position, give no hint as to what exactly they refer. Therefore at least two questions pose themselves:

Which message? Which/whose working position?


The distortion transformation appears almost exclusively in form of a nominalization.


A nominalization occurs when a process is reformulated into an event. A nominalization may change the meaning of a statement. A nominalization may also cause important information regarding a process to be lost. Linguistically one recognizes a nominalization as a process word (verb or predicate) that has been molded into an event word (noun or argument).
        Data loss shall be recognized and reported.

The process behind the noun data loss actually consists of:
        Data is being lost.

So this sentence has the related questions: Which data is being lost? How is the data being lost? How can the loss of data be recognized?

Results and conclusions

The linguistic methods of Requirements Engineering introduced here have already been successfully applied in several large industrial projects. In each case, the defined procedural guidelines of object engineering played a crucial role for the success of the project. The previously mentioned success of the linguists and psychologists suggests an easy and useful transfer of proven and utilizable tools onto the software development process -- particularly in the engineering of prose requirements.

When picking methods from the methods kit object engineering it is hardly possible to omit the component linguistic methods in contrast to the possible elimination of the integration and simulation model components. Object engineering forms the basis for further work in the software development process with prose requirements, which are the essential foundation for continued development, as previously demonstrated.

Hence, with some concessions, it is possible that the scientifically necessary formal model, which is formed by the entirety of all requirements, can be represented informally with the assistance of linguistic methods. So the door is then opened such that the client and user may, for the first time, directly review and criticize the working description of the problem.


Foundations for this paper follow:

Profile and address of the author

Christine Rupp is CEO and owner of SOPHIST Ltd. and has 7 years of extensive experience in the area of software analysis methods. She has led and supported projects with organizations including the Deutsche Flugsicherung Ltd. (German Flight Control Administration), Eurocontrol, Deutsche Post AG, Daimler Benz AG, RWE DEA and Siemens AG.


SOPHIST Gesellschaft für innovative Software-Entwicklung mbH
Vordere Cramergasse 11-13
90478 Nürnberg, Germany
Tel: +49 (0)911/ 40 900-0
Fax: +49 (0)911/ 49 900-99
Copyright © 2000 by SOPHIST Ltd.

All rights reserved, including any rights arising from the granting of a patent or the registration of a patent or the registration of a utility model or design. No part of this publication may be reproduced, distributed, stored in a retrieval system, or transmitted in any form, or by any means, electronic, mechanical, photocopying, recording or otherwise without the prior consent of the publisher.
Violators will be prosecuted to the maximum extent possible under law.

Partners in EuroSPI

ISCN LTD, ISCN GesmbH, Schieszstattgasse 4/24, 8010 Graz, and Coordination Office, Florence House, 1 Florence Villas, Bray, Ireland,,,, Editing Done: 19.7.2002