Final Portfolio

Draft Research Questions

From oldest to newest:

Draft Claim

As cool as computer science is, it is not a one-on-one conversation with the machine. It’s a collaborative process that for the most part takes place outside the text editor. When dozens of people are all working towards the development of a single product, all of them sharing different literacy in computer programming and user design, they need a guiding force. That is the purpose of the specification. However, specifications can be hard to understand, and even lead to bugs and implementation differences, the things they’re meant to avoid. Natural language specifications can be challenging to write clearly, and formal specifications, usually relying on mathematical syntax, can be challenging to parse for the uninitiated.

In this presentation, I aim to show that the benefits of both informal natural language specifications and formal mathematical specifications can be combined with minimal downsides, while at the same time allowing said specifications to be read by both humans and computers. By writing in this ideal specification language, engineers can write specifications that can be read and verified by anyone with the appropriate domain knowledge and turned into an executable program, creating a single source of truth for an application. This can be accomplished through extremely restricted prescriptive grammar. Through this, specifications can be easily parsed by computers and humans can avoid ambiguity when interpreting them.

Abstract

Specifications, documents explicitly and intricately defining the behavior of a piece of software, are commonly used in the information technology industry to help large teams design and iterate on complex systems. There are, traditionally, two kinds of specifications. Natural-language specifications are easy to read and write, but very challenging to write well, and their lack of formality due to over-expressiveness creates misunderstandings that lead to grave errors in implementation. Formal specifications lean on propositional logic and set theory from mathematics to construct rigorous specifications, but can be hard to read for those without advanced knowledge in discrete math and computer science. Computer scientists strictly believe that natural language is by definition informal, making it unfit for writing specifications, but natural language is still heavily used over formal specifications to describe complex systems. This presentation explores a possible marriage of these opposing approaches—a formal specification language that uses a prescriptive subset of the English language—through a toy specification language that can be parsed and error-checked by a computer program. By formalizing natural language in specifications, the barrier to describing and understanding mission-critical software can be significantly lowered.

Acknowledgments

I am glad to acknowledge:

Bibliography

[1]

S. Bradner, “Key words for use in RFCs to Indicate Requirement Levels,” 1997. [Online]. Available: https://www.rfc-editor.org/rfc/rfc2119

[2]

C.-C. Chiang, “TUG: An Executable Specification Language,” in 5th IEEE/ACIS International Conference on Computer and Information Science and 1st IEEE/ACIS International Workshop on Component-Based Software Engineering,Software Architecture and Reuse (ICIS-COMSAR’06), 2006, pp. 180–186. doi: 10.1109/ICIS-COMSAR.2006.85.

[3]

N. E. Fuchs, “Specifications are (preferably) executable,” Software engineering journal, vol. 7, no. 5, pp. 323–334, 1992.

[4]

I. J. Hayes and C. B. Jones, “Specifications are not (necessarily) executable,” Software Engineering Journal, vol. 4, no. 6, pp. 330–339, 1989.

[5]

A. van Kesteren and D. Denicola, “Infra Standard,” Apr. 05, 2023. https://infra.spec.whatwg.org/

[6]

H. U. Khan, I. Asghar, S. A. A. Ghayyur, and M. Raza, “An empirical study of software requirements verification and validation techniques along their mitigation strategies,” Asian Journal of Computer and Information Systems, vol. 3, no. 3, 2015, [Online]. Available: https://www.researchgate.net/profile/Ikram-Asghar/publication/281645652_An_Empirical_Study_of_Software_Requirements_Verification_and_Validation_Techniques_along_their_Mitigation_Strategies/links/55f2a17008ae199d47c4841c/An-Empirical-Study-of-Software-Requirements-Verification-and-Validation-Techniques-along-their-Mitigation-Strategies.pdf

[7]

B. Meyer, “On formalism in specifications,” IEEE software, vol. 2, no. 1, p. 6, 1985.

Review of Sources

H. U. Khan, I. Asghar, S. A. A. Ghayyur, and M. Raza, “An empirical study of software requirements verification and validation techniques along their mitigation strategies,” Asian Journal of Computer and Information Systems, vol. 3, no. 3, 2015, [Online]. Available: https://www.researchgate.net/profile/Ikram-Asghar/publication/281645652_An_Empirical_Study_of_Software_Requirements_Verification_and_Validation_Techniques_along_their_Mitigation_Strategies/links/55f2a17008ae199d47c4841c/An-Empirical-Study-of-Software-Requirements-Verification-and-Validation-Techniques-along-their-Mitigation-Strategies.pdf

DISTILLATION: The article is composed of three sections: a literature review of requirements verification papers, a description of various verification techniques, and a survey of Pakistani software engineers on their frustrations with requirements engineering and how verification helps to resolve them. The authors believe that requirements engineering (RE)—a form of software engineering where a piece of software is explicitly designed before its production—is a crucial part of software development. An important part of the RE process is verification and validation (V&V), where engineers make sure that the software they’re writing follows the specification they’re implementing and is being written in an orderly manner. They found that there are six common techniques engineers use throughout V&V: tracing (justifying the specification), prototyping, requirements testing, user manual writing, formal specification validation, and inspection. From their survey of 55 experts from all walks of life in the Pakistani software industry, they found that the greatest challenges with implementing specifications were ambiguous requirements and, to a lesser extent, inconsistency and incomplete specifications, and the best tools to resolve them were implementation inspections and prototyping.

CRITIQUE: This article helped me to better concrete my research question alongside justifying many of my concerns surrounding my idea of “written specifications as code.” It introduced me to the term requirements engineering, which made it much easier for me to find literature related to my area of research. It also confirmed one of my suspicions about requirements engineering: translating a spec to a functional piece of software is fraught with ambiguity and inconsistency, to the point that many systems must be constructed around RE in order for it to go smoothly. I believe that many of these issues would be resolved by a subset of grammatical and semantic English that translates to computer code. Because of a limited subset of words and senses,  it would be nigh impossible for there to be any semantic ambiguity in a specification. Because the specification can be translated to code, we can take advantage of existing tooling like linters to eliminate inconsistency. Having an executable specification also makes various V&V techniques irrelevant: the specification is the prototype and tests can be included in the specification as examples. An executable specification, therefore, allows humans to focus solely on writing a clear, error-free specification instead of having to juggle that with making an implementation. Now that I was sure that an answer to my research question would be valuable to the CS community, I used the article as a jumping-off point to explore how humans write specifications in various industries.

S. Bradner, “Key words for use in RFCs to Indicate Requirement Levels,” 1997. [Online]. Available: https://www.rfc-editor.org/rfc/rfc2119

B. Leiba, “Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words,” 2017. [Online]. Available: https://www.rfc-editor.org/rfc/rfc8174

DISTILLATION: These two reports discuss the redefining of a group of keywords to indicate how urgently parts of a specification must be implemented. Request for Comments (RFC) 2119 creates these keywords, a mix of UPPERCASE verbs and adjectives such as MUST and RECOMMENDED, and assigns them specific meanings ranging from “you must implement this in order for your implementation to be valid” to “feel free to do or not to do this.” It also gives a phrase specification authors should use in their documents to indicate the redefining of these keywords, and some advice surrounding when using these keywords is appropriate. 2119 was created in reaction to many RFCs using keywords like these but no standard definition for them. RFC 8174 extends 2119, fixing an ambiguity issue with 2119: the casing of keywords. While 2119 shares the keywords as UPPERCASE, specification authors making use of 2119 would make use of both upper-and-lowercase words, causing readers to be confused about if they should interpret “must” as normal English or according to 2119. 8174 resolves this by clarifying that only UPPERCASE words hold the meaning defined in 2119, and gives specification authors a new phrase to use in their documents.

CRITIQUE: I believe these articles capture perfectly the intentions of my capstone project. They both demonstrate how we can use prescribed grammar to help computers understand natural language and how the lack of a prescribed grammar can cause ambiguity issues. Because 8174 requires that both words be UPPERCASE in order to have their prescribed meaning, computers can parse out the defined keywords and apply their meaning. Normal usage of these keywords is still preserved, as well, enabling machine-readable language and human-readable language in the same document. The extension of 2119 by 8174 demonstrates how important it is to be clear in the communication of a specification to avoid ambiguity, which I believe my question can solve by putting more emphasis on specifications. I do have some qualms. I do not like the mix of verbs and adjectives as keywords nor the substitute terms (one can substitute MUST for REQUIRED or SHALL), as I believe there should be one way to declare something. As much as I do like the ability to switch between English and 2119 meaning through the casing, it is a definition overload, which always creates an opportunity for ambiguity. It’s also important to recognize that it’s much harder to convey 2119 in the spoken word because there isn’t a standard way to read UPPERCASE words in English. After looking at the RFCs, I began to explore the landscape of existing executable specifications and what they could look like.

C.-C. Chiang, “TUG: An Executable Specification Language,” in 5th IEEE/ACIS International Conference on Computer and Information Science and 1st IEEE/ACIS International Workshop on Component-Based Software Engineering, Software Architecture and Reuse (ICIS-COMSAR’06), 2006, pp. 180–186. doi: 10.1109/ICIS-COMSAR.2006.85.

DISTILLATION: This article describes Tree Unified with Grammar, a programming language for writing specifications that can then be executed like a program. Chiang argues that getting specifications right the first time is of high importance in requirements engineering and believes that more powerful specification tools can ease the burden of requirements engineering. He believes that the existing mechanisms for writing specifications, natural language, and programming languages, are too informal and too concrete respectively for the task, so designed his own mathematical syntax and interpreter for a functional programming language specially designed for writing specifications. TUG’s design centers around a mathematical syntax that describes a finite state machine, a common form of computer program, which pares and combs through an input. Instead of constructing a natural language specification from user requirements and later writing an implementation to verify it, one would write a TUG program and iterate on it in a process similar to prototyping.

CRITIQUE: I agree with Chiang's philosophy behind TUG. I too believe that having executable specifications can significantly ease the software development process, and believe that balancing abstraction with the concreteness of execution is an important part of designing a good system for executable specifications. However, I do not believe that TUG is the right solution for the audience that is the focus of my research: everyday engineers and the executives, designers, and software testers they work with. TUG doesn’t read like a specification or even a mathematical proof; it reads like code, and quite hard code to parse at that, even for someone like me who’s written in a variety of languages and in a functional style for almost three years. For people with beginner or no computer programming knowledge, a TUG specification might come off as a foreign language and exclude them from participating in the software development process. It is ironic that in many cases the “informal” natural language descriptions that precede the sample specifications in the paper describe and solve the problem much more succinctly than TUG itself.

I. J. Hayes and C. B. Jones, “Specifications are not (necessarily) executable,” Software Engineering Journal, vol. 4, no. 6, pp. 330–339, 1989.

DISTILLATION: Hayes and Jones feel that executable specifications shouldn’t be part of the requirements engineering process. In order to be effective tools for reasoning out the design of systems, specifications must be as abstract as possible and concern themselves with only the behavior of a system, not all the minute details of its implementation. This is in contrast with the concreteness required to make executable specifications executable. Because of the concreteness of executable specifications, it is impossible to describe certain kinds of algorithms such as those with infinite runtimes or ones that can have multiple equivalent answers for a single input, known as non-determinism. Even for deterministic programs with finite runtimes, making their specifications executable can place a burden on the specification writer—for example, there may be many ways to sort a list of elements, but a specification writer should only be concerned about a list being sorted, not how it’s sorted—and can result in an extremely slow program due to implementing execution methods such as generate-and-test, where one generates all possible outputs for input and verifies them one by one. Because of their executable nature, engineers may feel encouraged to use executable specifications as a prototype, but this can cause engineers to simply reimplement the structure of the specification when the time to write an implementation comes around instead of writing the most efficient and valid implementation possible.

CRITIQUE: I believe that abstraction is an important part of specification writing and software engineering, but most software is already semi-concrete by the time a specification is written; you’re already thinking about approaches to resolving your user requirements and balancing design constraints specific to your system as you’re writing your spec. An executable specification doesn’t need to be able to represent all programs; most written software is concerned with being executable and having a single correct answer. Even if a function that is vital to the specification cannot be directly implemented from a specification, we can still generate as much code as possible and then let the user fill in what’s most relevant to their situation. The performance of an executable specification can be bad if it’s property-based, but many specifications in computer science, cooking, medicine, and other fields are instruction-based, as in they tell you exactly what to do and how to do it. I believe that this kind of specification will enable us to treat the spec just like a programming language and will translate to efficient machine code. Ultimately, I believe that the benefits of writing a specification that can be executed, such as ease of error checking; closely matching the iterative process of RE; and enabling faster development of implementations—outweigh the cons. The authors suggest an alternative to executable specifications which I want to explore further: a wide-spectrum language that enables writing specifications and implementing them with a single syntax.

N. E. Fuchs, “Specifications are (preferably) executable,” Software engineering journal, vol. 7, no. 5, pp. 323–334, 1992.

DISTILLATION: Fuchs, in this paper, is in direct opposition of Hayes and Jones. He believes that executable specifications can be a powerful tool for solving one of the greatest challenges in software development: correctness. This is because executable specifications enable engineers to significantly reduce bugs and begin the verification process early in development because of their interactive nature. This interactive nature also allows your user to verify parts of a specification that can’t be formally verified: user experience. To show that these advantages can be kept while still attaining the high levels of abstraction desired by Hayes and Jones, Fuchs rewrites their example specifications in an executable specification language named LSL, showing that only a minimal amount of abstraction needs to be sacrificed to gain the powers of execution. Fuchs argues that true abstraction will never be attainable in specifications written for real-world applications due to specifications being limited not only by executability but also by the user requirements they’re implementing and the mutable nature of mutable user interfaces like filesystems. He does agree that executable specifications will likely be slow but argues that even having a slow specification is much more powerful than one that can’t be executed.

CRITIQUE: Again, I am glad to see that someone shares my vision that executable specifications can be a powerful tool for software development. I originally argued that the levels of abstraction Hayes and Jones desired in their paper weren’t practical for most software developed; the fact that it is possible to achieve both high levels of abstraction and execution at the same time sweetens the deal. The paper has given me two new paths of research to follow. The first is logic programming languages like the one used in the paper, LSL; Fuchs mentions others like Prolog and ML. I believe the ability to state behavior and then derive from that a runnable program is very powerful and may ease specification writing. The second is a technique I plan on using to implement my own grammar: source-to-source transformation. The idea is that we parse code according to grammar, and, instead of directly interpreting it or translating it to machine code, we rewrite the code in another language. This technique can be very powerful as it enables you to create your own language without having to also write an interpreter or compiler.

Outline (everything below this heading)

Presentation Outline

What is a specification?

Formal vs. Informal

Building It

  1. Gorr can be parsed by a computer, which makes the automatic linting, verification, and even execution of a Gorr specification possible, among a litany of other ideas.
  2. Any grammar written in EBNF is a formal grammar. The fact that Gorr has a fully defined structure almost entirely eliminates concerns about ambiguity. There is only one way to parse a Gorr specification. I will later show how this also helps with proving that there is only one way to interpret a Gorr specification.

Cutting room floor: One consequence of this design decision is that I had to make most of the mathematical and logical operations use prefix notation in order to avoid the parsing ambiguity of statements like  “3 plus 4 divided by 7,” which can result either in 3 [ 3 + (4 / 7) ] or 1 [ (3 + 4) / 7 ].

How My Approach Addresses Previous Issues with Natural Language

Why Natural Language?

If This Is Possible, Why Does Everyone Say It Isn’t?

The Gorr Specification Language

Expressions

Data Types

The Gorr language has three data types: the integer, the Boolean, and void. Integers in Gorr are signed. The void type is used to signify a function that doesn’t return a value; a variable, when declared, cannot be of type void, and by extension cannot be assigned the return value of a void function call.

Operations

Integers in our language have six operations—negation, addition, subtraction, multiplication, division, and modulo—that all result in integers.

In the case of division, because Gorr lacks fractional numbers, the quotient returned discards the remainder; for example, the division of 5 by 2 does not result in 2 ½, but instead 2. If one wanted the remainder of 1, they could instead use the modulo operation.

Booleans have three operations—NOT, AND, and OR—that result in booleans.

Integers have four comparison operators—greater than, greater than or equal to, less than, and less than or equal to—that result in Booleans.

Integers and Booleans share one comparison operator, equivalence, resulting in a Boolean. The equivalence operator must not be used to test the equivalence of two different data types; a specification that tries to equate an integer to a Boolean is invalid.

All operators are evaluated from left to right; Gorr has no operator precedence.

The Boolean, integer, and void types all share one operation that comes in the form of a statement: discard. This can be used to make a function call and dispose of the value.

Variables

A variable can be used to remember a value and retrieve it at a later time. Variable types are static; for example, initializing an integer variable with a boolean is invalid. Variables must be initialized at declaration.

Variables declared outside of functions are constant and cannot be re-assigned.

Variables declared outside of functions have a global scope; they are accessible from any future variable or algorithm declarations. Variables inside of a function or a control flow statement have a local scope. Once an algorithm ends, the variables declared inside are no longer accessible; variables declared in one algorithm cannot be accessed from another. Once an If statement or While statement ends, the variable is freed and can be used again in the algorithm.

Specification writers should not use lone value literals as names for variables; while [[ true ]] and true are semantically different in Gorr, it may be hard for human readers to parse.

A variable cannot be referenced before it is declared.

Algorithms

Algorithms in Gorr are used to encapsulate logic. They optionally can take a number of arguments and must have a return type.

A function must have a return statement at the end of all branches.

Control Flow

An algorithm has access to four forms of control flow:

An otherwise statement must always come after an if statement.


Examples

Factorial

The algorithm [[ factorial ]], with the signature integer [[a]] returns integer, does the following:

1. If either [[a]] is equal to 0 or [[a]] is equal to 1,

1.1. Return 1.

2. Otherwise,

2.1. Return the multiplication of [[a]] by call [[factorial]] arguments the subtraction of 1 from [[a]].

Greatest Common Denominator and Least Common Multiple

The algorithm [[ absolute value ]] with the signature integer [[ a ]] returns integer, does the following:

1. If [[ a ]] is greater than or equal to 0,

1.1. Return [[ a ]].

2. Otherwise,

2.1. Return the negation of [[ a ]].

The algorithm [[ greatest common denominator ]] with the signature integer [[ u ]], integer [[ v ]] returns integer, does the following:

1. While [[ v ]] is greater than 0,

1.1. The integer [[ previous u ]] is [[ u ]].

1.2. Set [[ u ]] to [[ v ]].

1.3. Set [[ v ]] to the modulo of [[ previous u ]] by [[ v ]].

2. Return call [[ absolute value ]] arguments [[ u ]].

The algorithm [[ least common multiple ]] with the signature integer [[ u ]], integer [[ v ]] returns integer, does the following:

1. If  both [[ u ]] is greater than 0 and [[ a ]] is greater than 0,

1.1. Return the division of call [[ absolute value ]] arguments the multiplication of [[ a ]] by [[ b ]] by call [[ greatest common denominator ]] arguments [[ u ]], [[ v ]].

2. Otherwise,

2.1. Return 0.


Grammar

The formal context-free grammar for Gorr, notated in Extended Backus-Naur Form, is listed below:

(* Utility non-terminals. *)

⟨text⟩ → ? a string of characters that doesn’t have “[[” or “]]” as substrings ?;

⟨space⟩ → (“ “ | “\t”)+;

⟨newline⟩ → “\n”;

⟨period⟩ → “.”;

⟨comma⟩ → “,”

(* Value literals. *)

⟨digit⟩ → “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9” | “0”;

⟨integer⟩ → “-”? ⟨digit⟩+;

⟨boolean⟩ → “true” | “false”;

⟨value⟩ → ⟨integer⟩ | ⟨boolean⟩;

(* Operations. All in prefix notation except equality. *)

⟨negation⟩ → “the negation of” ⟨space⟩ ⟨expression⟩;

⟨not⟩ → “not” ⟨space⟩ ⟨expression⟩;

⟨unary operation⟩ → ⟨negation⟩ | ⟨not⟩;

⟨addition⟩ → “the addition of” ⟨space⟩ ⟨expression⟩ ⟨space⟩ “and” ⟨space⟩ ⟨expression⟩;

⟨subtraction⟩ → “the subtraction of” ⟨space⟩ ⟨expression⟩ ⟨space⟩ “from” ⟨space⟩ ⟨expression⟩;

⟨multiplication⟩ → “the multiplication of” ⟨space⟩ ⟨expression⟩ ⟨space⟩ “by” ⟨space⟩ ⟨expression⟩;

⟨multiplication⟩ → “the division of” ⟨space⟩ ⟨expression⟩ ⟨space⟩ “by” ⟨space⟩ ⟨expression⟩;

⟨modulo⟩ → “the modulo of” ⟨space⟩ ⟨expression⟩ ⟨space⟩ “by” ⟨space⟩ ⟨expression⟩;

⟨and⟩ → “both” ⟨space⟩ ⟨expression⟩ ⟨space⟩ “and” ⟨expression⟩;

⟨or⟩ → “either” ⟨space⟩ ⟨expression⟩ ⟨space⟩ “or” ⟨expression⟩;

⟨greater than or equal to⟩ → ⟨expression⟩ ⟨space⟩ “is greater than or equal to” ⟨space⟩ ⟨expression⟩;

⟨greater than⟩ → ⟨expression⟩ ⟨space⟩ “is greater than” ⟨space⟩ ⟨expression⟩;

⟨less than than or equal to⟩ → ⟨expression⟩ ⟨space⟩ “is less than or equal to” ⟨space⟩ ⟨expression⟩;

⟨less than⟩ → ⟨expression⟩ ⟨space⟩ “is less than” ⟨space⟩ ⟨expression⟩;

⟨equality⟩ → ⟨expression⟩ ⟨space⟩ “is equal to” ⟨space⟩ ⟨expression⟩;

⟨binary operation⟩ → ⟨addition⟩ | ⟨subtraction⟩ | ⟨multiplication⟩ | ⟨division⟩ | ⟨modulo⟩ | ⟨and⟩ | ⟨or⟩ | ⟨greater than or equal to⟩ | ⟨greater than⟩ | ⟨less than or equal to⟩ | ⟨less than⟩ | ⟨equality⟩;

⟨operation⟩ → ⟨unary operation⟩ | ⟨binary operation⟩;

(* Leading and trailing spaces in the variable name are consumed by the parser. *)

⟨variable⟩ → “[[” ⟨space⟩? ⟨text⟩ ⟨space⟩? “]]”;

⟨algorithm call⟩ → “call” ⟨space⟩ ⟨variable⟩ (“arguments” ⟨space⟩ ⟨expression⟩ (⟨comma⟩ ⟨space⟩ ⟨expression⟩)*)?;

⟨expression⟩ → ⟨operation⟩ | ⟨algorithm call⟩ | ⟨value⟩ | ⟨variable⟩;

(* Statements. *)

⟨data type⟩ → “integer” | “Boolean”;

⟨variable declaration⟩ → “The” ⟨space⟩ ⟨data type⟩ ⟨space⟩ ⟨variable⟩ ⟨space⟩ “is” ⟨expression⟩ ⟨period⟩;

⟨variable assignment⟩ → “Set” ⟨space⟩ ⟨variable⟩ ⟨space⟩ “to” ⟨space⟩ ⟨expression⟩ ⟨period⟩;

⟨if⟩ → “If” ⟨space⟩ ⟨expression⟩ ⟨comma⟩;

⟨otherwise⟩ → “Otherwise” ⟨comma⟩;

⟨while⟩ → “While” ⟨space⟩ ⟨expression⟩ ⟨comma⟩;

⟨return⟩ → “Return” (⟨space⟩ ⟨expression⟩)? ⟨period⟩;

⟨discard⟩ → “Discard” ⟨space⟩ ⟨expression⟩ ⟨period⟩;

⟨pass⟩ → “Pass” ⟨period⟩;

⟨algorithm statement⟩ → ⟨if⟩ | ⟨otherwise⟩ | ⟨while⟩ | ⟨return⟩ | ⟨variable declaration⟩ | ⟨variable assignment⟩ | ⟨discard⟩ | ⟨pass⟩;

(* Algorithms. *)

⟨algorithm declaration⟩ → “The algorithm” ⟨space⟩ ⟨variable⟩ ⟨space⟩ ⟨comma⟩

(* Arguments*)

“with the signature” ( ⟨data type⟩ ⟨space⟩ ⟨variable⟩ (⟨comma⟩ ⟨space⟩ ⟨data type⟩ ⟨space⟩ ⟨variable⟩)*)? “returns” (⟨data type⟩ | “void”) ⟨comma⟩

“does the following:” ⟨newline⟩

(* Body *)

(<space>? ⟨digit⟩+ (⟨period⟩ ⟨digit⟩)* ⟨period⟩ ⟨space⟩ ⟨algorithm statement⟩ ⟨newline⟩)+;

(* Specification *)

⟨statement⟩ → (⟨variable declaration⟩ | ⟨algorithm declaration⟩) ⟨newline⟩;

⟨specification⟩ → ⟨statement⟩+;

Research Question

How can natural language be formalized to make the process of writing specifications for complex software easier?

Introduction and Claim

As cool as computer science is, it is not a one-on-one conversation with the machine. It’s a collaborative process that for the most part takes place outside the text editor. When dozens of people are all working towards the development of a single product, all of them sharing different literacy in computer programming and user design, they need a guiding force. That is the purpose of the specification. However, specifications can be hard to understand, and even lead to bugs and implementation differences, the things they’re meant to avoid. Natural language specifications can be challenging to write clearly, and formal specifications, usually relying on mathematical syntax, can be challenging to parse for the uninitiated.

In this presentation, I aim to show that the benefits of both informal natural language specifications and formal mathematical specifications can be combined with minimal downsides, while at the same time allowing said specifications to be read by both humans and computers. By writing in this ideal specification language, engineers can write specifications that can be read and verified by anyone with the appropriate domain knowledge and turned into an executable program, creating a single source of truth for an application. This can be accomplished through extremely restricted prescriptive grammar. Through this, specifications can be easily parsed by computers and humans can avoid ambiguity when interpreting them.

Background

The Requirements Engineering Process

heavily pulls from Khan et al., 2015

Evidence

Limitations