I like the history of computer science. It enriches my daily activities and helps me gain perspective. A nice piece of historical reading I did recently was an article by Peter Naur[2], known from Backus-Naur notation. He won the 2006 Turing Award for ALGOL60 contributions. His paper Programming as Theory Building[1] introduces a mental model for software development that I find useful daily.

Although the paper is from 1985, it changed my understanding of the software life cycle. It’s centered around a single idea—Theory Building—which explains why you can’t focus only on code and docs. In the following sections, I explain ideas from the paper in an approachable way. My opinions and conclusions are at the end.

The author defines programming as an activity in which programmers form certain kind of insight, a theory, that matters at hand and contrasts to a more mechanistic definition where programming is mere production of a program and certain other texts. Theory Building is about human experience and knowledge[1].

The Need for Theory Building View

The idea of Theory Building might seem daunting or intangible at first. However, a few examples and a clear definition will make it familiar. Engineers who view software development as Theory Building have a better understanding of their job. The article mentions two real cases when both documentation and code failed to explain the system, so only experienced developers could come to the rescue.

The first case is about George who created a compiler. Bob and his team picked Georges’s work and built on top of existing code. It turned out that Bob applied ideas that were previously analyzed and rejected by George. Additionally, Bob added redundant modifications because he didn’t know that features were already supported. After 10 years George described Bob’s outcome as an amorphous set of changes to the original design.

The second example mentions Jane who implemented real-time fault monitoring system for production lanes. It was adapted to specific needs on every deployment. Jane and her team of experienced programmers were responsible for it. They witnessed the birth of the system and participated in core design decisions. When Anne with her team tried to help with installation, they encountered difficulties that couldn’t be resolved merely from documentation and code. Jane’s team was asked to rescue the project because of significant experience. They relied on informal knowledge of the system.

Definition

a typed approach to Theory Building View

a person who possesses theory […] knows how to do certain things and in addition can support the actual doing with explanations, justifications, and answers to queries about activity [1]

The definition of Theory Building relies on achievable outcomes. There is no procedure to follow for a programmer to have the Theory. It’s all about what you can do or not.

A person who has the ability to connect pieces together will learn quicker. The theory relies on the connection of past experiences to new requirements. It’s hard to express resemblance teaching someone else. Musicians learn by practice under the supervision of skilled teacher who helps progress. It’s analogous to the Theory of a program which is as hard to learn as similarities of many other kinds [...] such as human faces, tunes or tastes of wine.

This definition is focused on the capabilities of a person. It’s practical and emphasizes informal skills that need to be learned and internalized. A fresh person who just started to build knowledge cannot teach others. Mastery requires insights and allows to cover subject in an organized way.

Personal Experience

Theory Building View raises personal experience above documentation and code. Here are 3 things that matter the most:

Theory helps to understand the program in the context of the problem. A person who possesses Theory knows how real life matches the program and what falls outside of it.
Theory allows to explain each part of the program. A person who knows Theory understands conventions, the history of changes, rejected ideas, chosen principles, and reasoning. Everything is covered by the Theory.
A person with Theory is able to effectively respond to new requirements and support them with code. They can find the best implementation reflecting similarities to existing code.

Cost of modification

The theory’s primary goal is to support modifications of the program. After all one hopes to achieve a saving cost by making modifications of an existing program rather than by writing an entirely new program.[1] Do big physical constructions like bridges support making modifications? Maybe, but not at low cost. Small modifications can be done but substantial are expensive. Sometimes even more expensive than a replacement.

Program modifications are not only text changes. Code is just a medium to capture ideas that translate computation to utility. It’s not easy for newcomers to modify code confidently so we use best practices. Guidelines recommend you to extract constants, add config values, or design for extension. It helps, but the more flexible the code is the more it costs to maintain it. First, you need to know which parts are flexible, then test them, document them, and support them. Finally, the newcomer has to onboard to all abstractions that should have been providing flexibility but are leaking assumptions.

Another costly issue is that the program can satisfy new requirements in many ways. Some of them may be amorphic to the current design, but others are simple and elegant. Code modification viewed merely as text manipulation skips the notion of quality of contribution and hinders future modifications. The feature that works now may be surprisingly dead in two weeks after a change to an unrelated part of the code.

Program lifecycle

The theory is hidden and hard to express. Relies on humans and dissolves with people who change projects. You might get sick or retire. Regardless of team composition system is still useful. Old programs produce valid results even Theory is already gone with its creators.

The life of a program starts with birth. The foundational team boots the system up and builds Theory along. After a few years, the team shrinks so modifications are harder. Less and less people know Theory and program dies. But death is not an end. The program can be revived when new members recreate Theory from existing artifacts.

New hires can speed up Theory Building and learn from experienced colleagues. Working hand in hand with former members is most efficient. Theory learning is like writing and playing music instruments. The most important educational activity is the students doing the things under suitable supervision and guidance [1].

Newcomers cannot revive Theory on their own merely from documentation. It requires humans who already know it. Documentation and code are hard nuts to crack. Sometimes even harder and more costly than building an entirely new (sub)system.

Theory Building View goes further and suggests that during revival existing program text should be discarded and the new-formed programmer team should be given the opportunity to solve the given problem afresh[1]. This is a frustrating process. It requires trial, error, and a lot of time. The new team is confused about the dilemma: blindly navigate through existing rules or reinvent design and gain understanding.

A theory formed by a new person will likely fail the rails of the original Theory. Similar challenges are faced when the program evolves. Original ideas are gone or dispersed. Developers deal with the revival phase all the time.

Lifting the Theory

Programmers lift heavy weights to uncover the Theory, luckily article proposes 4 ways to train your muscles:

You can invest in education - understand principles, learn algorithms, and frameworks, and use conventions. You can experience some pain to gain experience - bugs, alerts, and incidents.
You can write better documentation. As Kent Beck suggests [1] the documentation is only good if it helps the next programmer build an adequate theory of the program. So for the sake of Theory docs should focus on metaphors, major components, and their interactions. Preferably with a lot of visuals with well-structured content.
You can write clean code which in turn will determine how easily the reader can build a coherent theory of the system[1].
You can try to adhere to methodologies. Use waterfall, scrum, or whatever helps. However, according to the author Theory cannot be expressed so methodologies may not be helpful.

My thoughts

So far I was referring paper On Theory Building View by Peter Naur[1]. In this section, I want to focus on my opinions and conclusions. I’ll start with a general opinion and then propose 3 ways to improve the Theory.

Overview

The author draws the line between positive programming (Theory Building) and negative programming (people type code like machines). I think in 1985 mechanistic paradigm was more prevalent. I assume coders were treated as tools in the hands of business analysts or stakeholders. From my experience, things have changed (in some companies at least). I was lucky enough to never feel like a machine with a processor in my head. My voice mattered for product and design.

Despite being few decades old, Theory Building is a timeless model for software development. It’s important for new members who have a difficult onboarding. As well as it is important for existing members to keep code clean. The theory emphasizes how difficult it is to build muscles for lifting a system. Good trainer is crucial. New joiners should try to reveal real knowledge that brought the system to life.

The theory is divided into 5 components. The article mentions them implicitly but I find it useful to make it explicit. Names are of my invention but they are illustrative:

Individual System Theory -> the knowledge a person has about the system
Group System Theory -> complementary knowledge of all members of a team
Practical Industry Theory -> languages, frameworks
Theoretical CS Theory -> algorithms, type systems, modularization, paradigms, design patterns
Personal Exposition -> encountered bugs, habits, solutions to common problems

How to improve Theory?

I’ve identified 3 ways that will help you to improve the Theory:

Keep it small
Keep it simple
Help others

Small Theory

Numer of ISO standards [6]

The smaller the Theory is the faster you will learn. Problem solving is hard. People look for existing solutions. A growing number of ISO standards shows the needs of acknowledged solutions. Experts gather together, think about the problem and prepare guidelines. It applies everywhere from software development to healthcare. Since the time original article was written, lots of ISO standards were added. For example networking was standardized with HTTP and authentication with OAuth. Good solutions are not exclusive to the ISO committee. Good solutions are built by anyone. Engaged individuals work hard on good open-source libraries available publicly.

Engineers reuse existing software. Starting with operating systems up to domain-specific libraries. Programs are built using components that solved painful problems. Consider a web server - do you build a new one from scratch? I don’t. The same goes for ORMs and testing libraries. Programmers immediately know their purpose and use cases and that reduces amount of new theory to build.

Another factor that shrinks the Theory are design principles like DDD[3] or Clean Code[4], which help to isolate code with the highest variability - your domain. You can use the same database and queue in all your jobs. But only some companies deal with healthcare problems and only yours deals with mobile App for patients with pacemakers. Imagine you have been onboarded to a project that matches the tech stack from your experience. The only missing piece is the actual problem new project solves.

Cherish Simple Theory

Nice learning resource for Theory is polished codebase. If you keep it clean, future developers (including yourself) will have an easy time understanding it. Adopt the practice of refactoring.

Dead systems may need revival, but developers don’t have the motivation to rebuild the project from scratch. Small continuous improvements make Theory more accessible. Important parts of a program should be refactored more often for clarity.

Another useful tool popularized since 1985 is testing. Nowadays tests are industry standard. Developers test on many layers - from unit tests to e2e suites. In methodologies like TDD developers create tests before code to ensure reliable software. Tests are helpful to maintain consistent velocity and add confidence of change.

A good teacher is a blessing. Someone who guides you through the code and corrects errors on the fly is irreplaceable. In software you can achieve it using code reviews. Git and Mercurial were created in 2005 which is 20 years after the article was written. Pair and mob programming is another way to brainstorm problems. They are effective to land better solutions and help sharing intrinsic knowledge. A new person can ask questions and receive immediate answers.

On the organization level, Architectural Advisory Forum meetings[5] are a good practice that helps to share Theory along with quality docs. Engineers gather together to discuss decisions of important changes. Documents explain decisions in present context and facilitate feedback across the company. It helps less experienced members to understand drivers and principles as well records current state of knowledge.

Conclusion

Theory Building View is a useful model and it’s up to date for 2024 software development. Since the article was written industry applied tools to improve Theory Building.

Shrink the Theory - isolate domain in the center and learn what is available
Maintain better Theory - practice refactoring and testing
Learn Theory faster - use code reviews and decision records

Personally, I will keep Theory Building View close to my heart. It helped me to understand that modern programming practices fall close to old ideas.

Sources

[1] Peter Naur – Programming as Theory Building (1985)

[2] https://en.wikipedia.org/wiki/Peter_Naur

[3] Domain Driven Design - Eric Evans

[4] Clean Code - Robert Martin

[5] Andrew Harmel-Law, Scaling the Practice of Architecture, Conversationally https://martinfowler.com/articles/scaling-architecture-conversationally.html

[6] Aguado Eduardo, ISO 9001 certification in the American Continent: statistical analysis and modeling