Parnas on Modularity
Table of Contents
Intro
For long time I wanted to dive into the ideas from David Parnas’s influential paper On the Criteria to be Used in Decomposing Systems into Modules. Paper gave birth of information hiding principle. The idea is simple and powerful although examples were hard to understand in modern engineering landscape. My goal is to explain the original paper in a approachable way and show that principles defined in 1972 are up to date in 2025.
The need for modules
Modules are known method of code organization. They are work assignment containers with clear reason to exist. Modular software has tangible benefits over blob of messy code. The importance of modules grows with the size of the system. Modular sofware have three main benefits:
- Project management – development time is shortened because people don’t step on each other’s shoes.
- Product flexibility – single module can change in isolation without impact on other modules.
- Comprehensibility – modules with clear responsibilities and contained functionality are easier to understand.
Those benefits are desired by software engineers but having modules
in a system is not enough. There are many ways to decompose a system into modules but only some have benefits. Let’s look into original example from paper to understand it better.
Two very different modularizations of the same system.
Key Word in Context Index
Imagine yourself searching for a word in index to learn all possible usages. This is exactly how Key Word in Context (KWIC) Index looks like. It contains all of the words for all of the lines in a text.
KWIC can be calculated with an algorithm that iterates over all of the lines from text. Each line is shifted circularly by repeatedly removing the first word and appending it to the end. Then all shifts are sorted alphabetically. Computed shifted lines are reordered to preserve context.
The final index is in the form of word: list of lines containing word
. The result of KWIC indexing will be larger than the original text. Below you can see an example sentence iterated to form a KWIC index.
Circular shifts of single line with indexes:
Now that we know the problem and theoretical solution lets explore some ways to code it.
Modularizations
First modularization
-
Module 1: Input – reads and stores the data in core for processing as an array of strings (
[[Char]]
) -
Module 2: Circular Shift – prepares an array with circular shifts, adds shifts to core (
[(FirstCharIdx, LineIdx)]
.) -
Module 3: Alphabetizing – uses arrays from 1 and 2 to order shifts. Result is an reordered array produced by Module 2.
-
Module 4: Output – uses arrays from 1 and 3 to pretty-print output as sentences.
-
Module 5: Master Control – controls interactions of modules, handles space allocations and errors.
Second approach
-
Module 1: Line Storage – manages data for other modules, provides functions and subroutines for clients.
-
Module 2: Input – reads input data and calls Line Storage to store it.
-
Module 3: Circular Shifter – creates circular shifts and returns them.
-
Module 4: Alphabetizer – reorders shifts using Line Storage
-
Module 5: Output – prints ordered shifts.
-
Module 6: Master Control – controls interactions of modules, handles space allocation and errors.
Comparison
Volatility
Modularizations can be evaluated against imaginary feature requests:
-
Input format change. In both decompositions this will be well isolated to one module.
-
Store text lines as linked lists instead of arrays. For the first modularization this will be a substantial change propagated to all modules because each of them uses core data. In the second decomposition it is isolated to the Line Storage module.
-
Calculate index on the fly rather than store in core. In the first decomposition it will be difficult because Output expects that all the shifts are calculated upfront. In the second decomposition the change can be isolated to the Alphabetizer, which will retrieve only a subset of the index.
Dependencies
The first modularization uses core data structure as its communication interface. The output of an upstream module needs to be finalized before a downstream one can take it over. Changes to this system propagate to all modules (e.g. change 2). All changes that alter the core cascade through the entire system.
The second modularization hides private data structures inside modules with functions and subroutines as an interface. Implementation can change without impact on other modules. Substantial modifications to resource handling are isolated to the Line Storage module. Second modularization doesn’t enforce a particular order of execution, which minimizes informal contracts between modules’ outputs.
Expected benefits
We can compare how two decompositions meet expected benefits of modular programming highlighted at the beginning of this article.
-
Code management – In second decomposition contributors depend less on each other, development is faster. More changes are isolated to one module, only input or output improvements are well isolated in both cases.
-
Product flexibility – drastic changes are better supported by second decomposition. The boundary between modules is slimmer and has less assumptions. Entire implementation can be replaced without impact on clients if the interface is stable.
-
Comprehensibility – in the first decomposition it’s harder to learn how modules work without learning their dependencies. The second decomposition uses functions and subroutines as an interface so clients need less information to use it.
The criteria
The second decomposition works better and uses information hiding as the main division criteria. Modules encapsulate logic with functions behind public contracts. Interface was defined carefully to maximize flexibility and minimize informal assumptions.
The first decomposition was divided as if it were a flowchart. Modules represent steps needed to achieve the result. This might be a good approach for small programs although it’s not scalable because dependencies are not isolated and changes are costly.
Even both programs may have exactly the same representation after compilation, they have much different code, which affects readability. Also they are different in specification and documentation. From that point of view modularity affects learning curve of the theory behind the system.
My thoughts
The article shows that it’s beneficial to give more thoughts to the design of the system but its difficult to objectively evaluate your own ideas. The tool I like to use is the list of currently planned features, possible feature requests, failure scenarios or security threats. Simply take your design and ask yourself: how it would perform under this stressor?
Another thing you can do is looking at modules boundaries. You can try to find the division of the system/classes that minimizes size of the interfaces and overall complexity. Can edge cases be ruled out by different design? Is there generic logic that could serve multiple modules? What would be the minimal set of operations to perform this task?
It might happen that during implementation you realize facts that affect foundation of the solution. Prototypes help to validate early ideas, the quicker you fail the better it is for the solution. It is a privilege to use LLM and fail quickly comparing to 50 years ago. In the paper Parnas mentions that experienced engineer would take 2 weeks to build KWIC system, while we can build it under 5 minutes!
Summary
Information hiding is foundational and Parnas shows why it’s an effective criteria to decompose systems into modules. Well-divided responsibilities result in organized, easy-to-read and maintainable code. The idea works in many layers—it’s effective in solving single-class responsibility as well as putting boundaries between domains in bigger systems.
Personally I enjoyed this paper because it introduces simple idea we take for granted. It was fun to uncover the original example and dig into state of software engineering in 1972.