Description: secure by design

By Dan Bergh Johnsson, Daniel Deogun, and Daniel Sawano

This article delves into DDD and models: what they are, how they relate, and how models work within Domain-Driven Design.

Save 37% off Secure by Design. Just enter fccsecure into the discount code box at checkout at manning.com.

 

Models as a tool for deeper insight

Let’s start with explaining what DDD means with model as those are at the center of DDD. In system development the word “model” is used for many things—UML diagrams over flow, how data is laid out in the tables of a database, and many other things. In DDD we use the word “model” to explain how we’ve captured our essential understanding of the business at hand as a selected set of concepts. Why do we need such models, and what should they look like?

Domain-Driven Design isn’t a silver bullet. Domain-Driven Design is at its best when your system handles a problem which isn’t trivial to grasp. In those cases the most critical problem’s understanding the complexity of the domain. Then, understanding and modeling that domain should be your main focus. If you fail to master the complexity of various technical aspects, you get a system which is less useful. But, if you fail to master the complexity of the domain, you get a system which is close to worthless. In that regard the domain’s the critical complexity.

Imagine you have a system that handles checked-in baggage at an airport. The complexity of the domain is probably your critical complexity. If you fail to properly represent how baggage is routed from check-in counters to airplanes, via conveyor belts and loading trucks, then the bags might not make it in time for the right flight, or end up on the wrong one. Passengers will be angry and the business will lose goodwill and money.

Even worse, there are important security aspects at stake. If a bag is checked, but the passenger doesn’t show up at the gate, then the baggage system must ensure the bag is unloaded. If the system isn’t properly crafted it might be possible to trick it into loading a bag onto a specific flight, or not unloading it—something that could have severe security consequences.

If you fail to capture a deep and precise understanding of baggage handling, you don’t only build a flawed system; you build something which is harmful to the business and potentially dangerous to the customers. It’s worse than bad, it makes the system meaningless. The airport might even be better off closed down than with such a flawed system in place. This isn’t a hypothetical example; the opening of Denver Airport in the 1990s was delayed a year and a half due to deficiencies in the baggage system.

In this case, understanding and modeling the domain of baggage handling should be the focus of your work. Spending time on optimizing your database connection pool would be a bad choice. The critical complexity is the domain. Failing to address the critical complexity makes the solution meaningless.

Now the connection to security. It’s hard to capture enough understanding to make a system which behaves well in all possible cases. It’s hard enough to do it for benevolent, “normal” data, with all the weird cases that can occur. It’s even harder to do it in a way which is resistant to malevolent data. Someone might try to attack your system by sending bizarre data to it, manipulating it into doing something bad. The system should still respond in a sound and safe way.

For security it’s essential to focus on building domain models. A lot of security problems are avoided as a side effect—especially business integrity problems, but to an extent it also shields the system from some technical attacks.

Some situations focus on modeling the domain when it isn’t the right choice. For example, if you write software for a network router, then I/O throughput is the most critical thing. Your critical complexity is technical in this case. But even here, you should consider whether a sloppy domain model might be a security issue. A critical complexity is always a potential. Be aware of whether it’s a technical aspect or the domain.

The main benefit of domain modeling is that it works as a vehicle for learning, at a deep level. And learning at that level is crucial. It’s not hard to “catch the lingo” of the businesspeople, and we can use that same lingo to write a requirements document that looks good. But without deep learning, such a document will contain subtle misunderstandings, inconsistencies, and logical loopholes. These flaws make it impossible to build a solid system which does the right thing in tricky situations — with security vulnerabilities as the worst consequence. Working in collaboration with domain experts to create a domain model fuels that learning.

What we need are domain models that support development in a stable and secure way.

For a domain model to be effective, it needs to:

  • Be simple to focus us on the essentials
  • Be strict enough that it can be a foundation for writing code
  • Capture deep understanding to make the system truly useful and helpful
  • Be the best choice from a pragmatic viewpoint
  • Provide us with language we can use whenever we talk about the system

Models are simplifications

A model is a simpler form of reality. It’s a simplification where we remove irrelevant parts. For example, when you check in a bag at the airport, there’s no need for the system to represent your shoe size. On the other hand, it’s probably relevant to represent the weight of the bag. To make it easier to understand and code the system, we create a model that contains the weight of the bag, but not the shoe size of the passenger. We keep the details which we think are relevant.

To be clear, models aren’t diagrams. In many other contexts “model” means a specific diagram type, such as an entity-relationship model, which is often used for database design, or the class diagram from UML. These diagrams are representations of the model, but the model is the conceptual understanding of how our simplified view of reality works. Database design is pertinent to the software development of your website, if you are in need of someone to help you with database design, visiting this website, www.cosource.com.au/database-design can show you how you can have a company do this for you so your website can achieve what you want it to.

The usage of “model” in Domain-Driven Design is closer to another use of the word, in the phrase “model train.” When building model trains the builders put a lot of effort into keeping some aspects of reality, while totally ignoring others. What details to keep and what details to distort is key to building train models, as well as domain models.


Figure 1. A model train looks like the real original train.


It’s no doubt that what we see in figure 1 is a model train. It looks like a train and moves around on rails, but it isn’t a real train. We consider it a train model because it kept some important attributes, while we allowed it to disregard some other attributes.

Let us list some attributes the model has in common with reality:

  • Color—We think that the model of a specific train should have the same colors as the original train.
  • Relative size—We expect the proportions to be maintained. If the doors are twice as high as they are wide in reality, we expect the same ratio on the model train.
  • Shape—We expect the model train and its details to have the same shape, such as the curvature of the front window.
  • Movement—We expect the model train to move along rails in the same way as a real train does.

Let us also list some attributes where the model differs from reality, and where we think the difference is fine:

  • Material—It’s OK that the model train is made out of plastic or tin, when the original was built from other materials.
  • Absolute size—If the real wagons were thirty meters long, we are fine that they’re much smaller in the model.
  • Weight—The model is much lighter, which is OK.
  • Method of propulsion—The model doesn’t have a steam engine; it runs on electricity.
  • Rail curvature—The curves in the model are much tighter than in reality, which we accept.

Strangely enough, it’s easier to find differences between the model train and a real train than it is to find things which they have in common. Still, we have a firm opinion that this is a proper model of a train. Clearly this specific model has managed to capture the essentials of our understanding of a train.

It seems like “color, relative size, and movement” are enough for us to understand that the model’s a train. These three attributes are necessary — if the model doesn’t fulfill them we won’t play along and pretend it’s a train. And these three are sufficient — if the model fails to fulfill some other expectation, such as material, we’ll still play along and pretend it’s a train. At the end of the day, a model is a simplification of reality, a simplification we still accept as valid representation of the real thing.

We now leave the realm of toys and take with us the idea that a model is a simplified understanding of the real thing. This goes for the models we use in system development as well. If we model a person, we might choose to grab onto a few attributes: a person has a name, is of a certain age, has a specific shoe size, and optionally has a pet. Agreed, this is a crude model, but a model nevertheless.


Figure 2. One possible model of people and pets


A model may be a simplification, but it must still be general enough that we can capture some variations that we think are interesting. In our example, we want to allow different names, different ages, and different shoe sizes, and we allow people to have pets or not. All these differences we allow to show up in the model. We don’t make any distinctions between people of different height, or pay any attention to their hairdo.


Figure 3. Joe, age 34, shoe size 9, and his dog Zarphac together with Jane, age 28, shoe size 6, no pet


We can represent this model in many different ways. We can use plain text to explain what we mean. We can use different kinds of diagrams to illustrate it. We can use code: pseudocode or actual code from a programming language. The important point here’s that none of these representations are the model. Class diagrams in particular are often confused with being the model, but the model as such isn’t any of the representations. The model is the conceptual understanding of what we consider as essential in our modeling—in this case name, age, shoe size, and pet.


Figure 4. The same model as before, but another representation


The main benefit of keeping models as simplified versions of reality is that simple models are easier to make strict, something which is essential when we later build software from them.

Models are strict

The domain model isn’t a watered-down version of reality; what it loses in richness it gains in strictness. People are complex beings with lots of attributes and lots of relations. When we decide to focus on name, age, shoe size, and pets, we lose a lot of richness. But we gain precision in what we mean by “person”—a precision that makes it possible to represent this entity in software. The folks that understand the domain are called domain experts.

Writing software is a collaboration between two kinds of professionals who come from different directions and who need to meet in a productive way: the businesspeople and the developers. Each have different needs which must be fulfilled to create great software. Businesspeople need to see the terminology they are familiar with, not quasi-technical mumbo-jumbo. If they don’t recognize their domain, we’ve failed them. But it’s not enough to use familiar words as labels in the user interface or in the headers of printed-out reports. The system must also behave in a way that businesspeople think is reasonable, consistent, and understandable.

For this to happen, the domain model has to be strict. If the model isn’t strict and contains ambiguities, then one part of the system might behave in one way and another part in another way. For example, a screen at the check-in counter might talk about “number of bags,” another at the gate might say “baggage count,” and the tablet used by loading staff might say “luggage.” To make things worse, some of these terms might count the carry-on as part of the number, while others don’t. Whenever the personnel speak to each other they each must remember what screen the other person is seeing and remember to add or subtract the carry-on from the number they’re seeing. Sometimes there are misunderstandings and bags are lost. The system fails the business, and not even the domain experts think it makes sense. Many almost-synonyms describing the same concept is often a sign that a model isn’t strict.

Another shameful variant is when a model is consistent in the terminology, but too lenient in its constraints and relations. This is often the result of using a “standard system” and “configuring it” to the domain. This is the usual way of working with, for example, Enterprise Resource Planning (ERP) products. ERP where first created for the manufacturing industry, to plan the usage of machines and raw materials. As factories differ, ERP systems were made highly configurable to suit the needs of each factory. Nowadays, they’re often described and sold as “standard systems” which can be configured to handle any domain, whereas under the hood they’re still the same flow-of-materials system. But this line of business has successfully sold such system to handle customer complaints, police investigations, or other completely different domains. Unfortunately, successfully selling is one thing, and successfully delivering value is another thing.

If you want to configure a flow-of-materials system to handle police investigation, you need to do some non-intuitive abstractions: “A police can be seen as a machine, and a report about burglary can be seen as a pile of raw material, which is refined by the police machine during the investigation.” In order to shoe-horn one domain into another you need to be less and less specific, less and less precise. The result is often a general “object management system” where everything is an “object.” Through the user interface you can update the attributes of the objects, but it carries little understanding of what those objects represent. You can often fill in any combination of attributes and relationships. A system which is too lenient is prone to mistakes, and such lenience can even result in security flaws. It takes both happy businesspeople and happy developers to make a good system. Both groups need their basic needs fulfilled.

It’s important to pay attention to the businesspeople. They need to recognize the domain they’re used to working in, and we should choose terminology that’s familiar to them. It’s a big mistake to fail to meet the needs of the domain professionals. It’s an equally big mistake to fail to meet the needs of the other professionals: the developers. As developers, we need strictness. It isn’t good enough to say that “most people have one pet.” We need to know if “having a pet” is strictly restricted to having only one.

This is where it takes some courage to be a developer. We need to ask the questions that make the model strict, without ambiguities. If we ask, “Can there be more than one pet?” we might get the answer, “Oh, that is really unusual.” This leaves us with two options. Either we think, “Then I need to allow for a list of pets,” or we think, “Just one pet allowed.” In the first case we end up writing a system with possibly more complexity than necessary, and sooner or later some weird combination will occur. In the second case we disallow multiple pets, only to get hammered a few months later when it turns out that there are some customers—perhaps customers we get when acquiring another company—who have two or more pets. To add insult to injury this can even turn into blame shifting toward us, with businesspeople saying, “We told you it could happen.”

The way out of this dilemma is to actively ask what should be in the model: “Shall we allow for multiple pets, or shall we restrict to having only one?” Deciding whether the unusual multi-pet people should be covered or not isn’t a technical decision—it’s a business decision. If we don’t have system support for them, then they must be handled through a separate manual routine.

On the other hand, providing scope for lots of diversity doesn’t come for free either. It’s tempting to allow for more and more general models; sooner or later everything is in a many-to-many relation with everything else, but that doesn’t make anything better in the long run. It can be hard to foresee and get an overview of the ramifications of a general model.

Say there’s a function which allows one person to “swap pets” with another person. If we also allow for multiple pets per person, then we need to figure out what it means to swap pets. Does that mean person A gets all of the pets of person B, and vice versa? Or do we only swap one pet?

If we don’t let the model reflect the business domain, we let the businesspeople down. If we don’t let the model be strict, we let the development people down. A good model must reflect the business domain and be strict.

Having a model be strict means that we’re able to build code using the model as a foundation. A good model must reflect the business domain and be strict.

When we design software we make similar choices. We make simple representations of complex phenomena. Let’s have a look at a schoolbook example of object orientation where lots of attributes and relations are ignored, and only a narrow view of a person is left:

  
 class Person { ?     private String name;
     private int age;
     private int shoeSize;
     private Animal pet;
     void growOlder() {
        this.age++;
     }
     void swapPetWith(Person other) {      ...
     }
 }
  

? The model of the domain concept “person,” captured as code

In this design we’ve removed tons of attributes and behaviors that a person might have, reducing it to four essential attributes. Leaving out details might seem to make the system poorer, but it gives us a great benefit.

What we gain by leaving out details is the possibility to be precise. In the domain of people, a “person” is a complex being with complex interactions, but in our model of the domain, a Person is something that has a name, an age, a shoe size, and the ability to grow older. Period. This is exactly what we mean when we use the word “person.” What we lose in richness we gain in precision.

 

Some terminolgy

Domain—A part of the real world where stuff happens, for example the domain of baggage handling

Domain model—A distilled version of the domain where each concept has a specific meaning

Code—An encoded version of the domain model, written in a programming language

 

Models capture deep understanding

The recent example of modeling a person’s laughingly simplistic. Real-world problems are much more intricate—as airport baggage handling. The strict understanding that we capture in a domain model is deeper than most people think. In fact, the knowledge we need to capture is even deeper than the understanding most domain experts exercise in their day-to-day work, when they handle situations on a case-by-case basis. The reason for this is that we don’t only need enough understanding to work in the domain, we need an understanding deep enough to build a machine. Let’s compare this with the challenge of riding a bike.

Most of us are experts at riding a bike. We can prove it by taking a bike and riding it, even in pretty challenging conditions, such as on a bumpy road and in windy weather, and perhaps even while carrying a large package under one arm. That takes expertise — compare it with the difficulties faced by a child who’s learning to ride on flat ground on a nice sunny summer day.

This expertise is comparable to the expertise of a domain expert. They know how the domain works. For example, a shipping expert knows how to route cargo containers even when conditions get tough, such as when a container’s mistakenly unloaded from a ship and there’s no other ship leaving for the same destination for a substantial amount of time. The domain expert can handle even tricky cases, taking each case on its own.

Unfortunately, the understanding we need to write a software system is even deeper. We don’t have the luxury of being “at the site” to handle any situation that arises, of being able to assess and improvise to resolve the situation. We’re writing a program that should do this, without us (with all our expertise) being there in human form. The challenge we face isn’t like riding a bike, but more like building a bike-riding robot.


Figure 5. To build a bike-riding robot, you need deep understanding of how to make a right turn


If we’re to build a bike-riding robot, our understanding of bike riding needs to be much deeper than most experts have, even professional bicycle messengers or BMX pros. For example: how do you turn right while riding a bike? Think about it for a few seconds—you’ve probably done it a thousand times. Most people spontaneously answer, “I pull on the right handlebar.” Unfortunately, doing this would cause you to fall to the left, down onto the asphalt, due to the centrifugal force. What you subconsciously do is turn the handlebars left, causing you to fall to the right for a short period of time. After a few milliseconds you tilt right to the appropriate angle, and then you turn the handlebars to the right—taking you into a right turn. Your angle leaning to the right is exactly what’s needed to compensate for the centrifugal force, and you turn right, safe and stable. You do this without thinking, and without understanding the subtle kinesthetic mechanics—Classical Mechanics by Herbert Goldstein is an excellent book on the subject. But if we want to build a bike-riding robot, this is the required depth of understanding.

This bike-riding robot story gave us some bad news and some good news. The bad news is that, if we look inside the head of a domain expert, we find no ready-to-go model. No “true” model is inside there. We can’t ask the domain experts and expect to get all the answers we need. The good news is that working together with domain experts to craft a model is a fun and rewarding job. Doing this is an iterative process of exploring lots of possible models and choosing one which is appropriate for solving the problems we have at hand.

Making a model means choosing one

One of the usual myths of modeling is that there’s a “true” model somewhere, often thought to be embedded inside the head of the domain expert. This isn’t the case. Making a model involves an active choice between many possible models, and we need to choose the one that best suits our needs.

In Domain-Driven Design we sometimes use the phrase “distilling a model.” Let’s compare ourselves for a while with a whiskey distiller. The whiskey distiller starts with a large batch of fermented wort — something basically undrinkable — then adds some heat and collects the vapors. The distiller throws away the first part, which contains acetone. The middle part consists of most of the alcohol, some of the water, and the natural flavors that are dissolved therein. This is considered the good part and is kept. The last part consists of some alcohol, a lot of water, and some less attractive flavors. This is also discarded. What’s kept is what we call whiskey. Your personal attitude toward whiskey or your tastes might vary, but you get the point. When we distill a model, we throw away some parts of reality and keep others.

The important point here is that there are many ways for a distiller to do their job. They’ve a choice. Keeping the middle part is a choice, because the objective for the distiller is to get a high-alcohol result with some specific flavors. But the distiller could have made other choices. Had the distiller wanted acetone instead, then the distillation would have looked different. The distiller would have kept the first part and thrown away the rest. In the same way, we can distill different models from the same reality depending on what we intend to use the models for.

Our model describing a person with name, age, shoe size, and pet is one model. Another model could be to describe a person by date of birth, place of birth, mother’s name, and father’s name. Neither of these two models is more correct than the other. Each is different and good for its designed purpose. If we are keeping a registry for a dog owners’ club, the first model is clearly superior to the second. If we’re studying how a family has spread across the world through migration, the first model is worthless, and the second is excellent.


Figure 6. Two different models of people?—?good for different things


When doing modeling, actively try to find different models that express your domain. Try to find three different models and compare how good they are at expressing your domain problems. Finding a good model is important, because it makes it possible to talk about the domain in an efficient and unambiguous way. A good model forms a language.

The model forms the ubiquitous language

An interesting aspect of modeling is that the model creates a language—the language we speak about the system.

To start with we must realize that when domain experts speak with each other they use a language of their own. This is the domain language. It might sound like English, but it’s in one regard a subset of English — there are a lot of common English words that aren’t used in this domain-expert language. In another regard it’s a superset of English — there are a lot of domain-specific terms and idioms that aren’t used in common English. What domain experts speak to one another is a language which is geared to enabling effective communication.

Take a moment to consider the domain-expert language of system developers. Among ourselves we easily throw around terminology that makes perfect sense to us, but it’s completely impossible to understand for non-developers; we might “pool the connections” or “make that a strategy.” And the domain experts of finance, logistics, or health care have their own lingo too.

If we’re building a logistics system, it seems like a logical approach to take the terminology from logistics and encode that as a software system. This is a wonderful idea, but unfortunately flawed. The language used by logistics experts isn’t logically consistent. This isn’t because they’re particularly sloppy with terminology. Software developers are equally sloppy with their terminology. Listen in on any two seasoned developers talking, and you’ll find that they might use the words “object,” “instance,” and “class” interchangeably, as if they were synonyms. And we know they aren’t, because when we explain object orientation to beginners we are careful to distinguish between “classes” and “objects.” But when two experts discuss, they can be sloppy because they understand each other anyway and the real discussion is elsewhere, on a higher level. Don’t turn into the language police, correcting domain experts when they talk to each other. Allow them to be sloppy, and yourself when talking to your peers.

If we’re building a logistics system, wouldn’t it be wonderful if we could form a language where we can talk about the system in a precise way without the risk of misunderstanding? This is exactly what the model is. If we’ve jointly, between logistics experts and developers, decided that a “leg” means a transportation from one place to another using the same vehicle, and we’ve decided that “terminating a leg” means that the cargo’s unloaded at the destination, then we can use those terms and make ourselves understood. If we say, “If two transports terminate a leg at the same dock then they can be co-transported on the next leg,” then that phrase can be unambiguously understood and the functionality can be implemented.


Figure 7. The domain model forms a language in common.


When discussing the functionality of a system, use the words and phrasings which are part of the model. By doing this you’ll quickly realize whether the functionality can be implemented. If it’s awkward to express the functionality using the terms from the model, this is a sure sign that it’ll be awkward to implement. It might be a sign that the model needs to be extended to contain a new term, and the system refactored for consistency.

Using the terminology of Domain-Driven Design, we want the model to become the ubiquitous language when talking about the system. By ubiquitous, in this case, we mean that the terminology should be used everywhere we talk about the system. The same terms should be used in the user interface, in the manuals, in the requirements or user stories, in the code, in the database tables. Why call something a “quantity” in the user interface, refer to it as an “amount” in the manual, and name the database column “Volume”? Insisting on using the same language across disciplines helps find ambiguities which could manifest as bugs or security flaws.


Figure 8. The model’s ubiquitous: “quantity” is used consistently all over the place.


It’s worth pointing out that the persistence model might be slightly different from the conceptual model. For example, we might have to split concepts into different tables, and we might need to join tables or synthetic keys which aren’t part of the conceptual model. In the same way, the classes in the code might be slightly different from the terms used in the conceptual model, for implementation-specific reasons. Nevertheless, the understanding we capture is still the same, and we try to use terminology from the ubiquitous language as much as possible when we name our constructs (classes or database tables).

This doesn’t mean we’re turning into a language police force. The model, or the domain model language, is the ubiquitous language when talking about the system. The domain experts are still allowed to use their ambiguous domain language amongst themselves, in the same way as developers are allowed to be sloppy about “objects” versus “classes” in discussions with other developers.

The important point about being precise in the ubiquitous language is that when we talk about the system we need to be precise. This is especially important when business experts and developers interact, and the risk of misunderstanding is the highest. In these situations we should insist on using the terminology of the ubiquitous language. Insist on using the words from the domain model in any requirements document. If something is hard to express in the terminology of the domain model, it’s probably hard to write as software.

It’s also worth pointing out that because language is ubiquitous doesn’t mean that it’s universal. It’s the ubiquitous language when talking about this specific system, not for talking about other systems (even other logistics systems). Different systems have different needs, and different focuses. They have different models and different languages. Each domain model language is the ubiquitous language within its realm, but not outside. The context for the language has an outer bound. In Domain-Driven Design we refer to this as the bounded context for the model. Within the bounded context each word in the model has a well-defined meaning, but outside the bounded context words can mean something completely different.

And that’s all for now—you’ve probably had enough of discussion about models for the moment…but if you haven’t, go check out the book on liveBook here and see this slide deck.