From Domain-Specific Languages Made Easy by Meinte Boersma

A domain-specific language is a software language that allows domain experts to capture their knowledge in a precise enough way to make that executable.

The following article is a standalone excerpt from the opening chapter of my book “Domain-Specific Languages Made Easy” for Manning Publications. This book is going to be available spring 2021, but is already part of the Manning Early Access Program (MEAP). You can find its product page at: https://www.manning.com/books/domain-specific-languages-made-easy and take 40% off by entering fccboersma into the discount code box at checkout.

This article contains ideas, excerpts, and material from chapter 1: “What is a Domain-Specific Language?”


Figure 1. The DSL-based (“model-driven”) approach to developing software systems.


The book focuses on implementing Domain-Specific Languages (DSLs) for business domains. With such DSLs, domain expert can capture their extensive domain knowledge as DSL content using a Domain IDE. That knowledge is often “made executable” by generating code from it. This generated code implements the domain-specific part of a software system which we’ll call the Runtime. This approach is usually called model-driven software development, with the DSL content more commonly known as the model.

What is a DSL?

A DSL is a software language that’s domain-specific. A software language is a language that’s written and read by humans, but also automatically processable by computers, and supported by software. In this section, we’ll figure out what it means for a software language to be domain-specific, and what the key aspects of a software language are.


Figure 2. What is a domain?


As a first step, let’s see what the D stands for. A domain is a particular, focused area of knowledge. It comes with a group of domain experts “inhabiting” or “owning” the domain: people possessing, shaping, extending that area’s knowledge, and sharing it with business stakeholders and other interested parties. You could even argue that a domain often exists in the first place as such a group of domain experts, with that group defining the domain in an ad hoc manner.

Domain experts typically converse with each other using a domain-specific ubiquitous language – even if they don’t realize they do. Although the term “domain-specific ubiquitous language” already gives us the S and the L, that language is not yet a proper DSL yet. The domain-specific ubiquitous language generally comes in both a verbal, and written form. The written form is used for the domain’s body of knowledge, and for task-specific documents such as the specification of a Runtime. The verbal form I like to call “domain speak”, and is used by the domain experts to discuss the body of knowledge and task-specific documents.

The domain-specific ubiquitous language is usually based on a natural language, but is enriched with domain-specific terms which are given precise, domain-specific meanings. It’s also often enriched with domain-specific notation. The ubiquitous language is not yet a proper DSL: it’s not a software language, because it lacks a solid definition, as well as software tools. It’s certainly a precursor to it, though: you could call it a “pre-DSL”.


Figure 3. Extracting a DSL from the “pre-DSL”.


This pre-DSL makes up fertile ground for a proper DSL: it’s already domain-specific, but it needs work to become a software language. Let’s have a look at the key aspects of a software language, so we can determine how we can “extract” a proper DSL from the pre-DSL. With that information, we can start implementing the DSL as a software language. That implementation takes the form of tools such as the Domain IDE, and the code generator. The Domain IDE is an application that allows domain experts to write (and read) DSL content using DSL editors.

Key aspects.


Figure 4. Mental model of the key aspects of a DSL.


This diagram lists the key aspects of any DSL, and the relations between them. At the center of the diagram is the actual DSL content, which is what the domain experts will see and manipulate. The “stuff” around it defines the DSL.

Notation

The most immediate and captivating part of a DSL is undoubtedly its notation: a system of writing that’s used to visually represent the DSL content so it can be understood by the domain experts. In other words: notation is the visual “stuff” that represents and codifies the meaning that’s being communicated. It serves as the interface between those domain experts’ eyeballs and the DSL content. For most intents and purposes, the notation is the DSL.


Figure 5. Mental model of only the notation aspect of a DSL.


Notation can be “anything”. Often, it is “boxology”: boxes containing text and/or sub-boxes, with lines or arrows between these. Sometimes, there are graphical elements that are not boxes so much, such as icons or other symbology, or tables.

DSL notation doesn’t need to be graphical: many DSLs are purely textual. It doesn’t need to look like code in a monospaced font, though: you can leverage good typesetting to improve readability, or even to convey specific meaning, and to make it look like a well-designed document.

Regardless of its type, a DSL’s notation is typically built up from of a small set of visual building blocks. For textual DSLs, these building blocks would be keywords, identifiers, strings, and numbers: various categories of text, each with their specific meaning, often typeset in a specific style to help convey that meaning. For graphical DSLs, these would be boxes, slots in those boxes, lines or arrows, labels next those arrows, etc. That the number of visual building blocks is small makes the language easier to “parse” visually, and therefore: to understand.

Structure

Notation is not thrown together happenstance: every DSL has a certain fixed structure. This structure makes it possible for the domain experts to understand the DSL consistently and unambiguously, but also for a computer to process it automatically.


Figure 6. Mental model of only the structure aspect of a DSL.


Structure is governed by the DSL’s concepts, and their properties. A concept is the blueprint for a fundamental construct in the DSL. It has a name, and a collection of properties which also have a name, as well as a type.

A property’s type determines what can be stored in the properties’ values. DSL content can be represented entirely using instances of concepts. Instances assign values that occur in the DSL content to properties as settings. It’s often quite easy to discern instances and their settings in the notated DSL content because of distinctive visual cues.

Instances of concepts don’t live in isolation, but work together through relations. Relations between instances are also represented by particular settings. Relations come in two kinds:

  • Containment, or parent-child relations
  • Reference relations

These relations are used to build up the DSL content from separate instances of concepts. Reference relations are especially powerful since they allow DSL content to re-use other DSL content, without needing to duplicate details.

Meaning


Figure 7. Mental model of only the meaning aspect of a DSL.


A DSL has to mean something. The “domain speak” and pre-DSL that the domain experts were already using before certainly carried a certain meaning. As stated before, this meaning might at that point still be a bit unprecise.

Usually, a software development project spends large amount of effort to make the domain experts’ specification of the Runtime precise enough to be able to implement. Instead, in a DSL-based approach the specification (in the form of DSL content) is itself precise enough to automatically generate a large part of the code of the Runtime from. We say that the DSL content is executably, because we can integrate the generated code into a complete software system, and run that. This executability is made possible by the DSL, that’s extracted from the pre-DSL, having been precisely-defined. The code generator then effectively gives the language a specific, and unambiguous meaning, in terms of how instances of concepts are mapped to code fragments in the Runtime.

Constraints


Figure 8. Mental model of only the constraints aspect of a DSL.


Not everything you can write down in accordance with the DSL’s structure, makes sense. The DSL’s structure might allow a domain expert to specify adding a dollar amount to a date, but the DSL as a whole certainly should this. The way to do this is by augmenting the DSL’s structure with a set of constraints. A constraint runs queries against the DSL content to check for the presence of patterns that are considered to be nonsensical, or problematic. In case such a pattern is found, the Domain IDE warns the domain expert about it with a meaningful message, highlighting the offensive DSL content.

Why use a DSL-based approach for software development?

Adopting a DSL-based approach should have several positive effects on the software development process. I say “should” since adopting a DSL-based approach is not always the right answer. We’ll get back to that later, in the section titled “When (not) to use a DSL-based approach?”

For better overview, I’ve divided the positive effects into a couple of broad categories.

Empowering the domain experts

  1. The domain experts can write a precise specification themselves using the Domain IDE. They don’t have to rely on software developers to manually translate any detail to the Runtime. Instead, a large part of the Runtime’s code is automatically generated from the specification (as DSL content) written in the Domain IDE. This improves their effectiveness considerably. It also improves their efficiency because the DSL editor assists and guides them in efficiently writing valid DSL content.

An alternative way to phrase this is that it makes software development much more Agile for domain experts since they can add or change features of the Runtime, and test them on their own.

  1. Usually, there are more domain experts than software developers around, so “scaling out”, or scaling horizontally, is a viable way to increase productivity. This is compounded by the fact that the domain experts are already -by definition!- knowledgeable about the domain.

The same is probably not true for most software developers. Software developers are experts in specifying the How using general programming languages, but rarely in any specific domain. This means that they will have to learn the domain well enough that they can start translating the specification written by the domain experts into working code. That learning process takes time which translates to costs, and running times for the entire software development effort.

Improving efficiency of the software development process

  1. Large parts of the code of the Runtime is derived automatically from the DSL content by the code generator. The software developers are relieved of the more tedious aspects of software development. How often haven’t you had to add a field to a form, a column to a database table, a field to a class, and all kinds of handling, just because someone added a single item to a specification? That kind of work just takes time and energy, so it’s a good thing to not have to do that. This not only improves efficiency, but it also leaves more time to do more interesting things, like working on the Domain IDE.
  2. Because the specification is turned into working code automatically, it also should be possible to verify changes almost instantly. It should also be able to deploy these changes really quickly: essentially “at the touch of a button”. Again, this improves Agility of the development of the Runtime.
  3. The specification in the Domain IDE becomes the center of communications across the whole development, with the DSL acting as a common language. You could say that expressing the specification as DSL content really captures the domain, and not only building software, but also understanding.

New possibilities

A DSL-based approach also opens up new possibilities that don’t exist in a traditional software development approach.

    1. Because the specification is machine-processable, it also becomes machine-checkable. This means that you can verify automatically whether it’s complete, and umambiguous, and satisfies other desirable properties. This is not something you can really do with a non-formal specification, because that would require a lot of person-power, and -time.
    2. The specification can be extended with tests that are written by the domain experts themselves. Such tests then reside “near” the specification of the business logic the tests are meant to validate. They can even be executed on-the-fly in the Domain IDE, giving the domain experts immediate feedback on the correctness, and completeness of their specification. This builds confidence across the software development process, and saves a lot of time and effort.
    3. The core of the business domain is encoded precisely, unambiguously, and independently of the technology used to implement the Runtime in a DSL, making it quite future-proof. It requires much less effort, is much less tedious, error-prone, and risky, to:
      1. Change the architecture, technological stack, or the overall design of the Runtime.
      2. Reuse the business knowledge for another Runtime, or for some other purpose.

When (not) to use a DSL-based approach?

Does this mean you should always use a DSL? The simple answer is: NO. As always: “it depends…” – and unfortunately not in the hard science way. The trick -or rather: knack- lies in recognizing the potential of a DSL in a given situation, and gauging whether it’d be a sensible idea.

In this book, we’ll be focusing on domains which I like to call business domains. A business domain is a domain that’s not really (computer-)technological in nature. Often, there are ulterior reasons to develop some software for the domain to support the activities undertaken in it, such as cost effectiveness, efficiency to gain a competitive edge, or regulatory compliance. The software itself (which we call the Runtime) is not the domain core business: it’s just something you need, preferably without getting in the way, and without costing too much.

Business domains are quite often good targets for a DSL-based approach. Developing software for business domains usually has two main challenges:

Not enough on-site software developers to initiate a software development project, and follow it entirely through to the end. This means outsourcing at least part of the development effort, which means extra time, costs and risks. A DSL-based approach can mitigate this by being more time and cost effective.

Communicating the business knowledge in the domain (in any form) to software developers is difficult and error-prone.


Figure 9. Side comic: the chasm of software development.


Even if the domain experts are able to write precise enough specifications in free-form documents -and this is a big if!-, the software developers still have to learn enough about the domain to be able to properly read and understand them. Often, software developers unintentionally end up being domain experts at the end of the development. A DSL-based approach can mitigate this in two ways:

  • With the use of a Domain IDE, domain experts can effectively take over a large part of the software development effort.
  • The DSL, the content that’s written in it using the Domain IDE and that serves as a specification of the Runtime becomes a hub for communication between domain experts and software developers. This improves the efficiency and effectiveness of that communication considerably which usually translates into improved time and cost effectiveness as well.

None of this is quantifiable up-front, so we could do with some criteria to have a chance at making the right decisions. The table below lists the major helpful criteria. None of these are really quantitative, but the more boxes you can confidently tick, the likelier it is you could -or even: should- use a DSL-based approach.

Table 1.



There’s no such thing as a free lunch. Apart from the advantages of having a DSL, implementing, and using one comes with certain costs, and accompanying risks.

  • Implementing the Domain IDE takes time, and effort, both from the software developers, and from the domain experts. The software developers craft the Domain IDE, next to developing the Runtime itself – even if using the DSL makes precisely that much more productive. It can already be difficult to understand the domain well enough to be able to build a Runtime the usual way. Designing a DSL and a Domain IDE requires understanding the domain even better. The domain experts help by clarifying the domain as much as needed for the software developers to be able to create the Domain IDE.
  • Implementing a DSL requires additional skills and knowledge from the software developers. This is precisely why you should be reading (and buying) this book, of course. Even having read this book, and sticking as much as possible to mainstream technologies, implementing a Domain IDE is never going to be as easy as implementing a regular application.
  • The domain experts need to transition to using the Domain IDE. They’ll have to wean off of their old way of working using free-form documents to write down their business knowledge. This requires education and migration, and support from the software developers that have crafted the Domain IDE.
  • Using a DSL complicates the software development technologically.
    • It’s necessary to have and keep the Domain IDE working to work on the Runtime at all.
    • The code generator must be run as an extra step in building the Runtime.
    • The Runtime can’t consist of only generated code. It needs hand-written code as well, which needs to fit seamlessly with the generated code, and how it’s generated.

As always, these disadvantages must be weighed against the advantages.

In summary:

  • A DSL is a software language (meaning: written by humans, readable by humans, processable by computers) that’s specifically made for the experts of a domain (meaning: a particular, focused area of knowledge) to write down their knowledge with in a form that’s precise enough to serve as a specification of a software system.
  • The key aspects of a DSL (and more generally: of a software language) are:
    • Notation is a DSL’s system of writing used to visually represent DSL content. Notation is implemented through a visualizer which visualizes DSL content. The visualizer is extended into a DSL editor which can be used inside a Domain IDE to inspect as well as manipulate DSL content.
    • Structure. A DSL’s structure determines what can be expressed in the DSL. It does that by defining concepts, with these having properties, and each of those having a type. Concepts can be instantiated, and settings on an instance hold values of the properties the instance was instantiated from. Some settings represent relations between instances: this is governed by the type of their properties.
    • Meaning. A DSL is given meaning by what it produces from DSL content. In our case, the way that the Runtime, built from code generated from DSL content, behaves when executing is the DSL’s meaning.
    • Constraints. A constraint warns the domain expert about DSL content which does not make sense. “Nonsensical” here means that either the code generator will have a problem, or the execution of the Runtime will. A constraint violation is reported to the Domain IDE’s user in the form of a meaningful error message on the offending DSL content.
  • We gave a number of criteria that help to determine whether a domain and its pre-DSL likely constitutes fertile ground for a proper DSL.
  • Adopting a DSL-based approach can have a lot of benefits, but also comes with ramp-up costs and risks. As always, these must be weighed against each other to determine whether a DSL-based approach makes sense.

That’s all for this article. If you want to learn more about the book, you can check it out on our liveBook platform here.