Description: https://images.manning.com/360/480/resize/book/c/6dd2fed-f419-4540-bff0-455acccfe08c/Sharvit-DOP-MEAP-HI.png

From Data-Oriented Programming by Yehonathan Sharvit

This article explores how data-oriented programming deals with retrieving data from a database.


Take 35% off Data-Oriented Programming by entering fccsharvit2 into the discount code box at checkout at manning.com.


You go for a walk with Joe in a park near your office. You seat on a bench close to a beautiful lake and gaze at the clouds in the sky. After a couple of minutes of meditation, Joe breaks the silence and asks you: “What do you see?” You tell him that this cloud looks to you like a horse and this one looks like a car. When you are back to the office, you ask Joe for an explanation about the clouds and he tells you with a mysterious smile on his lips: “A cloud is a cloud”.

Applications that run on the server side usually store data in a database. In DO, we represent data retrieved from the database like we represent any other data in our application: with generic data collections.

YOU: So far, you showed me how Data-Oriented programming represents data that lives in the memory of the application. What about data that comes from the outside?

JOE: What do you mean by outside?

YOU: Data that comes from the database.

JOE: I’ll return the question to you: How do you think that in DOP we represent data that comes from the database?

YOU: As generic data collections, I guess.

JOE: Exactly! In Data-Oriented programming, we always represent data with generic data collections.

YOU: Does it mean that we manipulate data from the database with the same flexibility as we manipulated in-memory data in Part 1?

JOE: Definitely.

In Data-Oriented programming, we represent data from the database with generic data collections and we manipulate it with generic functions

YOU: Could you show me for instance how to retrieve book search results when the catalog data is stored in a SQL database?

JOE: I’ll show you in a moment. First, tell me how you would design the tables that store catalog data?

YOU: Do you mean the exact table schemas with the information about primary keys and nullability of each and every column?

JOE: No. I only need you to give me a rough overview of the tables, their columns and the relationships between the tables.


Figure 1. The database model for books and authors


YOU: I’d have a books table with 3 columns: title, isbn and publication_year. And an authors table with two columns: for id and name.

Table 1. The books table filled with two books

title

isbn

publication_year

The Power of Habit

978-0812981605

2012

7 Habits of Highly Effective People

978-1982137274

1989

Table 2. The authors table filled with three authors

id

name

sean-covey

Sean Covey

stephen-covey

Stephen Covey

charles-duhigg

Charles Duhigg

JOE: And what about the connection between books and authors?

YOU: Let me see: A book could be written by multiple authors, and an author could write multiple books. Therefore, I need a many-to-many book_authors table that connects authors and books, with two columns: book_isbn and author_id.

Table 3. The book_authors table with rows connecting books with their authors

book_isbn

author_id

978-1982137274

sean-covey

978-1982137274

stephen-covey

978-0812981605

charles-duhigg

JOE: Let’s start with the simplest case. We are going to write code that searches books matching a title and returns basic information about the books: title, isbn and publication year.

YOU: What about the book authors?

JOE: We’ll deal with that later in the chapter, as it is a bit more complicated. Could you write a SQL query for retrieving books that contain habit in their title?

YOU: Sure.

It’s quite easy for you. Listing 1 shows the SQL query that you wrote.

Listing 1. SQL query to retrieve books whose title contain habit

 
 SELECT
   title,
   isbn,
   publication_year
   FROM
       books
  WHERE title LIKE '%habit%';
  

The results are displayed in Table 4.

Table 4. Results of the SQL query that retrieves books whose title contain habit

title

isbn

publication_year

The Power of Habit

978-0812981605

2012

7 Habits of Highly Effective People

978-1982137274

1989

JOE: How would you describe the results displayed in Table 4 as a data collection?

YOU: I would say it’s a list of maps.

Accessing data from a NoSQL database is done in a similar way to the approach presented in this chapter for accessing data from a relational database. 

JOE: Right. Could you write down the search results as a list of maps?

YOU: It doesn’t sound too complicated. The list of maps is in Listing 2.

Listing 2. Search results as a list of maps

 
 [
   {
     "title": "7 Habits of Highly Effective People",
     "isbn": "978-1982137274",
     "publication_year": 1989
   },
   {
     "title": "The Power of Habit",
     "isbn": "978-0812981605",
     "publication_year": 2012
   }
 ]
  

JOE: Could you write the JSON schema of the search results?

YOU: It shouldn’t be too difficult if you allow me to take a look at the JSON schema cheatsheet you kindly offered me the other day.

JOE: Of course. The purpose of a gift is to be used by the one who received it.

After a few minutes, with the help of the JSON schema cheatsheet from Part 1, you come up with a JSON schema for the search results

Listing 3. JSON schema cheatsheet

 
 {
   "type": "array",
   "items": {
     "type": "object",
     "properties": {
       "myNumber": {"type": "number"},
       "myString": {"type": "string"},
       "myEnum": {"enum": ["myVal", "yourVal"]},
       "myBool": {"type": "boolean"}
     },
     "required": ["myNumber", "myString"],
     "additionalProperties": false
   }
 }
  

Listing 4. The JSON schema for search results from the database

 
 var dbSearchResultSchema = {
   "type": "array",
   "items": {
     "type": "object",
     "required": ["title", "isbn", "publication_year"],
     "properties": {
       "title": {"type": "string"},
       "isbn": {"type": "string"},
       "publication_year": {"type": "integer"}
     }
   }
 };
  

JOE: Excellent. Now, I am going to show you how to implement searchBooks in a way that it fetches data from the database and returns a JSON string with the results. The cool thing is that we are going to use only generic data collections from the DB layer down to the JSON serialization.

YOU: Will it be similar to the implementation of searchBooks that we wrote in Chapter 3?

JOE: Absolutely. The only difference is that in Chapter 3, the state of the system was stored locally. We queried it with a function like _.filter(). Now, we use SQL queries to query the state that is stored in the Database. But in terms of data representation and manipulation, it’s exactly the same, as you can see in Figure 2. The Data manipulation step is implemented via generic functions that manipulate data collections. As the examples of the current chapter get more elaborate, you will see the benefits of being able to manipulate data collections with generic functions.


Figure 2. Data flow for serving a request that fetches data from the database


YOU: Sounds intriguing…

Most of our code is about generic functions that manipulate data collections

JOE: The core of it is the ability to communicate with the database through a driver that returns a list of maps. In JavaScript, you could use a SQL driver like node-postgres (https://node-postgres.com).

YOU: And in Java?

JOE: In Java, you could use JDBC in addition to a small utility function like the one in Listing 5 that converts a JDBC result set into a list of maps.

Listing 5. Converting a JDBC result set into a list of hash maps

 
 List<Map<String, Object>> convertJDBCResultSetToListOfMaps(ResultSet rs) {
     List<Map<String, Object>> listOfMaps = new ArrayList<Map<String, Object>>();
     ResultSetMetaData meta = rs.getMetaData();
     while (rs.next()) {
         Map map = new HashMap();
         for (int i = 1; i <= meta.getColumnCount(); i++) {
             String key = meta.getColumnLabel(i);
             Object value = rs.getObject(i);
             map.put(key, value);
         }
         listOfMaps.add(map);
     }
     return listOfMaps;
 }

Converting a JDBC result set into a list of hash maps is quite straightforward.

YOU: I would expect it to be much more complicated to convert a JDBC result set into a list of hash maps.

JOE: It’s straightforward because in a sense JDBC is data-oriented!

YOU: What about the field types?

JOE: When we convert a JDBC result set into a list of maps, each value is considered as an Object.

YOU: That’s annoying because it means that in order to access the value, we need to cast it to its type.

JOE: Yes and no. Look at our book search use case. We pass all the values down the road without really looking at their type. Only when we serialize the result in JSON, the concrete value type matters. And this is handled by the JSON serialization library. It’s called late binding.

With late binding, we care about data types as late as possible. 

YOU: Does it mean that in my application, I am allowed to manipulate data without dealing with concrete types?

In Data-Oriented programming, flexibility is increased as many parts of the system are free to manipulate data without dealing with concrete types

JOE: Exactly. You will see late binding in action in a moment. That’s one of the greatest benefits of Data-Oriented programming!

YOU: Interesting…

JOE: One last thing before I show you the code for retrieving search results form the database: In order to make it easier to read, I am going to write JavaScript code as if JavaScript were dealing with I/O is a synchronous way.

YOU: What do you mean?

JOE: In JavaScript, an I/O operation like sending a query to the database is done asynchronously. In real-life, it means using either callback functions or using async and await keywords.

YOU: Oh yeah. That’s because JavaScript is single-threaded.

The JavaScript snippets of this chapter are written as if JavaScript were dealing with I/O in a synchronous way. In real-life, we need to use async and await around I/O calls.

JOE: Indeed. So what I’m saying is that in the code that communicates with the database I’ll write the code as if JavaScript were dealing with I/O in a synchronous way. The code is in Listing 6.

Listing 6. Searching books in the database and returning the results in JSON

 
 var dbClient; 
 var ajv = new Ajv({allErrors: true}); 
  
 var title = "habit";
 var matchingBooksQuery = `SELECT title, isbn
                           FROM books
                           WHERE title LIKE '%$1%'`;
 var books = dbClient.query(matchingBooksQuery, [title]); 
 if(!ajv.validate(dbSearchResultSchema, books)) {
   var errors = ajv.errorsText(ajv.errors);
   throw "Internal error: Unexpected result from the database: " + errors;
 }
  
 JSON.stringify(books);
  

dbClient holds the DB connection.

Ajv (a JSON schema validation library) is initialized with allErrors: true in order to catch all the data validation errors (See Chapter 5 for details)

We use a parameterized SQL query as it is safer in terms of security

We pass the parameters to the SQL query as a list of values (in our case a list with a single value)

YOU: In a dynamically-typed language like JavaScript, I understand that the types of the values in the list of maps returned by dbClient.query() are determined at run time. But how does it work in a statically-typed language like Java: What are the types of the data fields in books?

JOE: The function convertJDBCResultSetToListOfMaps() in Listing 5 returns a list of Map<String, Object>. But JSON serialization libraries like gson (https://github.com/google/gson) know how to detect at run time the concrete type of the values in a map and serialize the values according to their type.

YOU: What do you mean by serializing a value according to its type?

JOE: For instance, the value of the field publication_year is a number, therefore it is not wrapped with quotes. However, the value of the field title is a string, therefore it is wrapped with quotes.

YOU: Nice! Now, I understand what you mean by late binding.

JOE Cool! Now, let me show you how we store data in the database. Let’s start from the beginning: parsing a client request. How do you usually receive the parameters of a client request?

YOU: It depends. The parameters could be sent as URL query parameters in a GET request or a JSON payload in the body of a POST request.

JOE: Let’s suppose we receive a JSON payload inside a web request. Could you give me an example of a JSON payload for an advanced search request?

YOU: It would contain the text that the book title should match.

JOE: What about the details about the fields to retrieve from Open Library books API?

YOU: They won’t be passed as part of the JSON payload as they are the same for all the search requests.

JOE: I could imagine a scenario where you want the client to decide what fields to retrieve. For instance a mobile client would prefer to retrieve only the most important fields and save network bandwidth.

YOU: Well, in that case I would have two different search endpoints: one for mobile and one for desktop.

JOE: What about situations where the client wants to display different pieces of information depending on the application screen. For instance in an extended search result screen, we display all the fields and in a basic search result screen, we display only the most important fields. Now you have 4 different use cases: desktop extended, desktop basic, mobile extended and mobile basic. Would you create 4 different endpoints?

YOU: OK. You convinced me: let’s have a single search endpoint and let the client decide what fields to retrieve.

JOE: Could you show me an example of a JSON payload for a search request?

YOU: Sure. It’s in Listing 7.

Listing 7. Example of search request payload where the clients decide what fields to retrieve for each search result

 {
   "title": "habit",
   "fields": ["title", "weight", "number_of_pages"]
 }

JOE: Excellent! The first step is to parse the JSON string into a data structure.

YOU: Let me guess: It’s gonna be a generic data structure.

JOE: Of course! In that case, we’ll have a map. Usually, JSON parsing is handled by the web server framework, but here I am going to show you how to do it manually.

YOU: What do you mean by a web server framework?

JOE: Stuff like express in node.js, Spring in Java, Django in Python, Ruby on Rails, ASP.net in C#.

YOU: Oh. I see. So how do you parse manually a JSON string into a map?

JOE: In JavaScript, we use JSON.parse() as in Listing 8 and in Java, we use a third-party library like gson (https://github.com/google/gson), maintained by Google, as in Listing 9.

Listing 8. Parsing a JSON string in JavaScript

 
 var jsonString = "{\"title\":\"habit\",\"fields\":[\"title\",\"weight\",\"number_of_pages\"]}";
 JSON.parse(jsonString);
  

Listing 9. Parsing a JSON string in Java with gson

 
 var jsonString = "{\"title\":\"habit\",\"fields\":[\"title\",\"weight\",\"number_of_pages\"]}";
 gson.fromJson(jsonString, Map.class);
  

JOE: Could you write the JSON schema for the payload of a search request?

YOU: Sure. It’s in Listing 10.

Listing 10. The JSON schema for a search request

 
 var searchBooksRequestSchema = {
   "type": "object",
   "properties": {
     "title": {"type": "string"},
     "fields": {
       "type": "array",
       "items": {
         "type": [
           "title",
           "full_title",
           "subtitle",
           "publisher",
           "publish_date",
           "weight",
           "physical_dimensions",
           "number_of_pages",
           "subjects",
           "publishers",
           "genre"
         ]
       }
     }
   },
   "required": ["title", "fields"]
 };
  

JOE: Nice! You marked the elements in the fields array as enums and not as strings. Where did you get the list of allowed values from?

YOU: Nancy gave me the list of fields that she wants to expose to the users. It’s in Listing 11.

Listing 11. The important fields from OpenLibrary Books API

 
 - title
 - full_title
 - subtitle
 - publisher
 - publish_date
 - weight
 - physical_dimensions
 - number_of_pages
 - subjects
 - publishers
 - genre
  

Represent a server response as a map

JOE: How does the Open Library Books API look like?

YOU: It’s quite straightforward: you create a GET request with the book ISBN and it gives you back a JSON string with extended information about the book.

JOE: Did your customer ask for any special treatment of the fields returned by the API?

YOU: She wants to keep only certain fields. The ones in Listing 11.

JOE: That’s it?

YOU: Yes. In Listing 12, you can see the JSON string returned by OpenLibrary Books API for “7 Habits of Highly Effective People”, after having kept only the necessary fields.

Listing 12. OpenLibrary Books API response for “7 Habits of Highly Effective People”

 
 var dataFromOpenLib = {
   "title":"7 Habits of Highly Effective People : Revised and Updated",
   "subtitle":"Powerful Lessons in Personal Change",
   "number_of_pages":432,
   "full_title":"7 Habits of Highly Effective People : Revised and Updated Powerful Lessons in Personal Change",
   "publish_date":"2020",
   "publishers":["Simon & Schuster, Incorporated"]
 };
  

YOU: Also, we need to keep only the fields that appear in the client request.

JOE: Do you know how to implement the double field filtering?

YOU: Yeah. I’ll parse the JSON string from the API into a hash map, like we parsed a client request and then I’ll use _.pick() twice to keep only the required fields.

JOE: It sounds like a great plan to me. Could you code it, including validating data that is returned by the Open Books API?

YOU: Sure! Let me first write the JSON schema for the Open Books API response.

You need to refresh your memory with the materials from Part 1 about schema composition in order to express the fact that either isbn_10 or isbn_13 is mandatory. The resulting schema is in Listing 13.

Listing 13. The JSON schema for the Open Books API response

 
 var basicBookInfoSchema = {
   "type": "object",
   "required": ["title"],
   "properties": {
     "title": {"type": "string"},
     "publishers": {
       "type": "array",
       "items": {"type": "string"}
     },
     "number_of_pages": {"type": "integer"},
     "weight": {"type": "string"},
     "physical_format": {"type": "string"},
     "subjects": {
       "type": "array",
       "items": {"type": "string"}
     },
     "isbn_13": {
       "type": "array",
       "items": {"type": "string"}
     },
     "isbn_10": {
       "type": "array",
       "items": {"type": "string"}
     },
     "publish_date": {"type": "string"},
     "physical_dimensions": {"type": "string"}
   }
 };
  
 var mandatoryIsbn13 = {
   "type": "object",
   "required": ["isbn_13"]
 };
  
 var mandatoryIsbn10 = {
   "type": "object",
   "required": ["isbn_10"]
 };
  
 var bookInfoSchema = {
   "allOf": [
     basicBookInfoSchema,
     {
       "anyOf": [mandatoryIsbn13, mandatoryIsbn10]
     }
   ]
 };
  

YOU: Now, assuming that I have a fetchResponseBody() function that sends a request and retrieves the body of the response as a string, the code would look like Listing 14.

Listing 14. Retrieving book information from OpenLibrary

 var ajv = new Ajv({allErrors: true});
 class OpenLibraryDataSource {
   static rawBookInfo(isbn) {
     var url = `https://openlibrary.org/isbn/${isbn}.json`;
     var jsonString = fetchResponseBody(url); 
     return JSON.parse(jsonString);
   }
  
   static bookInfo(isbn, requestedFields) {
     var relevantFields = ["title", "full_title",
                           "subtitle", "publisher",
                           "publish_date", "weight",
                           "physical_dimensions", "genre",
                           "subjects", "number_of_pages"];
     var rawInfo = rawBookInfo(isbn);
     if(!ajv.validate(bookInfoSchema, rawInfo)) {
       var errors = ajv.errorsText(ajv.errors);
       throw "Internal error: Unexpected result from Open Books API: " + errors;
     }
     var relevantInfo = _.pick(_.pick(rawInfo, relevantFields), requestedFields);
     return  _.set(relevantInfo, "isbn", isbn);
   }
 }

fetches JSON in the body of a response

The JavaScript snippets of this chapter are written as if JavaScript were dealing with I/O in a synchronous way. In real-life, we meed to use async and await around I/O calls.

JOE: Why did you add the isbn field in the map returned by bookInfo?

YOU: It will be allow me to combine information from two sources about the same book.

JOE: I like it!

That’s all for this article. If you want to learn more, check out the book on Manning’s liveBook platform here.