|
From Data-Oriented Programming by Yehonathan Sharvit This article explores how data-oriented programming deals with retrieving data from a database. |
Take 35% off Data-Oriented Programming by entering fccsharvit2 into the discount code box at checkout at manning.com.
You go for a walk with Joe in a park near your office. You seat on a bench close to a beautiful lake and gaze at the clouds in the sky. After a couple of minutes of meditation, Joe breaks the silence and asks you: “What do you see?” You tell him that this cloud looks to you like a horse and this one looks like a car. When you are back to the office, you ask Joe for an explanation about the clouds and he tells you with a mysterious smile on his lips: “A cloud is a cloud”.
Applications that run on the server side usually store data in a database. In DO, we represent data retrieved from the database like we represent any other data in our application: with generic data collections.
YOU: So far, you showed me how Data-Oriented programming represents data that lives in the memory of the application. What about data that comes from the outside?
JOE: What do you mean by outside?
YOU: Data that comes from the database.
JOE: I’ll return the question to you: How do you think that in DOP we represent data that comes from the database?
YOU: As generic data collections, I guess.
JOE: Exactly! In Data-Oriented programming, we always represent data with generic data collections.
YOU: Does it mean that we manipulate data from the database with the same flexibility as we manipulated in-memory data in Part 1?
JOE: Definitely.
In Data-Oriented programming, we represent data from the database with generic data collections and we manipulate it with generic functions.
YOU: Could you show me for instance how to retrieve book search results when the catalog data is stored in a SQL database?
JOE: I’ll show you in a moment. First, tell me how you would design the tables that store catalog data?
YOU: Do you mean the exact table schemas with the information about primary keys and nullability of each and every column?
JOE: No. I only need you to give me a rough overview of the tables, their columns and the relationships between the tables.
Figure 1. The database model for books and authors
YOU: I’d have a books
table with 3 columns: title
, isbn
and publication_year
. And an authors
table with two columns: for id
and name
.
Table 1. The books
table filled with two books
title |
isbn |
publication_year |
The Power of Habit |
978-0812981605 |
2012 |
7 Habits of Highly Effective People |
978-1982137274 |
1989 |
Table 2. The authors
table filled with three authors
id |
name |
sean-covey |
Sean Covey |
stephen-covey |
Stephen Covey |
charles-duhigg |
Charles Duhigg |
JOE: And what about the connection between books and authors?
YOU: Let me see: A book could be written by multiple authors, and an author could write multiple books. Therefore, I need a many-to-many book_authors
table that connects authors and books, with two columns: book_isbn
and author_id
.
Table 3. The book_authors
table with rows connecting books with their authors
book_isbn |
author_id |
978-1982137274 |
sean-covey |
978-1982137274 |
stephen-covey |
978-0812981605 |
charles-duhigg |
JOE: Let’s start with the simplest case. We are going to write code that searches books matching a title and returns basic information about the books: title, isbn and publication year.
YOU: What about the book authors?
JOE: We’ll deal with that later in the chapter, as it is a bit more complicated. Could you write a SQL query for retrieving books that contain habit in their title?
YOU: Sure.
It’s quite easy for you. Listing 1 shows the SQL query that you wrote.
Listing 1. SQL query to retrieve books whose title contain habit
SELECT title, isbn, publication_year FROM books WHERE title LIKE '%habit%';
The results are displayed in Table 4.
Table 4. Results of the SQL query that retrieves books whose title contain habit
title |
isbn |
publication_year |
The Power of Habit |
978-0812981605 |
2012 |
7 Habits of Highly Effective People |
978-1982137274 |
1989 |
JOE: How would you describe the results displayed in Table 4 as a data collection?
YOU: I would say it’s a list of maps.
Accessing data from a NoSQL database is done in a similar way to the approach presented in this chapter for accessing data from a relational database.
JOE: Right. Could you write down the search results as a list of maps?
YOU: It doesn’t sound too complicated. The list of maps is in Listing 2.
Listing 2. Search results as a list of maps
[ { "title": "7 Habits of Highly Effective People", "isbn": "978-1982137274", "publication_year": 1989 }, { "title": "The Power of Habit", "isbn": "978-0812981605", "publication_year": 2012 } ]
JOE: Could you write the JSON schema of the search results?
YOU: It shouldn’t be too difficult if you allow me to take a look at the JSON schema cheatsheet you kindly offered me the other day.
JOE: Of course. The purpose of a gift is to be used by the one who received it.
After a few minutes, with the help of the JSON schema cheatsheet from Part 1, you come up with a JSON schema for the search results
Listing 3. JSON schema cheatsheet
{ "type": "array", "items": { "type": "object", "properties": { "myNumber": {"type": "number"}, "myString": {"type": "string"}, "myEnum": {"enum": ["myVal", "yourVal"]}, "myBool": {"type": "boolean"} }, "required": ["myNumber", "myString"], "additionalProperties": false } }
Listing 4. The JSON schema for search results from the database
var dbSearchResultSchema = { "type": "array", "items": { "type": "object", "required": ["title", "isbn", "publication_year"], "properties": { "title": {"type": "string"}, "isbn": {"type": "string"}, "publication_year": {"type": "integer"} } } };
JOE: Excellent. Now, I am going to show you how to implement searchBooks
in a way that it fetches data from the database and returns a JSON string with the results. The cool thing is that we are going to use only generic data collections from the DB layer down to the JSON serialization.
YOU: Will it be similar to the implementation of searchBooks
that we wrote in Chapter 3?
JOE: Absolutely. The only difference is that in Chapter 3, the state of the system was stored locally. We queried it with a function like _.filter()
. Now, we use SQL queries to query the state that is stored in the Database. But in terms of data representation and manipulation, it’s exactly the same, as you can see in Figure 2. The Data manipulation step is implemented via generic functions that manipulate data collections. As the examples of the current chapter get more elaborate, you will see the benefits of being able to manipulate data collections with generic functions.
Figure 2. Data flow for serving a request that fetches data from the database
YOU: Sounds intriguing…
Most of our code is about generic functions that manipulate data collections.
JOE: The core of it is the ability to communicate with the database through a driver that returns a list of maps. In JavaScript, you could use a SQL driver like node-postgres (https://node-postgres.com).
YOU: And in Java?
JOE: In Java, you could use JDBC in addition to a small utility function like the one in Listing 5 that converts a JDBC result set into a list of maps.
Listing 5. Converting a JDBC result set into a list of hash maps
List<Map<String, Object>> convertJDBCResultSetToListOfMaps(ResultSet rs) { List<Map<String, Object>> listOfMaps = new ArrayList<Map<String, Object>>(); ResultSetMetaData meta = rs.getMetaData(); while (rs.next()) { Map map = new HashMap(); for (int i = 1; i <= meta.getColumnCount(); i++) { String key = meta.getColumnLabel(i); Object value = rs.getObject(i); map.put(key, value); } listOfMaps.add(map); } return listOfMaps; }
Converting a JDBC result set into a list of hash maps is quite straightforward.
YOU: I would expect it to be much more complicated to convert a JDBC result set into a list of hash maps.
JOE: It’s straightforward because in a sense JDBC is data-oriented!
YOU: What about the field types?
JOE: When we convert a JDBC result set into a list of maps, each value is considered as an Object
.
YOU: That’s annoying because it means that in order to access the value, we need to cast it to its type.
JOE: Yes and no. Look at our book search use case. We pass all the values down the road without really looking at their type. Only when we serialize the result in JSON, the concrete value type matters. And this is handled by the JSON serialization library. It’s called late binding.
With late binding, we care about data types as late as possible.
YOU: Does it mean that in my application, I am allowed to manipulate data without dealing with concrete types?
In Data-Oriented programming, flexibility is increased as many parts of the system are free to manipulate data without dealing with concrete types.
JOE: Exactly. You will see late binding in action in a moment. That’s one of the greatest benefits of Data-Oriented programming!
YOU: Interesting…
JOE: One last thing before I show you the code for retrieving search results form the database: In order to make it easier to read, I am going to write JavaScript code as if JavaScript were dealing with I/O is a synchronous way.
YOU: What do you mean?
JOE: In JavaScript, an I/O operation like sending a query to the database is done asynchronously. In real-life, it means using either callback functions or using async
and await
keywords.
YOU: Oh yeah. That’s because JavaScript is single-threaded.
The JavaScript snippets of this chapter are written as if JavaScript were dealing with I/O in a synchronous way. In real-life, we need to use async
and await
around I/O calls.
JOE: Indeed. So what I’m saying is that in the code that communicates with the database I’ll write the code as if JavaScript were dealing with I/O in a synchronous way. The code is in Listing 6.
Listing 6. Searching books in the database and returning the results in JSON
var dbClient; ❶ var ajv = new Ajv({allErrors: true}); ❷ var title = "habit"; var matchingBooksQuery = `SELECT title, isbn FROM books WHERE title LIKE '%$1%'`;❸ var books = dbClient.query(matchingBooksQuery, [title]); ❹ if(!ajv.validate(dbSearchResultSchema, books)) { var errors = ajv.errorsText(ajv.errors); throw "Internal error: Unexpected result from the database: " + errors; } JSON.stringify(books);
❶ dbClient
holds the DB connection.
❷ Ajv (a JSON schema validation library) is initialized with allErrors: true
in order to catch all the data validation errors (See Chapter 5 for details)
❸ We use a parameterized SQL query as it is safer in terms of security
❹ We pass the parameters to the SQL query as a list of values (in our case a list with a single value)
YOU: In a dynamically-typed language like JavaScript, I understand that the types of the values in the list of maps returned by dbClient.query()
are determined at run time. But how does it work in a statically-typed language like Java: What are the types of the data fields in books
?
JOE: The function convertJDBCResultSetToListOfMaps()
in Listing 5 returns a list of Map<String, Object>
. But JSON serialization libraries like gson (https://github.com/google/gson) know how to detect at run time the concrete type of the values in a map and serialize the values according to their type.
YOU: What do you mean by serializing a value according to its type?
JOE: For instance, the value of the field publication_year
is a number, therefore it is not wrapped with quotes. However, the value of the field title
is a string, therefore it is wrapped with quotes.
YOU: Nice! Now, I understand what you mean by late binding.
JOE Cool! Now, let me show you how we store data in the database. Let’s start from the beginning: parsing a client request. How do you usually receive the parameters of a client request?
YOU: It depends. The parameters could be sent as URL query parameters in a GET request or a JSON payload in the body of a POST request.
JOE: Let’s suppose we receive a JSON payload inside a web request. Could you give me an example of a JSON payload for an advanced search request?
YOU: It would contain the text that the book title should match.
JOE: What about the details about the fields to retrieve from Open Library books API?
YOU: They won’t be passed as part of the JSON payload as they are the same for all the search requests.
JOE: I could imagine a scenario where you want the client to decide what fields to retrieve. For instance a mobile client would prefer to retrieve only the most important fields and save network bandwidth.
YOU: Well, in that case I would have two different search endpoints: one for mobile and one for desktop.
JOE: What about situations where the client wants to display different pieces of information depending on the application screen. For instance in an extended search result screen, we display all the fields and in a basic search result screen, we display only the most important fields. Now you have 4 different use cases: desktop extended, desktop basic, mobile extended and mobile basic. Would you create 4 different endpoints?
YOU: OK. You convinced me: let’s have a single search endpoint and let the client decide what fields to retrieve.
JOE: Could you show me an example of a JSON payload for a search request?
YOU: Sure. It’s in Listing 7.
Listing 7. Example of search request payload where the clients decide what fields to retrieve for each search result
{ "title": "habit", "fields": ["title", "weight", "number_of_pages"] }
JOE: Excellent! The first step is to parse the JSON string into a data structure.
YOU: Let me guess: It’s gonna be a generic data structure.
JOE: Of course! In that case, we’ll have a map. Usually, JSON parsing is handled by the web server framework, but here I am going to show you how to do it manually.
YOU: What do you mean by a web server framework?
JOE: Stuff like express in node.js, Spring in Java, Django in Python, Ruby on Rails, ASP.net in C#.
YOU: Oh. I see. So how do you parse manually a JSON string into a map?
JOE: In JavaScript, we use JSON.parse()
as in Listing 8 and in Java, we use a third-party library like gson (https://github.com/google/gson), maintained by Google, as in Listing 9.
Listing 8. Parsing a JSON string in JavaScript
var jsonString = "{\"title\":\"habit\",\"fields\":[\"title\",\"weight\",\"number_of_pages\"]}"; JSON.parse(jsonString);
Listing 9. Parsing a JSON string in Java with gson
var jsonString = "{\"title\":\"habit\",\"fields\":[\"title\",\"weight\",\"number_of_pages\"]}"; gson.fromJson(jsonString, Map.class);
JOE: Could you write the JSON schema for the payload of a search request?
YOU: Sure. It’s in Listing 10.
Listing 10. The JSON schema for a search request
var searchBooksRequestSchema = { "type": "object", "properties": { "title": {"type": "string"}, "fields": { "type": "array", "items": { "type": [ "title", "full_title", "subtitle", "publisher", "publish_date", "weight", "physical_dimensions", "number_of_pages", "subjects", "publishers", "genre" ] } } }, "required": ["title", "fields"] };
JOE: Nice! You marked the elements in the fields
array as enums and not as strings. Where did you get the list of allowed values from?
YOU: Nancy gave me the list of fields that she wants to expose to the users. It’s in Listing 11.
Listing 11. The important fields from OpenLibrary Books API
- title - full_title - subtitle - publisher - publish_date - weight - physical_dimensions - number_of_pages - subjects - publishers - genre
Represent a server response as a map
JOE: How does the Open Library Books API look like?
YOU: It’s quite straightforward: you create a GET request with the book ISBN and it gives you back a JSON string with extended information about the book.
JOE: Did your customer ask for any special treatment of the fields returned by the API?
YOU: She wants to keep only certain fields. The ones in Listing 11.
JOE: That’s it?
YOU: Yes. In Listing 12, you can see the JSON string returned by OpenLibrary Books API for “7 Habits of Highly Effective People”, after having kept only the necessary fields.
Listing 12. OpenLibrary Books API response for “7 Habits of Highly Effective People”
var dataFromOpenLib = { "title":"7 Habits of Highly Effective People : Revised and Updated", "subtitle":"Powerful Lessons in Personal Change", "number_of_pages":432, "full_title":"7 Habits of Highly Effective People : Revised and Updated Powerful Lessons in Personal Change", "publish_date":"2020", "publishers":["Simon & Schuster, Incorporated"] };
YOU: Also, we need to keep only the fields that appear in the client request.
JOE: Do you know how to implement the double field filtering?
YOU: Yeah. I’ll parse the JSON string from the API into a hash map, like we parsed a client request and then I’ll use _.pick()
twice to keep only the required fields.
JOE: It sounds like a great plan to me. Could you code it, including validating data that is returned by the Open Books API?
YOU: Sure! Let me first write the JSON schema for the Open Books API response.
You need to refresh your memory with the materials from Part 1 about schema composition in order to express the fact that either isbn_10
or isbn_13
is mandatory. The resulting schema is in Listing 13.
Listing 13. The JSON schema for the Open Books API response
var basicBookInfoSchema = { "type": "object", "required": ["title"], "properties": { "title": {"type": "string"}, "publishers": { "type": "array", "items": {"type": "string"} }, "number_of_pages": {"type": "integer"}, "weight": {"type": "string"}, "physical_format": {"type": "string"}, "subjects": { "type": "array", "items": {"type": "string"} }, "isbn_13": { "type": "array", "items": {"type": "string"} }, "isbn_10": { "type": "array", "items": {"type": "string"} }, "publish_date": {"type": "string"}, "physical_dimensions": {"type": "string"} } }; var mandatoryIsbn13 = { "type": "object", "required": ["isbn_13"] }; var mandatoryIsbn10 = { "type": "object", "required": ["isbn_10"] }; var bookInfoSchema = { "allOf": [ basicBookInfoSchema, { "anyOf": [mandatoryIsbn13, mandatoryIsbn10] } ] };
YOU: Now, assuming that I have a fetchResponseBody()
function that sends a request and retrieves the body of the response as a string, the code would look like Listing 14.
Listing 14. Retrieving book information from OpenLibrary
var ajv = new Ajv({allErrors: true});
class OpenLibraryDataSource {
static rawBookInfo(isbn) {
var url = `https://openlibrary.org/isbn/${isbn}.json`;
var jsonString = fetchResponseBody(url); ❶
return JSON.parse(jsonString);
}
static bookInfo(isbn, requestedFields) {
var relevantFields = ["title", "full_title",
"subtitle", "publisher",
"publish_date", "weight",
"physical_dimensions", "genre",
"subjects", "number_of_pages"];
var rawInfo = rawBookInfo(isbn);
if(!ajv.validate(bookInfoSchema, rawInfo)) {
var errors = ajv.errorsText(ajv.errors);
throw "Internal error: Unexpected result from Open Books API: " + errors;
}
var relevantInfo = _.pick(_.pick(rawInfo, relevantFields), requestedFields);
return _.set(relevantInfo, "isbn", isbn);
}
}
❶ fetches JSON in the body of a response
The JavaScript snippets of this chapter are written as if JavaScript were dealing with I/O in a synchronous way. In real-life, we meed to use async
and await
around I/O calls.
JOE: Why did you add the isbn
field in the map returned by bookInfo
?
YOU: It will be allow me to combine information from two sources about the same book.
JOE: I like it!
That’s all for this article. If you want to learn more, check out the book on Manning’s liveBook platform here.