|
An excerpt from Julia as a Second Language by Erik Engheim This article covers: · Storing values on keys in dictionaries. · Working with pair objects. · Using tuples to create dictionaries. · Comparing dictionaries and arrays. Read it if you’re interested in the Julia language or in how it handles dictionaries. |
Take 25% off Julia as a Second Language by entering fccengheim into the discount code box at checkout at manning.com.
In this article, we will look at a new data type (in Julia) called a dictionary. In some other languages this datatype is also referred to as a map. In dictionaries values are looked up by keys, as opposed to an array where values are looked up exclusively using integer indices. A simple demonstration:
x = xs[42] (1) y = ys["foo"] (2) z = zs['D'] (3)
❶ Looking up the 42nd value x
in array xs
. Values in arrays are ordered. However xs
could have been a dictionary as well, since dictionary keys can be anything including integers.
❷ Looking up a value y
in dictionary ys
with the key "foo"
.
❸ Using a character 'D'
rather than a string as key in dictionary zs
to lookup value z
.
We will demonstrate the utility of dictionaries by working through a code example involving conversion of roman numerals to decimal values and back. A dictionary will be used to keep track of which values letters such as I, V and X correspond to in the decimal system.
Parsing Roman Numerals
While roman numerals are not very practical to use today, they are useful to learn about in order to understand numerical systems. In particular, because you will encounter various number systems while programming.
Both Roman numerals and the binary system used by computers can seem very cumbersome to use. However, this seems so because we (people) don’t use the numbers as they were intended.
It is harder to make calculations using Roman numerals with pen and paper than it is with Arabic numerals (which is what we use). The Romans, however, did not use pen and paper to perform calculations. They performed their calculations using a roman abacus.
Figure 1. A Roman abacus with pebbles representing different values
It is divided into multiple columns. Going from the left, you can see columns of pebbles marked as I, X and C. They each contain four pebbles. Each of these pebbles represent a different value depending on what column they are in:
- In the I column, every pebble is represents 1.
- In X, every pebble represents 10.
- In C, every pebble represents 100.
There are also columns containing a single pebble each. These columns are called V, L and D and represent the values 5, 50 and 500 (On the Roman abacus depicted here, you cannot actually see the VLD letters).
The beauty of the Roman system is that you can quickly write down exactly what the pebbles on the abacus say. Likewise, it is easy to arrange pebbles on a Roman abacus to match a Roman numeral you have read. For this reason, Roman numerals were used all the way into the 1500s in Europe, long after Arabic numerals had been introduced.
Let us look at how we can use this knowledge to parse Roman numerals and turn them into Arabic numerals. Put the code below into a text file and save it.
Listing 1. Parsing and converting Roman numerals to decimal numbers
roman_numerals = Dict('I' => 1, 'X' => 10, 'C' => 100, 'V' => 5, 'L' => 50, 'D' => 500, 'M' => 1000) function parse_roman(s) s = reverse(uppercase(s)) vals = [roman_numerals[ch] for ch in s] result = 0 for (i, val) in enumerate(vals) if i > 1 && val < vals[i - 1] result -= val else result += val end end result end
Load this file into the Julia REPL environment to test it out. This is an example of using parse_roman
with different roman numerals as input.
julia> parse_roman("II") 2 julia> parse_roman("IV") 4 julia> parse_roman("VI") 6 julia> parse_roman("IX") 9 julia> parse_roman("XI") 11
Let us go through how the code works.
Using the Dict Type
We map or translate the Roman letters I, V, X, etc. to numbers using what is called a dictionary. A dictionary is made up of multiple pairs.
julia> 'X' => 10 ❶ 'X' => 10 julia> pair = 'X' => 10 ❷ 'X' => 10 julia> dump(pair) ❸ Pair{Char,Int64} first: Char 'X' second: Int64 10 julia> pair.first ❹ 'X': ASCII/Unicode U+0058 (category Lu: Letter, uppercase) julia> pair.second 10
❶ A pair of the letter X and the number 10.
❷ Pairs can be stored in a variable and examined later.
❸ dump
allows us to look at the fields of any value. The fields of a Pair
value in this case.
❹ Extracting the first value in the pair.
A dictionary dump produces gibberish?!
Out of curiousity you may try to use the dump
function on dictionary object. It has fields such as slots
, idxfloor
, maxprobe
etc, which will not make a lot of sense to you. That is because dump
exposes implementation details. As a user of a datatype you should not need to know what fields it has, only which function you can use to operate on it.
We provide a list of these pairs to create a dictionary. The code below shows how we create a dictionary to map letters used by Roman numerals to their corresponding decimal value.
julia> roman_numerals = Dict('I' => 1, 'X' => 10, 'C' => 100, 'V' => 5, 'L' => 50, 'D' => 500, 'M' => 1000) Dict{Char,Int64} with 7 entries: 'M' => 1000 'D' => 500 'I' => 1 'L' => 50 'V' => 5 'X' => 10 'C' => 100
When used in a dictionary we refer to the first values in each pair as the keys in the dictionary. The second values in each pair form the values of the dictionary. So I, X and C are keys, while 1, 10 and 100, for example, are values.
We can ask a dictionary for the value corresponding to a key. This takes a Roman letter and returns the corresponding value.
julia> roman_numerals['C'] 100 julia> roman_numerals['M'] 1000
Looping over Characters
We can use this dictionary to help us convert roman letters to corresponding values. In the parse_roman
function we do this conversion with what is called an array comprehension:
vals = [roman_numerals[ch] for ch in s]
A comprehension is like a for-loop where a value is evaluated on each iteration and added to a collection. In this case a for-loop is used to build an array. To better understand how an array comprehension works we will look at a regular for-loop doing the exact same thing. In this example we start with the roman numerals “XIV”, which we want to convert.
julia> s = "XIV" "XIV" julia> vals = Int8[] Int8[] julia> for ch in s push!(vals, roman_numerals[ch]) end julia> vals 3-element Array{Int8,1}: 10 1 5
“XIV” is turned into the array of values [10, 1, 5]
named vals
, but the job is not quite done. We still need to combine these values into a single number.
Before converting input strings, our code turns every letter uppercase; “xiv” would not get processed correctly, because all the keys to our dictionary are uppercase.
I will walk you through the mechanics of the process, and save the explanation for why we perform these steps for the end.
We reverse the order of the letters, so we can process numerals conveniently from right to left in a loop.
julia> s = "xiv" "xiv" julia> s = reverse(uppercase(s)) "VIX"
Enumerating Values and Indicies
When processing a value val
in the loop, I want to be able to compare it with the preceding value. I could have accomplished that with a variable, say prev
, store value from a previous iteration. Instead, I use the enumerate
function to get the index i
of each value val
being processed. The value preceding val
is then simply vals[i-1]
.
for (i, val) in enumerate(vals) if i > 1 && val < vals[i - 1] result -= val else result += val end end
To better understand how enumerate
works, let me use some examples focused exclusively on enumerate
:
julia> enumerate([4, 6, 8]) enumerate([4, 6, 8])
Okay, that wasn’t very useful at all. The reason is that enumerate
is lazy. You don’t get any values out because this expression doesn’t actually need any values to be evaluated. But we can use the collect
function to collect all the values enumerate
would have produced into an array. A simple example of collecting a range:
julia> collect(2:3:11) 4-element Array{Int64,1}: 2 5 8 11
How we collect values from an enumeration is even more interesting:
julia> collect(enumerate(2:3:11)) 4-element Array{Tuple{Int64,Int64},1}: (1, 2) (2, 5) (3, 8) (4, 11) julia> collect(enumerate([4, 6, 8])) 3-element Vector{Tuple{Int64, Int64}}: (1, 4) (2, 6) (3, 8)
The collect
function will simulate looping over something, just like a for-loop. Except it will collect all the values encountered into an array, which it returns. So you can see with enumerate
you get a pair of values upon each iteration: an integer index and the value at that index.
Explaining the Conversion Process
We cannot simply add up the individual roman letters converted to their corresponding values. Consider the roman number XVI. It turns into [10, 5, 1]
. We could add that and get 16. However XIV is supposed to mean 14, because with Roman numerals when you have a smaller value in front of a larger one, such as IV, then you subtract the smaller value from the larger.
So we cannot just sum up the corresponding array [10, 1, 5]
. Instead we reverse it and work our way upwards. At every index we ask if the current value is lower than the previous one. If it is, we subtract from the result. Otherwise we add.
if i > 1 && val < vals[i - 1] result -= val else result += val end
That is what val < vals[i - 1]
does. It compares the current value val
, to the previous value vals[i -1]
. result
is used to accumulate the value of all the individual Roman letters.
Using Dictionaries
Now that we have looked at a practical code example utilizing the dictionary type Dict
in Julia, let us explore some more ways of interacting with a dictionary.
Creating Dictionaries
There are a multitude of ways to create a dictionary. Here are some examples. Multiple arguments, where each argument is a pair object:
julia> Dict("two" => 2, "four" => 4) Dict{String,Int64} with 2 entries: "two" => 2 "four" => 4
Pass an array of pairs to the dictionary constructor (a function with the same name as the type it makes instances of).
julia> pairs = ["two" => 2, "four" => 4] 2-element Array{Pair{String,Int64},1}: "two" => 2 "four" => 4 julia> Dict(pairs) Dict{String,Int64} with 2 entries: "two" => 2 "four" => 4
Pass an array of tuples to the dictionary constructor. Unlike pairs, tuples can contain more than two values. For dictionaries, however, they can only contain a key and a value.
julia> tuples = [("two", 2), ("four", 4)] 2-element Array{Tuple{String,Int64},1}: ("two", 2) ("four", 4) julia> Dict(tuples) Dict{String,Int64} with 2 entries: "two" => 2 "four" => 4
How do I know which variant to use? That depends on the problem you are trying to solve. Imagine we are reading data about pizza prices, and we got this array of tuples back:
pizzas = [ ("mexicana", 13.0), ("hawaiian", 16.5), ("bbq chicken", 20.75), ("sicilian", 12.25), ]
You might want to put this data into a dictionary to quickly lookup the price for a given pizza:
julia> pizza_dict = Dict(pizzas) Dict{String, Float64} with 4 entries: "sicilian" => 12.25 "bbq chicken" => 20.75 "mexicana" => 13.0 "hawaiian" => 16.5 julia> pizza_dict["mexicana"] 13.0
If keeping the pizza price data in order is not important, you could define this dictionary directly instead:
Dict( "sicilian" => 12.25, "bbq chicken" => 20.75, "mexicana" => 13.0, "hawaiian" => 16.5)
Sometimes you need an empty dictionary that will be filled up later. One example would be loading from file straight into a dictionary. Instead of appending values to the end of an array, you could insert them into a dictionary.
julia> d = Dict() Dict{Any,Any} with 0 entries
Notice the {Any, Any}
part. This describes what Julia has inferred is the type of the key and value in the dictionary. However when we created our pizza dictionary, you would have noticed that Julia described it as having the type Dict{String, Float64}
. String
refers to the type of the keys into the dictionary and Float64
the type of the values. We can, however, specify the type of the key and values for an empty dictionary as well:
julia> d = Dict{String, Float64}() Dict{String,Int64} with 0 entries julia> d["hawaiian"] = 16.5 16.5
What is the benefit of doing this? It makes it easier to catch incorrect usage of the dictionary at runtime. If you try to use values of the wrong type for the key and value, Julia will throw an exception to indicate the error. In this case we are trying to use the integer ‘5’ as key when a text string key is expected.
julia> d[5] = "five" ERROR: MethodError: Cannot `convert` an object of type Int64 to an object of type String
Sometimes you have keys and values in separate arrays. However you can still combine them into pairs, to create dictionaries using the zip
function.
julia> words = ["one", "two"] 2-element Array{String,1}: "one" "two" julia> nums = [1, 2] 2-element Array{Int64,1}: 1 2 julia> collect(zip(words, nums)) 2-element Array{Tuple{String,Int64},1}: ("one", 1) ("two", 2) julia> Dict(zip(words, nums)) Dict{String,Int64} with 2 entries: "two" => 2 "one" => 1
Element Access
We have already looked at one way of obtaining and setting dictionary elements. But what happens if we try to retrieve a value for a key that does not exist?
julia> d["seven"] ERROR: KeyError: key "seven" not found
We get an error. We can, of course, simply add the key:
julia> d["seven"] = 7; julia> d["seven"] 7
But how do we avoid producing an error when we are not sure if a key exists? One solution is the get()
function. If the key does not exist, a sentinel value is returned instead. The sentinel can be anything. This is a strategy followed in many programming languages, when working with dictionaries. The example below uses -1.
julia> get(d, "eight", -1) -1
Or we could simply ask the dictionary if it has the key.
julia> haskey(d, "eight") false julia> d["eight"] = 8 8 julia> haskey(d, "eight") true
Why Use a Dictionary?
In principle, you could use an array to do the conversion of roman numerals to decimal numbers. Here is an example of how you could do that.
Listing 2. Look up a value by key in an array of key-value pairs
function lookup(key, table) for (k, v) in table (1) if key == k return v (2) end end throw(KeyError(key)) (3) end
❶ Pull out the key k
and value v
of each pair in the array.
❷ Found a matching key, so return the corresponding value.
❸ If iterating over all the pairs didn’t find a matching key, then we are unable to return anything and must throw an exception instead. The KeyError
exception is convention to use in Julia in cases where keys are missing.
We could define the lookup table as an array of pairs instead of a dictionary.
numerals = ['I' => 1, 'X' => 10, 'C' => 100, 'V' => 5, 'L' => 50, 'D' => 500, 'M' => 1000]
With this we could look up values based on keys in similar fashion to a dictionary.
julia> lookup('X', roman_numerals)
10
julia> lookup('D', roman_numerals)
500
julia> lookup('S', roman_numerals) ❶
ERROR: KeyError: key 'S' not found
❶ A demonstration of looking up a key which doesn’t exist, producing an exception.
We avoid arrays when doing key-based lookup because the time to perform a lookup grows linearly with the size of the array. Looking up an element among 30 entries is going to take 3 times as long, on average, as looking up an entry among 10 elements. It is not hard to see how this does not scale well with large arrays. Looking for an element among 1 million elements will take a thousand times longer than locating it among one thousand elements.
Dictionaries, in contrast, are made so that the lookup time is independent of how many elements the dictionary contains. Looking up an element among one hundred is similar to doing it among one million.
Don’t discount arrays. Short arrays are very fast to search. Faster than a dictionary of comparable size. If the number of elements is less than one hundred, arrays are still a viable choice.
Why did I use a dictionary in our Roman numeral code example, you might be asking? Because dictionaries are really convenient to work with when dealing with key-based lookup, and you never have to worry about performance taking a nosedive if you’ve added too many elements.
There are special cases where using an array can work really well. For example, if you never modify the array. If elements are never added or removed you can simply keep the array sorted. A sorted array can be searched very quickly using Julia’s searchsortedfirst
function. In fact, our Roman numeral code example is well suited for this approach, since the mapping between numerals and decimal values is fixed. We do this by keeping the keys and values in separate arrays sorted by the key values.
Listing 3. Array of sorted keys with matching array of values
keys = ['C', 'D', 'I', 'L', 'M', 'V', 'X'] vals = [100, 500, 1, 50, 1000, 5, 10]
With searchsortedfirst,
I can find the index of a particular key.
julia> i = searchsortedfirst(keys, 'I') 3
We have made sure that the value for key I
is located at the same index i
in the vals
array:
julia> vals[i] 1
Another example:
julia> j = searchsortedfirst(keys, 'V') 6 julia> vals[j] 5
In this article we have covered all the key info about using dictionaries in Julia.
Summary
- Dictionaries hold key-value pairs, where the key has to be unique.
- Key-value pairs can quickly be looked up, added or removed from a dictionary. This differs from large arrays which may require time consuming searches.
- Arrays give better performance when the number of elements is small, or you do index based rather than key based access.
- In Julia, keys and values are typed. Hence Julia is able to catch the usage of keys which are of the wrong type at runtime, as well as attempts to insert values of the wrong type.