Description: https://images.manning.com/360/480/resize/book/e/542293a-5ae6-432b-a2c0-c388b32affd6/Kaminski-MEAP-HI.png

An excerpt from Julia for Data Analysis by Bogumil Kaminski

This article shows you how to use Multiple Dispatch in Julia.

Read it if you’re a data scientist or anyone who works with lots of data, and if you’re interested in the Julia language.


Take 25% off Julia for Data Analysis by entering fcckaminski into the discount code box at checkout at manning.com.


Let’s learn how to define functions having different methods and apply this knowledge to a function called winsorized_mean.

Rules of defining methods for a function

Fortunately, defining methods is relatively easy if you understand the principles of how Julia’s type system works. You just add the type restriction to the arguments of the function after ::. If the type specification part is omitted, then Julia assumes that value of Any type is allowed.

Assume we want to create the function fun taking a single positional argument with the following behavior:

  • if it is passed a number it should print "a number was passed" unless it is a value having Float64 type in which case we want "a Float64 value" printed;
  • in all other cases we want to print "unsupported type".

Here is an example how you can implement this behavior by defining three methods for a function fun.

 
 julia> fun(x) = println("unsupported type")
 fun (generic function with 1 method)
  
 julia> fun(x::Number) = println("a number was passed")
 fun (generic function with 2 methods)
  
 julia> fun(x::Float64) = println("a Float64 value")
 fun (generic function with 3 methods)
  
 julia> methods(fun)
 # 3 methods for generic function "fun":
 [1] fun(x::Float64) in Main at REPL[3]:1
 [2] fun(x::Number) in Main at REPL[2]:1
 [3] fun(x) in Main at REPL[1]:1
  
 julia> fun("hello!")
 unsupported type
  
 julia> fun(1)
 a number was passed
  
 julia> fun(1.0)
 a Float64 value
  

In the example above note that for instance 1 is a Number (as it is Int) but it is not Float64, so the most specific matching method is fun(x::Number).

Method ambiguity problem

What you must keep in mind when defining multiple methods for a function is to avoid method ambiguities. They happen when the Julia compiler is not able to decide which method for a given set of arguments should be selected. It is easiest to understand the problem by example. Assume you want to define a bar function taking two positional arguments. It should inform you if any of them were numbers. Here is a first attempt to implement such a function:

 
 julia> bar(x, y) = "no numbers passed"
 foo (generic function with 1 method)
  
 julia> bar(x::Number, y) = "first argument is a number"
 foo (generic function with 2 methods)
  
 julia> bar(x, y::Number) = "second argument is a number"
 foo (generic function with 3 methods)
  
 julia> bar("hello", "world")
 "no numbers passed"
  
 julia> bar(1, "world")
 "first argument is a number"
  
 julia> bar("hello", 2)
 "second argument is a number"
  
 julia> bar(1, 2)
 ERROR: MethodError: foo(::Int64, ::Int64) is ambiguous. Candidates:
   bar(x::Number, y) in Main at REPL[2]:1
   bar(x, y::Number) in Main at REPL[3]:1
 Possible fix, define
   bar(::Number, ::Number)
  

As you can see all worked nicely until we wanted to call bar by passing a number as both its first and second argument. In this case Julia complained that it does not know which method should be called as two of them potentially could be selected. Fortunately, we got a hint on how to resolve the situation. We need to define an additional method that fixes the ambiguity:

 
 julia> bar(x::Number, y::Number) = "both arguments are numbers"
 foo (generic function with 4 methods)
  
 julia> bar(1, 2)
 "both arguments are numbers"
  
 julia> methods(bar)
 # 4 methods for generic function "foo":
 [1] bar(x::Number, y::Number) in Main at REPL[8]:1
 [2] bar(x::Number, y) in Main at REPL[2]:1
 [3] bar(x, y::Number) in Main at REPL[3]:1
 [4] bar(x, y) in Main at REPL[1]:1
  

Why is multiple dispatch useful?

Understanding how methods work in Julia is an essential part of knowledge you should have. As you could see in the examples above it allows users to differentiate behavior of functions subject to the type of any positional argument of the function. Combined with the flexible type hierarchy system, multiple dispatch allows Julia programmers to write highly flexible and reusable code. Observe, that by specifying types at a suitable level of abstraction the user does not have to think of every possible concrete type that would be passed to the function while still retaining the control of what kind of values are accepted. For instance, if you defined your own Number subtype, as is done, for example, by the Decimals.jl (https://github.com/JuliaMath/Decimals.jl) package that features types supporting arbitrary precision decimal floating point calculations, you do not have to rewrite your code. All will just work with the new type even if the original code was not developed specifically to target this use case.

Improved implementation of winsorized mean

We are ready to improve our winsorized_mean function definition. Here is how you could implement it:

 
 julia> function winsorized_mean(x::AbstractVector, k::Integer)
            k >= 0 || throw(ArgumentError("k must be non-negative"))
            length(x) > 2 * k || throw(ArgumentError("k is too large"))
            y = sort!(collect(x))
            for i in 1:k
                y[i] = y[k + 1]
                y[end - i + 1] = y[end - k]
            end
            return sum(y) / length(y)
        end
 winsorized_mean (generic function with 1 method)
  

First note that we have restricted the allowed types for x and k, therefore if you try invoking the function its arguments must match the required types:

 
 julia> winsorized_mean([8, 3, 1, 5, 7], 1)
 5.0
  
 julia> winsorized_mean(1:10, 2)
 5.5
  
 julia> winsorized_mean(1:10, "a")
 ERROR: MethodError: no method matching winsorized_mean(::UnitRange{Int64}, ::String)
 Closest candidates are:
   winsorized_mean(::AbstractVector{T} where T, ::Integer) at REPL[6]:1
  
 julia> winsorized_mean(10, 1)
 ERROR: MethodError: no method matching winsorized_mean(::Int64, ::Int64)
 Closest candidates are:
   winsorized_mean(::AbstractVector{T} where T, ::Integer) at REPL[6]:1
  

Additionally, we can see several things in the code that make it robust. First, we check if passed arguments are consistent, that is, if k is negative or too large it is invalid, in which case we throw an error by calling the throw function with ArgumentError as its argument. See what happens if we pass the wrong k:

 
 julia> winsorized_mean(1:10, -1)
 ERROR: ArgumentError: k must be non-negative
  
 julia> winsorized_mean(1:10, 5)
 ERROR: ArgumentError: k is too large
  

Next make a copy of the data stored in the x vector before sorting it. To achieve this, we use the collect function which takes any iterable collection and returns an object storing the same values that has a Vector type. We pass this vector to the sort! function to sort it in-place.

You might ask why using the collect function to allocate a new Vector is needed. The reason is that for example ranges like 1:10 are read-only, therefore later we would not be able to update y with y[i] = y[k + 1] and y[end - i + 1] = y[end - k]. Additionally, in general Julia can support non-1-based indexing in arrays (see https://github.com/JuliaArrays/OffsetArrays.jl). However, Vector uses 1-based indexing. In summary using the collect function turns any collection or general AbstractVector into a standard Vector type defined in Julia that is mutable and uses 1-based indexing.

Finally note that instead of performing the for loop manually we have just used the sum function which is both simpler and more robust.

Does adding argument type annotations in methods improve their execution speed?

Adding type annotations to function arguments makes the Julia code easier to read and safer. A natural question that users often ask is if it improves code execution speed.

If you have a single method for some function, then adding type annotations does not improve code execution speed. The reason is that the Julia compiler when some function is called knows the types of arguments that you have passed to it and generates the native machine code using this information. In other words: type restriction information does not affect code generation.

However, the situation is different if you have multiple methods defined for some function. The reason is that type restrictions influence method dispatch. Then, each method can have a different implementation using an algorithm that is optimized for a value of a given type. Using multiple dispatch allows the Julia compiler to pick the implementation that is best for your data.

Let me explain it by example. Consider the sort function we introduced in chapter 2. By calling methods(sort) you can learn that it has five different methods defined in Base Julia (and possibly more if you loaded some Julia packages). There is a general method for sorting vectors with signature sort(v::AbstractVector; kws...) and a specialized method for sorting ranges like 1:3 that has signature sort(r::AbstractUnitRange).

What is the benefit of having this specialized method? The answer is that the second method is defined as sort(r::AbstractUnitRange) = r. Since we know that objects of type AbstractUnitRange are already sorted (they are ranges of values with an increment equal to 1) so we can just return the passed value. In this case, taking advantage of type restriction in method signature can improve the sort operation performance significantly.

That’s all for now. Thanks for reading.