Duck Typing, Interfaces, and Benchmarking#

Julia’s ‘interfaces’ and duck-typing are a core part of Julia’s type system.

A lot of the power and extensibility in Julia comes from a collection of informal interfaces. By extending a few specific methods to work for a custom type, objects of that type not only receive those functionalities, but they are also able to be used in other methods that are written to generically build upon those behaviors.

Example 1: `UnitRange`#

x = 1:30

1:30

typeof(x)

UnitRange{Int64}

supertypes(UnitRange)

(UnitRange, AbstractUnitRange{T} where T<:Real, OrdinalRange{T, T} where T<:Real, AbstractRange{T} where T<:Real, AbstractVector{T} where T<:Real, Any)

typeof(x) <: AbstractArray

true

Because it is a subtype of AbstractArray we can do array-like things with it (it should basically behave like an array!)

x[3]

size(x)

(30,)

eltype(x)

Int64

However, it’s not implemented like a regular Array at all.

In fact, it’s just two numbers! We can see this by looking at it’s fields:

fieldnames(typeof(x))

(:start, :stop)

or just by inspecting the source code

@which UnitRange{Int64}(1, 30)

UnitRange{T}(start::T, stop::T) where T<:Real in Base at range.jl:393

It is an immutable type which just holds the start and stop values.

This means that indexing, A[i], is not just a look-up but a (small) function (try @which getindex(x, 4)).

What’s nice about this is that we can use it in calculations and no array, containing the numbers from 1 to 30, is ever created.

Allocating memory is typically costly.

@time collect(1:10_000_000);

  0.054209 seconds (2 allocations: 76.294 MiB, 28.63% gc time)

But creating an immutable type of two numbers is essentially free, no matter what those two numbers are:

@time 1:10_000_000;

  0.000000 seconds

This is so far the same as in Python, but here is a key difference:

In [1]: a = range(10000)

In [2]: a
Out[2]: range(0, 10000)

In [3]: a * 2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 a * 2

TypeError: unsupported operand type(s) for *: 'range' and 'int'

As Julia uses multiple dispatch, UnitRange has its own methods defined for arithmetic which are custom made for that type. Not only does it behave the same way as an array with the same operations, it also does so with great performance:

# Double all elements in the range and print off the 10th element:
@time r = collect(1:10_000_000) * 2
@time r = (1:10_000_000) * 2
println(r[10])

  0.096616 seconds (4 allocations: 152.588 MiB, 36.33% gc time)
  0.000001 seconds
20

# Sum all of the elements in the range:
@time sum(collect(1:10_000_000))
@time sum(1:10_000_000)

  0.138812 seconds (2 allocations: 76.294 MiB, 67.88% gc time)
  0.000000 seconds

50000005000000

Parts of this do work in Python:

In [1]: %time sum(range(10_000_000))
CPU times: user 183 ms, sys: 169 µs, total: 183 ms
Wall time: 183 ms

using BenchmarkTools  # This will be explained later
using PythonCall  # As well as this

@btime @pyexec "sum(range(10_000_000))"

  192.622 ms (0 allocations: 0 bytes)

This effectively does the same thing the collect example does, just a few times slower, and far slower than the example using a range.

@which sum(1:10_000_000)

sum(r::AbstractRange{<:Real}) in Base at range.jl:1387

Duck typing and interfaces let us define specialised code that is very performant, and interacts with the rest of the language as expected.

Example 2: `DiagonalMatrix`#

Let’s create a simple custom DiagonalMatrix type that can represent square diagonal matrices, i.e.

\[\begin{split} D = \left( \begin{matrix} x & 0 & 0 & 0 \\ 0 & y & 0 & 0 \\ 0 & 0 & z & 0 \\ 0 & 0 & 0 & \ddots \end{matrix} \right) \end{split}\]

struct DiagonalMatrix{T} <: AbstractArray{T,2}
    diag::Vector{T}
end

In the spirit of duck typing, we integrate our DiagonalMatrix into Julia’s type hierarchy by making it a subtype (<:) of AbstractMatrix to indicate array-like behavior. (Note that this does not indicate inheritence of structure!)

Of course, to actually make it behave like a matrix (a two-dimensional array) we must also implement (parts of) the AbstractArray interface.

DiagonalMatrix([1, 2, 3])

MethodError: no method matching size(::DiagonalMatrix{Int64})
Closest candidates are:
  size(::AbstractArray{T, N}, ::Any) where {T, N} at abstractarray.jl:42
  size(::Union{LinearAlgebra.Adjoint{T, var"#s886"}, LinearAlgebra.Transpose{T, var"#s886"}} where {T, var"#s886"<:(AbstractVector)}) at /usr/local/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/adjtrans.jl:173
  size(::Union{LinearAlgebra.Adjoint{T, var"#s886"}, LinearAlgebra.Transpose{T, var"#s886"}} where {T, var"#s886"<:(AbstractMatrix)}) at /usr/local/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/adjtrans.jl:174
  ...

Stacktrace:
  [1] length
    @ ./abstractarray.jl:279 [inlined]
  [2] isempty(a::DiagonalMatrix{Int64})
    @ Base ./abstractarray.jl:1170
  [3] show(io::IOContext{IOBuffer}, #unused#::MIME{Symbol("text/plain")}, X::DiagonalMatrix{Int64})
    @ Base ./arrayshow.jl:364
  [4] limitstringmime(mime::MIME{Symbol("text/plain")}, x::DiagonalMatrix{Int64})
    @ IJulia /usr/local/julia/local/share/julia/packages/IJulia/6TIq1/src/inline.jl:43
  [5] display_mimestring(m::MIME{Symbol("text/plain")}, x::DiagonalMatrix{Int64})
    @ IJulia /usr/local/julia/local/share/julia/packages/IJulia/6TIq1/src/display.jl:71
  [6] display_dict(x::DiagonalMatrix{Int64})
    @ IJulia /usr/local/julia/local/share/julia/packages/IJulia/6TIq1/src/display.jl:102
  [7] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
  [8] invokelatest
    @ ./essentials.jl:726 [inlined]
  [9] execute_request(socket::ZMQ.Socket, msg::IJulia.Msg)
    @ IJulia /usr/local/julia/local/share/julia/packages/IJulia/6TIq1/src/execute_request.jl:112
 [10] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
 [11] invokelatest
    @ ./essentials.jl:726 [inlined]
 [12] eventloop(socket::ZMQ.Socket)
    @ IJulia /usr/local/julia/local/share/julia/packages/IJulia/6TIq1/src/eventloop.jl:8
 [13] (::IJulia.var"#15#18")()
    @ IJulia ./task.jl:484

Base.size(D::DiagonalMatrix) = (length(D.diag), length(D.diag))

DiagonalMatrix([1, 2, 3])

CanonicalIndexError: getindex not defined for DiagonalMatrix{Int64}

Stacktrace:
  [1] error_if_canonical_getindex(::IndexCartesian, ::DiagonalMatrix{Int64}, ::Int64, ::Int64)
    @ Base ./abstractarray.jl:1260
  [2] getindex
    @ ./abstractarray.jl:1240 [inlined]
  [3] isassigned(::DiagonalMatrix{Int64}, ::Int64, ::Int64)
    @ Base ./abstractarray.jl:565
  [4] alignment(io::IOContext{IOBuffer}, X::AbstractVecOrMat, rows::Vector{Int64}, cols::Vector{Int64}, cols_if_complete::Int64, cols_otherwise::Int64, sep::Int64, ncols::Int64)
    @ Base ./arrayshow.jl:68
  [5] _print_matrix(io::IOContext{IOBuffer}, X::AbstractVecOrMat, pre::String, sep::String, post::String, hdots::String, vdots::String, ddots::String, hmod::Int64, vmod::Int64, rowsA::UnitRange{Int64}, colsA::UnitRange{Int64})
    @ Base ./arrayshow.jl:207
  [6] print_matrix(io::IOContext{IOBuffer}, X::DiagonalMatrix{Int64}, pre::String, sep::String, post::String, hdots::String, vdots::String, ddots::String, hmod::Int64, vmod::Int64) (repeats 2 times)
    @ Base ./arrayshow.jl:171
  [7] print_array
    @ ./arrayshow.jl:358 [inlined]
  [8] show(io::IOContext{IOBuffer}, #unused#::MIME{Symbol("text/plain")}, X::DiagonalMatrix{Int64})
    @ Base ./arrayshow.jl:399
  [9] limitstringmime(mime::MIME{Symbol("text/plain")}, x::DiagonalMatrix{Int64})
    @ IJulia /usr/local/julia/local/share/julia/packages/IJulia/6TIq1/src/inline.jl:43
 [10] display_mimestring
    @ /usr/local/julia/local/share/julia/packages/IJulia/6TIq1/src/display.jl:71 [inlined]
 [11] display_dict(x::DiagonalMatrix{Int64})
    @ IJulia /usr/local/julia/local/share/julia/packages/IJulia/6TIq1/src/display.jl:102
 [12] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
 [13] invokelatest
    @ ./essentials.jl:726 [inlined]
 [14] execute_request(socket::ZMQ.Socket, msg::IJulia.Msg)
    @ IJulia /usr/local/julia/local/share/julia/packages/IJulia/6TIq1/src/execute_request.jl:112
 [15] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
 [16] invokelatest
    @ ./essentials.jl:726 [inlined]
 [17] eventloop(socket::ZMQ.Socket)
    @ IJulia /usr/local/julia/local/share/julia/packages/IJulia/6TIq1/src/eventloop.jl:8
 [18] (::IJulia.var"#15#18")()
    @ IJulia ./task.jl:484

function Base.getindex(D::DiagonalMatrix, i::Int, j::Int)
    if i == j
        return D.diag[i]
    else
        return zero(eltype(D))
    end
end

D = DiagonalMatrix([1, 2, 3])

3×3 DiagonalMatrix{Int64}:
0  0
2  0
0  3

Note how it’s automagically pretty printed (despite the fact that we never defined any special printing)!

D[2, 2]

D[1, 2]

size(D)

(3, 3)

D[3, 3] = 5

CanonicalIndexError: setindex! not defined for DiagonalMatrix{Int64}

Stacktrace:
 [1] error_if_canonical_setindex(::IndexCartesian, ::DiagonalMatrix{Int64}, ::Int64, ::Int64)
   @ Base ./abstractarray.jl:1354
 [2] setindex!(::DiagonalMatrix{Int64}, ::Int64, ::Int64, ::Int64)
   @ Base ./abstractarray.jl:1343
 [3] top-level scope
   @ In[25]:1

function Base.setindex!(D::DiagonalMatrix, v, i::Int, j::Int)
    if i == j
        D.diag[i] = v
    else
        throw(ArgumentError("cannot set off-diagonal entry ($i, $j)"))
    end
    return v
end

D[3, 3] = 5

3×3 DiagonalMatrix{Int64}:
0  0
2  0
0  5

D[3, 4] = 5

ArgumentError: cannot set off-diagonal entry (3, 4)

Stacktrace:
 [1] setindex!(D::DiagonalMatrix{Int64}, v::Int64, i::Int64, j::Int64)
   @ Main ./In[26]:5
 [2] top-level scope
   @ In[29]:1

But that’s not it. Because of duck typing, all kinds of different functions now “just work”.

eltype(D) # element data type

Int64

D + D # addition

3×3 Matrix{Int64}:
0   0
4   0
0  10

D * D # multiplication

3×3 Matrix{Int64}:
0   0
4   0
0  25

inv(D) # inversion

3×3 Matrix{Float64}:
0  0.0  0.0
0  0.5  0.0
0  0.0  0.2

sin.(D) # broadcasting

3×3 Matrix{Float64}:
841471  0.0        0.0
0       0.909297   0.0
0       0.0       -0.958924

using LinearAlgebra
eigen(D) # eigensolver

Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
3-element Vector{Float64}:
 1.0
 2.0
 5.0
vectors:
3×3 Matrix{Float64}:
 1.0  0.0  0.0
 0.0  1.0  0.0
 0.0  0.0  1.0

Of course, so far, these operations have suboptimal performance because they don’t utilize the special structure inherent to our DiagonalMatrix but fall back to generic implementations.

@which D + D

+(A::AbstractArray, B::AbstractArray) in Base at arraymath.jl:6

Base.:+(Da::DiagonalMatrix, Db::DiagonalMatrix) = DiagonalMatrix(Da.diag + Db.diag)

@which D + D

+(Da::DiagonalMatrix, Db::DiagonalMatrix) in Main at In[37]:1

Important note: user defined types are just as good as built-in types!

There is nothing special about built-in types. In fact, they are implemented in essentially the same way!

Whereas diagonal arrays in something like Numpy are some C code

Side Note: Consistency of Interfaces#

Another benefit of the abstract type/interface/multiple dispatch style is that interfaces are very consistent within Julia, across not just the base language but across packages as well.

An article called Python vs. Julia: It’s also about Consistency shows this point quite well.

Benchmarking with `BenchmarkTools.jl`#

Benchmarking is difficult to do right for many reasons

computers are noisy machines
global vs local scope
the first function call is special in Julia (more later)
…

g(x) = x + 2 * x

g (generic function with 1 method)

x = rand(2, 2)
@time g.(x)

  0.059905 seconds (218.68 k allocations: 11.185 MiB, 99.52% compilation time)

2×2 Matrix{Float64}:
 2.17578  0.275548
 2.57232  0.509042

function f()
    x = rand(2, 2)
    @time g.(x)
end

f (generic function with 1 method)

f()

  0.000001 seconds (1 allocation: 96 bytes)

2×2 Matrix{Float64}:
 2.84491  1.27985
 2.89121  1.09645

Fortunately, there are tools that do this for us. In addition, they also collect some statistics by running the benchmark multiple times.

General rule: Don’t use @time but @btime from BenchmarkTools.jl and interpolate ($) input arguments.

using BenchmarkTools

@btime g.($x)

  44.073 ns (1 allocation: 96 bytes)

2×2 Matrix{Float64}:
 2.17578  0.275548
 2.57232  0.509042

@benchmark g.($x)

BenchmarkTools.Trial: 10000 samples with 990 evaluations.
 Range (min … max):  44.502 ns …  1.551 μs  ┊ GC (min … max): 0.00% … 96.36%
 Time  (median):     46.556 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   51.344 ns ± 54.011 ns  ┊ GC (mean ± σ):  4.63% ±  4.29%

  ▃▇█▇▄▂▂▂▂▂▁▂▁▁▁                                   ▁▂▁       ▂
  █████████████████▆▄▄▄▃▃▁▁▁▃▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▅█████▇▆▇██ █
  44.5 ns      Histogram: log(frequency) by time        92 ns <

 Memory estimate: 96 bytes, allocs estimate: 1.

Custom types are just as good as built-in types#

Let’s compare our custom DiagonalMatrix against the standard Diagonal type that ships in the LinearAlgebra standard library.

using LinearAlgebra

x = rand(100);

Djl = Diagonal(x)
D = DiagonalMatrix(x)

@btime $Djl + $Djl;
@btime $D + $D;

  91.078 ns (1 allocation: 896 bytes)

  90.776 ns (1 allocation: 896 bytes)

Example 3: Cooler Diagonal Matrix#

What we made works quite well, but could be even neater if we remove the Vector restriction, and instead just use AbstractVectors. That way it could even work with ranges!

struct CoolerDiagonalMatrix{T} <: AbstractArray{T, 2}
    diag::AbstractVector{T}
end

function Base.getindex(D::CoolerDiagonalMatrix, i::Int, j::Int)
    if i == j
        return D.diag[i]
    else
        return zero(eltype(D))
    end
end

Base.:+(Da::CoolerDiagonalMatrix, Db::CoolerDiagonalMatrix) = CoolerDiagonalMatrix(Da.diag + Db.diag)

# Old version:
D = DiagonalMatrix(collect(1:10_000))
@btime $D + $D

# New:
D_cool = CoolerDiagonalMatrix(1:10_000)
@btime $D_cool + $D_cool;

  5.581 μs (2 allocations: 78.17 KiB)

  122.269 ns (3 allocations: 112 bytes)

Works! But… why are the allocations up? Let’s explore more:

D = DiagonalMatrix(collect(1:100))
@btime $D + $D

D_cool = CoolerDiagonalMatrix(collect(1:100))
@btime $D_cool + $D_cool;

  86.342 ns (1 allocation: 896 bytes)

  193.898 ns (2 allocations: 912 bytes)

Huh… the new type is slower even with the same arguments? Why?

We’ll explore this in the sessions tomorrow.

p.s. Numpy:

In [1]: import numpy as np

In [2]: d = np.diag(np.arange(1000))

In [3]: %timeit d + d
2.56 µs ± 35.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Is still slower than any of the Julia code.

using PythonCall

@btime @pyexec """
import numpy as np

d = np.diag(np.arange(1_000))

d + d
"""

  1.110 ms (0 allocations: 0 bytes)

Core messages of this notebook#

Duck typing is about shared behavior instead of shared structure.
User defined types are as good as built-in types.
We can extend Base functions for our types to implement arithmetics and such.
Subtyping an existing interface can give lots of functionality for free.
Functions should almost always be benchmarked with BenchmarkTools.jl’s @btime and @benchmark instead of @time.

Julia for HPC
EuXFEL Workshop

Duck Typing, Interfaces, and Benchmarking

Contents

Duck Typing, Interfaces, and Benchmarking#

Example 1: `UnitRange`#

Example 2: `DiagonalMatrix`#

Side Note: Consistency of Interfaces#

Benchmarking with `BenchmarkTools.jl`#

Custom types are just as good as built-in types#

Example 3: Cooler Diagonal Matrix#

Core messages of this notebook#

References#

Julia for HPCEuXFEL Workshop

Duck Typing, Interfaces, and Benchmarking

Contents

Duck Typing, Interfaces, and Benchmarking#

Example 1: UnitRange#

Example 2: DiagonalMatrix#

Side Note: Consistency of Interfaces#

Benchmarking with BenchmarkTools.jl#

Custom types are just as good as built-in types#

Example 3: Cooler Diagonal Matrix#

Core messages of this notebook#

References#

Julia for HPC
EuXFEL Workshop

Example 1: `UnitRange`#

Example 2: `DiagonalMatrix`#

Benchmarking with `BenchmarkTools.jl`#