Broadcasting in Julia is a way of writing vectorised code (think Matlab) that is performant and explicit. The benefits of performant code are obvious (faster!) but explicit vectorisation is also a significant benefit.

When I first saw Matlab and how you could call the sin with a vector input, I was (slightly) blown away by the usefulness of this. It didn’t take too long for me to realise the limitations though; vectorising a complicated function can require quite a bit of code gymnastics, which doesn’t usually help the readability, particularly for those students who are relatively new to programming.

This is where Julia’s dot broadcasting (vectorisation) comes in. If you want a function to work on a vector of inputs (applying the same function to each element of the vector) you simply put a dot on the function call. For example, the sine of a vector of values becomes sin.([1.1, 0.3, 2.3]); note the extra dot between the sin and the first bracket.

For a really good introduction to this, see the blog post More Dots: Syntactic Loop Fusion in Julia.

In Julia v0.7/1.0, there were some changes under the hood to how broadcasting works. (See Extensible broadcast fusion for more details and how it can be customised by different types.) It now creates a series of Broadcasted objects that get fused together before finally being materialised to give the final answer. For example, consider

r = sqrt(sum(x.^2 .+ y.^2))

Internally this gets rewritten (“lowered”) to

r = sqrt(sum(materialize(broadcasted(+, broadcasted(^, x, 2), broadcasted(^, y, 2)))))

(This isn’t quite accurate on the details since the squaring is implemented slightly differently.) Notice the hierarchy of broadcasted calls enclosed within a call to materialize. This is where the magic of broadcast fusion happens (and enables Julia to construct performant code). The broadcasted calls create a nested set of Broadcasted objects that contain the (lazily evaluated) vectorised expression and the materialize call creates the final vector from this.

Most of the time this automatic magic is exactly what we want. But sometimes it’s not.

Consider the case above where the sum is being computed; a vector will be allocated in memory for the calculation x.^2 + y.^2 and if x and y are large then a large amount of memory will be allocated unnecessarily for this intermediate value. Since the sum function doesn’t need all the values at the same time, couldn’t we just lazily compute x.^2 + y.^2 as individual numbers and feed them to the sum one-by-one? For example, we could do something like

acc = 0.0
for i = eachindex(x, y)
acc += x[i]^2 + y[i]^2
end
r = sqrt(acc)

In this case writing out the explicit for loop is something we’re trying to avoid (otherwise why bother with broadcasting?). Can we somehow extract the lazy representation from the broadcasting without materializing the intermediate result?

The answer is yes, but unfortunately it’s not part of the base Julia (yet). The code below gives us a lazy macro that enables us to get access to that lazy representation that broadcasting creates and use it explicitly in our surrounding code.

@inline _lazy(x) = x[1]  # unwrap the tuple
macro lazy(x)
return esc(:(_lazy(_lazy.(\$x))))
end

Now we can compare the lazy version and the eager (materialized) versions.

julia> using BenchmarkTools

julia> x = rand(1_000_000) ; y = rand(1_000_000) ;

julia> @btime sqrt(sum(x.^2 .+ y.^2))  # normal eager evaluation
2.837 ms (16 allocations: 7.63 MiB)
816.7514405417339

julia> @btime sqrt(sum(@lazy x.^2 .+ y.^2))  # lazy broadcasted evaluation
1.075 ms (12 allocations: 208 bytes)
816.7514405417412

Notice the memory consumption: 7.63 MiB for the normal version versus 208 bytes for the lazily evaluated version. Similarly the lazy version is significantly faster (though that depends quite a lot on the size of the vectors used). There is a slightly different answer in the two cases since the Julia sum function uses slightly different algorithms for vectors versus iterators (so I’m not quite comparing like-for-like).

Why is the lazy version not the default? Well here is the caveat: as soon as you do lazy evaluation the performance becomes much more problem dependent – it can get faster (as in this case) but, equally, it can get slower. BenchmarkTools.jl is your friend!

Barycentric.jl

Over the past couple of years or so I’ve been getting into the Julia programming language; it’s been great to watch the language mature over time. Many people proclaim the virtues of its speed (it’s very fast for a dynamic language) but really I like its elegance – it’s a very well designed language that makes full use of multiple dispatch. (Multiple dispatch is something that I doubt most coders know much about but once you are used to it, it’s indispensable!)

My first foray into the world of Julia package development is Barycentric.jl, a small package to do polynomial interpolation using a Barycentric representation. This approach is espoused in Berrut and Trefethen, SIAM Review 2004 as a way to do polynomial interpolation with O(n) operations, rather than O(n2) operations as is more typical for interpolation with Lagrange polynomials.

While this package isn’t really a general purpose interpolation code (see Interpolations.jl for that), it is good for building numerical algorithms such as collocation.

One example of this is a simple(ish) simulation of a dynamic cantilever beam. The Euler-Bernoulli equation is the most straightforward, non-trivial model  we can use –

$$\frac{EI}{\rho AL^4}\frac{\partial^4u}{\partial x^4} + \frac{\partial^2 u}{\partial t^2} + \xi\frac{\partial u}{\partial t} = 0$$

where $E$ is Young’s modulus, $I$ is the second moment of area, $\rho A$ is the mass per unit length, $L$ is the length, and $\xi$ is the (external) damping coefficient.

Since it is a fourth-order partial differential equation in space we need four boundary conditions. For a cantilever beam we have (primes denote derivatives with respect to $x$)

$u(0, t) = 0$ (zero displacement at wall)

$u'(0,t) = 0$ (zero slope at wall)

$u''(1,t) = 0$ (zero torque at free end)

$u'''(1,t) = 0$ (zero shear at free end)

To solve the Euler-Bernoulli equation we discretise the model in space using Chebyshev polynomials (for an introduction to Chebyshev approximations to differential equations see the excellent, and relatively short, book Spectral Methods in Matlab by Nick Trefethen). This is where Barycentric.jl comes in.

In a nutshell, we’re going to use an $N$ degree polynomial to approximate the solution in the $x$ direction by constraining the polynomial to satisfy the four boundary conditions at $x=0$ and $x=1$ and then evaluating the fourth derivative for the interior of the Euler-Bernoulli equation.

I’m going to arbitrarily choose to evaluate the Euler-Bernoulli equation at the Chebyshev nodes of the $N-2$ degree Chebyshev polynomial, excluding the end points, so $N-3$ points in total. Hence these points plus the four boundary conditions gives $N+1$ equations to match the $N+1$ unknowns of the $N$ degree Chebyshev polynomial.

The code to do this is as follows. The end result is a fourth-order derivative matrix defined on the collocation points.

using Barycentric

N = 10  # degree of the polynomial
n = N-2
# Construct the polynomial
P = Chebyshev2{N}()
# Generate the differentiation matrix y' ≈ Dy
D = differentiation_matrix(P)
# Collocation points (nodes of the N-2 degree second-kind Chebyshev polynomial)
x_coll = [-cospi(j/n) for j = 1:n-1]
# Interpolation matrix from nodes(P) to x_coll
In = interpolation_matrix(P, x_coll)

# Construct the mapping from the values at the collocation points to the
# values at the nodes of the Chebyshev polynomial, simultaneously
# incorporating the  boundary conditions
In⁻¹ = inv([In;                           # interpolation to collocation points
[1 zeros(1, N)];              # u(0, t) = 0
D[1:1, :];                    # u'(0, t) = 0
(D^2)[end:end, :]             # u''(1, t) = 0
(D^3)[end:end, :]             # u'''(1, t) = 0
])[:, 1:end-4]  # remove the boundary condition inputs since they are zero

# Construct the differentiation matrix that incorporates the boundary conditions
D₄ = In*(D^4)*In⁻¹

The basic premise is to construct a fourth-order differentiation matrix on the $N$-degree Chebyshev polynomial whilst incorporating the boundary conditions. This is done by mapping from the collocation points onto the nodes of the Chebyshev polynomial, incorporating the boundary conditions, then applying the differentiation matrix before mapping back to the collocation points.

To integrate the equations of motion, the second-order (in time) differential equation is rewritten as a system of first-order ODEs and thrown into DifferentialEquations.jl.

function beammodel!(dudt, u, p, t)
n = size(p.D₄, 2)  # number of collocation points
dudt[1:n] .= u[n+1:2n]  # u̇₁ = u₂
dudt[n+1:2n] .= -p.EI/p.ρA*(p.D₄*u[1:n]) .- p.ξ*u[n+1:2n]  # u̇₂ = -EI/ρA*u₁'''' - ξ*u₂
end

Before integrating, we need some initial conditions. To avoid putting energy into the higher modes of the beam, I use the mode shape of the first beam mode for the initial conditions.

# A parameter vector for integration; a steel beam (1m × 10mm × 1mm)
p = (D₄ = D₄, EI = 1666.6, ρA = 8.0, ξ = 0.2)

# Jacobian matrix of the differential equation
using LinearAlgebra
A = [zeros(size(p.D₄)) I; -p.EI/p.ρA*p.D₄ -p.ξ*I]
ev = eigen(A)
idx = argmin(abs.(ev.values))  # lowest mode
u0 = real.(ev.vectors[:, idx])  # ignore rotations

# Integrate!
using OrdinaryDiffEq
prob = ODEProblem(beammodel!, u0, (0, 10.0), p)
sol = solve(prob, Rodas5(), dtmax=0.05)  # use a stiff solver

And to plot

using Makie
sc = Scene()
wf = wireframe!(sc, x_coll, sol.t, sol[1:N-3, :])
scale!(wf, 1.0, 1.0, 10.0)
l = lines!(sc, [x_coll[end]], sol.t, sol[N-3, :], color=:red, linewidth=3.0)

The result is at the top of this post!

While this is a largely academic example (we could solve this problem analytically) there are lots of extensions that can be made with this approach.

Installing QNX on Fedora

It turns out that installing QNX on Fedora isn’t that easy unless you know how… Firstly, QNX is a 32bit program and requires the following packages to be installed on Fedora if you don’t already have them (e.g., you are running a 64 bit system).

For the installer

• glibc.i686
• libXp.i686
• gtk2.i686

For qde

• libXmu.i686
• libXtst.i686

Then you have to work out why the QNX installer says that “A suitable JVM could not be found.”

It turns out that the QNX installer is searching for a specific version of the Java JRE (which for QNX  6.5.0 is Sun Java v1.5, though it doesn’t seem to care which update version).

The only way to find this out is to make use of the log function of the installer. Running the installer with “-is:log log.txt” will create a log file called log.txt and in there you’ll find a line which tells you the version that QNX is looking for; for me it was “Sun Microsystems Java Runtime Environment (JRE) 1.5 for Linux”. You then need to download the appropriate Java package from Oracle (the current owners of Java) or elsewhere.

Edit: it turns out that under Linux, installing service pack 1 deletes some necessary files. To fix this you must specify “-console” on the installation command line like below. (This forces QNX to use the command line based installer which works correctly unlike the graphical one…)

Once downloaded and unpacked, simply set the QNX_JAVAHOME environment variable to the correct path and run the installer. (You don’t need to permanently install the JRE since QNX installs it’s own version; it’s only needed for the installer.) The command line for me was “sudo -E QNX_JAVAHOME=/tmp/jre1.5.0_22/ ./qnxsdp-6.5.0SP1-201206271006-linux.bin – console”.

Once the base QNX package is installed, you can install service pack 1 on top. This time QNX needs to use the Java installation that it just installed alongside itself. For me this meant that I didn’t need to specify the Java path for the service pack. Again, the command line for me was “sudo -E ./qnxsdp-6.5.0SP1-201206271006-linux.bin -console”.

Once all that is done (and you’ve activated the software), it should all work!