You don't know what to choose between Matlab and Python :

Try

Outline

  • State of the art : the two language problem
  • What is Julia
  • Why Julia is fast
  • Julia Ecosystem & interop

Matlab

Pros :

  • Polished product with support
  • Simulink
  • High level syntax

Cons :

  • Closed : not everyone has access to it, impossible to put a demonstrator online.
  • Slow loops (better since 2015) : not everything is pretty once vectorized
  • Not fast per se (Fortran bindings), memory management is difficult
  • Not a generalist language (cumbersome to put a demo on the web e.g.)
  • €€ each year for a lab

Python

Pros :

  • Free and open-source, generalist, widly used outside scientific community
  • Lot of scientific communities are embracing it
  • Lots of efforts to make it fast (numba, ...)

Cons :

  • Scientific computing is not native :
    • all fallback to C/Fortran black-boxes -> limit flexibility
  • Object Oriented paradigm can be cumbersome for scientific code

Science and the two languages problem

Scientists need to easily explore new ideas :

  • Need for mathematical abstractions
  • Need for customizations (avoid black boxes, try variations)

  • But also need for performance (intrinsic goal or to allow re-use)

What is done now (when you need to go further than using existing packages, for e.g. data analysis)

1) Prototyping in R/Python/Matlab

2) Rewriting whole or parts (if money/man power) to a performant low-level language as C/Fortran

Yet another language

Here comes Julia :

  • innovative open-source generalist programming language with scientific computing at its core
  • easy as Matlab, fast as Fortran, flexible as Python, deep as LISP
  • leverages (for now...) major C/Fortran science packages, e.g. LAPACK, MPI, FFTW...
  • 5th language to run at 1 petaflops (Celeste project), after assembly, Fortran, C, C++
  • State of the art packages in Optimization (JUMP), Differential Equations (DifEq), ... Well positioned for ML (Flux, Knet, autodiff...)
  • solves the "two-languages problem"

Background

  • origins at MIT, UCSB applied math & computer science, 2010
  • founders Viral Shah, Jeff Bezanson, Stefan Karpinsky (now at Julia Computing LLC), Alan Edelman (MIT)
  • ~10-person core language team, ~870 contributors, ~2500 registered packages
  • support from Intel, Microsoft, Wall St., Moore Foundation, NumFocus
  • julia-0.1 released in 2012, julia-1.0 in August 2018, now julia-1.2

Its goal : Solve the two-languages problem by having a dynamic, high level language with good mathematical expressivity able to be as fast as C/Fortran.

Disclaimer

  • I am not a Julia expert : I've been following the language for ~ two years and I am currently porting python research codes to Julia.
  • Julia (like Python...) is not a drop-in replacement for Matlab, it is an alternative.
  • Don't fix that ain't broken :
    • if Matlab/R/Python/C is good for you, no need to change
    • but if you are sometimes faced with some limitations of these languages, give Julia a try

Julia: expressive as Matlab

Julia can as expressive as Matlab for interactive calculations with built-in numerical libraries and plotting. A few examples...

Arrays

In [1]:
A = randn(4,4)       # 4 x 4 matrix of normally distributed random numbers
Out[1]:
4×4 Array{Float64,2}:
 -1.7897    -1.27394   -0.796802   -1.12804  
 -0.843538  -0.607065   0.187264    0.0447423
 -0.83152   -0.483635   0.0594034  -0.607388 
  0.194026   1.4971     0.533445   -0.379708 
In [2]:
A[:,1]                # familiar colon syntax --extract first column of A
Out[2]:
4-element Array{Float64,1}:
 -1.7896952667660175 
 -0.843538211646361  
 -0.8315200775522081 
  0.19402554880895634

Linear Algebra

In [3]:
using LinearAlgebra
b = randn(4)          
x = A\b               # solve Ax=b for x using backslash operator
norm(A*x-b)
Out[3]:
5.551115123125783e-16
In [4]:
U, σ, V = svd(A);     # singular value decomposition --note unicode variable name σ
In [5]:
Σ = diagm(0 => σ)
norm(A - U*Σ*V')      # compute error of SVD factorization A = U Σ V'
Out[5]:
8.164634113057559e-16

Fast as Fortran

In [6]:
function loop()
    a=0 
    for i in 1:1_000_000
        a+=1;
    end
end
@time loop() # compile time
@time loop() # now fast
  0.004979 seconds (20.83 k allocations: 1.173 MiB)
  0.000001 seconds (4 allocations: 160 bytes)

Compare with Matlab (R2018)

I know this is not Matlab-friendly, that's the point !

function [] = loop()
    a=0;
    for i=1:1000000
        a=i+1;
    end
end

f = @() loop();

timeit(f) => 0.0018s

1000x slower !

Benchmarks on common code pattterns

Takeaway: Julia timings are clustered around C timing, Matlab/Python/R orders of magnitude slower.

KS-CNAB2 benchmark: CPU time versus lines of code

Takeaway: The Julia PDE code is almost as compact as Matlab/Python, almost as fast as C/Fortran.

Julia: easy, dynamic, and fast. How?

  • Just-in-time compilation (JIT)
    • user-level code is compiled to machine code on-the-fly
  • Meticulous type system
    • designed to maximize impact of JIT
    • type inference: compiler determines types of variables
  • Multiple dispatch
    • functions are compiled for each set of argument types
    • function dispatch determined at compile time when possible, run time when not

**Just-in-time ahead-of-time compilation

Functions are compiled to machine code when first run. Subsequent runs are as fast as compiled C, C++, Fortran. Not a tracing JIT (like pypy)

Caveats

  • It's a marathon not a sprint : Julia can be slow at first
  • It's an open-source language with lots of good willing contributors, it's not a product (see JuliaPro...)
  • If you directly translate from Matlab to Julia you will not always see an improvement, you have exploit Julia strengths (loops to avoid allocations, type stability, ...)

A julia type

Like a C struct, or a Python object... but without methods

struct Spaceship
           speed::Float64
           Position::Array{Float64,2}
   end

Multiple dispatch

collide(a::Asteroid, s::Spaceship) # a first method of the function collide
collide(s1::Spaceship, s2::Spaceship) # another method of the function collide
In [7]:
abstract type Temperature end

struct Celsius <: Temperature
    value
end

struct Kelvin <: Temperature
   value::Float64 
end

struct Fahrenheit <: Temperature
    value::Float64
end

import Base: promote_rule, convert
promote_rule(::Type{Kelvin}, ::Type{Celsius})     = Kelvin
promote_rule(::Type{Fahrenheit}, ::Type{Celsius}) = Celsius
promote_rule(::Type{Fahrenheit}, ::Type{Kelvin})  = Kelvin

convert(::Type{Kelvin},  t::Celsius)     = Kelvin(t.value + 273.15)
convert(::Type{Kelvin},  t::Fahrenheit)  = Kelvin(Celsius(t))
convert(::Type{Celsius}, t::Kelvin)      = Celsius(t.value - 273.15)
convert(::Type{Celsius}, t::Fahrenheit)  = Celsius((t.value - 32)*5/9)
convert(::Type{Fahrenheit}, t::Celsius)  = Fahrenheit(t.value*9/5 + 32)
convert(::Type{Fahrenheit}, t::Kelvin)   = Fahrenheit(Celsius(t))

import Base: +,-,*
+(x::T, y::T) where {T <: Temperature} = T(x.value + y.value)
-(x::T, y::T) where {T <: Temperature} = T(x.value - y.value)

+(x::Temperature, y::Temperature) = +(promote(x,y)...)
-(x::Temperature, y::Temperature) = -(promote(x,y)...)

*(x::Number, y::T) where {T <: Temperature} = T(x * y.value)

const °C = Celsius(1); const °F = Fahrenheit(1); const K = Kelvin(1);
In [8]:
1K+1°C
Out[8]:
Kelvin(275.15)

Introspection

$$f(x) = 4\, x\, (1-x)$$
In [9]:
f(x) = 4x*(1-x)     # define logistic map
@time f(0.3);       # first run is slow
@time f(0.4);       # second run is a thousand times faster
  0.008778 seconds (18.00 k allocations: 1.003 MiB)
  0.000003 seconds (5 allocations: 176 bytes)

Observe the compilation of $f(x)$ by stages

user Julia -> generic Julia expression -> typed Julia expression -> intermediate compiler language -> machine code

In [11]:
@code_lowered f(0.3)  # show f(x) as generic Julia expression
Out[11]:
CodeInfo(
1 ─ %1 = 4 * x
 %2 = 1 - x
 %3 = %1 * %2
└──      return %3
)
In [12]:
@code_typed f(0.3)    # show f(x) as Julia expr with inferred types, based on the arg types
Out[12]:
CodeInfo(
1 ─ %1 = (Base.mul_float)(4.0, x)::Float64
 %2 = (Base.sub_float)(1.0, x)::Float64
 %3 = (Base.mul_float)(%1, %2)::Float64
└──      return %3
) => Float64
In [13]:
@code_llvm f(0.3)    # show f(x) in intermediate compiler language (LLVM)
;  @ In[9]:1 within `f'
define double @julia_f_12939(double) {
top:
; ┌ @ promotion.jl:314 within `*' @ float.jl:399
   %1 = fmul double %0, 4.000000e+00
; └
; ┌ @ promotion.jl:315 within `-' @ float.jl:397
   %2 = fsub double 1.000000e+00, %0
; └
; ┌ @ float.jl:399 within `*'
   %3 = fmul double %1, %2
; └
  ret double %3
}

Type inference and dispatch

We can see that if we call f on an Integer we get a code specialised for integer (i64)

In [14]:
@code_native f(0.3) # show f(x) in IA-64 assembly language
	.text
; ┌ @ In[9]:1 within `f'
	movabsq	$140409732451304, %rax  # imm = 0x7FB3B039CBE8
; │┌ @ promotion.jl:314 within `*' @ float.jl:399
	vmulsd	(%rax), %xmm0, %xmm1
	movabsq	$140409732451312, %rax  # imm = 0x7FB3B039CBF0
; │└
; │┌ @ promotion.jl:315 within `-' @ float.jl:397
	vmovsd	(%rax), %xmm2           # xmm2 = mem[0],zero
	vsubsd	%xmm0, %xmm2, %xmm0
; │└
; │┌ @ float.jl:399 within `*'
	vmulsd	%xmm0, %xmm1, %xmm0
; │└
	retq
	nopw	%cs:(%rax,%rax)
; └
In [15]:
@code_native f(3)    # show f(x) in IA-64 assembly language
	.text
; ┌ @ In[9]:1 within `f'
; │┌ @ In[9]:1 within `-'
	movl	$1, %eax
	subq	%rdi, %rax
; │└
; │┌ @ int.jl:54 within `*'
	imulq	%rdi, %rax
	shlq	$2, %rax
; │└
	retq
	nopw	%cs:(%rax,%rax)
; └
In [16]:
using LinearAlgebra

# First we write a generic function (that takes an array A and a list of vectors 
# and sums the inner products of A and each vector)
function inner_sum(A,vs)
    t = zero(eltype(A))
    for v in vs
        t += inner(v,A,v)
    end
    return t
end

# we define the inner product used in our inner_sum function
inner(v,A,w) = dot(v,A*w)

## Now we want a new type, the famous One Hot Vector

import Base: *

struct OneHotVector <: AbstractVector{Bool}
  idx::UInt32
  len::UInt32
end

Base.size(xs::OneHotVector) = (Int64(xs.len),)
Base.getindex(xs::OneHotVector, i::Integer) = i == xs.idx
Base.getindex(xs::OneHotVector, ::Colon) = OneHotVector(xs.idx, xs.len)
In [17]:
# It works (generic) ! but it's slow
v = [OneHotVector(i,1000) for i in rand(1:1000,100)]
A=rand(1000,1000)

inner_sum(A,v)
@time inner_sum(A,v)

# And now specify to get speed !
A::AbstractMatrix * b::OneHotVector = A[:, b.idx]
inner(v::OneHotVector,A,w::OneHotVector) = A[v.idx,w.idx] # How to do that in single dispatch ??

inner_sum(A,v)
@time inner_sum(A,v)
  0.037617 seconds (105 allocations: 793.922 KiB)
  0.000003 seconds (5 allocations: 176 bytes)
Out[17]:
51.52316795511521

An ecosystem of packages

  • Most of packages are available on Github (easy collaboration)
  • Main fields are grouped in Github Organizations (see https://julialang.org/community/)
  • Julia comes with a powerful package/environment manager
In [19]:
using Pkg #load the Pkg stdlib
Pkg.activate(".") #activate the local environment
Out[19]:
"/home/raphael/Projets/Perso/tests-julia/julia-intro/Project.toml"

Solving Differential Equations

Solve the Lorenz equations:

$$ \begin{align} \frac{dx}{dt} &= σ(y-x) \\ \frac{dy}{dt} &= x(ρ-z) - y \\ \frac{dz}{dt} &= xy - βz \\ \end{align} $$
In [49]:
#Pkg.add("DifferentialEquations") # add the Differential equation suite
using DifferentialEquations # first time is very slow (precompilation)
using Plots
gr()
Out[49]:
Plots.GRBackend()
In [3]:
function lorenz(du,u,p,t)
 du[1] = 10.0*(u[2]-u[1])
 du[2] = u[1]*(28.0-u[3]) - u[2]
 du[3] = u[1]*u[2] - (8/3)*u[3]
end

u0 = [1.0;0.0;0.0] ; tspan = (0.0,100.0);
prob = ODEProblem(lorenz,u0,tspan)
sol = DifferentialEquations.solve(prob)
Plots.plot(sol,vars=(1,2,3))
Out[3]:
(u1,u2,u3)

Optimization

In [22]:
using JuMP, GLPK, Test

"""
Formulate and solve a simple LP:
    max 5x + 3y
     st 1x + 5y <= 3
         0 <= x <= 2
         0 <= y <= 30
"""
function example_basic()
    model = Model(with_optimizer(GLPK.Optimizer))

    @variable(model, 0 <= x <= 2)
    @variable(model, 0 <= y <= 30)

    @objective(model, Max, 5x + 3y)
    @constraint(model, 1x + 5y <= 3.0)

    println(model)
    JuMP.optimize!(model)

    obj_value = JuMP.objective_value(model)
    x_value = JuMP.value(x)
    y_value = JuMP.value(y)

    println("Objective value: ", obj_value)
    println("x = ", x_value)
    println("y = ", y_value)
    @test obj_value  10.6
    @test x_value  2
    @test y_value  0.2

end

example_basic()
┌ Info: Precompiling GLPK [60bf3e95-4087-53dc-ae20-288a0d20c6a6]
└ @ Base loading.jl:1186
Max 5 x + 3 y
Subject to
 x + 5 y ≤ 3.0
 x ≥ 0.0
 y ≥ 0.0
 x ≤ 2.0
 y ≤ 30.0

Objective value: 10.6
x = 2.0
y = 0.2
Out[22]:
Test Passed
In [2]:
using DifferentialEquations, Measurements, Plots

g = 9.79 ± 0.02; # Gravitational constants
L = 1.00 ± 0.01; # Length of the pendulum

#Initial Conditions
u₀ = [0 ± 0, π / 60 ± 0.01] # Initial speed and initial angle
tspan = (0.0, 6.3)

#Define the problem
function simplependulum(du,u,p,t)
    θ  = u[1]
     = u[2]
    du[1] = 
    du[2] = -(g/L)*θ
end

#Pass to solvers
prob = ODEProblem(simplependulum, u₀, tspan)
sol = DifferentialEquations.solve(prob, Tsit5(), reltol = 1e-6)

# Analytic solution
u = u₀[2] .* cos.(sqrt(g / L) .* sol.t)

plot(sol.t, getindex.(sol.u, 2), label = "Numerical")
plot!(sol.t, u, label = "Analytic")
Out[2]:
0 1 2 3 4 5 6 -0.050 -0.025 0.000 0.025 0.050 Numerical Analytic

Automatic differentiation

In [42]:
#Pkg.add("ForwardDiff")
using ForwardDiff

# Define a function f
f(x::Vector) = sum(sin, x) + prod(tan, x) * sum(sqrt, x);

x = rand(5) # small size for example's sake

# Get the gradient of f
g = x -> ForwardDiff.gradient(f, x); # g = ∇f

@show g(x)

# Get the Hessian
ForwardDiff.hessian(f, x)
g(x) = [2.57287, 2.51714, 2.3224, 1.87393, 1.8107]
Out[42]:
5×5 Array{Float64,2}:
 1.39246  4.80673  4.31075  3.52545  3.74384
 4.80673  1.37213  4.18168  3.41991  3.63178
 4.31075  4.18168  1.32308  3.06703  3.25711
 3.52545  3.41991  3.06703  1.78856  2.66366
 3.74384  3.63178  3.25711  2.66366  2.8778 

Deep learning

Classic MNIST number classification with Flux.jl

In [4]:
using Flux, Flux.Data.MNIST, Statistics
using Flux: onehotbatch, onecold, crossentropy, throttle
using Base.Iterators: repeated
# using CuArrays

# Classify MNIST digits with a simple multi-layer-perceptron

# Load images
imgs = MNIST.images()
# Stack images into one large batch
X = hcat(float.(reshape.(imgs, :))...) |> gpu

labels = MNIST.labels()
# One-hot-encode the labels
Y = onehotbatch(labels, 0:9) |> gpu

# Define the neural network (two Dense layers)
m = Chain(
  Dense(28^2, 32, relu),
  Dense(32, 10),
  softmax) |> gpu

# define loss function and metric
loss(x, y) = crossentropy(m(x), y)
accuracy(x, y) = mean(onecold(m(x)) .== onecold(y))

# create batches of 100
dataset = repeated((X, Y), 100)
# Define callback that show the current loss
evalcb = () -> @show(loss(X, Y))

opt = ADAM()

# Train
Flux.train!(loss, params(m), dataset, opt, cb = throttle(evalcb, 10))
@show accuracy(X, Y)
# Test set accuracy
tX = hcat(float.(reshape.(MNIST.images(:test), :))...) |> gpu
tY = onehotbatch(MNIST.labels(:test), 0:9) |> gpu
@show accuracy(tX, tY)
loss(X, Y) = 2.2864406f0 (tracked)
loss(X, Y) = 1.495227f0 (tracked)
loss(X, Y) = 1.018613f0 (tracked)
loss(X, Y) = 0.7433482f0 (tracked)
loss(X, Y) = 0.60261893f0 (tracked)
loss(X, Y) = 0.51817137f0 (tracked)
loss(X, Y) = 0.46358913f0 (tracked)
loss(X, Y) = 0.41939938f0 (tracked)
loss(X, Y) = 0.39135918f0 (tracked)
accuracy(X, Y) = 0.8994333333333333
accuracy(tX, tY) = 0.9029
Out[4]:
0.9029

Prediction

Predict the number for a given image

In [6]:
n = rand(1:length(MNIST.images(:test)))
print("Prediction:",argmax(tY[:,n])-1)
heatmap(MNIST.images(:test)[n])
Prediction:5
Out[6]:
5 10 15 20 25 5 10 15 20 25

Ecosystem

And much more :

  • JuliaRobotics (real-time !)
  • Stats/Machine Learning
  • Web
  • GPU
  • ...

Interop

Interop possible avec Python, Matlab, R, Java, C/Fortran,...

In [38]:
#Pkg.add("PyCall") # add python binding package

using PyCall 
@pyimport math # import python math library
@show math.sin(math.pi / 4) 
@show sin(pi / 4) ; #Julia native sinus function
math.sin(math.pi / 4) = 0.7071067811865475
sin(pi / 4) = 0.7071067811865475
In [29]:
# call C
t = ccall((:clock, "libc.so.6"), Int32, ())
Out[29]:
854653866

Macros: code that transforms code

In [30]:
# @time macro inserts timing and memory profiling into expression, then evaluates, and prints
@time f(2//3)
  0.076290 seconds (114.51 k allocations: 6.258 MiB, 24.08% gc time)
Out[30]:
8//9
In [31]:
# @which macro determines which function is called, provides link to source code on GitHub
@which exp(π)
Out[31]:
exp(x::Real) in Base.Math at special/exp.jl:73
In [32]:
@which exp(π*im)
Out[32]:
exp(z::Complex) in Base at complex.jl:604

Macros enable run-time code generation and transformation.

Applications :

  • generation and execution of boilerplate code
  • run-time generation and optimization of algorithms, e.g. FFTW, ATLAS
  • symbolic mathematics, automatic differentiation
  • all written like high-level Python, running like compiled C !!!

Parallelism in Julia: just change the array type

Some very trivial examples of Julia's built-in parallelism

In [33]:
using Distributed
#Pkg.add("DistributedArrays")
# add 4 process
addprocs(4);
In [34]:
; cat codes/count_heads.jl
function count_heads(n)
    c::Int = 0
    for i=1:n
        c += rand(Bool)
    end
    c
end
In [44]:
# define function on all process
@everywhere include("codes/count_heads.jl")

# dispatch two tasks
a = @spawn count_heads(10000000)
b = @spawn count_heads(10000000)
(fetch(a)+fetch(b))/20000000 # Get proba of drawing a head
Out[44]:
0.49999545

Parallel loops with reduction

In [48]:
# distribute the simulation on all process and sum results
nheads = @distributed (+) for i=1:200000000
  Int(rand(Bool))
end 
nheads/200000000
Out[48]:
0.499957515

And more :

  • Distributed arrays
  • GPUArrays
  • TPUArrays ...
  • Nested Threading (PARTR)

Just changing the behavior of the underlying structure can bring new hardware support for all packages

Code as a first-class citizen product of research

  • The (new version) Julia package manager has reproductibility at its core (each code project is linked to a deterministic set of dependencies)
  • Creating a Julia package comes with tools for documentation, unit testing, continuous integration
  • New scientific collaborations can be based on software (see the Github organizations such as JuliaDiff, JuliaStats, etc...)

Food for thoughts

Conclusions

Julia

  • Easy as Matlab, fast as Fortran, flexible as Python, deep as Lisp
  • Scientific computing, from interactive exploration to HPC
  • Paradigms (multiple dispatch) that enforce collaboration

Not covered

  • large-scale programming, development ecosystem, environments, debuggers, etc.
  • Abstract Types, compositions, ...
  • rough edges: plotting, static compilation (not quite there), package loading times, young 1.x

Thanks : the Julia community for most of the examples, xkcd

Julia website: http://www.julialang.org, this talk: https://github.com/raphbacher/julia-intro