User Tools

Site Tools


optim


snippet.juliarepl
julia> pkgchk( [  "julia" => v"1.0.2", "Optim" => v"0.17.1" ] )   ## "ForwardDiff" => nothing

Optimization With Optim.jl

  • Optim is the pure Julia implementation, discussed in this chapter. It is one of my favorite Julia packages: native, efficient, maintained.

Most of the examples in this chapter minimize the two-dimensional quadratic polynomial

snippet.julia
myfun(x)= (x[1]2)^2 + (x[2]3)^2 + ((x[1]2)*(x[2]3))^2;

It has an obvious minimum at [2,3], and is an ideal example function around which many optimizers were originally designed. optim-results discusses more difficult test functions.

IMPORTANT: see optim-results for some benchmarks

Overview of Optimization and Algorithms

Available Multivariate Optimization Algorithms

Name Abbrev Derivs? ~Time Notes
Nelder-Mead Simplex NM No 31 works on noncontinuous(?) nonsmooth functions.
Simulated Annealing SAMIN No 4,700 searches for global optima, but is slow
Particle Swarm Prtcl No 333,000 switches algorithms for global optimum search, but is really slow
GradientDescent GrdDsct 1st 19 often a bad idea due to excessive zig-zagging
ConjugateGradient CnjgGrd 1st 20 often better than gradient descent
BFGS BGFS 1st 40 A quasi-Newton method which cleverly extrapolates “implied” second derivative from prev turns.
LBFGS LBFGS 1st 29 Same, but with limited memory
Newton Nwtn 2nd 31 Newton-Raphson requires first and second derivatives.
NewtonTrustRegion NwTst 2nd 50 Like Newton-Raphson, but limits step size.
  • The time estimates are for the sample polynomial function. Optimizers will be faster or slower for other problems. The timings are just to give an idea of orders of magnitude.
  • SAMIN (and the univariate optimizers) requires box constraints. The others do not.
  • All optimizers are using their defaults, a rand(2) starting point, no symbolic user-provided gradients, and no bounds (except SAMIN (-100 to +100)).

Showing Progress

  • use show_trace as an option.

Providing Derivatives (Gradients and Hessians) To Optimizers

You may never need to provide them. optimize() will use numeric differentiation if the optimizer determines it needs any. They tend to work quite well and are worry-free. The drawback is that they require extra function evaluations. If function evaluation is costly, here is what you can do.

(In most of the examples, z2 is a starting value which you can set to zeros(2) or rand(2).) optimize() allows as arguments any of the following:

  1. Direct provision of functions:
    1. a plain function, optimize( myfun, z2 ).
    2. a plain function with user-provided symbolic gradient, optimize( myfun, myfun_gradient, z2 ).
    3. a plain function with user-provided symbolic gradient and Hessian, optimize( myfun, myfun_gradient, myfun_hessian, z2 ).
  2. Indirect provision via a packaged object:
    1. a OnceDifferentiated object, which holds the function and the gradient. It can be created by automatic differentiation or by passing a user f into OnceDifferentiated.
    2. a TwiceDifferentiated object, which holds the function, its gradient, and its Hessian. It can be created by automatic differentiation or by passing a user f and g into TwiceDifferentiated.

Again, optimize will use differencing for derivatves by default, so providing derivatives is always optional. Numeric differencing is recommended when function evaluation itself is quick.

If you do need exact derivatives, perhaps because they occasionally can be more accurate, perhaps because calculation of the function and thus differences is expensive, you have two choices:

  1. You can code the symbolic derivatives into Julia.
  2. You can use automatic differentiation, such as optimize( TwiceDifferentiable(f, zeros(n); autodiff = :forward), zeros(n), BGFS(), Optim.Options() ) . However, automatic differentiation works by overloading Julia's elementary math, which can also slow down performance.

* WARNING There are some common gotchas when coding your own gradients and Hessians. If you want to provide them, please read the appendix section providing_derivatives_to_optimizers first.

Box Constraints

All optimizers in Optim allow box constraints. See the example below.

Input Options

  • Setting Termination Conditions: x_tol= 1e-32, f_tol=1e-32, g_tol=1e-8, f_calls_limit=0, g_calls_limit=0, h_calls_limit=0, iterations=1_000
  • Setting Verbose Printouts and Intervention : store_trace=false, show_trace=false, extended_trace=false, show_every=0, callback=function, time_limit=NaN.
  • Options can be given as optimize(..., Optim.Options(g_tol=1e-12)) or optimize(..., g_tol=1e-12).
  • Oddly, there is no input option to change from minimization to maximization

Output Overview

Both Univariate and Multivariate Results

Call Sample Notes
summary(res) “SAMIN” method, usually as input
Optim.minimizer(res) [ 2.0, 3.0 ] the optimal value
Optim.minimum(res) 3.534e-13 function value at optimum
Optim.iterations(res) 3451 number of iterations
Optim.iteration_limit_reached(res) false obvious
Optim.trace(res) n/a only retained if store_trace=true in optimize
Optim.x_trace(res) n/a only retained if store_trace=true in optimize
Optim.f_trace(res) n/a only retained if store_trace=true in optimize
Optim.f_calls(res) 3451 number of function calls
Optim.converged(res) false result

Only Univariate Results

Call Sample Notes
Optim.lower_bound(res) -100 lower bound, usually as input
Optim.upper_bound(res) 100 upper bound, usually as input
Optim.x_lower_trace(res) only retained if store_trace=true in optimize
Optim.x_upper_trace(res) only retained if store_trace=true in optimize
Optim.rel_tol(res) only retained if store_trace=true in optimize
Optim.abs_tol(res) only retained if store_trace=true in optimize

Only Multivariate Results

Call Sample Notes
Optim.g_norm_trace(res) n/a only retained if store_trace=true in optimize
Optim.g_calls(res) 0 only retained if store_trace=true in optimize
Optim.x_converged(res) false only retained if store_trace=true in optimize
Optim.f_converged(res) false only retained if store_trace=true in optimize
Optim.g_converged(res) false only retained if store_trace=true in optimize
Optim.initial_state(res) [0.0,0.0] user input

Gradient-Free Optimizers

Nelder-Mead Simplex (Amoeba)

Nelder-Mead Simplex is (sort of) the multi-dimensional equivalent of the golden-section search in one dimension. This algorithm is not an equivalent for the golden-section search, which brackets a minimum and is guaranteed to find it. Instead, there are beautiful and intuitive illustrations at Wikipedia. The best description is not that Nelder-Mead looks less like an amoeba contractor, and more like an amoeba crawler.

snippet.juliarepl
julia> using Optim

julia> myfun(x)= (x[1]2)^2 + (x[2]3)^2 + ((x[1]2)*(x[2]3))^2;

julia> optimize( myfun, [0.0, 0.0], NelderMead(), Optim.Options(iterations=100_000) )
Results of Optimization Algorithm
 * Algorithm: Nelder-Mead
 * Starting Point: [0.0,0.0]
 * Minimizer: [2.0000267077758536,3.0000220787571954]
 * Minimum: 1.200777e–09
 * Iterations: 49
 * Convergence: true
   *  √(Σ(yᵢ-ȳ)²)/n < 1.0e–08: true
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 98

Simulated Annealing

Simulated Annealing (Metropolis-Hastings) seems to be a form of Monte-Carlo Markov Chain simulation, in which random probabilistic guesses are salted (and more after a while when the lowest value(s) so far seems pretty good already).

WARNING This requires the master and not the release, so use Pkg.checkout("Optim"). Is there a version without bounds??

snippet.juliarepl
julia> using Optim

julia> using Random; Random.seed!(0);

julia> myfun(x)= (x[1]2)^2 + (x[2]3)^2 + ((x[1]2)*(x[2]3))^2;

julia> optimize( myfun, fill(100.0,2), fill(+100.0,2), [0.0,0.0], SAMIN(), Optim.Options(iterations=100_000) )
================================================================================
SAMIN results
==> Normal convergence <==
total number of objective function evaluations: 22101

     Obj. value:      0.0000000000

       parameter      search width
         2.00000           0.00000
         3.00000           0.00000
================================================================================

Results of Optimization Algorithm
 * Algorithm: SAMIN
 * Starting Point: [0.0,0.0]
 * Minimizer: [2.000000006088701,2.999999998587848]
 * Minimum: 3.906645e–17
 * Iterations: 22101
 * Convergence: false
   * |x - x'| ≤ 0.0e+00: false
     |x - x'| = NaN
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
     |f(x) - f(x')| = NaN |f(x)|
   * |g(x)| ≤ 0.0e+00: false
     |g(x)| = NaN
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 22101
 * Gradient Calls: 0

Particle Swarm

Particle Swarm tries to switch around search heuristics, also probabilistically.

snippet.juliarepl
julia> using Optim

julia> myfun(x)= (x[1]2)^2 + (x[2]3)^2 + ((x[1]2)*(x[2]3))^2;

julia> using Random; Random.seed!(0);

julia> optimize( myfun, [0.0,0.0], ParticleSwarm(), Optim.Options(iterations=100_000))
Results of Optimization Algorithm
 * Algorithm: Particle Swarm
 * Starting Point: [0.0,0.0]
 * Minimizer: [2.0,3.0]
 * Minimum: 0.000000e+00
 * Iterations: 100000
 * Convergence: false
   * |x - x'| ≤ 0.0e+00: false
     |x - x'| = NaN
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
     |f(x) - f(x')| = NaN |f(x)|
   * |g(x)| ≤ 1.0e–08: false
     |g(x)| = NaN
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: true
 * Objective Calls: 400000
 * Gradient Calls: 0

Gradient-Based Hessian-Free Optimizers

Steepest GradientDescent

snippet.juliarepl
julia> using Optim

julia> myfun(x)= (x[1]2)^2 + (x[2]3)^2 + ((x[1]2)*(x[2]3))^2;

julia> optimize( myfun, [0.0,0.0], GradientDescent())
Results of Optimization Algorithm
 * Algorithm: Gradient Descent
 * Starting Point: [0.0,0.0]
 * Minimizer: [2.0,3.0]
 * Minimum: 0.000000e+00
 * Iterations: 6
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: false
     |x - x'| = 1.52e–07
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
     |f(x) - f(x')| = Inf |f(x)|
   * |g(x)| ≤ 1.0e–08: true
     |g(x)| = 2.22e–16
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 19
 * Gradient Calls: 19

Conjugate Gradient Descent

Usually avoids excessive zig-zagging of steepest gradient descent:

snippet.juliarepl
julia> using Optim

julia> myfun(x)= (x[1]2)^2 + (x[2]3)^2 + ((x[1]2)*(x[2]3))^2;

julia> optimize( myfun, [0.0,0.0], ConjugateGradient())
Results of Optimization Algorithm
 * Algorithm: Conjugate Gradient
 * Starting Point: [0.0,0.0]
 * Minimizer: [2.000000000000004,3.0000000000000004]
 * Minimum: 1.617165e–29
 * Iterations: 7
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: false
     |x - x'| = 1.68e–07
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
     |f(x) - f(x')| = 1.77e+15 |f(x)|
   * |g(x)| ≤ 1.0e–08: true
     |g(x)| = 7.77e–15
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 15
 * Gradient Calls: 9

BFGS

The classic quasi-Newton workhorse, which internally approximates the second derivative with how the gradients have changed across iterations.

snippet.juliarepl
julia> using Optim

julia> myfun(x)= (x[1]2)^2 + (x[2]3)^2 + ((x[1]2)*(x[2]3))^2;

julia> optimize( myfun, [0.0,0.0], BFGS())
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [0.0,0.0]
 * Minimizer: [2.000000000000021,3.0000000000000027]
 * Minimum: 4.427482e–28
 * Iterations: 7
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: false
     |x - x'| = 1.75e–07
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
     |f(x) - f(x')| = 7.04e+13 |f(x)|
   * |g(x)| ≤ 1.0e–08: true
     |g(x)| = 4.15e–14
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 21
 * Gradient Calls: 21

The limited-memory version is nearly as good, requiring just one extra call.

snippet.juliarepl
julia> using Optim

julia> myfun(x)= (x[1]2)^2 + (x[2]3)^2 + ((x[1]2)*(x[2]3))^2;

julia> optimize( myfun, [0.0,0.0], LBFGS())
Results of Optimization Algorithm
 * Algorithm: L-BFGS
 * Starting Point: [0.0,0.0]
 * Minimizer: [1.999999999999605,2.9999999999999294]
 * Minimum: 1.610245e–25
 * Iterations: 7
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: false
     |x - x'| = 4.53e–07
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
     |f(x) - f(x')| = 1.32e+12 |f(x)|
   * |g(x)| ≤ 1.0e–08: true
     |g(x)| = 7.90e–13
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 22
 * Gradient Calls: 22

Hessian-Based Optimizers

Newton-Raphson

snippet.juliarepl
julia> using Optim

julia> myfun(x)= (x[1]2)^2 + (x[2]3)^2 + ((x[1]2)*(x[2]3))^2;

julia> optimize( myfun, [0.0,0.0], Newton())
Results of Optimization Algorithm
 * Algorithm: Newton's Method
 * Starting Point: [0.0,0.0]
 * Minimizer: [2.0000000000006297,3.0000000000166733]
 * Minimum: 2.783965e–22
 * Iterations: 5
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: false
     |x - x'| = 6.05e–04
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
     |f(x) - f(x')| = 1.32e+15 |f(x)|
   * |g(x)| ≤ 1.0e–08: true
     |g(x)| = 3.33e–11
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 15
 * Gradient Calls: 15
 * Hessian Calls: 5

Trust regions limit the step size to something sensible (i.e., so that Newton does not step out of bounds so easily).

snippet.juliarepl
julia> using Optim

julia> myfun(x)= (x[1]2)^2 + (x[2]3)^2 + ((x[1]2)*(x[2]3))^2;

julia> optimize( myfun, [0.0,0.0], NewtonTrustRegion())
Results of Optimization Algorithm
 * Algorithm: Newton's Method (Trust Region)
 * Starting Point: [0.0,0.0]
 * Minimizer: [2.0,3.0]
 * Minimum: 0.000000e+00
 * Iterations: 8
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: false
     |x - x'| = 8.53e–09
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
     |f(x) - f(x')| = Inf |f(x)|
   * |g(x)| ≤ 1.0e–08: true
     |g(x)| = 2.22e–16
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 9
 * Gradient Calls: 9
 * Hessian Calls: 7

Optimization With Box Constraints

The call arguments of optimize change when box constraints are included. Instead of passing a function designating the optimizer, such as NelderMead()), one passes Fminbox{NelderMead}() (The first called a function which returns a complex object which is then passed by the user into optimize; the second passes along the name of the function as a type.)

NelderMead() ⟶ Fminbox{NelderMead}()

The second and third arguments in optimize have to be the lower and upper bounds. The Optim.Options() argument seems to be inconsistent with Fminbox (and may screw up the entire optimization with or without an error)! The initial value has to be inside the box.

snippet.juliarepl
julia> using Optim

julia> myfun(x)= (x[1]2)^2 + (x[2]3)^2 + ((x[1]2)*(x[2]3))^2;

julia> initial_x= [ 0.0, 0.0 ];

julia> optimize(myfun, initial_x, NelderMead() )               ## without the box constraint
Results of Optimization Algorithm
 * Algorithm: Nelder-Mead
 * Starting Point: [0.0,0.0]
 * Minimizer: [2.0000267077758536,3.0000220787571954]
 * Minimum: 1.200777e–09
 * Iterations: 49
 * Convergence: true
   *  √(Σ(yᵢ-ȳ)²)/n < 1.0e–08: true
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 98

julia> lower= [999.9,999.9 ]; upper= [ 999, 2.9 ];

julia> results = optimize(myfun, lower, upper, initial_x)	## box constraint form: x[2] is maxed at 2.9
┌ Warning: Linesearch failed, using alpha = 2.570171260006581 and exiting optimization.
│ The linesearch exited with message:
│ Linesearch failed to converge, reached maximum iterations 50.
└ @ Optim ~/.julia/packages/Optim/ULNLZ/src/utilities/perform_linesearch.jl:47
Results of Optimization Algorithm
 * Algorithm: Fminbox with L-BFGS
 * Starting Point: [0.0,0.0]
 * Minimizer: [2.0000000000004565,2.8999999999999995]
 * Minimum: 1.000000e–02
 * Iterations: 8
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: true
     |x - x'| = 0.00e+00
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: true
     |f(x) - f(x')| = 0.00e+00 |f(x)|
   * |g(x)| ≤ 1.0e–08: false
     |g(x)| = 2.00e–01
   * Stopped by an increasing objective: true
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 1425
 * Gradient Calls: 1425
snippet.juliarepl
julia> using Optim

julia> f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
f (generic function with 1 method)

julia> lower = [1.25,2.1]
2-element Array{Float64,1}:
  1.252.1

julia> upper = [Inf, Inf]
2-element Array{Float64,1}:
 Inf
 Inf

julia> initial_x = [2.0, 2.0]
2-element Array{Float64,1}:
 2.0
 2.0

julia> inner_optimizer = GradientDescent()
GradientDescent{LineSearches.InitialPrevious{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Nothing,getfield(Optim, Symbol("##12#14"))}(LineSearches.InitialPrevious{Float64}
  alpha: Float64 1.0
  alphamin: Float64 0.0
  alphamax: Float64 Inf
, LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}
  delta: Float64 0.1
  sigma: Float64 0.9
  alphamax: Float64 Inf
  rho: Float64 5.0
  epsilon: Float64 1.0e–6
  gamma: Float64 0.66
  linesearchmax: Int64 50
  psi3: Float64 0.1
  display: Int64 0
  mayterminate: Base.RefValue{Bool}
, nothing, getfield(Optim, Symbol("##12#14"))(), Flat())

julia> results = optimize(f, lower, upper, initial_x, Fminbox(inner_optimizer))
Results of Optimization Algorithm
 * Algorithm: Fminbox with Gradient Descent
 * Starting Point: [2.0,2.0]
 * Minimizer: [1.2500000000000002,1.5626480963230658]
 * Minimum: 6.250219e–02
 * Iterations: 11
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: false
     |x - x'| = 2.22e–16
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: true
     |f(x) - f(x')| = 0.00e+00 |f(x)|
   * |g(x)| ≤ 1.0e–08: false
     |g(x)| = 4.26e–01
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 818877
 * Gradient Calls: 818877
  • WARNING In 0.6.2, with 0.14.1, adding Optim.Options() seems to be incompatible with Fminbox. optimize(myfun, [ 50.0, 50.0 ], fill(6.0, 2), fill(100.0, 2), Fminbox{NelderMead}(), Optim.Options(iterations=200000)) fails.

Passing Along Non-Optimization Parameters to User Function

refer to [https://github.com/JuliaNLSolvers/Optim.jl/blob/master/docs/src/algo/autodiff.md].

You can either use global variables, or more elegantly a version of the following (here, two-plus-one parameters):

snippet.juliarepl
julia> using Optim

julia> msumdeltas(x::Vector{Float64}, p::Vector{Float64}, m::Vector{Float64})::Float64 = sum( m.*(x .- p).^2 );

julia> optimize( xf->msumdeltas( xf, [ 12.0, 14.0 ], [ 1.0, 1.0 ] ),  [ 0.0, 0.0 ] )
Results of Optimization Algorithm
 * Algorithm: Nelder-Mead
 * Starting Point: [0.0,0.0]
 * Minimizer: [12.000018215395523,14.000051397940497]
 * Minimum: 2.973549e–09
 * Iterations: 62
 * Convergence: true
   *  √(Σ(yᵢ-ȳ)²)/n < 1.0e–08: true
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 120

Univariate Function Optimization

There are two more optimizers for univariate cases, Brent and GoldenSection. To use them, omit the starting point and provide brackets.

snippet.juliafixme
[download only julia statements]
julia> using Optim
 
julia> optimize( x->(3*(x-4)^2), -100, +100, Brent() )
Results of Optimization Algorithm
 * Algorithm: Brent's Method
 * Search Interval: [-100.000000, 100.000000]
 * Minimizer: 4.000000e+00
 * Minimum: 0.000000e+00
 * Iterations: 5
 * Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true
 * Objective Function Calls: 6
 
julia> optimize( x->(3*(x-4)^2), -100, +100, GoldenSection() )
Results of Optimization Algorithm
 * Algorithm: Golden Section Search
 * Search Interval: [-100.000000, 100.000000]
 * Minimizer: 4.000000e+00
 * Minimum: 6.668724e-16
 * Iterations: 44
 * Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true
 * Objective Function Calls: 45

More Function Examples and Comparisons

Backmatter

Useful Packages on Julia Repository

  • Optim is a pure Julia implementation, discussed in this chapter.
  • NLopt is a mixed other languages and Julia implementation
  • Blackbox focuses on non-gradient global optimization, with support for parallelism.

Notes

References

Providing Derivatives To Optimizers

The base example is still the two-dimensional quadratic polynomial with minimum myfun([2,3])=0 with the following exact gradients and Hessian:

snippet.julia
myfun(x)= (x[1]2)^2 + (x[2]3)^2 + ((x[1]2)*(x[2]3))^2;
myfungradients(x)= [ 2*(1 + (x[2]3)^2)*(x[1]2 ),  2*(1 + (x[1]2)^2)*(x[2]3) ]
myfunhessian(x)= reshape( [ 2+2*(x[2]3)^2, 4*(x[1]2)*(x[2]3), 4*(x[1]2)*(x[2]3) , 2+2*(x[2]2)^2 ], 2, 2 )

A complete copy-and-paste example can be found towards the rear of this section.


Some optimization algorithms benefit or require gradient and Hessian information, i.e., first and second derivatives. Julia supports three ways for optimizers to obtain them:

  1. working out and providing symbolic derivatives in Julia function form;
  2. letting a built-in exact differentiator take care of it (explained below);
  3. using a built-in numerical differencing method.

The derivatives can be passed into a <Some>Differentiable object, which in turn can be passed into the optimizing functions. The optimizers know how to use these objects (picking off the derivatives or ignoring them). If only the optimizing function itself is passed, and an optimizer requires a gradient, optimize falls back on numerical differencing. If optimizing function and gradients and/or Hessians are passed, optimize can use them.

User-Provided Symbolic Derivatives

The syntax by which users can pass symbolic derivatives is unusual. The gradient and hessian functions that the user passes in require some storage as their first argument, which becomes a cache that is modified upon each call. This is for speed. The example here continues working with the symbolic myfun(), myfungradients(), and myfunhessian() for clarity. For even faster speed, these functions should not be called but coded directly into f(), g!(), and h!().

WARNING: Pay careful attention to how g!() and h!() are defined, and read the description at the end of the example.

snippet.juliarepl
julia> myfun(x)= (x[1]2)^2 + (x[2]3)^2 + ((x[1]2)*(x[2]3))^2;

julia> myfungradients(x)= [ 2*(1 + (x[2]3)^2)*(x[1]2 ),  2*(1 + (x[1]2)^2)*(x[2]3) ];

julia> myfunhessian(x)= reshape( [ 2+2*(x[2]3)^2, 4*(x[1]2)*(x[2]3), 4*(x[1]2)*(x[2]3) , 2+2*(x[2]2)^2 ], 2, 2 );

julia> function f(x) myfun(x) end;#function

julia> function g!(G,x) Gcp=myfungradients(x); G[1]=Gcp[1]; G[2]=Gcp[2]; G; end;#function  ## must not use G=Gin!!!

julia> function fg!(G,x) g!(G,x); f(x) end;#function   (both! can be used to shorten calculations)

julia> function h!(H,x) Hcp=myfunhessian(x); for i=1:length(x),j=1:length(x) ; H[i,j]= Hcp[i,j]; end; end;#function

julia> z2= zeros(2);

julia> using Optim           ## needed

julia> optimize( f, g!, z2, BFGS())
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [0.0,0.0]
 * Minimizer: [2.000000000000021,3.0000000000000027]
 * Minimum: 4.427482e–28
 * Iterations: 7
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: false
     |x - x'| = 1.75e–07
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
     |f(x) - f(x')| = 7.04e+13 |f(x)|
   * |g(x)| ≤ 1.0e–08: true
     |g(x)| = 4.17e–14
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 21
 * Gradient Calls: 21

WARNING It is very tempting to code the more elegant

snippet.juliarepl
julia> function g!(G,x) G=myfungradients(x) end;#function

However, this is a bug. The reason is that the function then replaces the passed-in G, which means that the global G (outside g!()) never gets to see the changes. Instead, G itself must be modified. This can be done with

snippet.juliarepl
julia> function g!(G,x) Gcp=myfungradients(x); G[1]=Gcp[1]; G[2]=Gcp[2]; G; end;#function

julia> function g!(G,x); G .= myfungradients(x) end;#function  ## also works

Built-In Automatic Exact Differentiation

ForwardDiff

Automatic differentiation is built around the ForwardDiff.jl package, which is pulled in by Optim.jl. ForwardDiff is pickier about the way the user function is defined. In particular, although it understands Vector, it does not understand Vector{Float64}. (This can mess with dispatch precedence.) Forwarddiff works by redefining every elementary math operation and function in Julia. The math operations still returns a numeric value (as always), but they then also keep internally a symbolic representation, so that ForwardDiff can analyze the function progress and apply the chainrule piece by piece by piece.

snippet.juliarepl
julia> using ForwardDiff

julia> myfun(x)= (x[1]2)^2 + (x[2]3)^2 + ((x[1]2)*(x[2]3))^2;

julia> ForwardDiff.gradient( myfun, [ 2.0, 3.0 ] )          ## at the optimum, it is zero in both directions
2-element Array{Float64,1}:
 0.0
 0.0

WARNING Do not require the input arguments to be typed (Float) on the user function that is to be optimized. This messes up the dispatch.

Kinks in ForwardDiff

Kinks are allowed, but ForwardDiff ignores looking towards its rear, and just looks towards its front. Thus, it gives the mathematically dubious result:

snippet.juliarepl
julia> using ForwardDiff

julia> sumabsf(x)= sum(abs.(x))
sumabsf (generic function with 1 method)

julia> ForwardDiff.gradient(sumabsf, [ 0.0 ] )              ## warning --- mathematically strange
1-element Array{Float64,1}:
 1.0

julia> ForwardDiff.gradient(sumabsf, [0.0 ] )             ## yes, Julia has a negative 0.0
1-element Array{Float64,1}:
 –1.0

ForwardDiff Interface to Optim: <Use>Differentiable

Optim provides two convenience functions that contain the useful differentials (and in the same form as described above). Although the end user rarely uses them to obtain gradients and Hessians, we can show how the optimizers do it.

snippet.juliarepl
julia> using Optim

julia> myfun(x)= (x[1]2)^2 + (x[2]3)^2 + ((x[1]2)*(x[2]3))^2;          ## could be x::Vector, but not x::Vector{Float64}

julia> od = OnceDifferentiable(myfun, [NaN,NaN]; autodiff = :forward);     ## [NaN,NaN] is needed to know dimensionality of myfun

julia> myfun( [2.0,3.0] ) == od.f( [2.0, 3.0] )                            ## confirming that it copied the function
true

julia> od.df( [NaN,NaN], [2.0, 3.0] )                                      ## first argument is ignored.
2-element Array{Float64,1}:
 0.0
 0.0

julia> td = TwiceDifferentiable(myfun, [ NaN, NaN ]; autodiff = :forward); ## works analogously
  • To force Optim not to use differentials (even numeric ones), define nd = NonDifferentiable(fun, z2).
  • The od and td container objects contain both the function and its derivatives, which can be passed to optimization functions in lieu of the function itself.
  • Again, there are type constraints on the user function. myfun(x::Vector{Float64})= ... does not work, because Float64 is too specific and steals the dispatch from the ForwardDiff.jl differentiator.

Built-In Numeric Differentiation

Simply do not provide the gradient and/or Hessian to optimize(), and Optim.jl will by itself fall back on DiffEqDiffTools.jl.

The Complete Copy-Paste Inputs

snippet.julia
myfun(x)= (x[1]2)^2 + (x[2]3)^2 + ((x[1]2)*(x[2]3))^2;
myfungradients(x)= [ 2*(1 + (x[2]3)^2)*(x[1]2 ),  2*(1 + (x[1]2)^2)*(x[2]3) ]
myfunhessian(x)= reshape( [ 2+2*(x[2]3)^2, 4*(x[1]2)*(x[2]3), 4*(x[1]2)*(x[2]3) , 2+2*(x[2]2)^2 ], 2, 2 )

# for speed, you should code the above directly into the below.

function f(x) myfun(x) end#function
function g!(G,x) G.=myfungradients(x) end#function  ## must not use G=, but G.= to replace external G contents
function fg!(G,x) g!(G,x); f(x) end#function        ## (both! can be used to shorten calculations)
function h!(H,x) H .= myfunhessian(x) end#function  ## must not use H=, but H.= to replace external H contents

z2= zeros(2.0);

using Optim           ## henceforth needed

# show optimize use via function call
optimize( f, g!, z2, BFGS())


# show optimize use via *Differentiable objects

nd = NonDifferentiable( f, z2 )
od = OnceDifferentiable( f, g!, z2 )
td = TwiceDifferentiable( f, g!, h!, z2 )

# show how optimizer functions will internally pick off results

println( ( td.f( [2.0, 2.0] ), td.df( [NaN,NaN], [2.0, 2.0] ), td.h( [NaN NaN;NaN NaN], [2.0, 2.0] ) ) )
println( ( od.f( [2.0, 2.0] ), od.df( [NaN,NaN], [2.0, 2.0] ) ) )

# show optimize use via object

optimize( td, z2, BFGS())
optimize( od, z2, BFGS())

WARNING do not forget the .= inside the g!() and h!() functions!

optim.txt · Last modified: 2018/11/22 20:48 (external edit)