Roots and Solving  Compact Optimization  Linear and Squared Optimization 

julia> pkgchk.( [ "julia" => v"1.0.3", "Optim" => v"0.17.1" ] ); ## "ForwardDiff" => nothing
Most of the examples in this chapter minimize the twodimensional quadratic polynomial
myfun(x)= (x[1]–2)^2 + (x[2]–3)^2 + ((x[1]–2)*(x[2]–3))^2;
It has an obvious minimum at [2,3]
, and is an ideal example function around which many optimizers were originally designed. optimresults discusses more difficult test functions.
IMPORTANT: see optimresults for some benchmarks
Name  Abbrev  Derivs?  ~Time  Notes 

NelderMead Simplex  NM  No  31  works on noncontinuous(?) nonsmooth functions. 
Simulated Annealing  SAMIN  No  4,700  searches for global optima, but is slow 
Particle Swarm  Prtcl  No  333,000  switches algorithms for global optimum search, but is really slow 
GradientDescent  GrdDsct  1st  19  often a bad idea due to excessive zigzagging 
ConjugateGradient  CnjgGrd  1st  20  often better than gradient descent 
BFGS  BGFS  1st  40  A quasiNewton method which cleverly extrapolates “implied” second derivative from prev turns. 
LBFGS  LBFGS  1st  29  Same, but with limited memory 
Newton  Nwtn  2nd  31  NewtonRaphson requires first and second derivatives. 
NewtonTrustRegion  NwTst  2nd  50  Like NewtonRaphson, but limits step size. 
rand(2)
starting point, no symbolic userprovided gradients, and no bounds (except SAMIN (100 to +100)).show_trace
as an option.
You may never need to provide them. optimize()
will use numeric differentiation if the optimizer determines it needs any. They tend to work quite well and are worryfree. The drawback is that they require extra function evaluations. If function evaluation is costly, here is what you can do.
(In most of the examples, z2
is a starting value which you can set to zeros(2)
or rand(2)
.)
optimize()
allows as arguments any of the following:
optimize( myfun, z2 )
.
optimize( myfun, myfun_gradient, z2 )
.
optimize( myfun, myfun_gradient, myfun_hessian, z2 )
.
OnceDifferentiated
object, which holds the function and the gradient. It can be created by automatic differentiation or by passing a user f into OnceDifferentiated.
TwiceDifferentiated
object, which holds the function, its gradient, and its Hessian. It can be created by automatic differentiation or by passing a user f and g into TwiceDifferentiated.
Again, optimize
will use differencing for derivatves by default, so providing derivatives is always optional. Numeric differencing is recommended when function evaluation itself is quick.
If you do need exact derivatives, perhaps because they occasionally can be more accurate, perhaps because calculation of the function and thus differences is expensive, you have two choices:
optimize( TwiceDifferentiable(f, zeros(n); autodiff = :forward), zeros(n), BGFS(), Optim.Options() )
. However, automatic differentiation works by overloading Julia's elementary math, which can also slow down performance.* There are some common gotchas when coding your own gradients and Hessians. If you want to provide them, please read the appendix section providing_derivatives_to_optimizers first.
All optimizers in Optim allow box constraints. See the example below.
x_tol= 1e32
, f_tol=1e32
, g_tol=1e8
, f_calls_limit=0
, g_calls_limit=0
, h_calls_limit=0
, iterations=1_000
store_trace=false
, show_trace=false
, extended_trace=false
, show_every=0
, callback=function
, time_limit=NaN
.
optimize(..., Optim.Options(g_tol=1e12))
or optimize(..., g_tol=1e12)
.
Call  Sample  Notes 

summary(res)  “SAMIN”  method, usually as input 
Optim.minimizer(res)  [ 2.0, 3.0 ]  the optimal value 
Optim.minimum(res)  3.534e13  function value at optimum 
Optim.iterations(res)  3451  number of iterations 
Optim.iteration_limit_reached(res)  false  obvious 
Optim.trace(res)  n/a  only retained if store_trace=true in optimize 
Optim.x_trace(res)  n/a  only retained if store_trace=true in optimize 
Optim.f_trace(res)  n/a  only retained if store_trace=true in optimize 
Optim.f_calls(res)  3451  number of function calls 
Optim.converged(res)  false  result 
Call  Sample  Notes 

Optim.lower_bound(res)  100  lower bound, usually as input 
Optim.upper_bound(res)  100  upper bound, usually as input 
Optim.x_lower_trace(res)  only retained if store_trace=true in optimize 

Optim.x_upper_trace(res)  only retained if store_trace=true in optimize 

Optim.rel_tol(res)  only retained if store_trace=true in optimize 

Optim.abs_tol(res)  only retained if store_trace=true in optimize 
Call  Sample  Notes 

Optim.g_norm_trace(res)  n/a  only retained if store_trace=true in optimize 
Optim.g_calls(res)  0  only retained if store_trace=true in optimize 
Optim.x_converged(res)  false  only retained if store_trace=true in optimize 
Optim.f_converged(res)  false  only retained if store_trace=true in optimize 
Optim.g_converged(res)  false  only retained if store_trace=true in optimize 
Optim.initial_state(res)  [0.0,0.0]  user input 
NelderMead Simplex is (sort of) the multidimensional equivalent of the goldensection search in one dimension. This algorithm is not an equivalent for the goldensection search, which brackets a minimum and is guaranteed to find it. Instead, there are beautiful and intuitive illustrations at Wikipedia. The best description is not that NelderMead looks less like an amoeba contractor, and more like an amoeba crawler.
julia> using Optim julia> myfun(x)= (x[1]–2)^2 + (x[2]–3)^2 + ((x[1]–2)*(x[2]–3))^2; julia> optimize( myfun, [0.0, 0.0], NelderMead(), Optim.Options(iterations=100_000) ) Results of Optimization Algorithm * Algorithm: NelderMead * Starting Point: [0.0,0.0] * Minimizer: [2.0000267077758536,3.0000220787571954] * Minimum: 1.200777e–09 * Iterations: 49 * Convergence: true * √(Σ(yᵢȳ)²)/n < 1.0e–08: true * Reached Maximum Number of Iterations: false * Objective Calls: 99
Simulated Annealing (MetropolisHastings) seems to be a form of MonteCarlo Markov Chain simulation, in which random probabilistic guesses are salted (and more after a while when the lowest value(s) so far seems pretty good already).
This requires the master and not the release, so use Pkg.checkout("Optim")
. Is there a version without bounds??
julia> using Optim julia> using Random; Random.seed!(0); julia> myfun(x)= (x[1]–2)^2 + (x[2]–3)^2 + ((x[1]–2)*(x[2]–3))^2; julia> optimize( myfun, fill(–100.0,2), fill(+100.0,2), [0.0,0.0], SAMIN(), Optim.Options(iterations=100_000) ) ================================================================================ SAMIN results ==> Normal convergence <== total number of objective function evaluations: 22101 Obj. value: 0.0000000000 parameter search width 2.00000 0.00000 3.00000 0.00000 ================================================================================ Results of Optimization Algorithm * Algorithm: SAMIN * Starting Point: [0.0,0.0] * Minimizer: [2.000000006088701,2.999999998587848] * Minimum: 3.906645e–17 * Iterations: 22101 * Convergence: false * x  x' ≤ 0.0e+00: false x  x' = NaN * f(x)  f(x') ≤ 0.0e+00 f(x): false f(x)  f(x') = NaN f(x) * g(x) ≤ 0.0e+00: false g(x) = NaN * Stopped by an increasing objective: false * Reached Maximum Number of Iterations: false * Objective Calls: 22101 * Gradient Calls: 0
Particle Swarm tries to switch around search heuristics, also probabilistically.
julia> using Optim julia> myfun(x)= (x[1]–2)^2 + (x[2]–3)^2 + ((x[1]–2)*(x[2]–3))^2; julia> using Random; Random.seed!(0); julia> optimize( myfun, [0.0,0.0], ParticleSwarm(), Optim.Options(iterations=100_000)) Results of Optimization Algorithm * Algorithm: Particle Swarm * Starting Point: [0.0,0.0] * Minimizer: [2.0,3.0] * Minimum: 0.000000e+00 * Iterations: 100000 * Convergence: false * x  x' ≤ 0.0e+00: false x  x' = NaN * f(x)  f(x') ≤ 0.0e+00 f(x): false f(x)  f(x') = NaN f(x) * g(x) ≤ 1.0e–08: false g(x) = NaN * Stopped by an increasing objective: false * Reached Maximum Number of Iterations: true * Objective Calls: 400000 * Gradient Calls: 0
julia> using Optim julia> myfun(x)= (x[1]–2)^2 + (x[2]–3)^2 + ((x[1]–2)*(x[2]–3))^2; julia> optimize( myfun, [0.0,0.0], GradientDescent()) Results of Optimization Algorithm * Algorithm: Gradient Descent * Starting Point: [0.0,0.0] * Minimizer: [2.0,3.0] * Minimum: 0.000000e+00 * Iterations: 6 * Convergence: true * x  x' ≤ 0.0e+00: false x  x' = 1.52e–07 * f(x)  f(x') ≤ 0.0e+00 f(x): false f(x)  f(x') = Inf f(x) * g(x) ≤ 1.0e–08: true g(x) = 2.22e–16 * Stopped by an increasing objective: false * Reached Maximum Number of Iterations: false * Objective Calls: 19 * Gradient Calls: 19
Usually avoids excessive zigzagging of steepest gradient descent:
julia> using Optim julia> myfun(x)= (x[1]–2)^2 + (x[2]–3)^2 + ((x[1]–2)*(x[2]–3))^2; julia> optimize( myfun, [0.0,0.0], ConjugateGradient()) Results of Optimization Algorithm * Algorithm: Conjugate Gradient * Starting Point: [0.0,0.0] * Minimizer: [2.000000000000004,3.0000000000000004] * Minimum: 1.617165e–29 * Iterations: 7 * Convergence: true * x  x' ≤ 0.0e+00: false x  x' = 1.68e–07 * f(x)  f(x') ≤ 0.0e+00 f(x): false f(x)  f(x') = 1.77e+15 f(x) * g(x) ≤ 1.0e–08: true g(x) = 7.77e–15 * Stopped by an increasing objective: false * Reached Maximum Number of Iterations: false * Objective Calls: 15 * Gradient Calls: 9
The classic quasiNewton workhorse, which internally approximates the second derivative with how the gradients have changed across iterations.
julia> using Optim julia> myfun(x)= (x[1]–2)^2 + (x[2]–3)^2 + ((x[1]–2)*(x[2]–3))^2; julia> optimize( myfun, [0.0,0.0], BFGS()) Results of Optimization Algorithm * Algorithm: BFGS * Starting Point: [0.0,0.0] * Minimizer: [2.000000000000021,3.0000000000000027] * Minimum: 4.427482e–28 * Iterations: 7 * Convergence: true * x  x' ≤ 0.0e+00: false x  x' = 1.75e–07 * f(x)  f(x') ≤ 0.0e+00 f(x): false f(x)  f(x') = 7.04e+13 f(x) * g(x) ≤ 1.0e–08: true g(x) = 4.15e–14 * Stopped by an increasing objective: false * Reached Maximum Number of Iterations: false * Objective Calls: 21 * Gradient Calls: 21
The limitedmemory version is nearly as good, requiring just one extra call.
julia> using Optim julia> myfun(x)= (x[1]–2)^2 + (x[2]–3)^2 + ((x[1]–2)*(x[2]–3))^2; julia> optimize( myfun, [0.0,0.0], LBFGS()) Results of Optimization Algorithm * Algorithm: LBFGS * Starting Point: [0.0,0.0] * Minimizer: [1.999999999999605,2.9999999999999294] * Minimum: 1.610245e–25 * Iterations: 7 * Convergence: true * x  x' ≤ 0.0e+00: false x  x' = 4.53e–07 * f(x)  f(x') ≤ 0.0e+00 f(x): false f(x)  f(x') = 1.32e+12 f(x) * g(x) ≤ 1.0e–08: true g(x) = 7.90e–13 * Stopped by an increasing objective: false * Reached Maximum Number of Iterations: false * Objective Calls: 22 * Gradient Calls: 22
julia> using Optim julia> myfun(x)= (x[1]–2)^2 + (x[2]–3)^2 + ((x[1]–2)*(x[2]–3))^2; julia> optimize( myfun, [0.0,0.0], Newton()) Results of Optimization Algorithm * Algorithm: Newton's Method * Starting Point: [0.0,0.0] * Minimizer: [2.0000000000006297,3.0000000000166733] * Minimum: 2.783965e–22 * Iterations: 5 * Convergence: true * x  x' ≤ 0.0e+00: false x  x' = 6.05e–04 * f(x)  f(x') ≤ 0.0e+00 f(x): false f(x)  f(x') = 1.32e+15 f(x) * g(x) ≤ 1.0e–08: true g(x) = 3.33e–11 * Stopped by an increasing objective: false * Reached Maximum Number of Iterations: false * Objective Calls: 15 * Gradient Calls: 15 * Hessian Calls: 5
Trust regions limit the step size to something sensible (i.e., so that Newton does not step out of bounds so easily).
julia> using Optim julia> myfun(x)= (x[1]–2)^2 + (x[2]–3)^2 + ((x[1]–2)*(x[2]–3))^2; julia> optimize( myfun, [0.0,0.0], NewtonTrustRegion()) Results of Optimization Algorithm * Algorithm: Newton's Method (Trust Region) * Starting Point: [0.0,0.0] * Minimizer: [2.0,3.0] * Minimum: 0.000000e+00 * Iterations: 8 * Convergence: true * x  x' ≤ 0.0e+00: false x  x' = 8.53e–09 * f(x)  f(x') ≤ 0.0e+00 f(x): false f(x)  f(x') = Inf f(x) * g(x) ≤ 1.0e–08: true g(x) = 2.22e–16 * Stopped by an increasing objective: false * Reached Maximum Number of Iterations: false * Objective Calls: 9 * Gradient Calls: 9 * Hessian Calls: 7
The call arguments of optimize
change when box constraints are included. Instead of passing a function designating the optimizer, such as NelderMead()
), one passes Fminbox{NelderMead}()
(The first called a function which returns a complex object which is then passed by the user into optimize; the second passes along the name of the function as a type.)
NelderMead() ⟶ Fminbox{NelderMead}()
The second and third arguments in optimize have to be the lower and upper bounds. The Optim.Options()
argument seems to be inconsistent with Fminbox
(and may screw up the entire optimization with or without an error)! The initial value has to be inside the box.
julia> using Optim julia> myfun(x)= (x[1]–2)^2 + (x[2]–3)^2 + ((x[1]–2)*(x[2]–3))^2; julia> initial_x= [ 0.0, 0.0 ]; julia> optimize(myfun, initial_x, NelderMead() ) ## without the box constraint Results of Optimization Algorithm * Algorithm: NelderMead * Starting Point: [0.0,0.0] * Minimizer: [2.0000267077758536,3.0000220787571954] * Minimum: 1.200777e–09 * Iterations: 49 * Convergence: true * √(Σ(yᵢȳ)²)/n < 1.0e–08: true * Reached Maximum Number of Iterations: false * Objective Calls: 98 julia> lower= [ –999.9, –999.9 ]; upper= [ 999, 2.9 ]; julia> results = optimize(myfun, lower, upper, initial_x) ## box constraint form: x[2] is maxed at 2.9 ┌ Warning: Linesearch failed, using alpha = 2.570171260006581 and exiting optimization. │ The linesearch exited with message: │ Linesearch failed to converge, reached maximum iterations 50. └ @ Optim ~/.julia/packages/Optim/ULNLZ/src/utilities/perform_linesearch.jl:47 Results of Optimization Algorithm * Algorithm: Fminbox with LBFGS * Starting Point: [0.0,0.0] * Minimizer: [2.0000000000004565,2.8999999999999995] * Minimum: 1.000000e–02 * Iterations: 8 * Convergence: true * x  x' ≤ 0.0e+00: true x  x' = 0.00e+00 * f(x)  f(x') ≤ 0.0e+00 f(x): true f(x)  f(x') = 0.00e+00 f(x) * g(x) ≤ 1.0e–08: false g(x) = 2.00e–01 * Stopped by an increasing objective: true * Reached Maximum Number of Iterations: false * Objective Calls: 1425 * Gradient Calls: 1425
julia> using Optim julia> f(x) = (1.0  x[1])^2 + 100.0 * (x[2]  x[1]^2)^2 f (generic function with 1 method) julia> lower = [1.25, –2.1] 2element Array{Float64,1}: 1.25 –2.1 julia> upper = [Inf, Inf] 2element Array{Float64,1}: Inf Inf julia> initial_x = [2.0, 2.0] 2element Array{Float64,1}: 2.0 2.0 julia> inner_optimizer = GradientDescent() GradientDescent{LineSearches.InitialPrevious{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Nothing,getfield(Optim, Symbol("##12#14"))}(LineSearches.InitialPrevious{Float64} alpha: Float64 1.0 alphamin: Float64 0.0 alphamax: Float64 Inf , LineSearches.HagerZhang{Float64,Base.RefValue{Bool}} delta: Float64 0.1 sigma: Float64 0.9 alphamax: Float64 Inf rho: Float64 5.0 epsilon: Float64 1.0e–6 gamma: Float64 0.66 linesearchmax: Int64 50 psi3: Float64 0.1 display: Int64 0 mayterminate: Base.RefValue{Bool} , nothing, getfield(Optim, Symbol("##12#14"))(), Flat()) julia> results = optimize(f, lower, upper, initial_x, Fminbox(inner_optimizer)) Results of Optimization Algorithm * Algorithm: Fminbox with Gradient Descent * Starting Point: [2.0,2.0] * Minimizer: [1.2500000000000002,1.5626480963230658] * Minimum: 6.250219e–02 * Iterations: 11 * Convergence: true * x  x' ≤ 0.0e+00: false x  x' = 2.22e–16 * f(x)  f(x') ≤ 0.0e+00 f(x): true f(x)  f(x') = 0.00e+00 f(x) * g(x) ≤ 1.0e–08: false g(x) = 4.26e–01 * Stopped by an increasing objective: false * Reached Maximum Number of Iterations: false * Objective Calls: 818877 * Gradient Calls: 818877
Optim.Options()
seems to be incompatible with Fminbox
. optimize(myfun, [ 50.0, 50.0 ], fill(6.0, 2), fill(100.0, 2), Fminbox{NelderMead}(), Optim.Options(iterations=200000))
fails.refer to [https://github.com/JuliaNLSolvers/Optim.jl/blob/master/docs/src/algo/autodiff.md].
You can either use global variables, or more elegantly a version of the following (here, twoplusone parameters):
julia> using Optim julia> msumdeltas(x::Vector{Float64}, p::Vector{Float64}, m::Vector{Float64})::Float64 = sum( m.*(x . p).^2 ); julia> optimize( xf>msumdeltas( xf, [ 12.0, 14.0 ], [ 1.0, 1.0 ] ), [ 0.0, 0.0 ] ) Results of Optimization Algorithm * Algorithm: NelderMead * Starting Point: [0.0,0.0] * Minimizer: [12.000018215395523,14.000051397940497] * Minimum: 2.973549e–09 * Iterations: 62 * Convergence: true * √(Σ(yᵢȳ)²)/n < 1.0e–08: true * Reached Maximum Number of Iterations: false * Objective Calls: 120
There are two more optimizers for univariate cases, Brent
and GoldenSection
. To use them, omit the starting point and provide brackets.
[download only julia statements] julia> using Optim julia> optimize( x>(3*(x4)^2), 100, +100, Brent() ) Results of Optimization Algorithm * Algorithm: Brent's Method * Search Interval: [100.000000, 100.000000] * Minimizer: 4.000000e+00 * Minimum: 0.000000e+00 * Iterations: 5 * Convergence: max(x  x_upper, x  x_lower) <= 2*(1.5e08*x+2.2e16): true * Objective Function Calls: 6 julia> optimize( x>(3*(x4)^2), 100, +100, GoldenSection() ) Results of Optimization Algorithm * Algorithm: Golden Section Search * Search Interval: [100.000000, 100.000000] * Minimizer: 4.000000e+00 * Minimum: 6.668724e16 * Iterations: 44 * Convergence: max(x  x_upper, x  x_lower) <= 2*(1.5e08*x+2.2e16): true * Objective Function Calls: 45
The base example is still the twodimensional quadratic polynomial with minimum myfun([2,3])=0
with the following exact gradients and Hessian:
myfun(x)= (x[1]–2)^2 + (x[2]–3)^2 + ((x[1]–2)*(x[2]–3))^2; myfungradients(x)= [ 2*(1 + (x[2]–3)^2)*(x[1]–2 ), 2*(1 + (x[1]–2)^2)*(x[2]–3) ] myfunhessian(x)= reshape( [ 2+2*(x[2]–3)^2, 4*(x[1]–2)*(x[2]–3), 4*(x[1]–2)*(x[2]–3) , 2+2*(x[2]–2)^2 ], 2, 2 )
A complete copyandpaste example can be found towards the rear of this section.
Some optimization algorithms benefit or require gradient and Hessian information, i.e., first and second derivatives. Julia supports three ways for optimizers to obtain them:
The derivatives can be passed into a <Some>
Differentiable object, which in turn can be passed into the optimizing functions. The optimizers know how to use these objects (picking off the derivatives or ignoring them). If only the optimizing function itself is passed, and an optimizer requires a gradient, optimize falls back on numerical differencing. If optimizing function and gradients and/or Hessians are passed, optimize can use them.
The syntax by which users can pass symbolic derivatives is unusual. The gradient and hessian functions that the user passes in require some storage as their first argument, which becomes a cache that is modified upon each call. This is for speed. The example here continues working with the symbolic myfun()
, myfungradients()
, and myfunhessian()
for clarity. For even faster speed, these functions should not be called but coded directly into f()
, g!()
, and h!()
.
: Pay careful attention to how g!()
and h!()
are defined, and read the description at the end of the example.
julia> myfun(x)= (x[1]–2)^2 + (x[2]–3)^2 + ((x[1]–2)*(x[2]–3))^2; julia> myfungradients(x)= [ 2*(1 + (x[2]–3)^2)*(x[1]–2 ), 2*(1 + (x[1]–2)^2)*(x[2]–3) ]; julia> myfunhessian(x)= reshape( [ 2+2*(x[2]–3)^2, 4*(x[1]–2)*(x[2]–3), 4*(x[1]–2)*(x[2]–3) , 2+2*(x[2]–2)^2 ], 2, 2 ); julia> function f(x) myfun(x) end;#function julia> function g!(G,x) Gcp=myfungradients(x); G[1]=Gcp[1]; G[2]=Gcp[2]; G; end;#function ## must not use G=Gin!!! julia> function fg!(G,x) g!(G,x); f(x) end;#function (both! can be used to shorten calculations) julia> function h!(H,x) Hcp=myfunhessian(x); for i=1:length(x),j=1:length(x) ; H[i,j]= Hcp[i,j]; end; end;#function julia> z2= zeros(2); julia> using Optim ## needed julia> optimize( f, g!, z2, BFGS()) Results of Optimization Algorithm * Algorithm: BFGS * Starting Point: [0.0,0.0] * Minimizer: [2.000000000000021,3.0000000000000027] * Minimum: 4.427482e–28 * Iterations: 7 * Convergence: true * x  x' ≤ 0.0e+00: false x  x' = 1.75e–07 * f(x)  f(x') ≤ 0.0e+00 f(x): false f(x)  f(x') = 7.04e+13 f(x) * g(x) ≤ 1.0e–08: true g(x) = 4.17e–14 * Stopped by an increasing objective: false * Reached Maximum Number of Iterations: false * Objective Calls: 21 * Gradient Calls: 21
It is very tempting to code the more elegant
julia> function g!(G,x) G=myfungradients(x) end;#function
However, this is a bug. The reason is that the function then replaces the passedin G
, which means that the global G (outside g!()
) never gets to see the changes. Instead, G
itself must be modified. This can be done with
julia> function g!(G,x) Gcp=myfungradients(x); G[1]=Gcp[1]; G[2]=Gcp[2]; G; end;#function julia> function g!(G,x); G .= myfungradients(x) end;#function ## also works
Automatic differentiation is built around the ForwardDiff.jl
package, which is pulled in by Optim.jl
. ForwardDiff is pickier about the way the user function is defined. In particular, although it understands Vector
, it does not understand Vector{Float64}
. (This can mess with dispatch precedence.) Forwarddiff works by redefining every elementary math operation and function in Julia. The math operations still returns a numeric value (as always), but they then also keep internally a symbolic representation, so that ForwardDiff can analyze the function progress and apply the chainrule piece by piece by piece.
julia> using ForwardDiff julia> myfun(x)= (x[1]–2)^2 + (x[2]–3)^2 + ((x[1]–2)*(x[2]–3))^2; julia> ForwardDiff.gradient( myfun, [ 2.0, 3.0 ] ) ## at the optimum, it is zero in both directions 2element Array{Float64,1}: 0.0 0.0
Do not require the input arguments to be typed (Float) on the user function that is to be optimized. This messes up the dispatch.
Kinks are allowed, but ForwardDiff ignores looking towards its rear, and just looks towards its front. Thus, it gives the mathematically dubious result:
julia> using ForwardDiff julia> sumabsf(x)= sum(abs.(x)) sumabsf (generic function with 1 method) julia> ForwardDiff.gradient(sumabsf, [ 0.0 ] ) ## warning  mathematically strange 1element Array{Float64,1}: 1.0 julia> ForwardDiff.gradient(sumabsf, [ –0.0 ] ) ## yes, Julia has a negative 0.0 1element Array{Float64,1}: –1.0
Optim provides two convenience functions that contain the useful differentials (and in the same form as described above). Although the end user rarely uses them to obtain gradients and Hessians, we can show how the optimizers do it.
julia> using Optim julia> myfun(x)= (x[1]–2)^2 + (x[2]–3)^2 + ((x[1]–2)*(x[2]–3))^2; ## could be x::Vector, but not x::Vector{Float64} julia> od = OnceDifferentiable(myfun, [NaN,NaN]; autodiff = :forward); ## [NaN,NaN] is needed to know dimensionality of myfun julia> myfun( [2.0,3.0] ) == od.f( [2.0, 3.0] ) ## confirming that it copied the function true julia> od.df( [NaN,NaN], [2.0, 3.0] ) ## first argument is ignored. 2element Array{Float64,1}: 0.0 0.0 julia> td = TwiceDifferentiable(myfun, [ NaN, NaN ]; autodiff = :forward); ## works analogously
nd = NonDifferentiable(fun, z2)
.
od
and td
container objects contain both the function and its derivatives, which can be passed to optimization functions in lieu of the function itself.
myfun(x::Vector{Float64})= ...
does not work, because Float64 is too specific and steals the dispatch from the ForwardDiff.jl
differentiator.
Simply do not provide the gradient and/or Hessian to optimize()
, and Optim.jl will by itself fall back on DiffEqDiffTools.jl
.
myfun(x)= (x[1]–2)^2 + (x[2]–3)^2 + ((x[1]–2)*(x[2]–3))^2; myfungradients(x)= [ 2*(1 + (x[2]–3)^2)*(x[1]–2 ), 2*(1 + (x[1]–2)^2)*(x[2]–3) ] myfunhessian(x)= reshape( [ 2+2*(x[2]–3)^2, 4*(x[1]–2)*(x[2]–3), 4*(x[1]–2)*(x[2]–3) , 2+2*(x[2]–2)^2 ], 2, 2 ) # for speed, you should code the above directly into the below. function f(x) myfun(x) end#function function g!(G,x) G.=myfungradients(x) end#function ## must not use G=, but G.= to replace external G contents function fg!(G,x) g!(G,x); f(x) end#function ## (both! can be used to shorten calculations) function h!(H,x) H .= myfunhessian(x) end#function ## must not use H=, but H.= to replace external H contents z2= zeros(2.0); using Optim ## henceforth needed # show optimize use via function call optimize( f, g!, z2, BFGS()) # show optimize use via *Differentiable objects nd = NonDifferentiable( f, z2 ) od = OnceDifferentiable( f, g!, z2 ) td = TwiceDifferentiable( f, g!, h!, z2 ) # show how optimizer functions will internally pick off results println( ( td.f( [2.0, 2.0] ), td.df( [NaN,NaN], [2.0, 2.0] ), td.h( [NaN NaN;NaN NaN], [2.0, 2.0] ) ) ) println( ( od.f( [2.0, 2.0] ), od.df( [NaN,NaN], [2.0, 2.0] ) ) ) # show optimize use via object optimize( td, z2, BFGS()) optimize( od, z2, BFGS())
do not forget the .=
inside the g!()
and h!()
functions!