# julia

### Site Tools

optim

snippet.juliarepl
`julia> pkgchk.( [  "julia" => v"1.0.3", "Optim" => v"0.17.1" ] );   ## "ForwardDiff" => nothing`

# Optimization With Optim.jl

• Optim is the pure Julia implementation, discussed in this chapter. It is one of my favorite Julia packages: native, efficient, maintained.

Most of the examples in this chapter minimize the two-dimensional quadratic polynomial

snippet.julia
`myfun(x)= (x–2)^2 + (x–3)^2 + ((x–2)*(x–3))^2;`

It has an obvious minimum at `[2,3]`, and is an ideal example function around which many optimizers were originally designed. optim-results discusses more difficult test functions.

IMPORTANT: see optim-results for some benchmarks

## Overview of Optimization and Algorithms

### Available Multivariate Optimization Algorithms

Name Abbrev Derivs? ~Time Notes
Nelder-Mead Simplex NM No 31 works on noncontinuous(?) nonsmooth functions.
Simulated Annealing SAMIN No 4,700 searches for global optima, but is slow
Particle Swarm Prtcl No 333,000 switches algorithms for global optimum search, but is really slow
GradientDescent GrdDsct 1st 19 often a bad idea due to excessive zig-zagging
BFGS BGFS 1st 40 A quasi-Newton method which cleverly extrapolates “implied” second derivative from prev turns.
LBFGS LBFGS 1st 29 Same, but with limited memory
Newton Nwtn 2nd 31 Newton-Raphson requires first and second derivatives.
NewtonTrustRegion NwTst 2nd 50 Like Newton-Raphson, but limits step size.
• The time estimates are for the sample polynomial function. Optimizers will be faster or slower for other problems. The timings are just to give an idea of orders of magnitude.
• SAMIN (and the univariate optimizers) requires box constraints. The others do not.
• All optimizers are using their defaults, a `rand(2)` starting point, no symbolic user-provided gradients, and no bounds (except SAMIN (-100 to +100)).

#### Showing Progress

• use `show_trace` as an option.

### Providing Derivatives (Gradients and Hessians) To Optimizers

You may never need to provide them. `optimize()` will use numeric differentiation if the optimizer determines it needs any. They tend to work quite well and are worry-free. The drawback is that they require extra function evaluations. If function evaluation is costly, here is what you can do.

(In most of the examples, `z2` is a starting value which you can set to `zeros(2)` or `rand(2)`.) `optimize()` allows as arguments any of the following:

1. Direct provision of functions:
1. a plain function, `optimize( myfun, z2 )`.
2. a plain function with user-provided symbolic gradient, `optimize( myfun, myfun_gradient, z2 )`.
3. a plain function with user-provided symbolic gradient and Hessian, `optimize( myfun, myfun_gradient, myfun_hessian, z2 )`.
2. Indirect provision via a packaged object:
1. a `OnceDifferentiated` object, which holds the function and the gradient. It can be created by automatic differentiation or by passing a user f into OnceDifferentiated.
2. a `TwiceDifferentiated` object, which holds the function, its gradient, and its Hessian. It can be created by automatic differentiation or by passing a user f and g into TwiceDifferentiated.

Again, `optimize` will use differencing for derivatves by default, so providing derivatives is always optional. Numeric differencing is recommended when function evaluation itself is quick.

If you do need exact derivatives, perhaps because they occasionally can be more accurate, perhaps because calculation of the function and thus differences is expensive, you have two choices:

1. You can code the symbolic derivatives into Julia.
2. You can use automatic differentiation, such as ` optimize( TwiceDifferentiable(f, zeros(n); autodiff = :forward), zeros(n), BGFS(), Optim.Options() ) `. However, automatic differentiation works by overloading Julia's elementary math, which can also slow down performance.

* There are some common gotchas when coding your own gradients and Hessians. If you want to provide them, please read the appendix section providing_derivatives_to_optimizers first.

### Box Constraints

All optimizers in Optim allow box constraints. See the example below.

### Input Options

• Setting Termination Conditions: `x_tol= 1e-32`, `f_tol=1e-32`, `g_tol=1e-8`, `f_calls_limit=0`, `g_calls_limit=0`, `h_calls_limit=0`, `iterations=1_000`
• Setting Verbose Printouts and Intervention : `store_trace=false`, `show_trace=false`, `extended_trace=false`, `show_every=0`, `callback=function`, `time_limit=NaN`.
• Options can be given as `optimize(..., Optim.Options(g_tol=1e-12))` or `optimize(..., g_tol=1e-12)`.
• Oddly, there is no input option to change from minimization to maximization

### Output Overview

#### Both Univariate and Multivariate Results

Call Sample Notes
`summary(res)` “SAMIN” method, usually as input
`Optim.minimizer(res)` [ 2.0, 3.0 ] the optimal value
`Optim.minimum(res)` 3.534e-13 function value at optimum
`Optim.iterations(res)` 3451 number of iterations
`Optim.iteration_limit_reached(res)` false obvious
`Optim.trace(res)` n/a only retained if `store_trace=true` in optimize
`Optim.x_trace(res)` n/a only retained if `store_trace=true` in optimize
`Optim.f_trace(res)` n/a only retained if `store_trace=true` in optimize
`Optim.f_calls(res)` 3451 number of function calls
`Optim.converged(res)` false result

#### Only Univariate Results

Call Sample Notes
`Optim.lower_bound(res)` -100 lower bound, usually as input
`Optim.upper_bound(res)` 100 upper bound, usually as input
`Optim.x_lower_trace(res)` only retained if `store_trace=true` in `optimize`
`Optim.x_upper_trace(res)` only retained if `store_trace=true` in `optimize`
`Optim.rel_tol(res)` only retained if `store_trace=true` in `optimize`
`Optim.abs_tol(res)` only retained if `store_trace=true` in `optimize`

#### Only Multivariate Results

Call Sample Notes
`Optim.g_norm_trace(res)` n/a only retained if `store_trace=true` in optimize
`Optim.g_calls(res)` 0 only retained if `store_trace=true` in `optimize`
`Optim.x_converged(res)` false only retained if `store_trace=true` in `optimize`
`Optim.f_converged(res)` false only retained if `store_trace=true` in `optimize`
`Optim.g_converged(res)` false only retained if `store_trace=true` in `optimize`
`Optim.initial_state(res)` [0.0,0.0] user input

Nelder-Mead Simplex is (sort of) the multi-dimensional equivalent of the golden-section search in one dimension. This algorithm is not an equivalent for the golden-section search, which brackets a minimum and is guaranteed to find it. Instead, there are beautiful and intuitive illustrations at Wikipedia. The best description is not that Nelder-Mead looks less like an amoeba contractor, and more like an amoeba crawler.

snippet.juliarepl
```julia> using Optim

julia> myfun(x)= (x–2)^2 + (x–3)^2 + ((x–2)*(x–3))^2;

julia> optimize( myfun, [0.0, 0.0], NelderMead(), Optim.Options(iterations=100_000) )
Results of Optimization Algorithm
* Starting Point: [0.0,0.0]
* Minimizer: [2.0000267077758536,3.0000220787571954]
* Minimum: 1.200777e–09
* Iterations: 49
* Convergence: true
*  √(Σ(yᵢ-ȳ)²)/n < 1.0e–08: true
* Reached Maximum Number of Iterations: false
* Objective Calls: 99```

### Simulated Annealing

Simulated Annealing (Metropolis-Hastings) seems to be a form of Monte-Carlo Markov Chain simulation, in which random probabilistic guesses are salted (and more after a while when the lowest value(s) so far seems pretty good already). This requires the master and not the release, so use `Pkg.checkout("Optim")`. Is there a version without bounds??

snippet.juliarepl
```julia> using Optim

julia> using Random; Random.seed!(0);

julia> myfun(x)= (x–2)^2 + (x–3)^2 + ((x–2)*(x–3))^2;

julia> optimize( myfun, fill(–100.0,2), fill(+100.0,2), [0.0,0.0], SAMIN(), Optim.Options(iterations=100_000) )
================================================================================
SAMIN results
==> Normal convergence <==
total number of objective function evaluations: 22101

Obj. value:      0.0000000000

parameter      search width
2.00000           0.00000
3.00000           0.00000
================================================================================

Results of Optimization Algorithm
* Algorithm: SAMIN
* Starting Point: [0.0,0.0]
* Minimizer: [2.000000006088701,2.999999998587848]
* Minimum: 3.906645e–17
* Iterations: 22101
* Convergence: false
* |x - x'| ≤ 0.0e+00: false
|x - x'| = NaN
* |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
|f(x) - f(x')| = NaN |f(x)|
* |g(x)| ≤ 0.0e+00: false
|g(x)| = NaN
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 22101

### Particle Swarm

Particle Swarm tries to switch around search heuristics, also probabilistically.

snippet.juliarepl
```julia> using Optim

julia> myfun(x)= (x–2)^2 + (x–3)^2 + ((x–2)*(x–3))^2;

julia> using Random; Random.seed!(0);

julia> optimize( myfun, [0.0,0.0], ParticleSwarm(), Optim.Options(iterations=100_000))
Results of Optimization Algorithm
* Algorithm: Particle Swarm
* Starting Point: [0.0,0.0]
* Minimizer: [2.0,3.0]
* Minimum: 0.000000e+00
* Iterations: 100000
* Convergence: false
* |x - x'| ≤ 0.0e+00: false
|x - x'| = NaN
* |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
|f(x) - f(x')| = NaN |f(x)|
* |g(x)| ≤ 1.0e–08: false
|g(x)| = NaN
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: true
* Objective Calls: 400000

snippet.juliarepl
```julia> using Optim

julia> myfun(x)= (x–2)^2 + (x–3)^2 + ((x–2)*(x–3))^2;

Results of Optimization Algorithm
* Starting Point: [0.0,0.0]
* Minimizer: [2.0,3.0]
* Minimum: 0.000000e+00
* Iterations: 6
* Convergence: true
* |x - x'| ≤ 0.0e+00: false
|x - x'| = 1.52e–07
* |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
|f(x) - f(x')| = Inf |f(x)|
* |g(x)| ≤ 1.0e–08: true
|g(x)| = 2.22e–16
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 19

Usually avoids excessive zig-zagging of steepest gradient descent:

snippet.juliarepl
```julia> using Optim

julia> myfun(x)= (x–2)^2 + (x–3)^2 + ((x–2)*(x–3))^2;

Results of Optimization Algorithm
* Starting Point: [0.0,0.0]
* Minimizer: [2.000000000000004,3.0000000000000004]
* Minimum: 1.617165e–29
* Iterations: 7
* Convergence: true
* |x - x'| ≤ 0.0e+00: false
|x - x'| = 1.68e–07
* |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
|f(x) - f(x')| = 1.77e+15 |f(x)|
* |g(x)| ≤ 1.0e–08: true
|g(x)| = 7.77e–15
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 15

### BFGS

The classic quasi-Newton workhorse, which internally approximates the second derivative with how the gradients have changed across iterations.

snippet.juliarepl
```julia> using Optim

julia> myfun(x)= (x–2)^2 + (x–3)^2 + ((x–2)*(x–3))^2;

julia> optimize( myfun, [0.0,0.0], BFGS())
Results of Optimization Algorithm
* Algorithm: BFGS
* Starting Point: [0.0,0.0]
* Minimizer: [2.000000000000021,3.0000000000000027]
* Minimum: 4.427482e–28
* Iterations: 7
* Convergence: true
* |x - x'| ≤ 0.0e+00: false
|x - x'| = 1.75e–07
* |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
|f(x) - f(x')| = 7.04e+13 |f(x)|
* |g(x)| ≤ 1.0e–08: true
|g(x)| = 4.15e–14
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 21

The limited-memory version is nearly as good, requiring just one extra call.

snippet.juliarepl
```julia> using Optim

julia> myfun(x)= (x–2)^2 + (x–3)^2 + ((x–2)*(x–3))^2;

julia> optimize( myfun, [0.0,0.0], LBFGS())
Results of Optimization Algorithm
* Algorithm: L-BFGS
* Starting Point: [0.0,0.0]
* Minimizer: [1.999999999999605,2.9999999999999294]
* Minimum: 1.610245e–25
* Iterations: 7
* Convergence: true
* |x - x'| ≤ 0.0e+00: false
|x - x'| = 4.53e–07
* |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
|f(x) - f(x')| = 1.32e+12 |f(x)|
* |g(x)| ≤ 1.0e–08: true
|g(x)| = 7.90e–13
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 22

## Hessian-Based Optimizers

### Newton-Raphson

snippet.juliarepl
```julia> using Optim

julia> myfun(x)= (x–2)^2 + (x–3)^2 + ((x–2)*(x–3))^2;

julia> optimize( myfun, [0.0,0.0], Newton())
Results of Optimization Algorithm
* Algorithm: Newton's Method
* Starting Point: [0.0,0.0]
* Minimizer: [2.0000000000006297,3.0000000000166733]
* Minimum: 2.783965e–22
* Iterations: 5
* Convergence: true
* |x - x'| ≤ 0.0e+00: false
|x - x'| = 6.05e–04
* |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
|f(x) - f(x')| = 1.32e+15 |f(x)|
* |g(x)| ≤ 1.0e–08: true
|g(x)| = 3.33e–11
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 15
* Hessian Calls: 5```

Trust regions limit the step size to something sensible (i.e., so that Newton does not step out of bounds so easily).

snippet.juliarepl
```julia> using Optim

julia> myfun(x)= (x–2)^2 + (x–3)^2 + ((x–2)*(x–3))^2;

julia> optimize( myfun, [0.0,0.0], NewtonTrustRegion())
Results of Optimization Algorithm
* Algorithm: Newton's Method (Trust Region)
* Starting Point: [0.0,0.0]
* Minimizer: [2.0,3.0]
* Minimum: 0.000000e+00
* Iterations: 8
* Convergence: true
* |x - x'| ≤ 0.0e+00: false
|x - x'| = 8.53e–09
* |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
|f(x) - f(x')| = Inf |f(x)|
* |g(x)| ≤ 1.0e–08: true
|g(x)| = 2.22e–16
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 9
* Hessian Calls: 7```

## Optimization With Box Constraints

The call arguments of `optimize` change when box constraints are included. Instead of passing a function designating the optimizer, such as `NelderMead()`), one passes `Fminbox{NelderMead}()` (The first called a function which returns a complex object which is then passed by the user into optimize; the second passes along the name of the function as a type.)

`NelderMead() ⟶ Fminbox{NelderMead}()`

The second and third arguments in optimize have to be the lower and upper bounds. The `Optim.Options()` argument seems to be inconsistent with `Fminbox` (and may screw up the entire optimization with or without an error)! The initial value has to be inside the box.

snippet.juliarepl
```julia> using Optim

julia> myfun(x)= (x–2)^2 + (x–3)^2 + ((x–2)*(x–3))^2;

julia> initial_x= [ 0.0, 0.0 ];

julia> optimize(myfun, initial_x, NelderMead() )               ## without the box constraint
Results of Optimization Algorithm
* Starting Point: [0.0,0.0]
* Minimizer: [2.0000267077758536,3.0000220787571954]
* Minimum: 1.200777e–09
* Iterations: 49
* Convergence: true
*  √(Σ(yᵢ-ȳ)²)/n < 1.0e–08: true
* Reached Maximum Number of Iterations: false
* Objective Calls: 98

julia> lower= [ –999.9, –999.9 ]; upper= [ 999, 2.9 ];

julia> results = optimize(myfun, lower, upper, initial_x)	## box constraint form: x is maxed at 2.9
┌ Warning: Linesearch failed, using alpha = 2.570171260006581 and exiting optimization.
│ The linesearch exited with message:
│ Linesearch failed to converge, reached maximum iterations 50.
└ @ Optim ~/.julia/packages/Optim/ULNLZ/src/utilities/perform_linesearch.jl:47
Results of Optimization Algorithm
* Algorithm: Fminbox with L-BFGS
* Starting Point: [0.0,0.0]
* Minimizer: [2.0000000000004565,2.8999999999999995]
* Minimum: 1.000000e–02
* Iterations: 8
* Convergence: true
* |x - x'| ≤ 0.0e+00: true
|x - x'| = 0.00e+00
* |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: true
|f(x) - f(x')| = 0.00e+00 |f(x)|
* |g(x)| ≤ 1.0e–08: false
|g(x)| = 2.00e–01
* Stopped by an increasing objective: true
* Reached Maximum Number of Iterations: false
* Objective Calls: 1425
snippet.juliarepl
```julia> using Optim

julia> f(x) = (1.0 - x)^2 + 100.0 * (x - x^2)^2
f (generic function with 1 method)

julia> lower = [1.25, –2.1]
2-element Array{Float64,1}:
1.25
–2.1

julia> upper = [Inf, Inf]
2-element Array{Float64,1}:
Inf
Inf

julia> initial_x = [2.0, 2.0]
2-element Array{Float64,1}:
2.0
2.0

alpha: Float64 1.0
alphamin: Float64 0.0
alphamax: Float64 Inf
, LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}
delta: Float64 0.1
sigma: Float64 0.9
alphamax: Float64 Inf
rho: Float64 5.0
epsilon: Float64 1.0e–6
gamma: Float64 0.66
linesearchmax: Int64 50
psi3: Float64 0.1
display: Int64 0
mayterminate: Base.RefValue{Bool}
, nothing, getfield(Optim, Symbol("##12#14"))(), Flat())

julia> results = optimize(f, lower, upper, initial_x, Fminbox(inner_optimizer))
Results of Optimization Algorithm
* Algorithm: Fminbox with Gradient Descent
* Starting Point: [2.0,2.0]
* Minimizer: [1.2500000000000002,1.5626480963230658]
* Minimum: 6.250219e–02
* Iterations: 11
* Convergence: true
* |x - x'| ≤ 0.0e+00: false
|x - x'| = 2.22e–16
* |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: true
|f(x) - f(x')| = 0.00e+00 |f(x)|
* |g(x)| ≤ 1.0e–08: false
|g(x)| = 4.26e–01
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 818877
• In 0.6.2, with 0.14.1, adding `Optim.Options()` seems to be incompatible with `Fminbox`. `optimize(myfun, [ 50.0, 50.0 ], fill(6.0, 2), fill(100.0, 2), Fminbox{NelderMead}(), Optim.Options(iterations=200000))` fails.

## Passing Along Non-Optimization Parameters to User Function

You can either use global variables, or more elegantly a version of the following (here, two-plus-one parameters):

snippet.juliarepl
```julia> using Optim

julia> msumdeltas(x::Vector{Float64}, p::Vector{Float64}, m::Vector{Float64})::Float64 = sum( m.*(x .- p).^2 );

julia> optimize( xf->msumdeltas( xf, [ 12.0, 14.0 ], [ 1.0, 1.0 ] ),  [ 0.0, 0.0 ] )
Results of Optimization Algorithm
* Starting Point: [0.0,0.0]
* Minimizer: [12.000018215395523,14.000051397940497]
* Minimum: 2.973549e–09
* Iterations: 62
* Convergence: true
*  √(Σ(yᵢ-ȳ)²)/n < 1.0e–08: true
* Reached Maximum Number of Iterations: false
* Objective Calls: 120```

## Univariate Function Optimization

There are two more optimizers for univariate cases, `Brent` and `GoldenSection`. To use them, omit the starting point and provide brackets.

snippet.juliafixme
```[download only julia statements]
julia> using Optim

julia> optimize( x->(3*(x-4)^2), -100, +100, Brent() )
Results of Optimization Algorithm
* Algorithm: Brent's Method
* Search Interval: [-100.000000, 100.000000]
* Minimizer: 4.000000e+00
* Minimum: 0.000000e+00
* Iterations: 5
* Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true
* Objective Function Calls: 6

julia> optimize( x->(3*(x-4)^2), -100, +100, GoldenSection() )
Results of Optimization Algorithm
* Algorithm: Golden Section Search
* Search Interval: [-100.000000, 100.000000]
* Minimizer: 4.000000e+00
* Minimum: 6.668724e-16
* Iterations: 44
* Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true
* Objective Function Calls: 45```

# Backmatter

## Useful Packages on Julia Repository

• Optim is a pure Julia implementation, discussed in this chapter.
• NLopt is a mixed other languages and Julia implementation
• Blackbox focuses on non-gradient global optimization, with support for parallelism.
• Optim.jl first package
• DataFitting.jl
• tpapp/ContinuousTransformations.jl handles transforms like -inf < log(y-1) < infinity.
• There is an interesting Turing package for modeling and inferences in stochastic environments. Alas, as of May 2018, it was non-functional. It's a work in progress.

# Providing Derivatives To Optimizers

The base example is still the two-dimensional quadratic polynomial with minimum `myfun([2,3])=0` with the following exact gradients and Hessian:

snippet.julia
```myfun(x)= (x–2)^2 + (x–3)^2 + ((x–2)*(x–3))^2;
myfungradients(x)= [ 2*(1 + (x–3)^2)*(x–2 ),  2*(1 + (x–2)^2)*(x–3) ]
myfunhessian(x)= reshape( [ 2+2*(x–3)^2, 4*(x–2)*(x–3), 4*(x–2)*(x–3) , 2+2*(x–2)^2 ], 2, 2 )```

A complete copy-and-paste example can be found towards the rear of this section.

Some optimization algorithms benefit or require gradient and Hessian information, i.e., first and second derivatives. Julia supports three ways for optimizers to obtain them:

1. working out and providing symbolic derivatives in Julia function form;
2. letting a built-in exact differentiator take care of it (explained below);
3. using a built-in numerical differencing method.

The derivatives can be passed into a `<Some>`Differentiable object, which in turn can be passed into the optimizing functions. The optimizers know how to use these objects (picking off the derivatives or ignoring them). If only the optimizing function itself is passed, and an optimizer requires a gradient, optimize falls back on numerical differencing. If optimizing function and gradients and/or Hessians are passed, optimize can use them.

## User-Provided Symbolic Derivatives

The syntax by which users can pass symbolic derivatives is unusual. The gradient and hessian functions that the user passes in require some storage as their first argument, which becomes a cache that is modified upon each call. This is for speed. The example here continues working with the symbolic `myfun()`, `myfungradients()`, and `myfunhessian()` for clarity. For even faster speed, these functions should not be called but coded directly into `f()`, `g!()`, and `h!()`. : Pay careful attention to how `g!()` and `h!()` are defined, and read the description at the end of the example.

snippet.juliarepl
```julia> myfun(x)= (x–2)^2 + (x–3)^2 + ((x–2)*(x–3))^2;

julia> myfungradients(x)= [ 2*(1 + (x–3)^2)*(x–2 ),  2*(1 + (x–2)^2)*(x–3) ];

julia> myfunhessian(x)= reshape( [ 2+2*(x–3)^2, 4*(x–2)*(x–3), 4*(x–2)*(x–3) , 2+2*(x–2)^2 ], 2, 2 );

julia> function f(x) myfun(x) end;#function

julia> function g!(G,x) Gcp=myfungradients(x); G=Gcp; G=Gcp; G; end;#function  ## must not use G=Gin!!!

julia> function fg!(G,x) g!(G,x); f(x) end;#function   (both! can be used to shorten calculations)

julia> function h!(H,x) Hcp=myfunhessian(x); for i=1:length(x),j=1:length(x) ; H[i,j]= Hcp[i,j]; end; end;#function

julia> z2= zeros(2);

julia> using Optim           ## needed

julia> optimize( f, g!, z2, BFGS())
Results of Optimization Algorithm
* Algorithm: BFGS
* Starting Point: [0.0,0.0]
* Minimizer: [2.000000000000021,3.0000000000000027]
* Minimum: 4.427482e–28
* Iterations: 7
* Convergence: true
* |x - x'| ≤ 0.0e+00: false
|x - x'| = 1.75e–07
* |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
|f(x) - f(x')| = 7.04e+13 |f(x)|
* |g(x)| ≤ 1.0e–08: true
|g(x)| = 4.17e–14
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 21 It is very tempting to code the more elegant

snippet.juliarepl
`julia> function g!(G,x) G=myfungradients(x) end;#function`

However, this is a bug. The reason is that the function then replaces the passed-in `G`, which means that the global G (outside `g!()`) never gets to see the changes. Instead, `G` itself must be modified. This can be done with

snippet.juliarepl
```julia> function g!(G,x) Gcp=myfungradients(x); G=Gcp; G=Gcp; G; end;#function

julia> function g!(G,x); G .= myfungradients(x) end;#function  ## also works```

## Built-In Automatic Exact Differentiation

### ForwardDiff

Automatic differentiation is built around the `ForwardDiff.jl` package, which is pulled in by `Optim.jl`. ForwardDiff is pickier about the way the user function is defined. In particular, although it understands `Vector`, it does not understand `Vector{Float64}`. (This can mess with dispatch precedence.) Forwarddiff works by redefining every elementary math operation and function in Julia. The math operations still returns a numeric value (as always), but they then also keep internally a symbolic representation, so that ForwardDiff can analyze the function progress and apply the chainrule piece by piece by piece.

snippet.juliarepl
```julia> using ForwardDiff

julia> myfun(x)= (x–2)^2 + (x–3)^2 + ((x–2)*(x–3))^2;

julia> ForwardDiff.gradient( myfun, [ 2.0, 3.0 ] )          ## at the optimum, it is zero in both directions
2-element Array{Float64,1}:
0.0
0.0``` Do not require the input arguments to be typed (Float) on the user function that is to be optimized. This messes up the dispatch.

### Kinks in ForwardDiff

Kinks are allowed, but ForwardDiff ignores looking towards its rear, and just looks towards its front. Thus, it gives the mathematically dubious result:

snippet.juliarepl
```julia> using ForwardDiff

julia> sumabsf(x)= sum(abs.(x))
sumabsf (generic function with 1 method)

julia> ForwardDiff.gradient(sumabsf, [ 0.0 ] )              ## warning --- mathematically strange
1-element Array{Float64,1}:
1.0

julia> ForwardDiff.gradient(sumabsf, [ –0.0 ] )             ## yes, Julia has a negative 0.0
1-element Array{Float64,1}:
–1.0```

### ForwardDiff Interface to Optim: <Use>Differentiable

Optim provides two convenience functions that contain the useful differentials (and in the same form as described above). Although the end user rarely uses them to obtain gradients and Hessians, we can show how the optimizers do it.

snippet.juliarepl
```julia> using Optim

julia> myfun(x)= (x–2)^2 + (x–3)^2 + ((x–2)*(x–3))^2;          ## could be x::Vector, but not x::Vector{Float64}

julia> od = OnceDifferentiable(myfun, [NaN,NaN]; autodiff = :forward);     ## [NaN,NaN] is needed to know dimensionality of myfun

julia> myfun( [2.0,3.0] ) == od.f( [2.0, 3.0] )                            ## confirming that it copied the function
true

julia> od.df( [NaN,NaN], [2.0, 3.0] )                                      ## first argument is ignored.
2-element Array{Float64,1}:
0.0
0.0

julia> td = TwiceDifferentiable(myfun, [ NaN, NaN ]; autodiff = :forward); ## works analogously```
• To force Optim not to use differentials (even numeric ones), define `nd = NonDifferentiable(fun, z2)`.
• The `od` and `td` container objects contain both the function and its derivatives, which can be passed to optimization functions in lieu of the function itself.
• Again, there are type constraints on the user function. `myfun(x::Vector{Float64})= ...` does not work, because Float64 is too specific and steals the dispatch from the `ForwardDiff.jl` differentiator.

## Built-In Numeric Differentiation

Simply do not provide the gradient and/or Hessian to `optimize()`, and Optim.jl will by itself fall back on `DiffEqDiffTools.jl`.

## The Complete Copy-Paste Inputs

snippet.julia
```myfun(x)= (x–2)^2 + (x–3)^2 + ((x–2)*(x–3))^2;
myfungradients(x)= [ 2*(1 + (x–3)^2)*(x–2 ),  2*(1 + (x–2)^2)*(x–3) ]
myfunhessian(x)= reshape( [ 2+2*(x–3)^2, 4*(x–2)*(x–3), 4*(x–2)*(x–3) , 2+2*(x–2)^2 ], 2, 2 )

# for speed, you should code the above directly into the below.

function f(x) myfun(x) end#function
function g!(G,x) G.=myfungradients(x) end#function  ## must not use G=, but G.= to replace external G contents
function fg!(G,x) g!(G,x); f(x) end#function        ## (both! can be used to shorten calculations)
function h!(H,x) H .= myfunhessian(x) end#function  ## must not use H=, but H.= to replace external H contents

z2= zeros(2.0);

using Optim           ## henceforth needed

# show optimize use via function call
optimize( f, g!, z2, BFGS())

# show optimize use via *Differentiable objects

nd = NonDifferentiable( f, z2 )
od = OnceDifferentiable( f, g!, z2 )
td = TwiceDifferentiable( f, g!, h!, z2 )

# show how optimizer functions will internally pick off results

println( ( td.f( [2.0, 2.0] ), td.df( [NaN,NaN], [2.0, 2.0] ), td.h( [NaN NaN;NaN NaN], [2.0, 2.0] ) ) )
println( ( od.f( [2.0, 2.0] ), od.df( [NaN,NaN], [2.0, 2.0] ) ) )

# show optimize use via object

optimize( td, z2, BFGS())
optimize( od, z2, BFGS())``` do not forget the `.=` inside the `g!()` and `h!()` functions!

### Page Tools 