User Tools

Site Tools


optim

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

optim [2018/12/27 14:37] (current)
Line 1: Line 1:
 +
 +~~CLOSETOC~~
 +
 +~~TOC 1-3 wide~~
 +
 +---
 +
 +^  [[roots|Roots and Solving]] ​ ^  Compact Optimization ​ ^  [[JuMP|Linear and Squared Optimization]] ​ ^
 +
 +---
 +
 +
 +```juliarepl
 +julia> pkgchk.( [  "​julia"​ => v"​1.0.3",​ "​Optim"​ => v"​0.17.1"​ ] );   ## "​ForwardDiff"​ => nothing ​
 +
 +```
 +
 +
 +
 +# Optimization With Optim.jl
 +
 +* [Optim](http://​julianlsolvers.github.io/​Optim.jl/​latest/​) is the pure Julia implementation,​ discussed in this chapter. ​ It is one of my favorite Julia packages: native, efficient, maintained.
 +
 +
 +Most of the examples in this chapter minimize the two-dimensional quadratic polynomial
 +
 +```julia
 +myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2;​
 +```
 +
 +It has an obvious minimum at `[2,3]`, and is an ideal example function around which many optimizers were originally designed. ​ [[optim-results]] discusses more difficult test functions.
 +
 +IMPORTANT: see [[optim-results]] for some benchmarks
 +
 +
 +
 +## Overview of Optimization and Algorithms
 +
 +### Available Multivariate Optimization Algorithms
 +
 +^ **Name** ​ ^  **Abbrev** ​ ^  **Derivs?​** ​ ^  **~Time** ^ **Notes** ​ ^
 +| Nelder-Mead Simplex |  NM  |  No  |  31 | works on noncontinuous(?​) nonsmooth functions. ​ |
 +| Simulated Annealing ​ |  SAMIN  |  No  |  4,700 | searches for **global** optima, but is slow  |
 +| Particle Swarm  |  Prtcl  |  No  |  333,000 | switches algorithms for **global** optimum search, but is really slow  |
 +| GradientDescent ​ |  GrdDsct ​ |  1st  |  19 | often a bad idea due to excessive zig-zagging ​ |
 +| ConjugateGradient ​ |  CnjgGrd ​ |  1st  |  20 | often better than gradient descent ​ |
 +| BFGS  |  BGFS  |  1st  |  40 | A quasi-Newton method which cleverly extrapolates "​implied"​ second derivative from prev turns. ​ |
 +| LBFGS  |  LBFGS  |  1st  |  29 | Same, but with limited memory ​ |
 +| Newton ​ |  Nwtn  |  2nd  |  31 | Newton-Raphson requires first and second derivatives. ​ |
 +| NewtonTrustRegion ​ |  NwTst  |  2nd  |  50 | Like Newton-Raphson,​ but limits step size.  |
 +
 +
 +* The time estimates are for the sample polynomial function. ​ Optimizers will be faster or slower for other problems. ​ The timings are just to give an idea of orders of magnitude.
 +
 +* SAMIN (and the univariate optimizers) requires box constraints. ​ The others do not.
 +
 +* All optimizers are using their defaults, a `rand(2)` starting point, no symbolic user-provided gradients, and no bounds (except SAMIN (-100 to +100)).
 +
 +#### Showing Progress
 +
 +* use `show_trace` as an option.
 +
 +
 +
 +### Providing Derivatives (Gradients and Hessians) To Optimizers
 +
 +You may never need to provide them.  `optimize()` will use numeric differentiation if the optimizer determines it needs any.  They tend to work quite well and are worry-free. ​ The drawback is that they require extra function evaluations. ​ If function evaluation is costly, here is what you can do.
 +
 +(In most of the examples, `z2` is a starting value which you can set to `zeros(2)` or `rand(2)`.)
 +`optimize()` allows as arguments any of the following:
 +
 +1. Direct provision of functions:
 +
 +  1. a plain function, `optimize( myfun, z2 )`.
 +
 +  2. a plain function with user-provided symbolic gradient, `optimize( myfun, myfun_gradient,​ z2 )`.
 +
 +  3. a plain function with user-provided symbolic gradient and Hessian, `optimize( myfun, myfun_gradient,​ myfun_hessian,​ z2 )`.
 +
 +2. Indirect provision via a packaged object:
 +
 +  1. a `OnceDifferentiated` object, which holds the function and the gradient. ​ It can be created by automatic differentiation or by passing a user f into OnceDifferentiated.
 +
 +  2. a `TwiceDifferentiated` object, which holds the function, its gradient, and its Hessian. ​ It can be created by automatic differentiation or by passing a user f and g into TwiceDifferentiated.
 +
 +Again, `optimize` will use differencing for derivatves by default, so providing derivatives is always optional. ​ Numeric differencing is recommended when function evaluation itself is quick.
 +
 +If you do need exact derivatives,​ perhaps because they occasionally can be more accurate, perhaps because calculation of the function and thus differences is expensive, you have two choices:
 +
 +1. You can code the symbolic derivatives into Julia.
 +
 +2. You can use automatic differentiation,​ such as ` optimize( TwiceDifferentiable(f,​ zeros(n); autodiff = :forward), zeros(n), BGFS(), Optim.Options() ) `.  However, automatic differentiation works by overloading Julia'​s elementary math, which can also slow down performance.
 +
 +
 +* WARNING *There are some common gotchas when coding your own gradients and Hessians. ​ If you want to provide them, please read the appendix section [[#​providing_derivatives_to_optimizers]] first.*
 +
 +
 +### Box Constraints
 +
 +All optimizers in Optim allow box constraints. ​ See the [[#​optimization_with_box_constraints|example below]].
 +
 +
 +### Input Options
 +
 +* Setting Termination Conditions: `x_tol= 1e-32`, `f_tol=1e-32`,​ `g_tol=1e-8`,​ `f_calls_limit=0`,​ `g_calls_limit=0`,​ `h_calls_limit=0`,​ `iterations=1_000`
 +
 +* Setting Verbose Printouts and Intervention : `store_trace=false`,​ `show_trace=false`,​ `extended_trace=false`,​ `show_every=0`,​ `callback=function`,​ `time_limit=NaN`.
 +
 +* Options can be given as `optimize(...,​ Optim.Options(g_tol=1e-12))` or `optimize(...,​ g_tol=1e-12)`.
 +
 +* Oddly, there is no input option to change from minimization to maximization
 +
 +
 +### Output Overview
 +
 +#### Both Univariate and Multivariate Results
 +
 +^ **Call** ​ ^  **Sample** ​ ^ **Notes** ​ ^
 +| `summary(res)` ​ |  "​SAMIN" ​ | method, usually as input  |
 +| `Optim.minimizer(res)` ​ |  [ 2.0, 3.0 ]  | the optimal value  |
 +| `Optim.minimum(res)` ​ | 3.534e-13 | function value at optimum ​ |
 +| `Optim.iterations(res)` |  3451  | number of iterations ​ |
 +| `Optim.iteration_limit_reached(res)` |  false  | obvious ​ |
 +| `Optim.trace(res)` |  n/a  | only retained if `store_trace=true` in optimize ​ |
 +| `Optim.x_trace(res)` |  n/a  | only retained if `store_trace=true` in optimize ​ |
 +| `Optim.f_trace(res)` |  n/a  | only retained if `store_trace=true` in optimize ​ |
 +| `Optim.f_calls(res)` |  3451  | number of function calls  |
 +| `Optim.converged(res)` |  false  | result ​ |
 +
 +#### Only Univariate Results
 +
 +^ **Call** ​ ^  **Sample** ​ ^ **Notes** ​ ^
 +| `Optim.lower_bound(res)` ​ |  -100  | lower bound, usually as input  |
 +| `Optim.upper_bound(res)` ​ |  100  | upper bound, usually as input  |
 +| `Optim.x_lower_trace(res)` ​ |  | only retained if `store_trace=true` in `optimize` ​ |
 +| `Optim.x_upper_trace(res)` ​ |  | only retained if `store_trace=true` in `optimize` ​ |
 +| `Optim.rel_tol(res)` ​ |  | only retained if `store_trace=true` in `optimize` ​ |
 +| `Optim.abs_tol(res)` |  | only retained if `store_trace=true` in `optimize` ​ |
 +
 +
 +#### Only Multivariate Results
 +
 +^ **Call** ​ ^  **Sample** ​ ^ **Notes** ​ ^
 +| `Optim.g_norm_trace(res)` ​ | n/a | only retained if `store_trace=true` in optimize ​ |
 +| `Optim.g_calls(res)` ​ | 0 | only retained if `store_trace=true` in `optimize` ​  |
 +| `Optim.x_converged(res)` ​ | false | only retained if `store_trace=true` in `optimize` ​   |
 +| `Optim.f_converged(res)` ​ | false | only retained if `store_trace=true` in `optimize` ​   |
 +| `Optim.g_converged(res)` ​ | false | only retained if `store_trace=true` in `optimize` ​   |
 +| `Optim.initial_state(res)` ​ | [0.0,0.0] | user input  |
 +
 +
 +
 +
 +## Gradient-Free Optimizers
 +
 +### Nelder-Mead Simplex (Amoeba)
 +
 +Nelder-Mead Simplex is (sort of) the multi-dimensional equivalent of the golden-section search in one dimension. ​ This algorithm is *not* an equivalent for the golden-section search, which brackets a minimum and is guaranteed to find it.  Instead, there are beautiful and intuitive illustrations at [Wikipedia](https://​en.wikipedia.org/​wiki/​Nelder%E2%80%93Mead_method). ​ The best description is not that Nelder-Mead looks less like an amoeba contractor, and more like an amoeba crawler.
 +
 +```juliarepl
 +julia> using Optim
 +
 +julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2;​
 +
 +julia> optimize( myfun, [0.0, 0.0], NelderMead(),​ Optim.Options(iterations=100_000) )
 +Results of Optimization Algorithm
 + * Algorithm: Nelder-Mead
 + * Starting Point: [0.0,0.0]
 + * Minimizer: [2.0000267077758536,​3.0000220787571954]
 + * Minimum: 1.200777e-09
 + * Iterations: 49
 + * Convergence:​ true
 +   ​* ​ √(Σ(yᵢ-ȳ)²)/​n < 1.0e-08: true
 +   * Reached Maximum Number of Iterations: false
 + * Objective Calls: 99
 +
 +```
 +
 +
 +
 +
 +
 +### Simulated Annealing
 +
 +Simulated Annealing (Metropolis-Hastings) seems to be a form of Monte-Carlo Markov Chain simulation, in which random probabilistic guesses are salted (and more after a while when the lowest value(s) so far seems pretty good already).
 +
 +WARNING This requires the master and not the release, so use `Pkg.checkout("​Optim"​)`. ​ Is there a version without bounds??
 +
 +```juliarepl
 +julia> using Optim
 +
 +julia> using Random; Random.seed!(0);​
 +
 +julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2;​
 +
 +julia> optimize( myfun, fill(-100.0,​2),​ fill(+100.0,​2),​ [0.0,0.0], SAMIN(), Optim.Options(iterations=100_000) )
 +================================================================================
 +SAMIN results
 +==> Normal convergence <==
 +total number of objective function evaluations:​ 22101
 +
 +     Obj. value: ​     0.0000000000
 +
 +       ​parameter ​     search width
 +         ​2.00000 ​          ​0.00000
 +         ​3.00000 ​          ​0.00000
 +================================================================================
 +
 +Results of Optimization Algorithm
 + * Algorithm: SAMIN
 + * Starting Point: [0.0,0.0]
 + * Minimizer: [2.000000006088701,​2.999999998587848]
 + * Minimum: 3.906645e-17
 + * Iterations: 22101
 + * Convergence:​ false
 +   * |x - x'| ≤ 0.0e+00: false
 +     |x - x'| = NaN
 +   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
 +     |f(x) - f(x')| = NaN |f(x)|
 +   * |g(x)| ≤ 0.0e+00: false
 +     ​|g(x)| = NaN
 +   * Stopped by an increasing objective: false
 +   * Reached Maximum Number of Iterations: false
 + * Objective Calls: 22101
 + * Gradient Calls: 0
 +
 +```
 +
 +
 +
 +
 +### Particle Swarm
 +
 +Particle Swarm tries to switch around search heuristics, also probabilistically.
 +
 +```juliarepl
 +julia> using Optim
 +
 +julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2;​
 +
 +julia> using Random; Random.seed!(0);​
 +
 +julia> optimize( myfun, [0.0,0.0], ParticleSwarm(),​ Optim.Options(iterations=100_000))
 +Results of Optimization Algorithm
 + * Algorithm: Particle Swarm
 + * Starting Point: [0.0,0.0]
 + * Minimizer: [2.0,3.0]
 + * Minimum: 0.000000e+00
 + * Iterations: 100000
 + * Convergence:​ false
 +   * |x - x'| ≤ 0.0e+00: false
 +     |x - x'| = NaN
 +   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
 +     |f(x) - f(x')| = NaN |f(x)|
 +   * |g(x)| ≤ 1.0e-08: false
 +     ​|g(x)| = NaN
 +   * Stopped by an increasing objective: false
 +   * Reached Maximum Number of Iterations: true
 + * Objective Calls: 400000
 + * Gradient Calls: 0
 +
 +```
 +
 +
 +
 +## Gradient-Based Hessian-Free Optimizers
 +
 +
 +### Steepest GradientDescent
 +
 +```juliarepl
 +julia> using Optim
 +
 +julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2;​
 +
 +julia> optimize( myfun, [0.0,0.0], GradientDescent())
 +Results of Optimization Algorithm
 + * Algorithm: Gradient Descent
 + * Starting Point: [0.0,0.0]
 + * Minimizer: [2.0,3.0]
 + * Minimum: 0.000000e+00
 + * Iterations: 6
 + * Convergence:​ true
 +   * |x - x'| ≤ 0.0e+00: false
 +     |x - x'| = 1.52e-07
 +   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
 +     |f(x) - f(x')| = Inf |f(x)|
 +   * |g(x)| ≤ 1.0e-08: true
 +     ​|g(x)| = 2.22e-16
 +   * Stopped by an increasing objective: false
 +   * Reached Maximum Number of Iterations: false
 + * Objective Calls: 19
 + * Gradient Calls: 19
 +```
 +
 +
 +
 +### Conjugate Gradient Descent
 +
 +Usually avoids excessive zig-zagging of steepest gradient descent:
 +
 +```juliarepl
 +julia> using Optim
 +
 +julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2;​
 +
 +julia> optimize( myfun, [0.0,0.0], ConjugateGradient())
 +Results of Optimization Algorithm
 + * Algorithm: Conjugate Gradient
 + * Starting Point: [0.0,0.0]
 + * Minimizer: [2.000000000000004,​3.0000000000000004]
 + * Minimum: 1.617165e-29
 + * Iterations: 7
 + * Convergence:​ true
 +   * |x - x'| ≤ 0.0e+00: false
 +     |x - x'| = 1.68e-07
 +   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
 +     |f(x) - f(x')| = 1.77e+15 |f(x)|
 +   * |g(x)| ≤ 1.0e-08: true
 +     ​|g(x)| = 7.77e-15
 +   * Stopped by an increasing objective: false
 +   * Reached Maximum Number of Iterations: false
 + * Objective Calls: 15
 + * Gradient Calls: 9
 +```
 +
 +
 +### BFGS
 +
 +The classic quasi-Newton workhorse, which internally approximates the second derivative with how the gradients have changed across iterations.
 +
 +```juliarepl
 +julia> using Optim
 +
 +julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2;​
 +
 +julia> optimize( myfun, [0.0,0.0], BFGS())
 +Results of Optimization Algorithm
 + * Algorithm: BFGS
 + * Starting Point: [0.0,0.0]
 + * Minimizer: [2.000000000000021,​3.0000000000000027]
 + * Minimum: 4.427482e-28
 + * Iterations: 7
 + * Convergence:​ true
 +   * |x - x'| ≤ 0.0e+00: false
 +     |x - x'| = 1.75e-07
 +   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
 +     |f(x) - f(x')| = 7.04e+13 |f(x)|
 +   * |g(x)| ≤ 1.0e-08: true
 +     ​|g(x)| = 4.15e-14
 +   * Stopped by an increasing objective: false
 +   * Reached Maximum Number of Iterations: false
 + * Objective Calls: 21
 + * Gradient Calls: 21
 +```
 +
 +The limited-memory version is nearly as good, requiring just one extra call.
 +
 +```juliarepl
 +julia> using Optim
 +
 +julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2;​
 +
 +julia> optimize( myfun, [0.0,0.0], LBFGS())
 +Results of Optimization Algorithm
 + * Algorithm: L-BFGS
 + * Starting Point: [0.0,0.0]
 + * Minimizer: [1.999999999999605,​2.9999999999999294]
 + * Minimum: 1.610245e-25
 + * Iterations: 7
 + * Convergence:​ true
 +   * |x - x'| ≤ 0.0e+00: false
 +     |x - x'| = 4.53e-07
 +   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
 +     |f(x) - f(x')| = 1.32e+12 |f(x)|
 +   * |g(x)| ≤ 1.0e-08: true
 +     ​|g(x)| = 7.90e-13
 +   * Stopped by an increasing objective: false
 +   * Reached Maximum Number of Iterations: false
 + * Objective Calls: 22
 + * Gradient Calls: 22
 +```
 +
 +
 +
 +## Hessian-Based Optimizers
 +
 +### Newton-Raphson
 +
 +```juliarepl
 +julia> using Optim
 +
 +julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2;​
 +
 +julia> optimize( myfun, [0.0,0.0], Newton())
 +Results of Optimization Algorithm
 + * Algorithm: Newton'​s Method
 + * Starting Point: [0.0,0.0]
 + * Minimizer: [2.0000000000006297,​3.0000000000166733]
 + * Minimum: 2.783965e-22
 + * Iterations: 5
 + * Convergence:​ true
 +   * |x - x'| ≤ 0.0e+00: false
 +     |x - x'| = 6.05e-04
 +   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
 +     |f(x) - f(x')| = 1.32e+15 |f(x)|
 +   * |g(x)| ≤ 1.0e-08: true
 +     ​|g(x)| = 3.33e-11
 +   * Stopped by an increasing objective: false
 +   * Reached Maximum Number of Iterations: false
 + * Objective Calls: 15
 + * Gradient Calls: 15
 + * Hessian Calls: 5
 +```
 +
 +
 +Trust regions limit the step size to something sensible (i.e., so that Newton does not step out of bounds so easily).
 +
 +```juliarepl
 +julia> using Optim
 +
 +julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2;​
 +
 +julia> optimize( myfun, [0.0,0.0], NewtonTrustRegion())
 +Results of Optimization Algorithm
 + * Algorithm: Newton'​s Method (Trust Region)
 + * Starting Point: [0.0,0.0]
 + * Minimizer: [2.0,3.0]
 + * Minimum: 0.000000e+00
 + * Iterations: 8
 + * Convergence:​ true
 +   * |x - x'| ≤ 0.0e+00: false
 +     |x - x'| = 8.53e-09
 +   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
 +     |f(x) - f(x')| = Inf |f(x)|
 +   * |g(x)| ≤ 1.0e-08: true
 +     ​|g(x)| = 2.22e-16
 +   * Stopped by an increasing objective: false
 +   * Reached Maximum Number of Iterations: false
 + * Objective Calls: 9
 + * Gradient Calls: 9
 + * Hessian Calls: 7
 +```
 +
 +
 +
 +
 +
 +## Optimization With Box Constraints
 +
 +The call arguments of `optimize` change when box constraints are included. ​ Instead of passing a function designating the optimizer, such as `NelderMead()`),​ one passes `Fminbox{NelderMead}()` (The first called a function which returns a complex object which is then passed by the user into optimize; the second passes along the name of the function as a type.)
 +
 +    NelderMead() ⟶ Fminbox{NelderMead}()
 +
 +The second and third arguments in optimize have to be the lower and upper bounds. ​ The `Optim.Options()` argument seems to be inconsistent with `Fminbox` (and may screw up the entire optimization with or without an error)! ​ The initial value has to be inside the box.
 +
 +```juliarepl
 +julia> using Optim
 +
 +julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2;​
 +
 +julia> initial_x= [ 0.0, 0.0 ];
 +
 +julia> optimize(myfun,​ initial_x, NelderMead() )               ## without the box constraint
 +Results of Optimization Algorithm
 + * Algorithm: Nelder-Mead
 + * Starting Point: [0.0,0.0]
 + * Minimizer: [2.0000267077758536,​3.0000220787571954]
 + * Minimum: 1.200777e-09
 + * Iterations: 49
 + * Convergence:​ true
 +   ​* ​ √(Σ(yᵢ-ȳ)²)/​n < 1.0e-08: true
 +   * Reached Maximum Number of Iterations: false
 + * Objective Calls: 98
 +
 +julia> lower= [ -999.9, -999.9 ]; upper= [ 999, 2.9 ];
 +
 +julia> results = optimize(myfun,​ lower, upper, initial_x) ##​ box constraint form: x[2] is maxed at 2.9
 +┌ Warning: Linesearch failed, using alpha = 2.570171260006581 and exiting optimization.
 +│ The linesearch exited with message:
 +│ Linesearch failed to converge, reached maximum iterations 50.
 +└ @ Optim ~/​.julia/​packages/​Optim/​ULNLZ/​src/​utilities/​perform_linesearch.jl:​47
 +Results of Optimization Algorithm
 + * Algorithm: Fminbox with L-BFGS
 + * Starting Point: [0.0,0.0]
 + * Minimizer: [2.0000000000004565,​2.8999999999999995]
 + * Minimum: 1.000000e-02
 + * Iterations: 8
 + * Convergence:​ true
 +   * |x - x'| ≤ 0.0e+00: true
 +     |x - x'| = 0.00e+00
 +   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: true
 +     |f(x) - f(x')| = 0.00e+00 |f(x)|
 +   * |g(x)| ≤ 1.0e-08: false
 +     ​|g(x)| = 2.00e-01
 +   * Stopped by an increasing objective: true
 +   * Reached Maximum Number of Iterations: false
 + * Objective Calls: 1425
 + * Gradient Calls: 1425
 +
 +```
 +
 +
 +```juliarepl
 +
 +julia> using Optim
 +
 +julia> f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
 +f (generic function with 1 method)
 +
 +julia> lower = [1.25, -2.1]
 +2-element Array{Float64,​1}:​
 +  1.25
 + -2.1
 +
 +julia> upper = [Inf, Inf]
 +2-element Array{Float64,​1}:​
 + Inf
 + Inf
 +
 +julia> initial_x = [2.0, 2.0]
 +2-element Array{Float64,​1}:​
 + 2.0
 + 2.0
 +
 +julia> inner_optimizer = GradientDescent()
 +GradientDescent{LineSearches.InitialPrevious{Float64},​LineSearches.HagerZhang{Float64,​Base.RefValue{Bool}},​Nothing,​getfield(Optim,​ Symbol("##​12#​14"​))}(LineSearches.InitialPrevious{Float64}
 +  alpha: Float64 1.0
 +  alphamin: Float64 0.0
 +  alphamax: Float64 Inf
 +, LineSearches.HagerZhang{Float64,​Base.RefValue{Bool}}
 +  delta: Float64 0.1
 +  sigma: Float64 0.9
 +  alphamax: Float64 Inf
 +  rho: Float64 5.0
 +  epsilon: Float64 1.0e-6
 +  gamma: Float64 0.66
 +  linesearchmax:​ Int64 50
 +  psi3: Float64 0.1
 +  display: Int64 0
 +  mayterminate:​ Base.RefValue{Bool}
 +, nothing, getfield(Optim,​ Symbol("##​12#​14"​))(),​ Flat())
 +
 +julia> results = optimize(f, lower, upper, initial_x, Fminbox(inner_optimizer))
 +Results of Optimization Algorithm
 + * Algorithm: Fminbox with Gradient Descent
 + * Starting Point: [2.0,2.0]
 + * Minimizer: [1.2500000000000002,​1.5626480963230658]
 + * Minimum: 6.250219e-02
 + * Iterations: 11
 + * Convergence:​ true
 +   * |x - x'| ≤ 0.0e+00: false
 +     |x - x'| = 2.22e-16
 +   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: true
 +     |f(x) - f(x')| = 0.00e+00 |f(x)|
 +   * |g(x)| ≤ 1.0e-08: false
 +     ​|g(x)| = 4.26e-01
 +   * Stopped by an increasing objective: false
 +   * Reached Maximum Number of Iterations: false
 + * Objective Calls: 818877
 + * Gradient Calls: 818877
 +
 +```
 +
 +
 +* WARNING In 0.6.2, with 0.14.1, adding `Optim.Options()` seems to be incompatible with `Fminbox`. ​ `optimize(myfun,​ [ 50.0, 50.0 ], fill(6.0, 2), fill(100.0, 2), Fminbox{NelderMead}(),​ Optim.Options(iterations=200000))` fails.
 +
 +
 +
 +## Passing Along Non-Optimization Parameters to User Function
 +
 +refer to [https://​github.com/​JuliaNLSolvers/​Optim.jl/​blob/​master/​docs/​src/​algo/​autodiff.md].
 +
 +You can either use global variables, or more elegantly a version of the following (here, two-plus-one parameters):​
 +
 +```juliarepl
 +julia> using Optim
 +
 +julia> msumdeltas(x::​Vector{Float64},​ p::​Vector{Float64},​ m::​Vector{Float64})::​Float64 = sum( m.*(x .- p).^2 );
 +
 +julia> optimize( xf->​msumdeltas( xf, [ 12.0, 14.0 ], [ 1.0, 1.0 ] ),  [ 0.0, 0.0 ] )
 +Results of Optimization Algorithm
 + * Algorithm: Nelder-Mead
 + * Starting Point: [0.0,0.0]
 + * Minimizer: [12.000018215395523,​14.000051397940497]
 + * Minimum: 2.973549e-09
 + * Iterations: 62
 + * Convergence:​ true
 +   ​* ​ √(Σ(yᵢ-ȳ)²)/​n < 1.0e-08: true
 +   * Reached Maximum Number of Iterations: false
 + * Objective Calls: 120
 +
 +```
 +
 +
 +## Univariate Function Optimization
 +
 +There are two more optimizers for univariate cases, `Brent` and `GoldenSection`. ​ To use them, omit the starting point and provide brackets.
 +
 +```juliafixme
 +julia> using Optim
 +
 +julia> optimize( x->​(3*(x-4)^2),​ -100, +100, Brent() )
 +Results of Optimization Algorithm
 + * Algorithm: Brent'​s Method
 + * Search Interval: [-100.000000,​ 100.000000]
 + * Minimizer: 4.000000e+00
 + * Minimum: 0.000000e+00
 + * Iterations: 5
 + * Convergence:​ max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16):​ true
 + * Objective Function Calls: 6
 +
 +julia> optimize( x->​(3*(x-4)^2),​ -100, +100, GoldenSection() )
 +Results of Optimization Algorithm
 + * Algorithm: Golden Section Search
 + * Search Interval: [-100.000000,​ 100.000000]
 + * Minimizer: 4.000000e+00
 + * Minimum: 6.668724e-16
 + * Iterations: 44
 + * Convergence:​ max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16):​ true
 + * Objective Function Calls: 45
 +
 +```
 +
 +
 +
 +
 +## More Function Examples and Comparisons
 +
 +* There are many more tests and benchmark results in [[optim-results]].
 +
 +
 +* There are useful [optimization tips and tricks](https://​github.com/​JuliaNLSolvers/​Optim.jl/​blob/​master/​docs/​src/​user/​tipsandtricks.md).
 +
 +
 +
 +
 +
 +
 +# Backmatter
 +
 +## Useful Packages on Julia Repository
 +
 +* [Optim](http://​julianlsolvers.github.io/​Optim.jl/​latest/​) is a pure Julia implementation,​ discussed in this chapter.
 +
 +* [NLopt](https://​nlopt.readthedocs.io/​en/​latest/​) is a mixed other languages and Julia implementation
 +
 +* [Blackbox](https://​github.com/​robertfeldt/​BlackBoxOptim.jl) focuses on non-gradient global optimization,​ with support for parallelism.
 +
 +
 +## Notes
 +
 +## References
 +
 +- [JuliaOpt](http://​www.juliaopt.org/​)
 +
 +- [Optim.jl](https://​github.com/​JuliaNLSolvers/​Optim.jl) first package
 +
 +- [Convex](https://​github.com/​JuliaOpt/​Convex.jl)
 +
 +- [NLopt](https://​github.com/​JuliaOpt/​NLopt.jl)
 +
 +- [Optimization Tips and Tricks](https://​github.com/​JuliaNLSolvers/​Optim.jl/​blob/​master/​docs/​src/​user/​tipsandtricks.md)
 +
 +- DataFitting.jl
 +
 +- [tpapp/​ContinuousTransformations.jl](https://​github.com/​tpapp/​ContinuousTransformations.jl) handles transforms like -inf < log(y-1) < infinity.
 +
 +- There is an interesting [Turing](https://​github.com/​yebai/​Turing.jl/​wiki/​Get-started) package for modeling and inferences in stochastic environments. ​ Alas, as of May 2018, it was non-functional. ​ It's a work in progress.
 +
 +
 +
 +# Providing Derivatives To Optimizers
 +
 +The base example is still the two-dimensional quadratic polynomial with minimum `myfun([2,​3])=0` with the following exact gradients and Hessian:
 +
 +```julia
 +myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2;​
 +myfungradients(x)= [ 2*(1 + (x[2]-3)^2)*(x[1]-2 ),  2*(1 + (x[1]-2)^2)*(x[2]-3) ]
 +myfunhessian(x)= reshape( [ 2+2*(x[2]-3)^2,​ 4*(x[1]-2)*(x[2]-3),​ 4*(x[1]-2)*(x[2]-3) , 2+2*(x[2]-2)^2 ], 2, 2 )
 +```
 +
 +A complete copy-and-paste example can be found towards the rear of this section.
 +
 +---
 +
 +Some optimization algorithms benefit or require gradient and Hessian information,​ i.e., first and second derivatives. ​ Julia supports three ways for optimizers to obtain them:
 +
 +1. working out and providing symbolic derivatives in Julia function form;
 +
 +2. letting a built-in exact differentiator take care of it (explained below);
 +
 +3. using a built-in numerical differencing method.
 +
 +The derivatives can be passed into a `<​Some>​`Differentiable object, which in turn can be passed into the optimizing functions. ​ The optimizers know how to use these objects (picking off the derivatives or ignoring them). ​ If only the optimizing function itself is passed, and an optimizer requires a gradient, optimize falls back on numerical differencing. ​ If optimizing function and gradients and/or Hessians are passed, optimize can use them. 
 +
 +
 +
 +## User-Provided Symbolic Derivatives
 +
 +The syntax by which users can pass symbolic derivatives is unusual. ​ The gradient and hessian functions that the user passes in require some storage as their first argument, which becomes a cache that is modified upon each call.  This is for speed. ​ The example here continues working with the symbolic `myfun()`, `myfungradients()`,​ and `myfunhessian()` for clarity. ​ For even faster speed, these functions should not be called but coded directly into `f()`, `g!()`, and `h!()`.
 +
 +WARNING: Pay careful attention to how `g!()` and `h!()` are defined, and read the description at the end of the example.
 +
 +
 +```juliarepl
 +
 +julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2;​
 +
 +julia> myfungradients(x)= [ 2*(1 + (x[2]-3)^2)*(x[1]-2 ),  2*(1 + (x[1]-2)^2)*(x[2]-3) ];
 +
 +julia> myfunhessian(x)= reshape( [ 2+2*(x[2]-3)^2,​ 4*(x[1]-2)*(x[2]-3),​ 4*(x[1]-2)*(x[2]-3) , 2+2*(x[2]-2)^2 ], 2, 2 );
 +
 +julia> function f(x) myfun(x) end;#​function
 +
 +julia> function g!(G,x) Gcp=myfungradients(x);​ G[1]=Gcp[1];​ G[2]=Gcp[2];​ G; end;#​function ​ ## must not use G=Gin!!!
 +
 +julia> function fg!(G,x) g!(G,x); f(x) end;#​function ​  ​(both! can be used to shorten calculations)
 +
 +julia> function h!(H,x) Hcp=myfunhessian(x);​ for i=1:​length(x),​j=1:​length(x) ; H[i,j]= Hcp[i,j]; end; end;#​function
 +
 +julia> z2= zeros(2);
 +
 +julia> using Optim           ## needed
 +
 +julia> optimize( f, g!, z2, BFGS())
 +Results of Optimization Algorithm
 + * Algorithm: BFGS
 + * Starting Point: [0.0,0.0]
 + * Minimizer: [2.000000000000021,​3.0000000000000027]
 + * Minimum: 4.427482e-28
 + * Iterations: 7
 + * Convergence:​ true
 +   * |x - x'| ≤ 0.0e+00: false
 +     |x - x'| = 1.75e-07
 +   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
 +     |f(x) - f(x')| = 7.04e+13 |f(x)|
 +   * |g(x)| ≤ 1.0e-08: true
 +     ​|g(x)| = 4.17e-14
 +   * Stopped by an increasing objective: false
 +   * Reached Maximum Number of Iterations: false
 + * Objective Calls: 21
 + * Gradient Calls: 21
 +
 +```
 +
 +WARNING It is very tempting to code the more elegant
 +
 +```juliarepl
 +julia> function g!(G,x) G=myfungradients(x) end;#​function
 +```
 +
 +However, this is a bug.  The reason is that the function then replaces the passed-in `G`, which means that the global G (outside `g!()`) never gets to see the changes. ​ Instead, `G` itself must be modified. ​ This can be done with
 +
 +```juliarepl
 +julia> function g!(G,x) Gcp=myfungradients(x);​ G[1]=Gcp[1];​ G[2]=Gcp[2];​ G; end;#​function
 +
 +julia> function g!(G,x); G .= myfungradients(x) end;#​function ​ ## also works
 +```
 +
 +
 +
 +## Built-In Automatic Exact Differentiation
 +
 +### ForwardDiff
 +
 +[Automatic differentiation](https://​github.com/​JuliaNLSolvers/​Optim.jl/​blob/​master/​docs/​src/​algo/​autodiff.md) is built around the `ForwardDiff.jl` package, which is pulled in by `Optim.jl`. ​ ForwardDiff is pickier about the way the user function is defined. ​ In particular, although it understands `Vector`, it does not understand `Vector{Float64}`. ​ (This can mess with dispatch precedence.) ​ Forwarddiff works by redefining *every* elementary math operation and function in Julia. ​ The math operations still returns a numeric value (as always), but they then also keep internally a symbolic representation,​ so that ForwardDiff can analyze the function progress and apply the chainrule piece by piece by piece.
 +
 +```juliarepl
 +julia> using ForwardDiff
 +
 +julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2;​
 +
 +julia> ForwardDiff.gradient( myfun, [ 2.0, 3.0 ] )          ## at the optimum, it is zero in both directions
 +2-element Array{Float64,​1}:​
 + 0.0
 + 0.0
 +
 +```
 +
 +WARNING Do not require the input arguments to be typed (Float) on the user function that is to be optimized. ​ This messes up the dispatch.
 +
 +
 +### Kinks in ForwardDiff
 +
 +Kinks are allowed, but ForwardDiff ignores looking towards its rear, and just looks towards its front. ​ Thus, it gives the mathematically dubious result:
 +
 +```juliarepl
 +julia> using ForwardDiff
 +
 +julia> sumabsf(x)= sum(abs.(x))
 +sumabsf (generic function with 1 method)
 +
 +julia> ForwardDiff.gradient(sumabsf,​ [ 0.0 ] )              ## warning --- mathematically strange
 +1-element Array{Float64,​1}:​
 + 1.0
 +
 +julia> ForwardDiff.gradient(sumabsf,​ [ -0.0 ] )             ## yes, Julia has a negative 0.0
 +1-element Array{Float64,​1}:​
 + -1.0
 +```
 +
 +
 +### ForwardDiff Interface to Optim: <​Use>​Differentiable
 +
 +Optim provides two convenience functions that contain the useful differentials (and in the same form as described above). ​ Although the end user rarely uses them to obtain gradients and Hessians, we can show how the optimizers do it.
 +
 +```juliarepl
 +julia> using Optim
 +
 +julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2; ​         ## could be x::Vector, but not x::​Vector{Float64}
 +
 +julia> od = OnceDifferentiable(myfun,​ [NaN,NaN]; autodiff = :​forward); ​    ## [NaN,NaN] is needed to know dimensionality of myfun
 +
 +julia> myfun( [2.0,3.0] ) == od.f( [2.0, 3.0] )                            ## confirming that it copied the function
 +true
 +
 +julia> od.df( [NaN,NaN], [2.0, 3.0] )                                      ## first argument is ignored.
 +2-element Array{Float64,​1}:​
 + 0.0
 + 0.0
 +
 +julia> td = TwiceDifferentiable(myfun,​ [ NaN, NaN ]; autodiff = :forward); ## works analogously
 +
 +```
 +
 +* To force Optim not to use differentials (even numeric ones), define `nd = NonDifferentiable(fun,​ z2)`.
 +
 +* The `od` and `td` container objects contain both the function and its derivatives,​ which can be passed to optimization functions in lieu of the function itself.
 +
 +* Again, there are type constraints on the user function. ​ `myfun(x::​Vector{Float64})= ...` does not work, because Float64 is too specific and steals the dispatch from the `ForwardDiff.jl` differentiator.
 +
 +
 +
 +## Built-In Numeric Differentiation
 +
 +Simply do not provide the gradient and/or Hessian to `optimize()`,​ and Optim.jl will by itself fall back on `DiffEqDiffTools.jl`.
 +
 +
 +
 +
 +
 +## The Complete Copy-Paste Inputs
 +
 +
 +```julia
 +myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2;​
 +myfungradients(x)= [ 2*(1 + (x[2]-3)^2)*(x[1]-2 ),  2*(1 + (x[1]-2)^2)*(x[2]-3) ]
 +myfunhessian(x)= reshape( [ 2+2*(x[2]-3)^2,​ 4*(x[1]-2)*(x[2]-3),​ 4*(x[1]-2)*(x[2]-3) , 2+2*(x[2]-2)^2 ], 2, 2 )
 +
 +## for speed, you should code the above directly into the below.
 +
 +function f(x) myfun(x) end#​function
 +function g!(G,x) G.=myfungradients(x) end#​function ​ ## must not use G=, but G.= to replace external G contents
 +function fg!(G,x) g!(G,x); f(x) end#​function ​       ## (both! can be used to shorten calculations)
 +function h!(H,x) H .= myfunhessian(x) end#​function ​ ## must not use H=, but H.= to replace external H contents
 +
 +z2= zeros(2.0);
 +
 +using Optim           ## henceforth needed
 +
 +## show optimize use via function call
 +optimize( f, g!, z2, BFGS())
 +
 +
 +## show optimize use via *Differentiable objects
 +
 +nd = NonDifferentiable( f, z2 )
 +od = OnceDifferentiable( f, g!, z2 )
 +td = TwiceDifferentiable( f, g!, h!, z2 )
 +
 +## show how optimizer functions will internally pick off results
 +
 +println( ( td.f( [2.0, 2.0] ), td.df( [NaN,NaN], [2.0, 2.0] ), td.h( [NaN NaN;NaN NaN], [2.0, 2.0] ) ) )
 +println( ( od.f( [2.0, 2.0] ), od.df( [NaN,NaN], [2.0, 2.0] ) ) )
 +
 +## show optimize use via object
 +
 +optimize( td, z2, BFGS())
 +optimize( od, z2, BFGS())
 +
 +```
 +
 +WARNING do not forget the `.=` inside the `g!()` and `h!()` functions!
  
optim.txt · Last modified: 2018/12/27 14:37 (external edit)