optim

# Differences

This shows you the differences between two versions of the page.

 — optim [2018/12/27 14:37] (current) Line 1: Line 1: + + ~~CLOSETOC~~ + + ~~TOC 1-3 wide~~ + + --- + + ^  [[roots|Roots and Solving]] ​ ^  Compact Optimization ​ ^  [[JuMP|Linear and Squared Optimization]] ​ ^ + + --- + + + ```juliarepl + julia> pkgchk.( [  "​julia"​ => v"​1.0.3",​ "​Optim"​ => v"​0.17.1"​ ] );   ## "​ForwardDiff"​ => nothing ​ + + ``` + + + + # Optimization With Optim.jl + + * [Optim](http://​julianlsolvers.github.io/​Optim.jl/​latest/​) is the pure Julia implementation,​ discussed in this chapter. ​ It is one of my favorite Julia packages: native, efficient, maintained. + + + Most of the examples in this chapter minimize the two-dimensional quadratic polynomial + + ```julia + myfun(x)= (x-2)^2 + (x-3)^2 + ((x-2)*(x-3))^2;​ + ``` + + It has an obvious minimum at `[2,3]`, and is an ideal example function around which many optimizers were originally designed. ​ [[optim-results]] discusses more difficult test functions. + + IMPORTANT: see [[optim-results]] for some benchmarks + + + + ## Overview of Optimization and Algorithms + + ### Available Multivariate Optimization Algorithms + + ^ **Name** ​ ^  **Abbrev** ​ ^  **Derivs?​** ​ ^  **~Time** ^ **Notes** ​ ^ + | Nelder-Mead Simplex |  NM  |  No  |  31 | works on noncontinuous(?​) nonsmooth functions. ​ | + | Simulated Annealing ​ |  SAMIN  |  No  |  4,700 | searches for **global** optima, but is slow  | + | Particle Swarm  |  Prtcl  |  No  |  333,000 | switches algorithms for **global** optimum search, but is really slow  | + | GradientDescent ​ |  GrdDsct ​ |  1st  |  19 | often a bad idea due to excessive zig-zagging ​ | + | ConjugateGradient ​ |  CnjgGrd ​ |  1st  |  20 | often better than gradient descent ​ | + | BFGS  |  BGFS  |  1st  |  40 | A quasi-Newton method which cleverly extrapolates "​implied"​ second derivative from prev turns. ​ | + | LBFGS  |  LBFGS  |  1st  |  29 | Same, but with limited memory ​ | + | Newton ​ |  Nwtn  |  2nd  |  31 | Newton-Raphson requires first and second derivatives. ​ | + | NewtonTrustRegion ​ |  NwTst  |  2nd  |  50 | Like Newton-Raphson,​ but limits step size.  | + + + * The time estimates are for the sample polynomial function. ​ Optimizers will be faster or slower for other problems. ​ The timings are just to give an idea of orders of magnitude. + + * SAMIN (and the univariate optimizers) requires box constraints. ​ The others do not. + + * All optimizers are using their defaults, a `rand(2)` starting point, no symbolic user-provided gradients, and no bounds (except SAMIN (-100 to +100)). + + #### Showing Progress + + * use `show_trace` as an option. + + + + ### Providing Derivatives (Gradients and Hessians) To Optimizers + + You may never need to provide them.  `optimize()` will use numeric differentiation if the optimizer determines it needs any.  They tend to work quite well and are worry-free. ​ The drawback is that they require extra function evaluations. ​ If function evaluation is costly, here is what you can do. + + (In most of the examples, `z2` is a starting value which you can set to `zeros(2)` or `rand(2)`.) + `optimize()` allows as arguments any of the following: + + 1. Direct provision of functions: + + 1. a plain function, `optimize( myfun, z2 )`. + + 2. a plain function with user-provided symbolic gradient, `optimize( myfun, myfun_gradient,​ z2 )`. + + 3. a plain function with user-provided symbolic gradient and Hessian, `optimize( myfun, myfun_gradient,​ myfun_hessian,​ z2 )`. + + 2. Indirect provision via a packaged object: + + 1. a `OnceDifferentiated` object, which holds the function and the gradient. ​ It can be created by automatic differentiation or by passing a user f into OnceDifferentiated. + + 2. a `TwiceDifferentiated` object, which holds the function, its gradient, and its Hessian. ​ It can be created by automatic differentiation or by passing a user f and g into TwiceDifferentiated. + + Again, `optimize` will use differencing for derivatves by default, so providing derivatives is always optional. ​ Numeric differencing is recommended when function evaluation itself is quick. + + If you do need exact derivatives,​ perhaps because they occasionally can be more accurate, perhaps because calculation of the function and thus differences is expensive, you have two choices: + + 1. You can code the symbolic derivatives into Julia. + + 2. You can use automatic differentiation,​ such as ` optimize( TwiceDifferentiable(f,​ zeros(n); autodiff = :forward), zeros(n), BGFS(), Optim.Options() ) `.  However, automatic differentiation works by overloading Julia'​s elementary math, which can also slow down performance. + + + * WARNING *There are some common gotchas when coding your own gradients and Hessians. ​ If you want to provide them, please read the appendix section [[#​providing_derivatives_to_optimizers]] first.* + + + ### Box Constraints + + All optimizers in Optim allow box constraints. ​ See the [[#​optimization_with_box_constraints|example below]]. + + + ### Input Options + + * Setting Termination Conditions: `x_tol= 1e-32`, `f_tol=1e-32`,​ `g_tol=1e-8`,​ `f_calls_limit=0`,​ `g_calls_limit=0`,​ `h_calls_limit=0`,​ `iterations=1_000` + + * Setting Verbose Printouts and Intervention : `store_trace=false`,​ `show_trace=false`,​ `extended_trace=false`,​ `show_every=0`,​ `callback=function`,​ `time_limit=NaN`. + + * Options can be given as `optimize(...,​ Optim.Options(g_tol=1e-12))` or `optimize(...,​ g_tol=1e-12)`. + + * Oddly, there is no input option to change from minimization to maximization + + + ### Output Overview + + #### Both Univariate and Multivariate Results + + ^ **Call** ​ ^  **Sample** ​ ^ **Notes** ​ ^ + | `summary(res)` ​ |  "​SAMIN" ​ | method, usually as input  | + | `Optim.minimizer(res)` ​ |  [ 2.0, 3.0 ]  | the optimal value  | + | `Optim.minimum(res)` ​ | 3.534e-13 | function value at optimum ​ | + | `Optim.iterations(res)` |  3451  | number of iterations ​ | + | `Optim.iteration_limit_reached(res)` |  false  | obvious ​ | + | `Optim.trace(res)` |  n/a  | only retained if `store_trace=true` in optimize ​ | + | `Optim.x_trace(res)` |  n/a  | only retained if `store_trace=true` in optimize ​ | + | `Optim.f_trace(res)` |  n/a  | only retained if `store_trace=true` in optimize ​ | + | `Optim.f_calls(res)` |  3451  | number of function calls  | + | `Optim.converged(res)` |  false  | result ​ | + + #### Only Univariate Results + + ^ **Call** ​ ^  **Sample** ​ ^ **Notes** ​ ^ + | `Optim.lower_bound(res)` ​ |  -100  | lower bound, usually as input  | + | `Optim.upper_bound(res)` ​ |  100  | upper bound, usually as input  | + | `Optim.x_lower_trace(res)` ​ |  | only retained if `store_trace=true` in `optimize` ​ | + | `Optim.x_upper_trace(res)` ​ |  | only retained if `store_trace=true` in `optimize` ​ | + | `Optim.rel_tol(res)` ​ |  | only retained if `store_trace=true` in `optimize` ​ | + | `Optim.abs_tol(res)` |  | only retained if `store_trace=true` in `optimize` ​ | + + + #### Only Multivariate Results + + ^ **Call** ​ ^  **Sample** ​ ^ **Notes** ​ ^ + | `Optim.g_norm_trace(res)` ​ | n/a | only retained if `store_trace=true` in optimize ​ | + | `Optim.g_calls(res)` ​ | 0 | only retained if `store_trace=true` in `optimize` ​  | + | `Optim.x_converged(res)` ​ | false | only retained if `store_trace=true` in `optimize` ​   | + | `Optim.f_converged(res)` ​ | false | only retained if `store_trace=true` in `optimize` ​   | + | `Optim.g_converged(res)` ​ | false | only retained if `store_trace=true` in `optimize` ​   | + | `Optim.initial_state(res)` ​ | [0.0,0.0] | user input  | + + + + + ## Gradient-Free Optimizers + + ### Nelder-Mead Simplex (Amoeba) + + Nelder-Mead Simplex is (sort of) the multi-dimensional equivalent of the golden-section search in one dimension. ​ This algorithm is *not* an equivalent for the golden-section search, which brackets a minimum and is guaranteed to find it.  Instead, there are beautiful and intuitive illustrations at [Wikipedia](https://​en.wikipedia.org/​wiki/​Nelder%E2%80%93Mead_method). ​ The best description is not that Nelder-Mead looks less like an amoeba contractor, and more like an amoeba crawler. + + ```juliarepl + julia> using Optim + + julia> myfun(x)= (x-2)^2 + (x-3)^2 + ((x-2)*(x-3))^2;​ + + julia> optimize( myfun, [0.0, 0.0], NelderMead(),​ Optim.Options(iterations=100_000) ) + Results of Optimization Algorithm + * Algorithm: Nelder-Mead + * Starting Point: [0.0,0.0] + * Minimizer: [2.0000267077758536,​3.0000220787571954] + * Minimum: 1.200777e-09 + * Iterations: 49 + * Convergence:​ true + ​* ​ √(Σ(yᵢ-ȳ)²)/​n < 1.0e-08: true + * Reached Maximum Number of Iterations: false + * Objective Calls: 99 + + ``` + + + + + + ### Simulated Annealing + + Simulated Annealing (Metropolis-Hastings) seems to be a form of Monte-Carlo Markov Chain simulation, in which random probabilistic guesses are salted (and more after a while when the lowest value(s) so far seems pretty good already). + + WARNING This requires the master and not the release, so use `Pkg.checkout("​Optim"​)`. ​ Is there a version without bounds?? + + ```juliarepl + julia> using Optim + + julia> using Random; Random.seed!(0);​ + + julia> myfun(x)= (x-2)^2 + (x-3)^2 + ((x-2)*(x-3))^2;​ + + julia> optimize( myfun, fill(-100.0,​2),​ fill(+100.0,​2),​ [0.0,0.0], SAMIN(), Optim.Options(iterations=100_000) ) + ================================================================================ + SAMIN results + ==> Normal convergence <== + total number of objective function evaluations:​ 22101 + + Obj. value: ​     0.0000000000 + + ​parameter ​     search width + ​2.00000 ​          ​0.00000 + ​3.00000 ​          ​0.00000 + ================================================================================ + + Results of Optimization Algorithm + * Algorithm: SAMIN + * Starting Point: [0.0,0.0] + * Minimizer: [2.000000006088701,​2.999999998587848] + * Minimum: 3.906645e-17 + * Iterations: 22101 + * Convergence:​ false + * |x - x'| ≤ 0.0e+00: false + |x - x'| = NaN + * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false + |f(x) - f(x')| = NaN |f(x)| + * |g(x)| ≤ 0.0e+00: false + ​|g(x)| = NaN + * Stopped by an increasing objective: false + * Reached Maximum Number of Iterations: false + * Objective Calls: 22101 + * Gradient Calls: 0 + + ``` + + + + + ### Particle Swarm + + Particle Swarm tries to switch around search heuristics, also probabilistically. + + ```juliarepl + julia> using Optim + + julia> myfun(x)= (x-2)^2 + (x-3)^2 + ((x-2)*(x-3))^2;​ + + julia> using Random; Random.seed!(0);​ + + julia> optimize( myfun, [0.0,0.0], ParticleSwarm(),​ Optim.Options(iterations=100_000)) + Results of Optimization Algorithm + * Algorithm: Particle Swarm + * Starting Point: [0.0,0.0] + * Minimizer: [2.0,3.0] + * Minimum: 0.000000e+00 + * Iterations: 100000 + * Convergence:​ false + * |x - x'| ≤ 0.0e+00: false + |x - x'| = NaN + * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false + |f(x) - f(x')| = NaN |f(x)| + * |g(x)| ≤ 1.0e-08: false + ​|g(x)| = NaN + * Stopped by an increasing objective: false + * Reached Maximum Number of Iterations: true + * Objective Calls: 400000 + * Gradient Calls: 0 + + ``` + + + + ## Gradient-Based Hessian-Free Optimizers + + + ### Steepest GradientDescent + + ```juliarepl + julia> using Optim + + julia> myfun(x)= (x-2)^2 + (x-3)^2 + ((x-2)*(x-3))^2;​ + + julia> optimize( myfun, [0.0,0.0], GradientDescent()) + Results of Optimization Algorithm + * Algorithm: Gradient Descent + * Starting Point: [0.0,0.0] + * Minimizer: [2.0,3.0] + * Minimum: 0.000000e+00 + * Iterations: 6 + * Convergence:​ true + * |x - x'| ≤ 0.0e+00: false + |x - x'| = 1.52e-07 + * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false + |f(x) - f(x')| = Inf |f(x)| + * |g(x)| ≤ 1.0e-08: true + ​|g(x)| = 2.22e-16 + * Stopped by an increasing objective: false + * Reached Maximum Number of Iterations: false + * Objective Calls: 19 + * Gradient Calls: 19 + ``` + + + + ### Conjugate Gradient Descent + + Usually avoids excessive zig-zagging of steepest gradient descent: + + ```juliarepl + julia> using Optim + + julia> myfun(x)= (x-2)^2 + (x-3)^2 + ((x-2)*(x-3))^2;​ + + julia> optimize( myfun, [0.0,0.0], ConjugateGradient()) + Results of Optimization Algorithm + * Algorithm: Conjugate Gradient + * Starting Point: [0.0,0.0] + * Minimizer: [2.000000000000004,​3.0000000000000004] + * Minimum: 1.617165e-29 + * Iterations: 7 + * Convergence:​ true + * |x - x'| ≤ 0.0e+00: false + |x - x'| = 1.68e-07 + * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false + |f(x) - f(x')| = 1.77e+15 |f(x)| + * |g(x)| ≤ 1.0e-08: true + ​|g(x)| = 7.77e-15 + * Stopped by an increasing objective: false + * Reached Maximum Number of Iterations: false + * Objective Calls: 15 + * Gradient Calls: 9 + ``` + + + ### BFGS + + The classic quasi-Newton workhorse, which internally approximates the second derivative with how the gradients have changed across iterations. + + ```juliarepl + julia> using Optim + + julia> myfun(x)= (x-2)^2 + (x-3)^2 + ((x-2)*(x-3))^2;​ + + julia> optimize( myfun, [0.0,0.0], BFGS()) + Results of Optimization Algorithm + * Algorithm: BFGS + * Starting Point: [0.0,0.0] + * Minimizer: [2.000000000000021,​3.0000000000000027] + * Minimum: 4.427482e-28 + * Iterations: 7 + * Convergence:​ true + * |x - x'| ≤ 0.0e+00: false + |x - x'| = 1.75e-07 + * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false + |f(x) - f(x')| = 7.04e+13 |f(x)| + * |g(x)| ≤ 1.0e-08: true + ​|g(x)| = 4.15e-14 + * Stopped by an increasing objective: false + * Reached Maximum Number of Iterations: false + * Objective Calls: 21 + * Gradient Calls: 21 + ``` + + The limited-memory version is nearly as good, requiring just one extra call. + + ```juliarepl + julia> using Optim + + julia> myfun(x)= (x-2)^2 + (x-3)^2 + ((x-2)*(x-3))^2;​ + + julia> optimize( myfun, [0.0,0.0], LBFGS()) + Results of Optimization Algorithm + * Algorithm: L-BFGS + * Starting Point: [0.0,0.0] + * Minimizer: [1.999999999999605,​2.9999999999999294] + * Minimum: 1.610245e-25 + * Iterations: 7 + * Convergence:​ true + * |x - x'| ≤ 0.0e+00: false + |x - x'| = 4.53e-07 + * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false + |f(x) - f(x')| = 1.32e+12 |f(x)| + * |g(x)| ≤ 1.0e-08: true + ​|g(x)| = 7.90e-13 + * Stopped by an increasing objective: false + * Reached Maximum Number of Iterations: false + * Objective Calls: 22 + * Gradient Calls: 22 + ``` + + + + ## Hessian-Based Optimizers + + ### Newton-Raphson + + ```juliarepl + julia> using Optim + + julia> myfun(x)= (x-2)^2 + (x-3)^2 + ((x-2)*(x-3))^2;​ + + julia> optimize( myfun, [0.0,0.0], Newton()) + Results of Optimization Algorithm + * Algorithm: Newton'​s Method + * Starting Point: [0.0,0.0] + * Minimizer: [2.0000000000006297,​3.0000000000166733] + * Minimum: 2.783965e-22 + * Iterations: 5 + * Convergence:​ true + * |x - x'| ≤ 0.0e+00: false + |x - x'| = 6.05e-04 + * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false + |f(x) - f(x')| = 1.32e+15 |f(x)| + * |g(x)| ≤ 1.0e-08: true + ​|g(x)| = 3.33e-11 + * Stopped by an increasing objective: false + * Reached Maximum Number of Iterations: false + * Objective Calls: 15 + * Gradient Calls: 15 + * Hessian Calls: 5 + ``` + + + Trust regions limit the step size to something sensible (i.e., so that Newton does not step out of bounds so easily). + + ```juliarepl + julia> using Optim + + julia> myfun(x)= (x-2)^2 + (x-3)^2 + ((x-2)*(x-3))^2;​ + + julia> optimize( myfun, [0.0,0.0], NewtonTrustRegion()) + Results of Optimization Algorithm + * Algorithm: Newton'​s Method (Trust Region) + * Starting Point: [0.0,0.0] + * Minimizer: [2.0,3.0] + * Minimum: 0.000000e+00 + * Iterations: 8 + * Convergence:​ true + * |x - x'| ≤ 0.0e+00: false + |x - x'| = 8.53e-09 + * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false + |f(x) - f(x')| = Inf |f(x)| + * |g(x)| ≤ 1.0e-08: true + ​|g(x)| = 2.22e-16 + * Stopped by an increasing objective: false + * Reached Maximum Number of Iterations: false + * Objective Calls: 9 + * Gradient Calls: 9 + * Hessian Calls: 7 + ``` + + + + + + ## Optimization With Box Constraints + + The call arguments of `optimize` change when box constraints are included. ​ Instead of passing a function designating the optimizer, such as `NelderMead()`),​ one passes `Fminbox{NelderMead}()` (The first called a function which returns a complex object which is then passed by the user into optimize; the second passes along the name of the function as a type.) + + NelderMead() ⟶ Fminbox{NelderMead}() + + The second and third arguments in optimize have to be the lower and upper bounds. ​ The `Optim.Options()` argument seems to be inconsistent with `Fminbox` (and may screw up the entire optimization with or without an error)! ​ The initial value has to be inside the box. + + ```juliarepl + julia> using Optim + + julia> myfun(x)= (x-2)^2 + (x-3)^2 + ((x-2)*(x-3))^2;​ + + julia> initial_x= [ 0.0, 0.0 ]; + + julia> optimize(myfun,​ initial_x, NelderMead() )               ## without the box constraint + Results of Optimization Algorithm + * Algorithm: Nelder-Mead + * Starting Point: [0.0,0.0] + * Minimizer: [2.0000267077758536,​3.0000220787571954] + * Minimum: 1.200777e-09 + * Iterations: 49 + * Convergence:​ true + ​* ​ √(Σ(yᵢ-ȳ)²)/​n < 1.0e-08: true + * Reached Maximum Number of Iterations: false + * Objective Calls: 98 + + julia> lower= [ -999.9, -999.9 ]; upper= [ 999, 2.9 ]; + + julia> results = optimize(myfun,​ lower, upper, initial_x) ##​ box constraint form: x is maxed at 2.9 + ┌ Warning: Linesearch failed, using alpha = 2.570171260006581 and exiting optimization. + │ The linesearch exited with message: + │ Linesearch failed to converge, reached maximum iterations 50. + └ @ Optim ~/​.julia/​packages/​Optim/​ULNLZ/​src/​utilities/​perform_linesearch.jl:​47 + Results of Optimization Algorithm + * Algorithm: Fminbox with L-BFGS + * Starting Point: [0.0,0.0] + * Minimizer: [2.0000000000004565,​2.8999999999999995] + * Minimum: 1.000000e-02 + * Iterations: 8 + * Convergence:​ true + * |x - x'| ≤ 0.0e+00: true + |x - x'| = 0.00e+00 + * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: true + |f(x) - f(x')| = 0.00e+00 |f(x)| + * |g(x)| ≤ 1.0e-08: false + ​|g(x)| = 2.00e-01 + * Stopped by an increasing objective: true + * Reached Maximum Number of Iterations: false + * Objective Calls: 1425 + * Gradient Calls: 1425 + + ``` + + + ```juliarepl + + julia> using Optim + + julia> f(x) = (1.0 - x)^2 + 100.0 * (x - x^2)^2 + f (generic function with 1 method) + + julia> lower = [1.25, -2.1] + 2-element Array{Float64,​1}:​ + 1.25 + -2.1 + + julia> upper = [Inf, Inf] + 2-element Array{Float64,​1}:​ + Inf + Inf + + julia> initial_x = [2.0, 2.0] + 2-element Array{Float64,​1}:​ + 2.0 + 2.0 + + julia> inner_optimizer = GradientDescent() + GradientDescent{LineSearches.InitialPrevious{Float64},​LineSearches.HagerZhang{Float64,​Base.RefValue{Bool}},​Nothing,​getfield(Optim,​ Symbol("##​12#​14"​))}(LineSearches.InitialPrevious{Float64} + alpha: Float64 1.0 + alphamin: Float64 0.0 + alphamax: Float64 Inf + , LineSearches.HagerZhang{Float64,​Base.RefValue{Bool}} + delta: Float64 0.1 + sigma: Float64 0.9 + alphamax: Float64 Inf + rho: Float64 5.0 + epsilon: Float64 1.0e-6 + gamma: Float64 0.66 + linesearchmax:​ Int64 50 + psi3: Float64 0.1 + display: Int64 0 + mayterminate:​ Base.RefValue{Bool} + , nothing, getfield(Optim,​ Symbol("##​12#​14"​))(),​ Flat()) + + julia> results = optimize(f, lower, upper, initial_x, Fminbox(inner_optimizer)) + Results of Optimization Algorithm + * Algorithm: Fminbox with Gradient Descent + * Starting Point: [2.0,2.0] + * Minimizer: [1.2500000000000002,​1.5626480963230658] + * Minimum: 6.250219e-02 + * Iterations: 11 + * Convergence:​ true + * |x - x'| ≤ 0.0e+00: false + |x - x'| = 2.22e-16 + * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: true + |f(x) - f(x')| = 0.00e+00 |f(x)| + * |g(x)| ≤ 1.0e-08: false + ​|g(x)| = 4.26e-01 + * Stopped by an increasing objective: false + * Reached Maximum Number of Iterations: false + * Objective Calls: 818877 + * Gradient Calls: 818877 + + ``` + + + * WARNING In 0.6.2, with 0.14.1, adding `Optim.Options()` seems to be incompatible with `Fminbox`. ​ `optimize(myfun,​ [ 50.0, 50.0 ], fill(6.0, 2), fill(100.0, 2), Fminbox{NelderMead}(),​ Optim.Options(iterations=200000))` fails. + + + + ## Passing Along Non-Optimization Parameters to User Function + + refer to [https://​github.com/​JuliaNLSolvers/​Optim.jl/​blob/​master/​docs/​src/​algo/​autodiff.md]. + + You can either use global variables, or more elegantly a version of the following (here, two-plus-one parameters):​ + + ```juliarepl + julia> using Optim + + julia> msumdeltas(x::​Vector{Float64},​ p::​Vector{Float64},​ m::​Vector{Float64})::​Float64 = sum( m.*(x .- p).^2 ); + + julia> optimize( xf->​msumdeltas( xf, [ 12.0, 14.0 ], [ 1.0, 1.0 ] ),  [ 0.0, 0.0 ] ) + Results of Optimization Algorithm + * Algorithm: Nelder-Mead + * Starting Point: [0.0,0.0] + * Minimizer: [12.000018215395523,​14.000051397940497] + * Minimum: 2.973549e-09 + * Iterations: 62 + * Convergence:​ true + ​* ​ √(Σ(yᵢ-ȳ)²)/​n < 1.0e-08: true + * Reached Maximum Number of Iterations: false + * Objective Calls: 120 + + ``` + + + ## Univariate Function Optimization + + There are two more optimizers for univariate cases, `Brent` and `GoldenSection`. ​ To use them, omit the starting point and provide brackets. + + ```juliafixme + julia> using Optim + + julia> optimize( x->​(3*(x-4)^2),​ -100, +100, Brent() ) + Results of Optimization Algorithm + * Algorithm: Brent'​s Method + * Search Interval: [-100.000000,​ 100.000000] + * Minimizer: 4.000000e+00 + * Minimum: 0.000000e+00 + * Iterations: 5 + * Convergence:​ max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16):​ true + * Objective Function Calls: 6 + + julia> optimize( x->​(3*(x-4)^2),​ -100, +100, GoldenSection() ) + Results of Optimization Algorithm + * Algorithm: Golden Section Search + * Search Interval: [-100.000000,​ 100.000000] + * Minimizer: 4.000000e+00 + * Minimum: 6.668724e-16 + * Iterations: 44 + * Convergence:​ max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16):​ true + * Objective Function Calls: 45 + + ``` + + + + + ## More Function Examples and Comparisons + + * There are many more tests and benchmark results in [[optim-results]]. + + + * There are useful [optimization tips and tricks](https://​github.com/​JuliaNLSolvers/​Optim.jl/​blob/​master/​docs/​src/​user/​tipsandtricks.md). + + + + + + + # Backmatter + + ## Useful Packages on Julia Repository + + * [Optim](http://​julianlsolvers.github.io/​Optim.jl/​latest/​) is a pure Julia implementation,​ discussed in this chapter. + + * [NLopt](https://​nlopt.readthedocs.io/​en/​latest/​) is a mixed other languages and Julia implementation + + * [Blackbox](https://​github.com/​robertfeldt/​BlackBoxOptim.jl) focuses on non-gradient global optimization,​ with support for parallelism. + + + ## Notes + + ## References + + - [JuliaOpt](http://​www.juliaopt.org/​) + + - [Optim.jl](https://​github.com/​JuliaNLSolvers/​Optim.jl) first package + + - [Convex](https://​github.com/​JuliaOpt/​Convex.jl) + + - [NLopt](https://​github.com/​JuliaOpt/​NLopt.jl) + + - [Optimization Tips and Tricks](https://​github.com/​JuliaNLSolvers/​Optim.jl/​blob/​master/​docs/​src/​user/​tipsandtricks.md) + + - DataFitting.jl + + - [tpapp/​ContinuousTransformations.jl](https://​github.com/​tpapp/​ContinuousTransformations.jl) handles transforms like -inf < log(y-1) < infinity. + + - There is an interesting [Turing](https://​github.com/​yebai/​Turing.jl/​wiki/​Get-started) package for modeling and inferences in stochastic environments. ​ Alas, as of May 2018, it was non-functional. ​ It's a work in progress. + + + + # Providing Derivatives To Optimizers + + The base example is still the two-dimensional quadratic polynomial with minimum `myfun([2,​3])=0` with the following exact gradients and Hessian: + + ```julia + myfun(x)= (x-2)^2 + (x-3)^2 + ((x-2)*(x-3))^2;​ + myfungradients(x)= [ 2*(1 + (x-3)^2)*(x-2 ),  2*(1 + (x-2)^2)*(x-3) ] + myfunhessian(x)= reshape( [ 2+2*(x-3)^2,​ 4*(x-2)*(x-3),​ 4*(x-2)*(x-3) , 2+2*(x-2)^2 ], 2, 2 ) + ``` + + A complete copy-and-paste example can be found towards the rear of this section. + + --- + + Some optimization algorithms benefit or require gradient and Hessian information,​ i.e., first and second derivatives. ​ Julia supports three ways for optimizers to obtain them: + + 1. working out and providing symbolic derivatives in Julia function form; + + 2. letting a built-in exact differentiator take care of it (explained below); + + 3. using a built-in numerical differencing method. + + The derivatives can be passed into a `<​Some>​`Differentiable object, which in turn can be passed into the optimizing functions. ​ The optimizers know how to use these objects (picking off the derivatives or ignoring them). ​ If only the optimizing function itself is passed, and an optimizer requires a gradient, optimize falls back on numerical differencing. ​ If optimizing function and gradients and/or Hessians are passed, optimize can use them. + + + + ## User-Provided Symbolic Derivatives + + The syntax by which users can pass symbolic derivatives is unusual. ​ The gradient and hessian functions that the user passes in require some storage as their first argument, which becomes a cache that is modified upon each call.  This is for speed. ​ The example here continues working with the symbolic `myfun()`, `myfungradients()`,​ and `myfunhessian()` for clarity. ​ For even faster speed, these functions should not be called but coded directly into `f()`, `g!()`, and `h!()`. + + WARNING: Pay careful attention to how `g!()` and `h!()` are defined, and read the description at the end of the example. + + + ```juliarepl + + julia> myfun(x)= (x-2)^2 + (x-3)^2 + ((x-2)*(x-3))^2;​ + + julia> myfungradients(x)= [ 2*(1 + (x-3)^2)*(x-2 ),  2*(1 + (x-2)^2)*(x-3) ]; + + julia> myfunhessian(x)= reshape( [ 2+2*(x-3)^2,​ 4*(x-2)*(x-3),​ 4*(x-2)*(x-3) , 2+2*(x-2)^2 ], 2, 2 ); + + julia> function f(x) myfun(x) end;#​function + + julia> function g!(G,x) Gcp=myfungradients(x);​ G=Gcp;​ G=Gcp;​ G; end;#​function ​ ## must not use G=Gin!!! + + julia> function fg!(G,x) g!(G,x); f(x) end;#​function ​  ​(both! can be used to shorten calculations) + + julia> function h!(H,x) Hcp=myfunhessian(x);​ for i=1:​length(x),​j=1:​length(x) ; H[i,j]= Hcp[i,j]; end; end;#​function + + julia> z2= zeros(2); + + julia> using Optim           ## needed + + julia> optimize( f, g!, z2, BFGS()) + Results of Optimization Algorithm + * Algorithm: BFGS + * Starting Point: [0.0,0.0] + * Minimizer: [2.000000000000021,​3.0000000000000027] + * Minimum: 4.427482e-28 + * Iterations: 7 + * Convergence:​ true + * |x - x'| ≤ 0.0e+00: false + |x - x'| = 1.75e-07 + * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false + |f(x) - f(x')| = 7.04e+13 |f(x)| + * |g(x)| ≤ 1.0e-08: true + ​|g(x)| = 4.17e-14 + * Stopped by an increasing objective: false + * Reached Maximum Number of Iterations: false + * Objective Calls: 21 + * Gradient Calls: 21 + + ``` + + WARNING It is very tempting to code the more elegant + + ```juliarepl + julia> function g!(G,x) G=myfungradients(x) end;#​function + ``` + + However, this is a bug.  The reason is that the function then replaces the passed-in `G`, which means that the global G (outside `g!()`) never gets to see the changes. ​ Instead, `G` itself must be modified. ​ This can be done with + + ```juliarepl + julia> function g!(G,x) Gcp=myfungradients(x);​ G=Gcp;​ G=Gcp;​ G; end;#​function + + julia> function g!(G,x); G .= myfungradients(x) end;#​function ​ ## also works + ``` + + + + ## Built-In Automatic Exact Differentiation + + ### ForwardDiff + + [Automatic differentiation](https://​github.com/​JuliaNLSolvers/​Optim.jl/​blob/​master/​docs/​src/​algo/​autodiff.md) is built around the `ForwardDiff.jl` package, which is pulled in by `Optim.jl`. ​ ForwardDiff is pickier about the way the user function is defined. ​ In particular, although it understands `Vector`, it does not understand `Vector{Float64}`. ​ (This can mess with dispatch precedence.) ​ Forwarddiff works by redefining *every* elementary math operation and function in Julia. ​ The math operations still returns a numeric value (as always), but they then also keep internally a symbolic representation,​ so that ForwardDiff can analyze the function progress and apply the chainrule piece by piece by piece. + + ```juliarepl + julia> using ForwardDiff + + julia> myfun(x)= (x-2)^2 + (x-3)^2 + ((x-2)*(x-3))^2;​ + + julia> ForwardDiff.gradient( myfun, [ 2.0, 3.0 ] )          ## at the optimum, it is zero in both directions + 2-element Array{Float64,​1}:​ + 0.0 + 0.0 + + ``` + + WARNING Do not require the input arguments to be typed (Float) on the user function that is to be optimized. ​ This messes up the dispatch. + + + ### Kinks in ForwardDiff + + Kinks are allowed, but ForwardDiff ignores looking towards its rear, and just looks towards its front. ​ Thus, it gives the mathematically dubious result: + + ```juliarepl + julia> using ForwardDiff + + julia> sumabsf(x)= sum(abs.(x)) + sumabsf (generic function with 1 method) + + julia> ForwardDiff.gradient(sumabsf,​ [ 0.0 ] )              ## warning --- mathematically strange + 1-element Array{Float64,​1}:​ + 1.0 + + julia> ForwardDiff.gradient(sumabsf,​ [ -0.0 ] )             ## yes, Julia has a negative 0.0 + 1-element Array{Float64,​1}:​ + -1.0 + ``` + + + ### ForwardDiff Interface to Optim: <​Use>​Differentiable + + Optim provides two convenience functions that contain the useful differentials (and in the same form as described above). ​ Although the end user rarely uses them to obtain gradients and Hessians, we can show how the optimizers do it. + + ```juliarepl + julia> using Optim + + julia> myfun(x)= (x-2)^2 + (x-3)^2 + ((x-2)*(x-3))^2; ​         ## could be x::Vector, but not x::​Vector{Float64} + + julia> od = OnceDifferentiable(myfun,​ [NaN,NaN]; autodiff = :​forward); ​    ## [NaN,NaN] is needed to know dimensionality of myfun + + julia> myfun( [2.0,3.0] ) == od.f( [2.0, 3.0] )                            ## confirming that it copied the function + true + + julia> od.df( [NaN,NaN], [2.0, 3.0] )                                      ## first argument is ignored. + 2-element Array{Float64,​1}:​ + 0.0 + 0.0 + + julia> td = TwiceDifferentiable(myfun,​ [ NaN, NaN ]; autodiff = :forward); ## works analogously + + ``` + + * To force Optim not to use differentials (even numeric ones), define `nd = NonDifferentiable(fun,​ z2)`. + + * The `od` and `td` container objects contain both the function and its derivatives,​ which can be passed to optimization functions in lieu of the function itself. + + * Again, there are type constraints on the user function. ​ `myfun(x::​Vector{Float64})= ...` does not work, because Float64 is too specific and steals the dispatch from the `ForwardDiff.jl` differentiator. + + + + ## Built-In Numeric Differentiation + + Simply do not provide the gradient and/or Hessian to `optimize()`,​ and Optim.jl will by itself fall back on `DiffEqDiffTools.jl`. + + + + + + ## The Complete Copy-Paste Inputs + + + ```julia + myfun(x)= (x-2)^2 + (x-3)^2 + ((x-2)*(x-3))^2;​ + myfungradients(x)= [ 2*(1 + (x-3)^2)*(x-2 ),  2*(1 + (x-2)^2)*(x-3) ] + myfunhessian(x)= reshape( [ 2+2*(x-3)^2,​ 4*(x-2)*(x-3),​ 4*(x-2)*(x-3) , 2+2*(x-2)^2 ], 2, 2 ) + + ## for speed, you should code the above directly into the below. + + function f(x) myfun(x) end#​function + function g!(G,x) G.=myfungradients(x) end#​function ​ ## must not use G=, but G.= to replace external G contents + function fg!(G,x) g!(G,x); f(x) end#​function ​       ## (both! can be used to shorten calculations) + function h!(H,x) H .= myfunhessian(x) end#​function ​ ## must not use H=, but H.= to replace external H contents + + z2= zeros(2.0); + + using Optim           ## henceforth needed + + ## show optimize use via function call + optimize( f, g!, z2, BFGS()) + + + ## show optimize use via *Differentiable objects + + nd = NonDifferentiable( f, z2 ) + od = OnceDifferentiable( f, g!, z2 ) + td = TwiceDifferentiable( f, g!, h!, z2 ) + + ## show how optimizer functions will internally pick off results + + println( ( td.f( [2.0, 2.0] ), td.df( [NaN,NaN], [2.0, 2.0] ), td.h( [NaN NaN;NaN NaN], [2.0, 2.0] ) ) ) + println( ( od.f( [2.0, 2.0] ), od.df( [NaN,NaN], [2.0, 2.0] ) ) ) + + ## show optimize use via object + + optimize( td, z2, BFGS()) + optimize( od, z2, BFGS()) + + ``` + + WARNING do not forget the `.=` inside the `g!()` and `h!()` functions!
optim.txt · Last modified: 2018/12/27 14:37 (external edit)

### Page Tools 