optim

This shows you the differences between two versions of the page.

— |
optim [2018/12/27 14:37] (current) |
||
---|---|---|---|

Line 1: | Line 1: | ||

+ | |||

+ | ~~CLOSETOC~~ | ||

+ | |||

+ | ~~TOC 1-3 wide~~ | ||

+ | |||

+ | --- | ||

+ | |||

+ | ^ [[roots|Roots and Solving]] ^ Compact Optimization ^ [[JuMP|Linear and Squared Optimization]] ^ | ||

+ | |||

+ | --- | ||

+ | |||

+ | |||

+ | ```juliarepl | ||

+ | julia> pkgchk.( [ "julia" => v"1.0.3", "Optim" => v"0.17.1" ] ); ## "ForwardDiff" => nothing | ||

+ | |||

+ | ``` | ||

+ | |||

+ | |||

+ | |||

+ | # Optimization With Optim.jl | ||

+ | |||

+ | * [Optim](http://julianlsolvers.github.io/Optim.jl/latest/) is the pure Julia implementation, discussed in this chapter. It is one of my favorite Julia packages: native, efficient, maintained. | ||

+ | |||

+ | |||

+ | Most of the examples in this chapter minimize the two-dimensional quadratic polynomial | ||

+ | |||

+ | ```julia | ||

+ | myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2; | ||

+ | ``` | ||

+ | |||

+ | It has an obvious minimum at `[2,3]`, and is an ideal example function around which many optimizers were originally designed. [[optim-results]] discusses more difficult test functions. | ||

+ | |||

+ | IMPORTANT: see [[optim-results]] for some benchmarks | ||

+ | |||

+ | |||

+ | |||

+ | ## Overview of Optimization and Algorithms | ||

+ | |||

+ | ### Available Multivariate Optimization Algorithms | ||

+ | |||

+ | ^ **Name** ^ **Abbrev** ^ **Derivs?** ^ **~Time** ^ **Notes** ^ | ||

+ | | Nelder-Mead Simplex | NM | No | 31 | works on noncontinuous(?) nonsmooth functions. | | ||

+ | | Simulated Annealing | SAMIN | No | 4,700 | searches for **global** optima, but is slow | | ||

+ | | Particle Swarm | Prtcl | No | 333,000 | switches algorithms for **global** optimum search, but is really slow | | ||

+ | | GradientDescent | GrdDsct | 1st | 19 | often a bad idea due to excessive zig-zagging | | ||

+ | | ConjugateGradient | CnjgGrd | 1st | 20 | often better than gradient descent | | ||

+ | | BFGS | BGFS | 1st | 40 | A quasi-Newton method which cleverly extrapolates "implied" second derivative from prev turns. | | ||

+ | | LBFGS | LBFGS | 1st | 29 | Same, but with limited memory | | ||

+ | | Newton | Nwtn | 2nd | 31 | Newton-Raphson requires first and second derivatives. | | ||

+ | | NewtonTrustRegion | NwTst | 2nd | 50 | Like Newton-Raphson, but limits step size. | | ||

+ | |||

+ | |||

+ | * The time estimates are for the sample polynomial function. Optimizers will be faster or slower for other problems. The timings are just to give an idea of orders of magnitude. | ||

+ | |||

+ | * SAMIN (and the univariate optimizers) requires box constraints. The others do not. | ||

+ | |||

+ | * All optimizers are using their defaults, a `rand(2)` starting point, no symbolic user-provided gradients, and no bounds (except SAMIN (-100 to +100)). | ||

+ | |||

+ | #### Showing Progress | ||

+ | |||

+ | * use `show_trace` as an option. | ||

+ | |||

+ | |||

+ | |||

+ | ### Providing Derivatives (Gradients and Hessians) To Optimizers | ||

+ | |||

+ | You may never need to provide them. `optimize()` will use numeric differentiation if the optimizer determines it needs any. They tend to work quite well and are worry-free. The drawback is that they require extra function evaluations. If function evaluation is costly, here is what you can do. | ||

+ | |||

+ | (In most of the examples, `z2` is a starting value which you can set to `zeros(2)` or `rand(2)`.) | ||

+ | `optimize()` allows as arguments any of the following: | ||

+ | |||

+ | 1. Direct provision of functions: | ||

+ | |||

+ | 1. a plain function, `optimize( myfun, z2 )`. | ||

+ | |||

+ | 2. a plain function with user-provided symbolic gradient, `optimize( myfun, myfun_gradient, z2 )`. | ||

+ | |||

+ | 3. a plain function with user-provided symbolic gradient and Hessian, `optimize( myfun, myfun_gradient, myfun_hessian, z2 )`. | ||

+ | |||

+ | 2. Indirect provision via a packaged object: | ||

+ | |||

+ | 1. a `OnceDifferentiated` object, which holds the function and the gradient. It can be created by automatic differentiation or by passing a user f into OnceDifferentiated. | ||

+ | |||

+ | 2. a `TwiceDifferentiated` object, which holds the function, its gradient, and its Hessian. It can be created by automatic differentiation or by passing a user f and g into TwiceDifferentiated. | ||

+ | |||

+ | Again, `optimize` will use differencing for derivatves by default, so providing derivatives is always optional. Numeric differencing is recommended when function evaluation itself is quick. | ||

+ | |||

+ | If you do need exact derivatives, perhaps because they occasionally can be more accurate, perhaps because calculation of the function and thus differences is expensive, you have two choices: | ||

+ | |||

+ | 1. You can code the symbolic derivatives into Julia. | ||

+ | |||

+ | 2. You can use automatic differentiation, such as ` optimize( TwiceDifferentiable(f, zeros(n); autodiff = :forward), zeros(n), BGFS(), Optim.Options() ) `. However, automatic differentiation works by overloading Julia's elementary math, which can also slow down performance. | ||

+ | |||

+ | |||

+ | * WARNING *There are some common gotchas when coding your own gradients and Hessians. If you want to provide them, please read the appendix section [[#providing_derivatives_to_optimizers]] first.* | ||

+ | |||

+ | |||

+ | ### Box Constraints | ||

+ | |||

+ | All optimizers in Optim allow box constraints. See the [[#optimization_with_box_constraints|example below]]. | ||

+ | |||

+ | |||

+ | ### Input Options | ||

+ | |||

+ | * Setting Termination Conditions: `x_tol= 1e-32`, `f_tol=1e-32`, `g_tol=1e-8`, `f_calls_limit=0`, `g_calls_limit=0`, `h_calls_limit=0`, `iterations=1_000` | ||

+ | |||

+ | * Setting Verbose Printouts and Intervention : `store_trace=false`, `show_trace=false`, `extended_trace=false`, `show_every=0`, `callback=function`, `time_limit=NaN`. | ||

+ | |||

+ | * Options can be given as `optimize(..., Optim.Options(g_tol=1e-12))` or `optimize(..., g_tol=1e-12)`. | ||

+ | |||

+ | * Oddly, there is no input option to change from minimization to maximization | ||

+ | |||

+ | |||

+ | ### Output Overview | ||

+ | |||

+ | #### Both Univariate and Multivariate Results | ||

+ | |||

+ | ^ **Call** ^ **Sample** ^ **Notes** ^ | ||

+ | | `summary(res)` | "SAMIN" | method, usually as input | | ||

+ | | `Optim.minimizer(res)` | [ 2.0, 3.0 ] | the optimal value | | ||

+ | | `Optim.minimum(res)` | 3.534e-13 | function value at optimum | | ||

+ | | `Optim.iterations(res)` | 3451 | number of iterations | | ||

+ | | `Optim.iteration_limit_reached(res)` | false | obvious | | ||

+ | | `Optim.trace(res)` | n/a | only retained if `store_trace=true` in optimize | | ||

+ | | `Optim.x_trace(res)` | n/a | only retained if `store_trace=true` in optimize | | ||

+ | | `Optim.f_trace(res)` | n/a | only retained if `store_trace=true` in optimize | | ||

+ | | `Optim.f_calls(res)` | 3451 | number of function calls | | ||

+ | | `Optim.converged(res)` | false | result | | ||

+ | |||

+ | #### Only Univariate Results | ||

+ | |||

+ | ^ **Call** ^ **Sample** ^ **Notes** ^ | ||

+ | | `Optim.lower_bound(res)` | -100 | lower bound, usually as input | | ||

+ | | `Optim.upper_bound(res)` | 100 | upper bound, usually as input | | ||

+ | | `Optim.x_lower_trace(res)` | | only retained if `store_trace=true` in `optimize` | | ||

+ | | `Optim.x_upper_trace(res)` | | only retained if `store_trace=true` in `optimize` | | ||

+ | | `Optim.rel_tol(res)` | | only retained if `store_trace=true` in `optimize` | | ||

+ | | `Optim.abs_tol(res)` | | only retained if `store_trace=true` in `optimize` | | ||

+ | |||

+ | |||

+ | #### Only Multivariate Results | ||

+ | |||

+ | ^ **Call** ^ **Sample** ^ **Notes** ^ | ||

+ | | `Optim.g_norm_trace(res)` | n/a | only retained if `store_trace=true` in optimize | | ||

+ | | `Optim.g_calls(res)` | 0 | only retained if `store_trace=true` in `optimize` | | ||

+ | | `Optim.x_converged(res)` | false | only retained if `store_trace=true` in `optimize` | | ||

+ | | `Optim.f_converged(res)` | false | only retained if `store_trace=true` in `optimize` | | ||

+ | | `Optim.g_converged(res)` | false | only retained if `store_trace=true` in `optimize` | | ||

+ | | `Optim.initial_state(res)` | [0.0,0.0] | user input | | ||

+ | |||

+ | |||

+ | |||

+ | |||

+ | ## Gradient-Free Optimizers | ||

+ | |||

+ | ### Nelder-Mead Simplex (Amoeba) | ||

+ | |||

+ | Nelder-Mead Simplex is (sort of) the multi-dimensional equivalent of the golden-section search in one dimension. This algorithm is *not* an equivalent for the golden-section search, which brackets a minimum and is guaranteed to find it. Instead, there are beautiful and intuitive illustrations at [Wikipedia](https://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method). The best description is not that Nelder-Mead looks less like an amoeba contractor, and more like an amoeba crawler. | ||

+ | |||

+ | ```juliarepl | ||

+ | julia> using Optim | ||

+ | |||

+ | julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2; | ||

+ | |||

+ | julia> optimize( myfun, [0.0, 0.0], NelderMead(), Optim.Options(iterations=100_000) ) | ||

+ | Results of Optimization Algorithm | ||

+ | * Algorithm: Nelder-Mead | ||

+ | * Starting Point: [0.0,0.0] | ||

+ | * Minimizer: [2.0000267077758536,3.0000220787571954] | ||

+ | * Minimum: 1.200777e-09 | ||

+ | * Iterations: 49 | ||

+ | * Convergence: true | ||

+ | * √(Σ(yᵢ-ȳ)²)/n < 1.0e-08: true | ||

+ | * Reached Maximum Number of Iterations: false | ||

+ | * Objective Calls: 99 | ||

+ | |||

+ | ``` | ||

+ | |||

+ | |||

+ | |||

+ | |||

+ | |||

+ | ### Simulated Annealing | ||

+ | |||

+ | Simulated Annealing (Metropolis-Hastings) seems to be a form of Monte-Carlo Markov Chain simulation, in which random probabilistic guesses are salted (and more after a while when the lowest value(s) so far seems pretty good already). | ||

+ | |||

+ | WARNING This requires the master and not the release, so use `Pkg.checkout("Optim")`. Is there a version without bounds?? | ||

+ | |||

+ | ```juliarepl | ||

+ | julia> using Optim | ||

+ | |||

+ | julia> using Random; Random.seed!(0); | ||

+ | |||

+ | julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2; | ||

+ | |||

+ | julia> optimize( myfun, fill(-100.0,2), fill(+100.0,2), [0.0,0.0], SAMIN(), Optim.Options(iterations=100_000) ) | ||

+ | ================================================================================ | ||

+ | SAMIN results | ||

+ | ==> Normal convergence <== | ||

+ | total number of objective function evaluations: 22101 | ||

+ | |||

+ | Obj. value: 0.0000000000 | ||

+ | |||

+ | parameter search width | ||

+ | 2.00000 0.00000 | ||

+ | 3.00000 0.00000 | ||

+ | ================================================================================ | ||

+ | |||

+ | Results of Optimization Algorithm | ||

+ | * Algorithm: SAMIN | ||

+ | * Starting Point: [0.0,0.0] | ||

+ | * Minimizer: [2.000000006088701,2.999999998587848] | ||

+ | * Minimum: 3.906645e-17 | ||

+ | * Iterations: 22101 | ||

+ | * Convergence: false | ||

+ | * |x - x'| ≤ 0.0e+00: false | ||

+ | |x - x'| = NaN | ||

+ | * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false | ||

+ | |f(x) - f(x')| = NaN |f(x)| | ||

+ | * |g(x)| ≤ 0.0e+00: false | ||

+ | |g(x)| = NaN | ||

+ | * Stopped by an increasing objective: false | ||

+ | * Reached Maximum Number of Iterations: false | ||

+ | * Objective Calls: 22101 | ||

+ | * Gradient Calls: 0 | ||

+ | |||

+ | ``` | ||

+ | |||

+ | |||

+ | |||

+ | |||

+ | ### Particle Swarm | ||

+ | |||

+ | Particle Swarm tries to switch around search heuristics, also probabilistically. | ||

+ | |||

+ | ```juliarepl | ||

+ | julia> using Optim | ||

+ | |||

+ | julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2; | ||

+ | |||

+ | julia> using Random; Random.seed!(0); | ||

+ | |||

+ | julia> optimize( myfun, [0.0,0.0], ParticleSwarm(), Optim.Options(iterations=100_000)) | ||

+ | Results of Optimization Algorithm | ||

+ | * Algorithm: Particle Swarm | ||

+ | * Starting Point: [0.0,0.0] | ||

+ | * Minimizer: [2.0,3.0] | ||

+ | * Minimum: 0.000000e+00 | ||

+ | * Iterations: 100000 | ||

+ | * Convergence: false | ||

+ | * |x - x'| ≤ 0.0e+00: false | ||

+ | |x - x'| = NaN | ||

+ | * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false | ||

+ | |f(x) - f(x')| = NaN |f(x)| | ||

+ | * |g(x)| ≤ 1.0e-08: false | ||

+ | |g(x)| = NaN | ||

+ | * Stopped by an increasing objective: false | ||

+ | * Reached Maximum Number of Iterations: true | ||

+ | * Objective Calls: 400000 | ||

+ | * Gradient Calls: 0 | ||

+ | |||

+ | ``` | ||

+ | |||

+ | |||

+ | |||

+ | ## Gradient-Based Hessian-Free Optimizers | ||

+ | |||

+ | |||

+ | ### Steepest GradientDescent | ||

+ | |||

+ | ```juliarepl | ||

+ | julia> using Optim | ||

+ | |||

+ | julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2; | ||

+ | |||

+ | julia> optimize( myfun, [0.0,0.0], GradientDescent()) | ||

+ | Results of Optimization Algorithm | ||

+ | * Algorithm: Gradient Descent | ||

+ | * Starting Point: [0.0,0.0] | ||

+ | * Minimizer: [2.0,3.0] | ||

+ | * Minimum: 0.000000e+00 | ||

+ | * Iterations: 6 | ||

+ | * Convergence: true | ||

+ | * |x - x'| ≤ 0.0e+00: false | ||

+ | |x - x'| = 1.52e-07 | ||

+ | * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false | ||

+ | |f(x) - f(x')| = Inf |f(x)| | ||

+ | * |g(x)| ≤ 1.0e-08: true | ||

+ | |g(x)| = 2.22e-16 | ||

+ | * Stopped by an increasing objective: false | ||

+ | * Reached Maximum Number of Iterations: false | ||

+ | * Objective Calls: 19 | ||

+ | * Gradient Calls: 19 | ||

+ | ``` | ||

+ | |||

+ | |||

+ | |||

+ | ### Conjugate Gradient Descent | ||

+ | |||

+ | Usually avoids excessive zig-zagging of steepest gradient descent: | ||

+ | |||

+ | ```juliarepl | ||

+ | julia> using Optim | ||

+ | |||

+ | julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2; | ||

+ | |||

+ | julia> optimize( myfun, [0.0,0.0], ConjugateGradient()) | ||

+ | Results of Optimization Algorithm | ||

+ | * Algorithm: Conjugate Gradient | ||

+ | * Starting Point: [0.0,0.0] | ||

+ | * Minimizer: [2.000000000000004,3.0000000000000004] | ||

+ | * Minimum: 1.617165e-29 | ||

+ | * Iterations: 7 | ||

+ | * Convergence: true | ||

+ | * |x - x'| ≤ 0.0e+00: false | ||

+ | |x - x'| = 1.68e-07 | ||

+ | * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false | ||

+ | |f(x) - f(x')| = 1.77e+15 |f(x)| | ||

+ | * |g(x)| ≤ 1.0e-08: true | ||

+ | |g(x)| = 7.77e-15 | ||

+ | * Stopped by an increasing objective: false | ||

+ | * Reached Maximum Number of Iterations: false | ||

+ | * Objective Calls: 15 | ||

+ | * Gradient Calls: 9 | ||

+ | ``` | ||

+ | |||

+ | |||

+ | ### BFGS | ||

+ | |||

+ | The classic quasi-Newton workhorse, which internally approximates the second derivative with how the gradients have changed across iterations. | ||

+ | |||

+ | ```juliarepl | ||

+ | julia> using Optim | ||

+ | |||

+ | julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2; | ||

+ | |||

+ | julia> optimize( myfun, [0.0,0.0], BFGS()) | ||

+ | Results of Optimization Algorithm | ||

+ | * Algorithm: BFGS | ||

+ | * Starting Point: [0.0,0.0] | ||

+ | * Minimizer: [2.000000000000021,3.0000000000000027] | ||

+ | * Minimum: 4.427482e-28 | ||

+ | * Iterations: 7 | ||

+ | * Convergence: true | ||

+ | * |x - x'| ≤ 0.0e+00: false | ||

+ | |x - x'| = 1.75e-07 | ||

+ | * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false | ||

+ | |f(x) - f(x')| = 7.04e+13 |f(x)| | ||

+ | * |g(x)| ≤ 1.0e-08: true | ||

+ | |g(x)| = 4.15e-14 | ||

+ | * Stopped by an increasing objective: false | ||

+ | * Reached Maximum Number of Iterations: false | ||

+ | * Objective Calls: 21 | ||

+ | * Gradient Calls: 21 | ||

+ | ``` | ||

+ | |||

+ | The limited-memory version is nearly as good, requiring just one extra call. | ||

+ | |||

+ | ```juliarepl | ||

+ | julia> using Optim | ||

+ | |||

+ | julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2; | ||

+ | |||

+ | julia> optimize( myfun, [0.0,0.0], LBFGS()) | ||

+ | Results of Optimization Algorithm | ||

+ | * Algorithm: L-BFGS | ||

+ | * Starting Point: [0.0,0.0] | ||

+ | * Minimizer: [1.999999999999605,2.9999999999999294] | ||

+ | * Minimum: 1.610245e-25 | ||

+ | * Iterations: 7 | ||

+ | * Convergence: true | ||

+ | * |x - x'| ≤ 0.0e+00: false | ||

+ | |x - x'| = 4.53e-07 | ||

+ | * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false | ||

+ | |f(x) - f(x')| = 1.32e+12 |f(x)| | ||

+ | * |g(x)| ≤ 1.0e-08: true | ||

+ | |g(x)| = 7.90e-13 | ||

+ | * Stopped by an increasing objective: false | ||

+ | * Reached Maximum Number of Iterations: false | ||

+ | * Objective Calls: 22 | ||

+ | * Gradient Calls: 22 | ||

+ | ``` | ||

+ | |||

+ | |||

+ | |||

+ | ## Hessian-Based Optimizers | ||

+ | |||

+ | ### Newton-Raphson | ||

+ | |||

+ | ```juliarepl | ||

+ | julia> using Optim | ||

+ | |||

+ | julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2; | ||

+ | |||

+ | julia> optimize( myfun, [0.0,0.0], Newton()) | ||

+ | Results of Optimization Algorithm | ||

+ | * Algorithm: Newton's Method | ||

+ | * Starting Point: [0.0,0.0] | ||

+ | * Minimizer: [2.0000000000006297,3.0000000000166733] | ||

+ | * Minimum: 2.783965e-22 | ||

+ | * Iterations: 5 | ||

+ | * Convergence: true | ||

+ | * |x - x'| ≤ 0.0e+00: false | ||

+ | |x - x'| = 6.05e-04 | ||

+ | * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false | ||

+ | |f(x) - f(x')| = 1.32e+15 |f(x)| | ||

+ | * |g(x)| ≤ 1.0e-08: true | ||

+ | |g(x)| = 3.33e-11 | ||

+ | * Stopped by an increasing objective: false | ||

+ | * Reached Maximum Number of Iterations: false | ||

+ | * Objective Calls: 15 | ||

+ | * Gradient Calls: 15 | ||

+ | * Hessian Calls: 5 | ||

+ | ``` | ||

+ | |||

+ | |||

+ | Trust regions limit the step size to something sensible (i.e., so that Newton does not step out of bounds so easily). | ||

+ | |||

+ | ```juliarepl | ||

+ | julia> using Optim | ||

+ | |||

+ | julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2; | ||

+ | |||

+ | julia> optimize( myfun, [0.0,0.0], NewtonTrustRegion()) | ||

+ | Results of Optimization Algorithm | ||

+ | * Algorithm: Newton's Method (Trust Region) | ||

+ | * Starting Point: [0.0,0.0] | ||

+ | * Minimizer: [2.0,3.0] | ||

+ | * Minimum: 0.000000e+00 | ||

+ | * Iterations: 8 | ||

+ | * Convergence: true | ||

+ | * |x - x'| ≤ 0.0e+00: false | ||

+ | |x - x'| = 8.53e-09 | ||

+ | * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false | ||

+ | |f(x) - f(x')| = Inf |f(x)| | ||

+ | * |g(x)| ≤ 1.0e-08: true | ||

+ | |g(x)| = 2.22e-16 | ||

+ | * Stopped by an increasing objective: false | ||

+ | * Reached Maximum Number of Iterations: false | ||

+ | * Objective Calls: 9 | ||

+ | * Gradient Calls: 9 | ||

+ | * Hessian Calls: 7 | ||

+ | ``` | ||

+ | |||

+ | |||

+ | |||

+ | |||

+ | |||

+ | ## Optimization With Box Constraints | ||

+ | |||

+ | The call arguments of `optimize` change when box constraints are included. Instead of passing a function designating the optimizer, such as `NelderMead()`), one passes `Fminbox{NelderMead}()` (The first called a function which returns a complex object which is then passed by the user into optimize; the second passes along the name of the function as a type.) | ||

+ | |||

+ | NelderMead() ⟶ Fminbox{NelderMead}() | ||

+ | |||

+ | The second and third arguments in optimize have to be the lower and upper bounds. The `Optim.Options()` argument seems to be inconsistent with `Fminbox` (and may screw up the entire optimization with or without an error)! The initial value has to be inside the box. | ||

+ | |||

+ | ```juliarepl | ||

+ | julia> using Optim | ||

+ | |||

+ | julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2; | ||

+ | |||

+ | julia> initial_x= [ 0.0, 0.0 ]; | ||

+ | |||

+ | julia> optimize(myfun, initial_x, NelderMead() ) ## without the box constraint | ||

+ | Results of Optimization Algorithm | ||

+ | * Algorithm: Nelder-Mead | ||

+ | * Starting Point: [0.0,0.0] | ||

+ | * Minimizer: [2.0000267077758536,3.0000220787571954] | ||

+ | * Minimum: 1.200777e-09 | ||

+ | * Iterations: 49 | ||

+ | * Convergence: true | ||

+ | * √(Σ(yᵢ-ȳ)²)/n < 1.0e-08: true | ||

+ | * Reached Maximum Number of Iterations: false | ||

+ | * Objective Calls: 98 | ||

+ | |||

+ | julia> lower= [ -999.9, -999.9 ]; upper= [ 999, 2.9 ]; | ||

+ | |||

+ | julia> results = optimize(myfun, lower, upper, initial_x) ## box constraint form: x[2] is maxed at 2.9 | ||

+ | ┌ Warning: Linesearch failed, using alpha = 2.570171260006581 and exiting optimization. | ||

+ | │ The linesearch exited with message: | ||

+ | │ Linesearch failed to converge, reached maximum iterations 50. | ||

+ | └ @ Optim ~/.julia/packages/Optim/ULNLZ/src/utilities/perform_linesearch.jl:47 | ||

+ | Results of Optimization Algorithm | ||

+ | * Algorithm: Fminbox with L-BFGS | ||

+ | * Starting Point: [0.0,0.0] | ||

+ | * Minimizer: [2.0000000000004565,2.8999999999999995] | ||

+ | * Minimum: 1.000000e-02 | ||

+ | * Iterations: 8 | ||

+ | * Convergence: true | ||

+ | * |x - x'| ≤ 0.0e+00: true | ||

+ | |x - x'| = 0.00e+00 | ||

+ | * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: true | ||

+ | |f(x) - f(x')| = 0.00e+00 |f(x)| | ||

+ | * |g(x)| ≤ 1.0e-08: false | ||

+ | |g(x)| = 2.00e-01 | ||

+ | * Stopped by an increasing objective: true | ||

+ | * Reached Maximum Number of Iterations: false | ||

+ | * Objective Calls: 1425 | ||

+ | * Gradient Calls: 1425 | ||

+ | |||

+ | ``` | ||

+ | |||

+ | |||

+ | ```juliarepl | ||

+ | |||

+ | julia> using Optim | ||

+ | |||

+ | julia> f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2 | ||

+ | f (generic function with 1 method) | ||

+ | |||

+ | julia> lower = [1.25, -2.1] | ||

+ | 2-element Array{Float64,1}: | ||

+ | 1.25 | ||

+ | -2.1 | ||

+ | |||

+ | julia> upper = [Inf, Inf] | ||

+ | 2-element Array{Float64,1}: | ||

+ | Inf | ||

+ | Inf | ||

+ | |||

+ | julia> initial_x = [2.0, 2.0] | ||

+ | 2-element Array{Float64,1}: | ||

+ | 2.0 | ||

+ | 2.0 | ||

+ | |||

+ | julia> inner_optimizer = GradientDescent() | ||

+ | GradientDescent{LineSearches.InitialPrevious{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Nothing,getfield(Optim, Symbol("##12#14"))}(LineSearches.InitialPrevious{Float64} | ||

+ | alpha: Float64 1.0 | ||

+ | alphamin: Float64 0.0 | ||

+ | alphamax: Float64 Inf | ||

+ | , LineSearches.HagerZhang{Float64,Base.RefValue{Bool}} | ||

+ | delta: Float64 0.1 | ||

+ | sigma: Float64 0.9 | ||

+ | alphamax: Float64 Inf | ||

+ | rho: Float64 5.0 | ||

+ | epsilon: Float64 1.0e-6 | ||

+ | gamma: Float64 0.66 | ||

+ | linesearchmax: Int64 50 | ||

+ | psi3: Float64 0.1 | ||

+ | display: Int64 0 | ||

+ | mayterminate: Base.RefValue{Bool} | ||

+ | , nothing, getfield(Optim, Symbol("##12#14"))(), Flat()) | ||

+ | |||

+ | julia> results = optimize(f, lower, upper, initial_x, Fminbox(inner_optimizer)) | ||

+ | Results of Optimization Algorithm | ||

+ | * Algorithm: Fminbox with Gradient Descent | ||

+ | * Starting Point: [2.0,2.0] | ||

+ | * Minimizer: [1.2500000000000002,1.5626480963230658] | ||

+ | * Minimum: 6.250219e-02 | ||

+ | * Iterations: 11 | ||

+ | * Convergence: true | ||

+ | * |x - x'| ≤ 0.0e+00: false | ||

+ | |x - x'| = 2.22e-16 | ||

+ | * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: true | ||

+ | |f(x) - f(x')| = 0.00e+00 |f(x)| | ||

+ | * |g(x)| ≤ 1.0e-08: false | ||

+ | |g(x)| = 4.26e-01 | ||

+ | * Stopped by an increasing objective: false | ||

+ | * Reached Maximum Number of Iterations: false | ||

+ | * Objective Calls: 818877 | ||

+ | * Gradient Calls: 818877 | ||

+ | |||

+ | ``` | ||

+ | |||

+ | |||

+ | * WARNING In 0.6.2, with 0.14.1, adding `Optim.Options()` seems to be incompatible with `Fminbox`. `optimize(myfun, [ 50.0, 50.0 ], fill(6.0, 2), fill(100.0, 2), Fminbox{NelderMead}(), Optim.Options(iterations=200000))` fails. | ||

+ | |||

+ | |||

+ | |||

+ | ## Passing Along Non-Optimization Parameters to User Function | ||

+ | |||

+ | refer to [https://github.com/JuliaNLSolvers/Optim.jl/blob/master/docs/src/algo/autodiff.md]. | ||

+ | |||

+ | You can either use global variables, or more elegantly a version of the following (here, two-plus-one parameters): | ||

+ | |||

+ | ```juliarepl | ||

+ | julia> using Optim | ||

+ | |||

+ | julia> msumdeltas(x::Vector{Float64}, p::Vector{Float64}, m::Vector{Float64})::Float64 = sum( m.*(x .- p).^2 ); | ||

+ | |||

+ | julia> optimize( xf->msumdeltas( xf, [ 12.0, 14.0 ], [ 1.0, 1.0 ] ), [ 0.0, 0.0 ] ) | ||

+ | Results of Optimization Algorithm | ||

+ | * Algorithm: Nelder-Mead | ||

+ | * Starting Point: [0.0,0.0] | ||

+ | * Minimizer: [12.000018215395523,14.000051397940497] | ||

+ | * Minimum: 2.973549e-09 | ||

+ | * Iterations: 62 | ||

+ | * Convergence: true | ||

+ | * √(Σ(yᵢ-ȳ)²)/n < 1.0e-08: true | ||

+ | * Reached Maximum Number of Iterations: false | ||

+ | * Objective Calls: 120 | ||

+ | |||

+ | ``` | ||

+ | |||

+ | |||

+ | ## Univariate Function Optimization | ||

+ | |||

+ | There are two more optimizers for univariate cases, `Brent` and `GoldenSection`. To use them, omit the starting point and provide brackets. | ||

+ | |||

+ | ```juliafixme | ||

+ | julia> using Optim | ||

+ | |||

+ | julia> optimize( x->(3*(x-4)^2), -100, +100, Brent() ) | ||

+ | Results of Optimization Algorithm | ||

+ | * Algorithm: Brent's Method | ||

+ | * Search Interval: [-100.000000, 100.000000] | ||

+ | * Minimizer: 4.000000e+00 | ||

+ | * Minimum: 0.000000e+00 | ||

+ | * Iterations: 5 | ||

+ | * Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true | ||

+ | * Objective Function Calls: 6 | ||

+ | |||

+ | julia> optimize( x->(3*(x-4)^2), -100, +100, GoldenSection() ) | ||

+ | Results of Optimization Algorithm | ||

+ | * Algorithm: Golden Section Search | ||

+ | * Search Interval: [-100.000000, 100.000000] | ||

+ | * Minimizer: 4.000000e+00 | ||

+ | * Minimum: 6.668724e-16 | ||

+ | * Iterations: 44 | ||

+ | * Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true | ||

+ | * Objective Function Calls: 45 | ||

+ | |||

+ | ``` | ||

+ | |||

+ | |||

+ | |||

+ | |||

+ | ## More Function Examples and Comparisons | ||

+ | |||

+ | * There are many more tests and benchmark results in [[optim-results]]. | ||

+ | |||

+ | |||

+ | * There are useful [optimization tips and tricks](https://github.com/JuliaNLSolvers/Optim.jl/blob/master/docs/src/user/tipsandtricks.md). | ||

+ | |||

+ | |||

+ | |||

+ | |||

+ | |||

+ | |||

+ | # Backmatter | ||

+ | |||

+ | ## Useful Packages on Julia Repository | ||

+ | |||

+ | * [Optim](http://julianlsolvers.github.io/Optim.jl/latest/) is a pure Julia implementation, discussed in this chapter. | ||

+ | |||

+ | * [NLopt](https://nlopt.readthedocs.io/en/latest/) is a mixed other languages and Julia implementation | ||

+ | |||

+ | * [Blackbox](https://github.com/robertfeldt/BlackBoxOptim.jl) focuses on non-gradient global optimization, with support for parallelism. | ||

+ | |||

+ | |||

+ | ## Notes | ||

+ | |||

+ | ## References | ||

+ | |||

+ | - [JuliaOpt](http://www.juliaopt.org/) | ||

+ | |||

+ | - [Optim.jl](https://github.com/JuliaNLSolvers/Optim.jl) first package | ||

+ | |||

+ | - [Convex](https://github.com/JuliaOpt/Convex.jl) | ||

+ | |||

+ | - [NLopt](https://github.com/JuliaOpt/NLopt.jl) | ||

+ | |||

+ | - [Optimization Tips and Tricks](https://github.com/JuliaNLSolvers/Optim.jl/blob/master/docs/src/user/tipsandtricks.md) | ||

+ | |||

+ | - DataFitting.jl | ||

+ | |||

+ | - [tpapp/ContinuousTransformations.jl](https://github.com/tpapp/ContinuousTransformations.jl) handles transforms like -inf < log(y-1) < infinity. | ||

+ | |||

+ | - There is an interesting [Turing](https://github.com/yebai/Turing.jl/wiki/Get-started) package for modeling and inferences in stochastic environments. Alas, as of May 2018, it was non-functional. It's a work in progress. | ||

+ | |||

+ | |||

+ | |||

+ | # Providing Derivatives To Optimizers | ||

+ | |||

+ | The base example is still the two-dimensional quadratic polynomial with minimum `myfun([2,3])=0` with the following exact gradients and Hessian: | ||

+ | |||

+ | ```julia | ||

+ | myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2; | ||

+ | myfungradients(x)= [ 2*(1 + (x[2]-3)^2)*(x[1]-2 ), 2*(1 + (x[1]-2)^2)*(x[2]-3) ] | ||

+ | myfunhessian(x)= reshape( [ 2+2*(x[2]-3)^2, 4*(x[1]-2)*(x[2]-3), 4*(x[1]-2)*(x[2]-3) , 2+2*(x[2]-2)^2 ], 2, 2 ) | ||

+ | ``` | ||

+ | |||

+ | A complete copy-and-paste example can be found towards the rear of this section. | ||

+ | |||

+ | --- | ||

+ | |||

+ | Some optimization algorithms benefit or require gradient and Hessian information, i.e., first and second derivatives. Julia supports three ways for optimizers to obtain them: | ||

+ | |||

+ | 1. working out and providing symbolic derivatives in Julia function form; | ||

+ | |||

+ | 2. letting a built-in exact differentiator take care of it (explained below); | ||

+ | |||

+ | 3. using a built-in numerical differencing method. | ||

+ | |||

+ | The derivatives can be passed into a `<Some>`Differentiable object, which in turn can be passed into the optimizing functions. The optimizers know how to use these objects (picking off the derivatives or ignoring them). If only the optimizing function itself is passed, and an optimizer requires a gradient, optimize falls back on numerical differencing. If optimizing function and gradients and/or Hessians are passed, optimize can use them. | ||

+ | |||

+ | |||

+ | |||

+ | ## User-Provided Symbolic Derivatives | ||

+ | |||

+ | The syntax by which users can pass symbolic derivatives is unusual. The gradient and hessian functions that the user passes in require some storage as their first argument, which becomes a cache that is modified upon each call. This is for speed. The example here continues working with the symbolic `myfun()`, `myfungradients()`, and `myfunhessian()` for clarity. For even faster speed, these functions should not be called but coded directly into `f()`, `g!()`, and `h!()`. | ||

+ | |||

+ | WARNING: Pay careful attention to how `g!()` and `h!()` are defined, and read the description at the end of the example. | ||

+ | |||

+ | |||

+ | ```juliarepl | ||

+ | |||

+ | julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2; | ||

+ | |||

+ | julia> myfungradients(x)= [ 2*(1 + (x[2]-3)^2)*(x[1]-2 ), 2*(1 + (x[1]-2)^2)*(x[2]-3) ]; | ||

+ | |||

+ | julia> myfunhessian(x)= reshape( [ 2+2*(x[2]-3)^2, 4*(x[1]-2)*(x[2]-3), 4*(x[1]-2)*(x[2]-3) , 2+2*(x[2]-2)^2 ], 2, 2 ); | ||

+ | |||

+ | julia> function f(x) myfun(x) end;#function | ||

+ | |||

+ | julia> function g!(G,x) Gcp=myfungradients(x); G[1]=Gcp[1]; G[2]=Gcp[2]; G; end;#function ## must not use G=Gin!!! | ||

+ | |||

+ | julia> function fg!(G,x) g!(G,x); f(x) end;#function (both! can be used to shorten calculations) | ||

+ | |||

+ | julia> function h!(H,x) Hcp=myfunhessian(x); for i=1:length(x),j=1:length(x) ; H[i,j]= Hcp[i,j]; end; end;#function | ||

+ | |||

+ | julia> z2= zeros(2); | ||

+ | |||

+ | julia> using Optim ## needed | ||

+ | |||

+ | julia> optimize( f, g!, z2, BFGS()) | ||

+ | Results of Optimization Algorithm | ||

+ | * Algorithm: BFGS | ||

+ | * Starting Point: [0.0,0.0] | ||

+ | * Minimizer: [2.000000000000021,3.0000000000000027] | ||

+ | * Minimum: 4.427482e-28 | ||

+ | * Iterations: 7 | ||

+ | * Convergence: true | ||

+ | * |x - x'| ≤ 0.0e+00: false | ||

+ | |x - x'| = 1.75e-07 | ||

+ | * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false | ||

+ | |f(x) - f(x')| = 7.04e+13 |f(x)| | ||

+ | * |g(x)| ≤ 1.0e-08: true | ||

+ | |g(x)| = 4.17e-14 | ||

+ | * Stopped by an increasing objective: false | ||

+ | * Reached Maximum Number of Iterations: false | ||

+ | * Objective Calls: 21 | ||

+ | * Gradient Calls: 21 | ||

+ | |||

+ | ``` | ||

+ | |||

+ | WARNING It is very tempting to code the more elegant | ||

+ | |||

+ | ```juliarepl | ||

+ | julia> function g!(G,x) G=myfungradients(x) end;#function | ||

+ | ``` | ||

+ | |||

+ | However, this is a bug. The reason is that the function then replaces the passed-in `G`, which means that the global G (outside `g!()`) never gets to see the changes. Instead, `G` itself must be modified. This can be done with | ||

+ | |||

+ | ```juliarepl | ||

+ | julia> function g!(G,x) Gcp=myfungradients(x); G[1]=Gcp[1]; G[2]=Gcp[2]; G; end;#function | ||

+ | |||

+ | julia> function g!(G,x); G .= myfungradients(x) end;#function ## also works | ||

+ | ``` | ||

+ | |||

+ | |||

+ | |||

+ | ## Built-In Automatic Exact Differentiation | ||

+ | |||

+ | ### ForwardDiff | ||

+ | |||

+ | [Automatic differentiation](https://github.com/JuliaNLSolvers/Optim.jl/blob/master/docs/src/algo/autodiff.md) is built around the `ForwardDiff.jl` package, which is pulled in by `Optim.jl`. ForwardDiff is pickier about the way the user function is defined. In particular, although it understands `Vector`, it does not understand `Vector{Float64}`. (This can mess with dispatch precedence.) Forwarddiff works by redefining *every* elementary math operation and function in Julia. The math operations still returns a numeric value (as always), but they then also keep internally a symbolic representation, so that ForwardDiff can analyze the function progress and apply the chainrule piece by piece by piece. | ||

+ | |||

+ | ```juliarepl | ||

+ | julia> using ForwardDiff | ||

+ | |||

+ | julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2; | ||

+ | |||

+ | julia> ForwardDiff.gradient( myfun, [ 2.0, 3.0 ] ) ## at the optimum, it is zero in both directions | ||

+ | 2-element Array{Float64,1}: | ||

+ | 0.0 | ||

+ | 0.0 | ||

+ | |||

+ | ``` | ||

+ | |||

+ | WARNING Do not require the input arguments to be typed (Float) on the user function that is to be optimized. This messes up the dispatch. | ||

+ | |||

+ | |||

+ | ### Kinks in ForwardDiff | ||

+ | |||

+ | Kinks are allowed, but ForwardDiff ignores looking towards its rear, and just looks towards its front. Thus, it gives the mathematically dubious result: | ||

+ | |||

+ | ```juliarepl | ||

+ | julia> using ForwardDiff | ||

+ | |||

+ | julia> sumabsf(x)= sum(abs.(x)) | ||

+ | sumabsf (generic function with 1 method) | ||

+ | |||

+ | julia> ForwardDiff.gradient(sumabsf, [ 0.0 ] ) ## warning --- mathematically strange | ||

+ | 1-element Array{Float64,1}: | ||

+ | 1.0 | ||

+ | |||

+ | julia> ForwardDiff.gradient(sumabsf, [ -0.0 ] ) ## yes, Julia has a negative 0.0 | ||

+ | 1-element Array{Float64,1}: | ||

+ | -1.0 | ||

+ | ``` | ||

+ | |||

+ | |||

+ | ### ForwardDiff Interface to Optim: <Use>Differentiable | ||

+ | |||

+ | Optim provides two convenience functions that contain the useful differentials (and in the same form as described above). Although the end user rarely uses them to obtain gradients and Hessians, we can show how the optimizers do it. | ||

+ | |||

+ | ```juliarepl | ||

+ | julia> using Optim | ||

+ | |||

+ | julia> myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2; ## could be x::Vector, but not x::Vector{Float64} | ||

+ | |||

+ | julia> od = OnceDifferentiable(myfun, [NaN,NaN]; autodiff = :forward); ## [NaN,NaN] is needed to know dimensionality of myfun | ||

+ | |||

+ | julia> myfun( [2.0,3.0] ) == od.f( [2.0, 3.0] ) ## confirming that it copied the function | ||

+ | true | ||

+ | |||

+ | julia> od.df( [NaN,NaN], [2.0, 3.0] ) ## first argument is ignored. | ||

+ | 2-element Array{Float64,1}: | ||

+ | 0.0 | ||

+ | 0.0 | ||

+ | |||

+ | julia> td = TwiceDifferentiable(myfun, [ NaN, NaN ]; autodiff = :forward); ## works analogously | ||

+ | |||

+ | ``` | ||

+ | |||

+ | * To force Optim not to use differentials (even numeric ones), define `nd = NonDifferentiable(fun, z2)`. | ||

+ | |||

+ | * The `od` and `td` container objects contain both the function and its derivatives, which can be passed to optimization functions in lieu of the function itself. | ||

+ | |||

+ | * Again, there are type constraints on the user function. `myfun(x::Vector{Float64})= ...` does not work, because Float64 is too specific and steals the dispatch from the `ForwardDiff.jl` differentiator. | ||

+ | |||

+ | |||

+ | |||

+ | ## Built-In Numeric Differentiation | ||

+ | |||

+ | Simply do not provide the gradient and/or Hessian to `optimize()`, and Optim.jl will by itself fall back on `DiffEqDiffTools.jl`. | ||

+ | |||

+ | |||

+ | |||

+ | |||

+ | |||

+ | ## The Complete Copy-Paste Inputs | ||

+ | |||

+ | |||

+ | ```julia | ||

+ | myfun(x)= (x[1]-2)^2 + (x[2]-3)^2 + ((x[1]-2)*(x[2]-3))^2; | ||

+ | myfungradients(x)= [ 2*(1 + (x[2]-3)^2)*(x[1]-2 ), 2*(1 + (x[1]-2)^2)*(x[2]-3) ] | ||

+ | myfunhessian(x)= reshape( [ 2+2*(x[2]-3)^2, 4*(x[1]-2)*(x[2]-3), 4*(x[1]-2)*(x[2]-3) , 2+2*(x[2]-2)^2 ], 2, 2 ) | ||

+ | |||

+ | ## for speed, you should code the above directly into the below. | ||

+ | |||

+ | function f(x) myfun(x) end#function | ||

+ | function g!(G,x) G.=myfungradients(x) end#function ## must not use G=, but G.= to replace external G contents | ||

+ | function fg!(G,x) g!(G,x); f(x) end#function ## (both! can be used to shorten calculations) | ||

+ | function h!(H,x) H .= myfunhessian(x) end#function ## must not use H=, but H.= to replace external H contents | ||

+ | |||

+ | z2= zeros(2.0); | ||

+ | |||

+ | using Optim ## henceforth needed | ||

+ | |||

+ | ## show optimize use via function call | ||

+ | optimize( f, g!, z2, BFGS()) | ||

+ | |||

+ | |||

+ | ## show optimize use via *Differentiable objects | ||

+ | |||

+ | nd = NonDifferentiable( f, z2 ) | ||

+ | od = OnceDifferentiable( f, g!, z2 ) | ||

+ | td = TwiceDifferentiable( f, g!, h!, z2 ) | ||

+ | |||

+ | ## show how optimizer functions will internally pick off results | ||

+ | |||

+ | println( ( td.f( [2.0, 2.0] ), td.df( [NaN,NaN], [2.0, 2.0] ), td.h( [NaN NaN;NaN NaN], [2.0, 2.0] ) ) ) | ||

+ | println( ( od.f( [2.0, 2.0] ), od.df( [NaN,NaN], [2.0, 2.0] ) ) ) | ||

+ | |||

+ | ## show optimize use via object | ||

+ | |||

+ | optimize( td, z2, BFGS()) | ||

+ | optimize( od, z2, BFGS()) | ||

+ | |||

+ | ``` | ||

+ | |||

+ | WARNING do not forget the `.=` inside the `g!()` and `h!()` functions! | ||

optim.txt · Last modified: 2018/12/27 14:37 (external edit)