# julia

### Site Tools

multistats

snippet.juliarepl
julia> pkgchk.( [ "julia" => v"1.0.3", "DataFrames" => v"0.14.1", "GLM" => v"1.0.1", "Plots" => v"0.21.0", "RegressionTables" => v"0.2.0" ] );

# Multivariate Least-Squares Linear Regressions

This chapter demonstrates three methods to run linear least-squares regressions–sequential feeding, linear algebra, and GLM–and offers some benchmarks. However, for statistical analysis, you are almost always better off placing your data into a dataframe first, and then following the tips in the second part of this chapter. With data frames, you get easy variable names, better output, and good handling of missing observations.

For smooth localized fitting with splines, refer to Bivariate Relations and Loess.

The tips in this chapter uses an artificial set of numbers. The dependent variable is 1:10, the X variable is its squared and its cube. The constant '1' is the intercept.

snippet.juliarepl
julia> y=[1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 )	## this will be the main example in this chapter
10×3 Array{Int64,2}:
1    1     1
1    4     8
1    9    27
1   16    64
1   25   125
1   36   216
1   49   343
1   64   512
1   81   729
1  100  1000
• lm() means fit(LinearModel,...). glm() means fit(GeneralizedLinearModel,...).

## Ordinary Least-Squares (OLS) with Linear Algebra

### Coefficient Estimates

Although the statistical community is highly concerned with ill-conditioned numerical problems, for many practical applications, even the naive linear algebra method this is stable enough:

snippet.juliarepl
julia> y=[1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 );

julia> inv(X' * X) * X' * y      ## a less stable OLS regression
3-element Array{Float64,1}:
1.246997628700364
0.2019429358219224
–0.011642316224279095

For more accuracy, julia has the QR decomposition built-in. The backslash is syntactic sugar Q,R=qr(X); inv(factorize(R)) * Q.'y:

snippet.juliarepl
julia> y=[1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 );

julia> beta= X \ y
3-element Array{Float64,1}:
1.246997628700377
0.2019429358219229
–0.011642316224279031

QR is already very stable and the speed penalty is very small. It is well worth using it almost all the time. There is an even more accurate version, the “SVD Decomposition,” which offers good accuracy in even very ill-conditioned problems, but it has a serious speed penalty:

snippet.juliarepl
julia> using LinearAlgebra

julia> y= [1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 );

julia> pinv(X)*y
3-element Array{Float64,1}:
1.2469976287003885
0.20194293582192274
–0.011642316224279067

### Standard Errors of the Estimated Coefficients

To calculate the standard error of the coefficients

snippet.juliarepl
julia> using LinearAlgebra

julia> y= [1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 );

julia> beta= X \ y;

julia> resids= y - X*beta; sigmasq= sqrt( sum(resids.^2)/(size(X)[1]-size(X)[2]) )
0.2598203967093144

julia> sqrt.(diag(inv(X' * X)))*sigmasq   ## not very accurate
3-element Array{Float64,1}:
0.1792531929781652
0.015877907200886023
0.0015785864167600246

## Linear Regression Models with GLM.jl

The “generalized linear model” (GLM.jl) implements linear regressions. Its primary design uses other packages, primarily Distributions and Statsbase (which implements abstractions for statistical models (StatisticalModel), such as R^2, predict, fitted, residuals, coefnames, etc.).

The GLM.jl package is designed to work especially well with dataframes. This section offers only a small glimpse.

GLM also implements a number of other linear regressions, such as Probits, Negative Binomial Regressions, etc.

snippet.juliarepl
julia> using GLM, DataFrames

julia> y= [1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 );

julia> GLM.lm(X, y)				## plain OLS linear model, predicting y with X
LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}}:

Coefficients:
Estimate  Std.Error  t value Pr(>|t|)
x1        1.247   0.179253  6.95663   0.0002
x2     0.201943  0.0158779  12.7185    <1e–5
x3   –0.0116423 0.00157859 –7.37515   0.0002

julia> y= [1.0:10.0;]; mydf= DataFrame( y=vcat(y,999.9,missing) , x1= vcat(y.^2,missing,999.9) , x2= vcat(y.^3, 999.9, 999.9) );

julia> glm( @formula(y ~ x1+x2) , mydf , Normal() )	## force normal error presumption, changes Pr

Formula: y ~ 1 + x1 + x2

Coefficients:
Estimate  Std.Error  z value Pr(>|z|)
(Intercept)       1.247   0.179253  6.95663   <1e–11
x1             0.201943  0.0158779  12.7185   <1e–36
x2           –0.0116423 0.00157859 –7.37515   <1e–12
• There are many other linear models, where the errors are not assumed to be normally distributed. See the GLM documentation.
• For more general (non-normally distributed error) models, julia reports Z-significance rather than T-significance (which changes the Pr(>|z|) column, and keeps more information. This can also be forced with the Normal() option to glm, as shown above.

### Obtaining Coefficient Estimates

snippet.juliarepl
julia> using GLM

julia> y= [1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 ); mymodel= GLM.lm(X, y);

julia> coef(mymodel), stderror( mymodel )           ## often the most important results
([1.247, 0.201943, –0.0116423], [0.179253, 0.0158779, 0.00157859])

julia> confint( mymodel )   ## default: 95%
3×2 Array{Float64,2}:
0.823131    1.67086
0.164398    0.239488
–0.0153751  –0.00790955

julia> coeftable( mymodel )
Estimate  Std.Error  t value Pr(>|t|)
x1        1.247   0.179253  6.95663   0.0002
x2     0.201943  0.0158779  12.7185    <1e–5
x3   –0.0116423 0.00157859 –7.37515   0.0002
• To give the X matrix columns names other than x$i, use data frames. ### Obtaining Other Regression Statistics snippet.juliarepl julia> using GLM, StatsBase julia> y= [1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 ); mymodel= GLM.lm(X, y); julia> r2( mymodel ) 0.9942721640021418 julia> adjr2( mymodel ) 0.9926356394313252 julia> deviance( mymodel ) ## also called SSE 0.4725464698233003 julia> nulldeviance( mymodel ) ## also called SST 82.5 julia> sigma( lm::LinearModel )::Float64= sqrt( deviance( lm )/dof_residual( lm ) ) sigma (generic function with 1 method) julia> sigma( mymodel ) 0.2598203967093149 julia> loglikelihood( mymodel ) 1.0716360731829389 julia> aic( mymodel ) 5.856727853634123 julia> bic( mymodel ) 7.067068225610306 julia> nullmodel= GLM.lm( hcat( fill(1,10) ), y); julia> ftest(mymodel, nullmodel) Res. DOF DOF ΔDOF SSR ΔSSR R² ΔR² F* p(>F) Model 1 7 4 0.4725 0.9943 Model 2 9 2 –2 82.5000 –82.0275 0.0000 0.9943 607.5510 <1e–7 ### Recovering Input Observations, and Obtaining Residuals, and Fitted Values snippet.juliarepl julia> using GLM julia> y= [1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 ); mymodel= GLM.lm(X, y); julia> nobs( mymodel ), dof( mymodel ), dof_residual( mymodel ) (10.0, 4, 7.0) julia> dump( predict(mymodel) ) ## the fitted values Array{Float64}((10,)) [1.4373, 1.96163, 2.75014, 3.73298, 4.84028, 6.0022, 7.14889, 8.21048, 9.11713, 9.79897] julia> dump( residuals(mymodel) ) ## the residuals Array{Float64}((10,)) [–0.437298, 0.0383692, 0.249858, 0.267024, 0.159719, –0.00220301, –0.148887, –0.21048, –0.117127, 0.201025] julia> dump( mymodel.rr.y ) ## the original y values Array{Float64}((10,)) [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0] julia> mymodel.pp.X ## the original X values 10×3 Array{Float64,2}: 1.0 1.0 1.0 1.0 4.0 8.0 1.0 9.0 27.0 1.0 16.0 64.0 1.0 25.0 125.0 1.0 36.0 216.0 1.0 49.0 343.0 1.0 64.0 512.0 1.0 81.0 729.0 1.0 100.0 1000.0 ### Obtaining New Predictions snippet.juliarepl julia> using GLM julia> y= [1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 ); mymodel= GLM.lm(X, y); julia> predict( mymodel, [ 1.0 5.0 3.0 ] ) ## careful: this must be a matrix, not a vector! 1-element Array{Float64,1}: 2.221785359137163 julia> predict( mymodel, [ 1.0 5.0 3.0 ]; interval=:confidence ) ## ", level=0.95" (95% bounds) 1×3 Array{Float64,2}: 2.22179 1.91626 2.52731 For more regression statistics, refer to Multivariate Statistics With Data Frames. ### Writing your own Summary Output snippet.juliarepl julia> using GLM julia> y= [1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 ); mymodel= GLM.lm(X, y); julia> import Base.show ## allows us to define our own show function julia> show(io::IO, x::LinearModel)= print(io, coeftable(x), "\n", "N=$(nobs(x)),  σ=$(round( sqrt(deviance(x)/dof_residual(x)); sigdigits=4)), R²=$(round( r2(x); sigdigits=4))")
show (generic function with 351 methods)

julia> show(x::LinearModel)= show(stdout, x)
show (generic function with 352 methods)

julia> mymodel
Estimate  Std.Error  t value Pr(>|t|)
x1        1.247   0.179253  6.95663   0.0002
x2     0.201943  0.0158779  12.7185    <1e–5
x3   –0.0116423 0.00157859 –7.37515   0.0002

N=10.0,  σ=0.2598,  R²=0.9943

Standardized coefficients are good measures of the “economic” (rather than the “statistical”) significance of your regression coefficients. You can use RegressionTables, or overwrite coeftable (from GLM.jl/src/lm.jl) yourself:

snippet.juliarepl
julia> import StatsBase.coeftable

julia> using GLM, Distributions

julia> using StatsBase: CoefTable, StatisticalModel, RegressionModel

julia> function coeftable(mm::LinearModel)
cc = coef(mm)
se = stderror(mm)
tt = cc ./ se
ccn = [ cc[i]*std(mymodel.pp.X[:,i])/std(mymodel.rr.y) for i in 1:length(cc) ]	## the new column
CoefTable(hcat(cc,se,tt,ccdf.(Ref(FDist(1, dof_residual(mm))), abs2.(tt)), ccn),
["Estimate","Std.Error","t value", "Pr(>|t|)","NormCoef"],
["x$i" for i = 1:size(mm.pp.X, 2)], 4) end##function## coeftable (generic function with 4 methods) julia> y= [1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 ); mymodel= GLM.lm(X, y); julia> print(mymodel) LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}}: Coefficients: Estimate Std.Error t value Pr(>|t|) NormCoef x1 1.247 0.179253 6.95663 0.0002 0.0 x2 0.201943 0.0158779 12.7185 <1e–5 2.27936 x3 –0.0116423 0.00157859 –7.37515 0.0002 –1.32175 ### Useful Contents of a Model snippet.julianoeval [download only julia statements] julia> dump(ols) StatsModels.DataFrameRegressionModel{LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}} model: LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}} rr: LmResp{Array{Float64,1}} mu: Array{Float64}((3,)) [1.83333, 4.33333, 6.83333] offset: Array{Float64}((0,)) Float64[] wts: Array{Float64}((0,)) Float64[] y: Array{Float64}((3,)) [2.0, 4.0, 7.0] pp: DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}} X: Array{Float64}((3, 2)) [1.0 1.0; 1.0 2.0; 1.0 3.0] beta0: Array{Float64}((2,)) [-0.666667, 2.5] delbeta: Array{Float64}((2,)) [0.0, 0.0] scratchbeta: Array{Float64}((2,)) [0.0, 0.0] chol: LinearAlgebra.Cholesky{Float64,Array{Float64,2}} factors: Array{Float64}((2, 2)) [1.73205 3.4641; 6.0 1.41421] uplo: Char 'U' info: Int64 0 scratchm1: Array{Float64}((3, 2)) [0.0 0.0; 0.0 0.0; 0.0 0.0] scratchm2: Array{Float64}((2, 2)) [2.30382e-314 2.28588e-314; 2.30382e-314 0.0] mf: ModelFrame df: DataFrame 3 observations of 2 variables Y: Array{Int64}((3,)) [2, 4, 7] X: Array{Int64}((3,)) [1, 2, 3] terms: StatsModels.Terms terms: Array{Any}((1,)) 1: Symbol X eterms: Array{Any}((2,)) 1: Symbol Y 2: Symbol X factors: Array{Bool}((2, 2)) Bool[true false; false true] is_non_redundant: Array{Bool}((2, 2)) Bool[false false; false false] order: Array{Int64}((2,)) [1, 1] response: Bool true intercept: Bool true nonmissing: BitArray{1} chunks: Array{UInt64}((1,)) UInt64[0x0000000000000007] len: Int64 3 dims: Tuple{Int64} 1: Int64 4694371872 contrasts: Dict{Symbol,StatsModels.ContrastsMatrix} slots: Array{UInt8}((16,)) UInt8[0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00] keys: Array{Symbol}((16,)) 1: #undef 2: #undef 3: #undef 4: #undef 5: #undef ... 12: #undef 13: #undef 14: #undef 15: #undef 16: #undef vals: Array{StatsModels.ContrastsMatrix}((16,)) 1: #undef 2: #undef 3: #undef 4: #undef 5: #undef ... 12: #undef 13: #undef 14: #undef 15: #undef 16: #undef ndel: Int64 0 count: Int64 0 age: UInt64 0x0000000000000000 idxfloor: Int64 1 maxprobe: Int64 0 mm: ModelMatrix{Array{Float64,2}} m: Array{Float64}((3, 2)) [1.0 1.0; 1.0 2.0; 1.0 3.0] assign: Array{Int64}((2,)) [0, 1] ### Missing or NaN Data Unfortunately, the lm regression routines cannot handle missing data. Fortunately, the DataFrame/GLM/Formula interface can. Therefore, if there are missing observations, place the data into a DataFrame first. Here is a basic example: snippet.juliarepl julia> using GLM julia> X=[missing,2.0,3,5,10]; Y=[2.0,4,missing,3,5]; julia> lm( X, Y ) ERROR: MethodError: no method matching fit(::Type{LinearModel}, ::Array{Union{Missing, Float64},1}, ::Array{Union{Missing, Float64},1}, ::Bool) Closest candidates are: fit(!Matched::Type{StatsBase.Histogram}, ::Any...; kwargs...) at /Users/ivo/.julia/packages/StatsBase/NzjNi/src/hist.jl:340 fit(!Matched::StatsBase.StatisticalModel, ::Any...) at /Users/ivo/.julia/packages/StatsBase/NzjNi/src/statmodels.jl:151 fit(!Matched::Type{D<:Distributions.Distribution}, ::Any...) where D<:Distributions.Distribution at /Users/ivo/.julia/packages/Distributions/WHjOk/src/genericfit.jl:34 ... Stacktrace: [1] lm(::Array{Union{Missing, Float64},1}, ::Array{Union{Missing, Float64},1}, ::Bool) at /Users/ivo/.julia/packages/GLM/FxTmX/src/lm.jl:146 (repeats 2 times) [2] top-level scope at none:0 julia> using DataFrames julia> data= DataFrame(X=X, Y=Y) 5×2 DataFrame │ Row │ X │ Y │ │ │ Float64⍰ │ Float64⍰ │ ├─────┼──────────┼──────────┤ │ 1 │ missing │ 2.0 │ │ 2 │ 2.0 │ 4.0 │ │ 3 │ 3.0 │ missing │ │ 4 │ 5.0 │ 3.0 │ │ 5 │ 10.0 │ 5.0 │ julia> mymodel= lm( @formula(Y~X), data ) StatsModels.DataFrameRegressionModel{LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}} Formula: Y ~ 1 + X Coefficients: Estimate Std.Error t value Pr(>|t|) (Intercept) 3.13265 1.27486 2.45726 0.2460 X 0.153061 0.194414 0.787296 0.5754 julia> nobs(ans) 3.0 ## Sequential Weighted Least-Squares (WLS) Regression Filtering (AS75) The following WLS regression does not need to keep the input data in order to obtain OLS (and WLS) regression coefficients and statistics. Instead, it allows feeding and forgetting it. It is based on Gentleman's AS75 WLS Regression in Julia, which implements a weighted least-squares one-pass regression. If only the final coefficient estimates are needed, the AS75 algorithm is slow. However, AS75 has a very low memory footprint and it can provide updated coefficient estimates after each observation. (If updated historical fitted values or residuals are needed after each observation (e.g., as in recursive residuals), the observations need to be kept anyway, and the memory advantage of the filtering method evaporates.) snippet.juliarepl julia> include("gregr-as75.jl") ## the downloadable GIST julia> myreg= Reg(3, "testing regression", [ "i", "constant", "i-squared", "i-cubed" ] ); julia> for i=1:10 f= Float64(i) reginclude(myreg, Float64(i), [ 1.0, f*f, f*f*f ] ); end#for## julia> regcoefs(myreg) 3-element Array{Float64,1}: 1.2469976287003752 0.2019429358219231 –0.011642316224279057 julia> regcoefsse(myreg) 3-element Array{Float64,1}: 0.17925319297816514 0.01587790720088601 0.0015785864167600233 julia> myreg.rsq, myreg.adjrsq, myreg.sst, myreg.sse, myreg.sigmasq (0.9942721640021418, 0.9926356394313252, 82.5, 0.4725464698232986, 0.06750663854618552) julia> regpredict(myreg, [ 1.0, 4.0, –8.0 ]) 2.1479079017823004 With time, I will probably improve the AS75 code on the public GIST. ## Benchmarks (2018) These benchmarks were performed in mid-2018. They were run with code akin to snippet.julianoeval [download only julia statements] julia> M= 1_000_000; X= hcat( fill(1.0, M), rand( M, 5 ) ); y= rand(M); ## make data julia> @btime begin; z= pinv(X)* y; end; ## or the like ### Application Recommendation • If you do not care too much about memory use and you only need one final regression coefficient for all the data, then use QR or GLM. • If your data is really ill conditioned, use SVD. • If storage space is of the essence, or you need sequential coefficient regression output (i.e., updating coefficient estimates as data flows in), use AS75. ### One-Time Regressions To compare these methods on an iMac Pro on 1 million observations, with 5 random variables (plus a constant) predicting a 6th random variable, Method Memory Store Speed Filter AS75 <300 bytes 1,700 ms OLS LA: inv(X'\*X)\*X'\*y 56 MByte Feed 48 ms QR: X\y 56 MByte Feed 58 ms SVD: pinv(X)*y 56 MByte Feed 150 ms GLM.jl lm 56 MByte Feed, 109 MByte Store 19 ms If only a single regression is required, then AS75 is slower by orders of magnitude. However, it requires far less storage. ### Sequential Updating (Filtering) Regression Coefficients When coefficients are needed on each iteration (recursive coefficients), this reverses. With 2 variables (plus constant), the execution speed is Method 100 1,000 2,000 4,000 8,000 AS75 Speed (ms) 0.1 1 2 5 10 QR Speed (ms) 1.0 30 82 261 1,058 AS75 is not just faster, but also requires orders of magnitude less storage. ## GLM with Missing, DataFrames, and RegressionTables GLM plays especially well with DataFrames: 1. It offers a “formula” interface, which is deliberately R-like (though less capable). 2. GLM knows only when used with dataframes how to ignore missing observations. 3. The RegressionTables.jl package is the preferred way to look at and save regression output. snippet.juliarepl julia> using DataFrames, GLM julia> yb= [1.0:10.0;]; mydf= DataFrame( y=vcat(yb,999.9,missing) , x1= vcat(yb.^2,missing,999.9) , x2= vcat(yb.^3, 999.9, 999.9) ); julia> mydf 12×3 DataFrame │ Row │ y │ x1 │ x2 │ │ │ Float64⍰ │ Float64⍰ │ Float64 │ ├─────┼──────────┼──────────┼─────────┤ │ 1 │ 1.0 │ 1.0 │ 1.0 │ │ 2 │ 2.0 │ 4.0 │ 8.0 │ │ 3 │ 3.0 │ 9.0 │ 27.0 │ │ 4 │ 4.0 │ 16.0 │ 64.0 │ │ 5 │ 5.0 │ 25.0 │ 125.0 │ │ 6 │ 6.0 │ 36.0 │ 216.0 │ │ 7 │ 7.0 │ 49.0 │ 343.0 │ │ 8 │ 8.0 │ 64.0 │ 512.0 │ │ 9 │ 9.0 │ 81.0 │ 729.0 │ │ 10 │ 10.0 │ 100.0 │ 1000.0 │ │ 11 │ 999.9 │ missing │ 999.9 │ │ 12 │ missing │ 999.9 │ 999.9 │ julia> myols1= lm( @formula( y ~ x1 ), mydf ) StatsModels.DataFrameRegressionModel{LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}} Formula: y ~ 1 + x1 Coefficients: Estimate Std.Error t value Pr(>|t|) (Intercept) 2.17582 0.353361 6.15751 0.0003 x1 0.0863422 0.00702062 12.2984 <1e–5 julia> nobs(myols1) ## note: the result is a Float, which ignores missing observations 10.0 julia> myols2= lm( @formula( y ~ x1 + x2 ), mydf ); julia> using RegressionTables ## show the output in nicer form julia> regtable( myols1, myols2 ) ## output both models side by side ---------------------------------- y -------------------- (1) (2) ---------------------------------- (Intercept) 2.176*** 1.247*** (0.353) (0.179) x1 0.086*** 0.202*** (0.007) (0.016) x2 –0.012*** (0.002) ---------------------------------- Estimator OLS OLS ---------------------------------- N 10 10 R2 0.950 0.994 ---------------------------------- julia> regtable(myols1, myols2; standardize_coef=true) ## the coefs now are the '*sd(x)/sd(y)' and *often* what you should be interested in ---------------------------------- y -------------------- (1) (2) ---------------------------------- (Intercept) 0.000*** 0.000*** (0.000) (0.000) x1 0.975*** 2.279*** (0.079) (0.179) x2 –1.322*** (0.179) ---------------------------------- Estimator OLS OLS ---------------------------------- N 10 10 R2 0.950 0.994 ---------------------------------- julia> regtable( myols1, myols2; renderSettings=latexOutput() ) ## latexOutput also takes an optional filename. \begin{tabular}{lrr} \toprule & \multicolumn{2}{c}{y} \\ \cmidrule(lr){2–3} & (1) & (2) \\ \midrule (Intercept) & 2.176*** & 1.247*** \\ & (0.353) & (0.179) \\ x1 & 0.086*** & 0.202*** \\ & (0.007) & (0.016) \\ x2 & & –0.012*** \\ & & (0.002) \\ \midrule Estimator & OLS & OLS \\ \midrule$N$& 10 & 10 \\$R^2\$       &    0.950 &      0.994 \\
\bottomrule
\end{tabular}
• RegressionTables allows modifying much output according to end-user preferences.
• RegressionTables can also render in htmlOutput().
• As noted, unlike the plain vector/matrix numeric interface to GLM, the DataFrame/Formula interface knows how to handle missing cases. However, NaN must be converted to missing first. DataFrame also provides (minimal) support for other statistics through its completecases interface.

# Backmatter

## Useful Packages on Julia Repository

• Mocha for neural network learning
• LightML for many statistical techniques now grouped as “machine learning.”