User Tools

Site Tools


multistats

snippet.juliarepl
julia> pkgchk( [ "julia" => v"1.0.2", "DataFrames" => v"0.14.1", "GLM" => v"1.0.1", "Plots" => v"0.21.0", "RegressionTables" => v"0.2.0" ] )

Multivariate Least-Squares Linear Regressions

This chapter demonstrates three methods to run linear least-squares regressions–sequential feeding, linear algebra, and GLM–and offers some benchmarks. However, for statistical analysis, you are almost always better off placing your data into a dataframe first, and then following the tips in the second part of this chapter. With data frames, you get easy variable names, better output, and good handling of missing observations.

For smooth localized fitting with splines, refer to Bivariate Relations and Loess.

The tips in this chapter uses an artificial set of numbers. The dependent variable is 1:10, the X variable is its squared and its cube. The constant '1' is the intercept.

snippet.juliarepl
julia> y=[1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 )	## this will be the main example in this chapter
10×3 Array{Int64,2}:
 1    1     1
 1    4     8
 1    9    27
 1   16    64
 1   25   125
 1   36   216
 1   49   343
 1   64   512
 1   81   729
 1  100  1000
  • lm() means fit(LinearModel,...). glm() means fit(GeneralizedLinearModel,...).

Ordinary Least-Squares (OLS) with Linear Algebra

Coefficient Estimates

Although the statistical community is highly concerned with ill-conditioned numerical problems, for many practical applications, even the naive linear algebra method this is stable enough:

snippet.juliarepl
julia> y=[1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 );

julia> inv(X' * X) * X' * y      ## a less stable OLS regression
3-element Array{Float64,1}:
  1.246997628700364
  0.20194293582192240.011642316224279095

For more accuracy, julia has the QR decomposition built-in. The backslash is syntactic sugar Q,R=qr(X); inv(factorize(R)) * Q.'y:

snippet.juliarepl
julia> y=[1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 );

julia> beta= X \ y
3-element Array{Float64,1}:
  1.246997628700377
  0.20194293582192290.011642316224279031

QR is already very stable and the speed penalty is very small. It is well worth using it almost all the time. There is an even more accurate version, the “SVD Decomposition,” which offers good accuracy in even very ill-conditioned problems, but it has a serious speed penalty:

snippet.juliarepl
julia> using LinearAlgebra

julia> y= [1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 );

julia> pinv(X)*y
3-element Array{Float64,1}:
  1.2469976287003885
  0.201942935821922740.011642316224279067

Standard Errors of the Estimated Coefficients

To calculate the standard error of the coefficients

snippet.juliarepl
julia> using LinearAlgebra

julia> y= [1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 );

julia> beta= X \ y;

julia> resids= y - X*beta; sigmasq= sqrt( sum(resids.^2)/(size(X)[1]-size(X)[2]) )
0.2598203967093144

julia> sqrt.(diag(inv(X' * X)))*sigmasq   ## not very accurate
3-element Array{Float64,1}:
 0.1792531929781652
 0.015877907200886023
 0.0015785864167600246

Linear Regression Models with GLM.jl

The “generalized linear model” (GLM.jl) implements linear regressions. Its primary design uses other packages, primarily Distributions and Statsbase (which implements abstractions for statistical models (StatisticalModel), such as R^2, predict, fitted, residuals, coefnames, etc.).

The GLM.jl package is designed to work especially well with dataframes. This section offers only a small glimpse.

GLM also implements a number of other linear regressions, such as Probits, Negative Binomial Regressions, etc.

snippet.juliarepl
julia> using GLM, DataFrames

julia> y= [1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 );

julia> GLM.lm(X, y)				## plain OLS linear model, predicting y with X
LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}}:

Coefficients:
       Estimate  Std.Error  t value Pr(>|t|)
x1        1.247   0.179253  6.95663   0.0002
x2     0.201943  0.0158779  12.7185    <1e–5
x3   –0.0116423 0.001578597.37515   0.0002

julia> y= [1.0:10.0;]; mydf= DataFrame( y=vcat(y,999.9,missing) , x1= vcat(y.^2,missing,999.9) , x2= vcat(y.^3, 999.9, 999.9) );

julia> glm( @formula(y ~ x1+x2) , mydf , Normal() )	## force normal error presumption, changes Pr
StatsModels.DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,1},Normal{Float64},IdentityLink},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

Formula: y ~ 1 + x1 + x2

Coefficients:
               Estimate  Std.Error  z value Pr(>|z|)
(Intercept)       1.247   0.179253  6.95663   <1e–11
x1             0.201943  0.0158779  12.7185   <1e–36
x2           –0.0116423 0.001578597.37515   <1e–12
  • There are many other linear models, where the errors are not assumed to be normally distributed. See the GLM documentation.
  • For more general (non-normally distributed error) models, julia reports Z-significance rather than T-significance (which changes the Pr(>|z|) column, and keeps more information. This can also be forced with the Normal() option to glm, as shown above.

Obtaining Coefficient Estimates

snippet.juliarepl
julia> using GLM

julia> y= [1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 ); mymodel= GLM.lm(X, y);

julia> coef(mymodel), stderror( mymodel )           ## often the most important results
([1.247, 0.201943,0.0116423], [0.179253, 0.0158779, 0.00157859])

julia> confint( mymodel )   ## default: 95%
3×2 Array{Float64,2}:
  0.823131    1.67086
  0.164398    0.2394880.01537510.00790955

julia> coeftable( mymodel )
       Estimate  Std.Error  t value Pr(>|t|)
x1        1.247   0.179253  6.95663   0.0002
x2     0.201943  0.0158779  12.7185    <1e–5
x3   –0.0116423 0.001578597.37515   0.0002
  • To give the X matrix columns names other than x$i, use data frames.

Obtaining Other Regression Statistics

snippet.juliarepl
julia> using GLM, StatsBase

julia> y= [1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 ); mymodel= GLM.lm(X, y);

julia> r2( mymodel )
0.9942721640021418

julia> adjr2( mymodel )
0.9926356394313252

julia> deviance( mymodel )       ## also called SSE
0.4725464698233003

julia> nulldeviance( mymodel )   ## also called SST
82.5

julia> sigma( lm::LinearModel )::Float64= sqrt( deviance( lm )/dof_residual( lm ) )
sigma (generic function with 1 method)

julia> sigma( mymodel )
0.2598203967093149

julia> loglikelihood( mymodel )
1.0716360731829389

julia> aic( mymodel )
5.856727853634123

julia> bic( mymodel )
7.067068225610306

julia> nullmodel= GLM.lm( hcat( fill(1,10) ), y);

julia> ftest(mymodel, nullmodel)
        Res. DOF DOF ΔDOF     SSR     ΔSSR     R²    ΔR²       F* p(>F)
Model 1        7   4       0.4725          0.9943
Model 2        9   22 82.500082.0275 0.0000 0.9943 607.5510 <1e–7

Recovering Input Observations, and Obtaining Residuals, and Fitted Values

snippet.juliarepl
julia> using GLM

julia> y= [1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 ); mymodel= GLM.lm(X, y);

julia> nobs( mymodel ), dof( mymodel ), dof_residual( mymodel )
(10.0, 4, 7.0)

julia> dump( predict(mymodel) )		## the fitted values
Array{Float64}((10,)) [1.4373, 1.96163, 2.75014, 3.73298, 4.84028, 6.0022, 7.14889, 8.21048, 9.11713, 9.79897]

julia> dump( residuals(mymodel) )		## the residuals
Array{Float64}((10,)) [0.437298, 0.0383692, 0.249858, 0.267024, 0.159719,0.00220301,0.148887,0.21048,0.117127, 0.201025]

julia> dump( mymodel.rr.y )			## the original y values
Array{Float64}((10,)) [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]

julia> mymodel.pp.X				## the original X values
10×3 Array{Float64,2}:
 1.0    1.0     1.0
 1.0    4.0     8.0
 1.0    9.0    27.0
 1.0   16.0    64.0
 1.0   25.0   125.0
 1.0   36.0   216.0
 1.0   49.0   343.0
 1.0   64.0   512.0
 1.0   81.0   729.0
 1.0  100.0  1000.0

Obtaining New Predictions

snippet.juliarepl
julia> using GLM

julia> y= [1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 ); mymodel= GLM.lm(X, y);

julia> predict( mymodel, [ 1.0 5.0 3.0 ] )	## careful: this must be a matrix, not a vector!
1-element Array{Float64,1}:
 2.221785359137163

julia> predict( mymodel, [ 1.0 5.0 3.0 ]; interval=:confidence )	## ", level=0.95" (95% bounds)
1×3 Array{Float64,2}:
 2.22179  1.91626  2.52731

For more regression statistics, refer to Multivariate Statistics With Data Frames.

Writing your own Summary Output

snippet.juliarepl
julia> using GLM

julia> y= [1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 ); mymodel= GLM.lm(X, y);

julia> import Base.show		## allows us to define our own show function

julia> show(io::IO, x::LinearModel)= print(io, coeftable(x), "\n", "N=$(nobs(x)),  σ=$(round( sqrt(deviance(x)/dof_residual(x)); sigdigits=4)),  R²=$(round( r2(x); sigdigits=4))")
show (generic function with 351 methods)

julia> show(x::LinearModel)= show(stdout, x)
show (generic function with 352 methods)

julia> mymodel
       Estimate  Std.Error  t value Pr(>|t|)
x1        1.247   0.179253  6.95663   0.0002
x2     0.201943  0.0158779  12.7185    <1e–5
x3   –0.0116423 0.001578597.37515   0.0002

N=10.0,  σ=0.2598,  R²=0.9943

Adding Standardized Coefficients

Standardized coefficients are good measures of the “economic” (rather than the “statistical”) significance of your regression coefficients. You can use RegressionTables, or overwrite coeftable (from GLM.jl/src/lm.jl) yourself:

snippet.juliarepl
julia> import StatsBase.coeftable

julia> using GLM, Distributions

julia> using StatsBase: CoefTable, StatisticalModel, RegressionModel

julia> function coeftable(mm::LinearModel)
           cc = coef(mm)
           se = stderror(mm)
           tt = cc ./ se
	   ccn = [ cc[i]*std(mymodel.pp.X[:,i])/std(mymodel.rr.y) for i in 1:length(cc) ]	## the new column
      	   CoefTable(hcat(cc,se,tt,ccdf.(Ref(FDist(1, dof_residual(mm))), abs2.(tt)), ccn),
	      ["Estimate","Std.Error","t value", "Pr(>|t|)","NormCoef"],
              ["x$i" for i = 1:size(mm.pp.X, 2)], 4)
           end##function##
coeftable (generic function with 4 methods)

julia> y= [1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 ); mymodel= GLM.lm(X, y);

julia> print(mymodel)
LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}}:

Coefficients:
       Estimate  Std.Error  t value Pr(>|t|) NormCoef
x1        1.247   0.179253  6.95663   0.0002      0.0
x2     0.201943  0.0158779  12.7185    <1e–5  2.27936
x3   –0.0116423 0.001578597.37515   0.00021.32175

Useful Contents of a Model

snippet.julianoeval
[download only julia statements]
julia> dump(ols)
StatsModels.DataFrameRegressionModel{LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
  model: LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}}
    rr: LmResp{Array{Float64,1}}
      mu: Array{Float64}((3,)) [1.83333, 4.33333, 6.83333]
      offset: Array{Float64}((0,)) Float64[]
      wts: Array{Float64}((0,)) Float64[]
      y: Array{Float64}((3,)) [2.0, 4.0, 7.0]
    pp: DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}
      X: Array{Float64}((3, 2)) [1.0 1.0; 1.0 2.0; 1.0 3.0]
      beta0: Array{Float64}((2,)) [-0.666667, 2.5]
      delbeta: Array{Float64}((2,)) [0.0, 0.0]
      scratchbeta: Array{Float64}((2,)) [0.0, 0.0]
      chol: LinearAlgebra.Cholesky{Float64,Array{Float64,2}}
        factors: Array{Float64}((2, 2)) [1.73205 3.4641; 6.0 1.41421]
        uplo: Char 'U'
        info: Int64 0
      scratchm1: Array{Float64}((3, 2)) [0.0 0.0; 0.0 0.0; 0.0 0.0]
      scratchm2: Array{Float64}((2, 2)) [2.30382e-314 2.28588e-314; 2.30382e-314 0.0]
  mf: ModelFrame
    df: DataFrame  3 observations of 2 variables
      Y: Array{Int64}((3,)) [2, 4, 7]      X: Array{Int64}((3,)) [1, 2, 3]
    terms: StatsModels.Terms
      terms: Array{Any}((1,))
        1: Symbol X
      eterms: Array{Any}((2,))
        1: Symbol Y
        2: Symbol X
      factors: Array{Bool}((2, 2)) Bool[true false; false true]
      is_non_redundant: Array{Bool}((2, 2)) Bool[false false; false false]
      order: Array{Int64}((2,)) [1, 1]
      response: Bool true
      intercept: Bool true
    nonmissing: BitArray{1}
      chunks: Array{UInt64}((1,)) UInt64[0x0000000000000007]
      len: Int64 3
      dims: Tuple{Int64}
        1: Int64 4694371872
    contrasts: Dict{Symbol,StatsModels.ContrastsMatrix}
      slots: Array{UInt8}((16,)) UInt8[0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]
      keys: Array{Symbol}((16,))
        1: #undef
        2: #undef
        3: #undef
        4: #undef
        5: #undef
        ...
        12: #undef
        13: #undef
        14: #undef
        15: #undef
        16: #undef
      vals: Array{StatsModels.ContrastsMatrix}((16,))
        1: #undef
        2: #undef
        3: #undef
        4: #undef
        5: #undef
        ...
        12: #undef
        13: #undef
        14: #undef
        15: #undef
        16: #undef
      ndel: Int64 0
      count: Int64 0
      age: UInt64 0x0000000000000000
      idxfloor: Int64 1
      maxprobe: Int64 0
  mm: ModelMatrix{Array{Float64,2}}
    m: Array{Float64}((3, 2)) [1.0 1.0; 1.0 2.0; 1.0 3.0]
    assign: Array{Int64}((2,)) [0, 1]

Missing or NaN Data

Unfortunately, the lm regression routines cannot handle missing data. Fortunately, the DataFrame/GLM/Formula interface can. Therefore, if there are missing observations, place the data into a DataFrame first. Here is a basic example:

snippet.juliarepl
julia> using GLM

julia> X=[missing,2.0,3,5,10]; Y=[2.0,4,missing,3,5];

julia> lm( X, Y )
ERROR: MethodError: no method matching fit(::Type{LinearModel}, ::Array{Union{Missing, Float64},1}, ::Array{Union{Missing, Float64},1}, ::Bool)
Closest candidates are:
  fit(!Matched::Type{StatsBase.Histogram}, ::Any...; kwargs...) at /Users/ivo/.julia/packages/StatsBase/NzjNi/src/hist.jl:340
  fit(!Matched::StatsBase.StatisticalModel, ::Any...) at /Users/ivo/.julia/packages/StatsBase/NzjNi/src/statmodels.jl:151
  fit(!Matched::Type{D<:Distributions.Distribution}, ::Any...) where D<:Distributions.Distribution at /Users/ivo/.julia/packages/Distributions/WHjOk/src/genericfit.jl:34
  ...
Stacktrace:
 [1] lm(::Array{Union{Missing, Float64},1}, ::Array{Union{Missing, Float64},1}, ::Bool) at /Users/ivo/.julia/packages/GLM/FxTmX/src/lm.jl:146 (repeats 2 times)
 [2] top-level scope at none:0

julia> using DataFrames

julia> data= DataFrame(X=X, Y=Y)
5×2 DataFrame
│ Row │ X        │ Y        │
│     │ Float64⍰ │ Float64⍰ │
├─────┼──────────┼──────────┤
│ 1missing2.0      │
│ 22.04.0      │
│ 33.0missing  │
│ 45.03.0      │
│ 510.05.0julia> mymodel= lm( @formula(Y~X), data )
StatsModels.DataFrameRegressionModel{LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

Formula: Y ~ 1 + X

Coefficients:
             Estimate Std.Error  t value Pr(>|t|)
(Intercept)   3.13265   1.27486  2.45726   0.2460
X            0.153061  0.194414 0.787296   0.5754


julia> nobs(ans)
3.0

Sequential Weighted Least-Squares (WLS) Regression Filtering (AS75)

The following WLS regression does not need to keep the input data in order to obtain OLS (and WLS) regression coefficients and statistics. Instead, it allows feeding and forgetting it. It is based on Gentleman's AS75 WLS Regression in Julia, which implements a weighted least-squares one-pass regression. If only the final coefficient estimates are needed, the AS75 algorithm is slow. However, AS75 has a very low memory footprint and it can provide updated coefficient estimates after each observation. (If updated historical fitted values or residuals are needed after each observation (e.g., as in recursive residuals), the observations need to be kept anyway, and the memory advantage of the filtering method evaporates.)

snippet.juliarepl
julia> include("gregr-as75.jl")		## the downloadable GIST

julia> myreg= Reg(3, "testing regression", [ "i", "constant", "i-squared", "i-cubed" ] );

julia> for i=1:10
	    f= Float64(i)
	    reginclude(myreg, Float64(i), [ 1.0, f*f, f*f*f ] );
	end#for##

julia> regcoefs(myreg)
3-element Array{Float64,1}:
  1.2469976287003752
  0.20194293582192310.011642316224279057

julia> regcoefsse(myreg)
3-element Array{Float64,1}:
 0.17925319297816514
 0.01587790720088601
 0.0015785864167600233

julia> myreg.rsq, myreg.adjrsq, myreg.sst, myreg.sse, myreg.sigmasq
(0.9942721640021418, 0.9926356394313252, 82.5, 0.4725464698232986, 0.06750663854618552)

julia> regpredict(myreg, [ 1.0, 4.0,8.0 ])
2.1479079017823004

With time, I will probably improve the AS75 code on the public GIST.


Benchmarks (2018)

These benchmarks were performed in mid-2018. They were run with code akin to

snippet.julianoeval
[download only julia statements]
julia> M= 1_000_000; X= hcat( fill(1.0, M), rand( M, 5 ) );  y= rand(M);  ## make data
julia> @btime begin; z= pinv(X)* y; end;  ## or the like

Application Recommendation

  • If you do not care too much about memory use and you only need one final regression coefficient for all the data, then use QR or GLM.
  • If your data is really ill conditioned, use SVD.
  • If storage space is of the essence, or you need sequential coefficient regression output (i.e., updating coefficient estimates as data flows in), use AS75.

One-Time Regressions

To compare these methods on an iMac Pro on 1 million observations, with 5 random variables (plus a constant) predicting a 6th random variable,

Method Memory Store Speed
Filter AS75 <300 bytes 1,700 ms
OLS LA: inv(X'\*X)\*X'\*y 56 MByte Feed 48 ms
QR: X\y 56 MByte Feed 58 ms
SVD: pinv(X)*y 56 MByte Feed 150 ms
GLM.jl lm 56 MByte Feed, 109 MByte Store 19 ms

If only a single regression is required, then AS75 is slower by orders of magnitude. However, it requires far less storage.

Sequential Updating (Filtering) Regression Coefficients

When coefficients are needed on each iteration (recursive coefficients), this reverses. With 2 variables (plus constant), the execution speed is

Method 100 1,000 2,000 4,000 8,000
AS75 Speed (ms) 0.1 1 2 5 10
QR Speed (ms) 1.0 30 82 261 1,058

AS75 is not just faster, but also requires orders of magnitude less storage.

GLM with Missing, DataFrames, and RegressionTables

GLM plays especially well with DataFrames:

  1. It offers a “formula” interface, which is deliberately R-like (though less capable).
  2. GLM knows only when used with dataframes how to ignore missing observations.
  3. The RegressionTables.jl package is the preferred way to look at and save regression output.
snippet.juliarepl
julia> using DataFrames, GLM

julia> yb= [1.0:10.0;]; mydf= DataFrame( y=vcat(yb,999.9,missing) , x1= vcat(yb.^2,missing,999.9) , x2= vcat(yb.^3, 999.9, 999.9) );

julia> mydf
12×3 DataFrame
│ Row │ y        │ x1       │ x2      │
│     │ Float64⍰ │ Float64⍰ │ Float64 │
├─────┼──────────┼──────────┼─────────┤
│ 11.01.01.0     │
│ 22.04.08.0     │
│ 33.09.027.0    │
│ 44.016.064.0    │
│ 55.025.0125.0   │
│ 66.036.0216.0   │
│ 77.049.0343.0   │
│ 88.064.0512.0   │
│ 99.081.0729.0   │
│ 1010.0100.01000.0  │
│ 11999.9missing999.9   │
│ 12missing999.9999.9julia> myols1= lm( @formula( y ~ x1 ), mydf )
StatsModels.DataFrameRegressionModel{LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

Formula: y ~ 1 + x1

Coefficients:
              Estimate  Std.Error t value Pr(>|t|)
(Intercept)    2.17582   0.353361 6.15751   0.0003
x1           0.0863422 0.00702062 12.2984    <1e–5


julia> nobs(myols1)	## note: the result is a Float, which ignores missing observations
10.0

julia> myols2= lm( @formula( y ~ x1 + x2 ), mydf );

julia> using RegressionTables	## show the output in nicer form

julia> regtable( myols1, myols2 )	## output both models side by side

----------------------------------
                        y
              --------------------
                   (1)         (2)
----------------------------------
(Intercept)   2.176***    1.247***
               (0.353)     (0.179)
x1            0.086***    0.202***
               (0.007)     (0.016)
x2                       –0.012***
                           (0.002)
----------------------------------
Estimator          OLS         OLS
----------------------------------
N                   10          10
R2               0.950       0.994
----------------------------------


julia> regtable(myols1, myols2; standardize_coef=true)	## the coefs now are the '*sd(x)/sd(y)' and *often* what you should be interested in

----------------------------------
                        y
              --------------------
                   (1)         (2)
----------------------------------
(Intercept)   0.000***    0.000***
               (0.000)     (0.000)
x1            0.975***    2.279***
               (0.079)     (0.179)
x2                       –1.322***
                           (0.179)
----------------------------------
Estimator          OLS         OLS
----------------------------------
N                   10          10
R2               0.950       0.994
----------------------------------



julia> regtable( myols1, myols2; renderSettings=latexOutput() )  ## latexOutput also takes an optional filename.
\begin{tabular}{lrr}
\toprule
            & \multicolumn{2}{c}{y} \\
\cmidrule(lr){23}
            &      (1) &        (2) \\
\midrule
(Intercept) & 2.176*** &   1.247*** \\
            &  (0.353) &    (0.179) \\
x1          & 0.086*** &   0.202*** \\
            &  (0.007) &    (0.016) \\
x2          &          &  –0.012*** \\
            &          &    (0.002) \\
\midrule
Estimator   &      OLS &        OLS \\
\midrule
$N$         &       10 &         10 \\
$R^2$       &    0.950 &      0.994 \\
\bottomrule
\end{tabular}
  • RegressionTables allows modifying much output according to end-user preferences.
  • RegressionTables can also render in htmlOutput().
  • As noted, unlike the plain vector/matrix numeric interface to GLM, the DataFrame/Formula interface knows how to handle missing cases. However, NaN must be converted to missing first. DataFrame also provides (minimal) support for other statistics through its completecases interface.

Backmatter

Useful Packages on Julia Repository

  • Mocha for neural network learning
  • LightML for many statistical techniques now grouped as “machine learning.”

Notes

References

multistats.txt · Last modified: 2018/11/25 08:56 (external edit)