# julia

### Site Tools

multistatsmore

snippet.juliarepl
`julia> pkgchk.( [  "julia" => v"1.0.3", "DataFrames" => v"0.14.1", "GLM" => v"1.0.1", "Plots" => v"0.21.0", "CovarianceMatrices" => v"0.9.0", "LsqFit" => v"0.6.0", "RegressionTables" => v"0.2.0", "FixedEffectModels" => v"0.6.1", "CategoricalArrays" => v"0.4.0", "LsqFit" => v"0.6.0", "ANOVA" => v"0.1.0"  ] );`

# More Multivariate Regression Topics

Most of the examples reuse the artificial data set from multistats:

snippet.julianoeval
```[download only julia statements]
julia> yb= [1.0:10.0;]; df= DataFrame( y=yb, x1= yb.^2, x2= yb.^3 );```

They should also work with Missing observations. For example, you could instead work with

snippet.julianoeval
```[download only julia statements]
julia> yb= [1.0:10.0;]; mydf= DataFrame( y=vcat(yb,999.9,missing) , x1= vcat(yb.^2,missing,999.9) , x2= vcat(yb.^3, 999.0, 999.0) );```

## Non(-Conditionally) Normal Dependent Variables

OLS is surprisingly robust with respect to violations of the normal error assumption. However, it is easy to improve the specification and thus not have to appeal the robustness of OLS.

### Probit and Logit (Binary Dependent Variable)

These are often used to model cases in which the dependent variable can only take two values. The following example shows first an OLS, then a Probit, and finally a Logit:

snippet.juliarepl
```julia> using GLM, DataFrames

julia> df= DataFrame( y= repeat( [1,0], 5); x1= 1:10, x2= sin.((1:10)) )
10×3 DataFrame
│ Row │ y     │ x1    │ x2        │
│     │ Int64 │ Int64 │ Float64   │
├─────┼───────┼───────┼───────────┤
│ 1   │ 1     │ 1     │ 0.841471  │
│ 2   │ 0     │ 2     │ 0.909297  │
│ 3   │ 1     │ 3     │ 0.14112   │
│ 4   │ 0     │ 4     │ –0.756802 │
│ 5   │ 1     │ 5     │ –0.958924 │
│ 6   │ 0     │ 6     │ –0.279415 │
│ 7   │ 1     │ 7     │ 0.656987  │
│ 8   │ 0     │ 8     │ 0.989358  │
│ 9   │ 1     │ 9     │ 0.412118  │
│ 10  │ 0     │ 10    │ –0.544021 │

julia> lm( @formula( y ~ x1 + x2 ), df )
StatsModels.DataFrameRegressionModel{LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

Formula: y ~ 1 + x1 + x2

Coefficients:
Estimate Std.Error   t value Pr(>|t|)
(Intercept)    0.644264  0.412811   1.56068   0.1626
x1           –0.0277944 0.0655179 –0.424227   0.6841
x2            0.0609811  0.271558   0.22456   0.8287

julia> glm( @formula( y ~ x1 + x2 ), df, Binomial(), ProbitLink() )	## f(mu_Y) = invnorm(P)

Formula: y ~ 1 + x1 + x2

Coefficients:
Estimate Std.Error   z value Pr(>|z|)
(Intercept)    0.374207  0.897261  0.417054   0.6766
x1           –0.0722244  0.142676 –0.506213   0.6127
x2             0.157343  0.584762  0.269072   0.7879

julia> glm( @formula( y ~ x1 + x2 ), df, Binomial(), LogitLink() )	## f(mu_Y) = log(P/(1-P))

Formula: y ~ 1 + x1 + x2

Coefficients:
Estimate Std.Error   z value Pr(>|z|)
(Intercept)   0.596395   1.44895  0.411604   0.6806
x1           –0.114614  0.230644 –0.496932   0.6192
x2            0.251826  0.938053  0.268456   0.7883```

### Poisson (Log-Linear) and Negative Binomial (Count Dependent Variable)

These are often used to model cases in which the dependent variable can only take non-negative integers, such as count data. However, the dependent variable should still be of type Float. The following example has been adapted from the GLM documentation, and shows first an OLS, then a Poisson, and finally a Negative Binomial model.

snippet.juliarepl
```julia> using DataFrames, GLM, CategoricalArrays

julia> df= DataFrame(y=[18.,17,15,20,10,20,25,13,12], x1=categorical([1,2,3,1,2,3,1,2,3]), x2=categorical([1,1,1,2,2,2,3,3,3]))
9×3 DataFrame
│ Row │ y       │ x1      │ x2    │
│     │ Float64 │ Categorical… │ Categorical… │
├─────┼─────────┼──────────────┼──────────────┤
│ 1   │ 18.0    │ 1            │ 1            │
│ 2   │ 17.0    │ 2            │ 1            │
│ 3   │ 15.0    │ 3            │ 1            │
│ 4   │ 20.0    │ 1            │ 2            │
│ 5   │ 10.0    │ 2            │ 2            │
│ 6   │ 20.0    │ 3            │ 2            │
│ 7   │ 25.0    │ 1            │ 3            │
│ 8   │ 13.0    │ 2            │ 3            │
│ 9   │ 12.0    │ 3            │ 3            │

julia> glm( @formula( y ~ x1 + x2), df, Normal() )

Formula: y ~ 1 + x1 + x2

Coefficients:
Estimate Std.Error  z value Pr(>|z|)
(Intercept)       21.0   3.40207  6.17271    <1e–9
x1: 2    –7.66667   3.72678 –2.05718   0.0397
x1: 3    –5.33333   3.72678 –1.43108   0.1524
x2: 2       0.0   3.72678      0.0   1.0000
x2: 3       0.0   3.72678      0.0   1.0000

julia> glm( @formula( y ~ x1 + x2), df, Poisson())

Formula: y ~ 1 + x1 + x2

Coefficients:
Estimate Std.Error     z value Pr(>|z|)
(Intercept)       3.04452  0.170899     17.8148   <1e–70
x1: 2      –0.454255  0.202171    –2.24689   0.0246
x1: 3      –0.292987  0.192742     –1.5201   0.1285
x2: 2  4.61065e–16       0.2 2.30532e–15   1.0000
x2: 3  3.44687e–17       0.2 1.72344e–16   1.0000

julia> glm( @formula( y ~ x1 + x2), df, NegativeBinomial())

Formula: y ~ 1 + x1 + x2

Coefficients:
Estimate Std.Error     z value Pr(>|z|)
(Intercept)      –0.04652 0.0106111     –4.3841    <1e–4
x1: 2     –0.0258006 0.0138336    –1.86507   0.0622
x1: 3     –0.0153554  0.012453    –1.23306   0.2176
x2: 2  2.66415e–13 0.0130304 2.04457e–11   1.0000
x2: 3  1.21585e–13 0.0130304 9.33093e–12   1.0000```

### Tobit (Censored Dependent Variable)

Censored variable with masspoint at zero.

Tobin, James (1958). “Estimation of relationships for limited dependent variables”. Econometrica. 26 (1): 24

## No-Intercept Regressions

snippet.juliarepl
```julia> using DataFrames, GLM, CategoricalArrays

julia> yb= [1.0:10.0;]; df= DataFrame( y=yb, x1= yb.^2, x2= yb.^3 );

julia> lm( @formula(y ~ –1 + x1 + x2), df )	## note the –1 in the formula
StatsModels.DataFrameRegressionModel{LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

Formula: y ~ x1 + x2

Coefficients:
Estimate  Std.Error  t value Pr(>|t|)
x1     0.288149  0.0261222  11.0308    <1e–5
x2   –0.0193578 0.00295594 –6.54879   0.0002```

## Independent Categorical Variables

### Dummies

Julia understands that categorical variables should be included as dummies. In the following example, x2 has three values, and the regression output reports the coefficient on the dummy where x2==2 and on the dummy where x2==3. (The x2==1 dummy is the intercept).

snippet.juliarepl
```julia> using DataFrames, GLM, CategoricalArrays

julia> yb= [1.0:10.0;]; df= DataFrame( y=yb, x1= yb.^2, x2= categorical(yb.^3 .% 3) )
10×3 DataFrame
│ Row │ y       │ x1      │ x2           │
│     │ Float64 │ Float64 │ Categorical… │
├─────┼─────────┼─────────┼──────────────┤
│ 1   │ 1.0     │ 1.0     │ 1.0          │
│ 2   │ 2.0     │ 4.0     │ 2.0          │
│ 3   │ 3.0     │ 9.0     │ 0.0          │
│ 4   │ 4.0     │ 16.0    │ 1.0          │
│ 5   │ 5.0     │ 25.0    │ 2.0          │
│ 6   │ 6.0     │ 36.0    │ 0.0          │
│ 7   │ 7.0     │ 49.0    │ 1.0          │
│ 8   │ 8.0     │ 64.0    │ 2.0          │
│ 9   │ 9.0     │ 81.0    │ 0.0          │
│ 10  │ 10.0    │ 100.0   │ 1.0          │

julia> lm( @formula( y ~ x1+x2 ), df )
StatsModels.DataFrameRegressionModel{LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

Formula: y ~ 1 + x1 + x2

Coefficients:
Estimate  Std.Error    t value Pr(>|t|)
(Intercept)     2.35583   0.557468    4.22594   0.0055
x1             0.086766 0.00774037    11.2095    <1e–4
x2: 1.0       –0.456617   0.599092  –0.762182   0.4748
x2: 2.0      –0.0455741   0.646077 –0.0705396   0.9461```

### Fixed Effects

The key package is https://github.com/matthieugomez/FixedEffectModels.jl. The following example is copied from the RegressionTables.jl documentation:

snippet.juliafix
```[download only julia statements]
julia> using RegressionTables, DataFrames, FixedEffectModels, RDatasets

julia> df= dataset("datasets", "iris");  ## 150 data points

julia> df[:SpeciesDummy]= categorical(df[:Species]);

6×6 DataFrame
│ Row │ SepalLength │ SepalWidth │ PetalLength │ PetalWidth │ Species      │ SpeciesDummy │
│     │ Float64     │ Float64    │ Float64     │ Float64    │ Categorical… │ Categorical… │
├─────┼─────────────┼────────────┼─────────────┼────────────┼──────────────┼──────────────┤
│ 1   │ 5.1         │ 3.5        │ 1.4         │ 0.2        │ setosa       │ setosa       │
│ 2   │ 4.9         │ 3.0        │ 1.4         │ 0.2        │ setosa       │ setosa       │
│ 3   │ 4.7         │ 3.2        │ 1.3         │ 0.2        │ setosa       │ setosa       │
│ 4   │ 4.6         │ 3.1        │ 1.5         │ 0.2        │ setosa       │ setosa       │
│ 5   │ 5.0         │ 3.6        │ 1.4         │ 0.2        │ setosa       │ setosa       │
│ 6   │ 5.4         │ 3.9        │ 1.7         │ 0.4        │ setosa       │ setosa       │

julia> rr1= reg(df, @model(SepalLength ~ SepalWidth   , fe = SpeciesDummy));

julia> rr2= reg(df, @model(SepalLength ~ SepalWidth + PetalLength   , fe = SpeciesDummy));

julia> rr3= reg(df, @model(SepalLength ~ SepalWidth + PetalLength + PetalWidth  , fe = SpeciesDummy));

julia> rr4= reg(df, @model(SepalWidth ~ SepalLength + PetalLength + PetalWidth  , fe = SpeciesDummy))
Fixed Effect Model
======================================================================
Number of obs:                 150  Degrees of freedom:              6
R2:                          0.635  R2 within:                   0.391
F-Statistic:               77.5189  p-value:                     0.000
Iterations:                      1  Converged:                    true
======================================================================
Estimate Std.Error t value Pr(>|t|) Lower 95%  Upper 95%
----------------------------------------------------------------------
SepalLength   0.377773  0.065569 5.76147    0.000  0.248171   0.507375
PetalLength  -0.187567 0.0834929 -2.2465    0.026 -0.352597 -0.0225366
PetalWidth     0.62571  0.123376 5.07156    0.000  0.381848   0.869573
======================================================================

julia> regtable(rr1,rr2,rr3,rr4; renderSettings = asciiOutput())

----------------------------------------------------------
SepalLength            SepalWidth
------------------------------   ----------
(1)        (2)        (3)          (4)
----------------------------------------------------------
SepalWidth     0.804***   0.432***   0.496***
(0.106)    (0.081)    (0.086)
PetalLength               0.776***   0.829***      -0.188*
(0.064)    (0.069)      (0.083)
PetalWidth                            -0.315*     0.626***
(0.151)      (0.123)
SepalLength                                       0.378***
(0.066)
----------------------------------------------------------
SpeciesDummy        Yes        Yes        Yes          Yes
----------------------------------------------------------
Estimator           OLS        OLS        OLS          OLS
----------------------------------------------------------
N                   150        150        150          150
R2                0.726      0.863      0.867        0.635
----------------------------------------------------------```
• Use `renderSettings = latexOutput("tbl.tex"))` to create a LaTeX version, `renderSettings = htmlOutput()` to create an html version.
• David Bates' MixedModels.jl is modelled after the R lme4 package. It allows for sophisticated fixed and random effects, including instrumental variables. It has more features and thus carries a higher learning curve than FixedEffects.

### Anova

snippet.juliarepl
```julia> using DataFrames, GLM, ANOVA

julia> yb= [1.0:20.0;]; df= DataFrame( y=yb, x1= yb.^2, x2= categorical(yb.^3 .% 5) );

julia> amodel= fit(LinearModel, @formula(y ~ x1+x2), df, contrasts= Dict(:x2 => EffectsCoding()))
StatsModels.DataFrameRegressionModel{LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

Formula: y ~ 1 + x1 + x2

Coefficients:
Estimate  Std.Error   t value Pr(>|t|)
(Intercept)     4.07463   0.566125    7.1974    <1e–5
x1            0.0447761 0.00301538   14.8492    <1e–9
x2: 1.0       –0.208955   0.739998 –0.282373   0.7818
x2: 2.0       0.0895522   0.730127  0.122653   0.9041
x2: 3.0      –0.0149254    0.73311 –0.020359   0.9840
x2: 4.0        0.104478   0.732589  0.142614   0.8886

julia> anova(amodel)
3×6 DataFrame
│ Row │ Source    │ DF        │ SS        │ MSS       │ F         │ p           │
│     │ String    │ Abstract… │ Abstract… │ Abstract… │ Abstract… │ Abstract…   │
├─────┼───────────┼───────────┼───────────┼───────────┼───────────┼─────────────┤
│ 1   │ x1        │ 1.0       │ 587.687   │ 587.687   │ 220.5     │ 5.81677e–10 │
│ 2   │ x2        │ 4.0       │ 0.247543  │ 0.0618857 │ 0.0232195 │ 0.998816    │
│ 3   │ Residuals │ 14.0      │ 37.3134   │ 2.66525   │ 0.0       │ 0.0         │```
• Because there are more parameters to fit, we cannot use the `lm` shortcut

## Standardized Coefficients

Every statistician must understand why standardized coefficients (i.e., coefficients multiplied by `se(x)/se(y)`) are so useful before being let loose on humanity. They make it possible to understand the “economic significance” rather than just the “statistical significance” of a regressor. Roughly, they tell how much one standard deviation difference in X influences Y, in terms of Y's standard deviation.

Standardized coefficients are covered in multistats.

## Consistent Standard Errors

For a nice introduction, see Long-Ervin (AS 2000). The original heteroskedasticity correction in econometrics traces back at least to White (1980). It is now often known as HC0. In finite samples, HC3 is better than HC0.

The examples are sensitive to the precise invokations. You must use `glm` and the dataframe interface, rather than the simpler `lm`.

### White (1980): Heteroskedasticity

snippet.juliarepl
```julia> using GLM, CovarianceMatrices

julia> yb= [1.0:10.0;]; df= DataFrame( y=yb, x1= yb.^2, x2= yb.^3 );

julia> mylm= glm( @formula(y~x1+x2), df, Normal() )

Formula: y ~ 1 + x1 + x2

Coefficients:
Estimate  Std.Error  z value Pr(>|z|)
(Intercept)       1.247   0.179253  6.95663   <1e–11
x1             0.201943  0.0158779  12.7185   <1e–36
x2           –0.0116423 0.00157859 –7.37515   <1e–12

julia> vcov( mylm, HC0 )		## White corrected coefficient var-cov
3×3 Array{Float64,2}:
0.0455358    –0.00294435    0.000260202
–0.00294435    0.000234629  –2.18796e–5
0.000260202  –2.18796e–5    2.08003e–6

julia> stderror( mylm, HC3 )		## Much better in Small Samples
3-element Array{Float64,1}:
0.3652189222615689
0.031003033896914146
0.003246364827550289```
• : the correct function name is `stderror` not `stderr`.

### Newey-West (1994): Heteroskedasticity and Autocorrelation

snippet.juliafixme
```[download only julia statements]
julia> using GLM, CovarianceMatrices, DataFrames

julia> yb= [1.0:10.0;]; df= DataFrame( y=yb, x1= yb.^2, x2= yb.^3 );

julia> myglm = glm( @formula( y ~ x1 + x2), df, Normal() )

Formula: y ~ 1 + x1 + x2

Coefficients:
Estimate  Std.Error  z value Pr(>|z|)
(Intercept)       1.247   0.179253  6.95663   <1e-11
x1             0.201943  0.0158779  12.7185   <1e-36
x2           -0.0116423 0.00157859 -7.37515   <1e-12

3-element Array{Float64,1}:
0.09847482841240222
0.010707835190040621
0.0010556957499695354```

For Andrews' spectral autocorrelation, use `vcov(mylm, QuadraticSpectralKernel(Andrews), prewhite = false)`.

## Fitting Non-Linear Functions Via Least-Squares Optimization (LsqFit)

LsqFit has a nice example, reproduced here.

snippet.juliarepl
```julia> using Random;  Random.seed!(0);

julia> expmodel(x, theta)= theta[1] * exp.(-x .* theta[2]);  ## or use `@. expmodel(x, p) = p[1]*exp(-x*p[2])`

julia> xdata= range(0; stop=10, length=20); trueparms= [1.0 2.0];

julia> ydata= expmodel(xdata, trueparms) + 0.1*randn(length(xdata));  ## the "noisy" data

julia> # # # above was the data generation.  now comes the loading and modification of LsqFit # # #

julia> using Statistics, LsqFit;

julia> import Base.summary		## we need a nicer summary function for the results

julia> function summary( f::LsqFit.LsqFitResult{Float64,1}; sigdigits=3 );
if (!f.converged) println("optimizer did not converge!"); return; end
(length(fit.wt)==0) || error("this summary does not yet work with weights")
print("Degrees of Freedom: \$(f.dof).  ")
println("RMSE: \$(round(mean(f.resid.^2);sigdigits=sigdigits)).  SDE: \$(round(std(f.resid);sigdigits=sigdigits)).")
println("Parameter Estimates: \$(f.param)")
end;##function##

julia> rsq( f::LsqFit.LsqFitResult{Float64,1}, ydata::Vector{Float64}; sigdigits=3 )= 1.0 - mean( f.resid.^2 )/std(ydata);

julia> # # # # now we fit the model and print the results of the fitting # # # #

julia> fit= curve_fit( expmodel, xdata, ydata, [0.5, 0.5] );

julia> summary(fit)		## the parameter estimate is not [1,2] but [1.07,1.91] --- estimation error
Degrees of Freedom: 18.  RMSE: 0.00983.  SDE: 0.102.
Parameter Estimates: [1.07429, 1.90889]

julia> rsq( fit, ydata )
0.9638146161544147

julia> standard_error(fit)		## coefficient standard error
2-element Array{Float64,1}:
0.10357853033234828
0.406806155417925

julia> margin_error(fit, 1.0–0.95)	## coefficient 5% margin
2-element Array{Float64,1}:
0.2176104172710054
0.8546680180232399

julia> confidence_interval(fit, 0.05)	## plus or minus margin error
2-element Array{Tuple{Float64,Float64},1}:
(0.8566787016414885, 1.2918995361834993)
(1.0542196905114256, 2.7635557265579056)

julia> yfitted= expmodel( xdata, fit.param );

julia> using Plots;

julia> scatter( xdata, ydata, labels="actual" )

julia> plot!( xdata, yfitted, labels="fitted" )

julia> savefig("plotting/lsqfit.png")```

## Instrumental Variables

snippet.juliarepl
```julia> using DataFrames, Random; Random.seed!(0);  ## create data set first

julia> N=5000; instrument= randn(N); unobsvar=randn(N);

julia> x= 4.0 .+ 1.0*instrument .+ 2.0*unobsvar + randn(N);	## x is related both to :instrument and :unobsvar

julia> y= 5.0 .+ 0.0*x .+ 3.0*unobsvar .+ randn(N);		## the true x is unrelated to y

julia> d= DataFrame( x=x, y=y, instrument=instrument, unobsvar=unobsvar ); describe(d)
4×8 DataFrame
│ Row │ variable   │ mean       │ min      │ median     │ max     │ nunique │ nmissing │ eltype   │
│     │ Symbol     │ Float64    │ Float64  │ Float64    │ Float64 │ Nothing │ Nothing  │ DataType │
├─────┼────────────┼────────────┼──────────┼────────────┼─────────┼─────────┼──────────┼──────────┤
│ 1   │ x          │ 4.00319    │ –4.91    │ 3.99529    │ 13.5128 │         │          │ Float64  │
│ 2   │ y          │ 5.04997    │ –4.93747 │ 5.01014    │ 17.4715 │         │          │ Float64  │
│ 3   │ instrument │ –0.0266859 │ –3.6203  │ –0.0176785 │ 4.37616 │         │          │ Float64  │
│ 4   │ unobsvar   │ 0.0103825  │ –3.22047 │ –0.0138893 │ 3.65663 │         │          │ Float64  │

julia> using FixedEffectModels

julia> reg(d, @model( y ~ x + unobsvar ))		## if we could observe :unobsvar, it would be great!
Linear Model
========================================================================
Number of obs:                 5000  Degrees of freedom:               3
F Statistic:               0.434644  p-value:                      1.000
========================================================================
Estimate  Std.Error  t value Pr(>|t|) Lower 95% Upper 95%
------------------------------------------------------------------------
x            0.00381245 0.00991791 0.384401    0.701 –0.015631 0.0232559
unobsvar        2.99684  0.0241536  124.074    0.000   2.94949   3.04419
(Intercept)     5.00359  0.0419245  119.348    0.000    4.9214   5.08578
========================================================================

julia> reg(d, @model( y ~ x ))				## but we can't, so this would make a big mistake
Linear Model
====================================================================
Number of obs:               5000  Degrees of freedom:             2
F Statistic:              7381.61  p-value:                    0.000
====================================================================
Estimate Std.Error t value Pr(>|t|) Lower 95% Upper 95%
--------------------------------------------------------------------
x             1.00355 0.0116805 85.9163    0.000   0.98065   1.02645
(Intercept)   1.03257 0.0546984 18.8775    0.000  0.925335    1.1398
====================================================================

julia> reg(d, @model( y ~ ( x ~ instrument) ))        ## but we can use a crutch: our instrument
IV Model
========================================================================
Number of obs:                 5000  Degrees of freedom:               2
F-Statistic:               0.470085  p-value:                      1.000
First Stage F-stat (KP):    945.133  First Stage p-val (KP):       0.000
========================================================================
Estimate Std.Error   t value Pr(>|t|) Lower 95% Upper 95%
------------------------------------------------------------------------
x            –0.0322202 0.0469937 –0.685627    0.493 –0.124349 0.0599082
(Intercept)     5.17895  0.193555   26.7569    0.000    4.7995    5.5584
========================================================================

julia> #### Let's do this by hand.  It should get us the same coefs, but wrong standard errors ####

julia> using GLM;  xp= predict( lm( @formula( x ~ instrument ), d ));  ## First Stage: Fitted x

julia> lm( hcat(xp,fill(1.0,5000)), d.y )	## Second State.
LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}}:

Coefficients:
Estimate Std.Error   t value Pr(>|t|)
x1   –0.0322202 0.0461032 –0.698871   0.4847
x2      5.17895  0.189888   27.2738   <1e–99```
• The standard errors are only correct if you use the explicit IV formulation, rather than the 2-stage least-squares at the end.
• IV is conceptually very subtle—seemingly easy, yet difficult.
• The :instrument must affect :y only via the direct channel through :x. It must not work mysteriously through any other channels.
• If :instrument and :x have poor correlation, the final coefficient (x1 on y, as above) becomes very diffuse. In the extreme, if the instrument is unrelated random noise, the coefficient (weirdly) becomes unbiased, although it has huge standard error.

# Backmatter

## Useful Packages on Julia Repository

• Mocha for neural network learning
• LightML for many statistical techniques now grouped as “machine learning.”
• Extended Multivariate Statistics. Linear Least Square Regression, Ridge Regression, Data Whitening, Principal Components Analysis (PCA), Canonical Correlation Analysis (CCA), Classical Multidimensional Scaling (MDS), Linear Discriminant Analysis (LDA), Multiclass LDA, Independent Component Analysis (ICA), FastICA, Probabilistic PCA, Factor Analysis, Kernel PCA

`END`

## Modifying the Default Output of GLM

EPIC FAILURE Attempt to Modify the Default Output of GLM

The goal here is to integrate both standardized coefficients and heteroskedasticity-adjusted standard errors into the standard “output” function for glm results, and to add some more useful statistics, too (like the R^2). Ideally, we would also get rid of the long `StatsModels.DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,1},Normal{Float64},IdentityLink},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}`, unless specifically requested.

From GLM.jl:

snippet.juliarepl
```julia> using DataFrames, RegressionTables, GLM, CovarianceMatrices, FixedEffectModels

julia> struct GLMResult <: FixedEffectModels.AbstractRegressionResult
coef::Vector{Float64}   # Vector of coefficients
vcov::Matrix{Float64}   # Covariance matrix

coefnames::Vector       # Name of coefficients
yname::Symbol           # Name of dependent variable

nobs::Int64             # Number of observations
df_residual::Int64      # degrees of freedoms

r2::Float64             # R squared
end;##struct##

julia> function tooutput(x::StatsModels.DataFrameRegressionModel, covariance::Matrix)
GLMResult(
coef(x),
covariance,
coefnames(x),
x.mf.terms.eterms[1],
Int(nobs(x)),
Int(nobs(x))-length(coefnames(x)),
0.0, ## r2 dows not work for DataFrameRegressionModels?
0.0)
end;##function##

julia> yb= [1.0:10.0;]; df= DataFrame( y=vcat(yb,999.9,missing) , x1= vcat(yb.^2,missing,999.9) , x2= vcat(yb.^3, 999.9, 999.9) );

julia> lm1= glm( @formula( y ~ x1+x2 ), df, Normal() )

julia> iid= tooutput(lm1, vcov(lm1))		## OLS  --- FAILS
ERROR: function title has no general method for AbstractRegressionResult

julia> rob= tooutput(lm1, vcov(lm1, HC3))	## HC = White; --- FAILS
ERROR: function title has no general method for AbstractRegressionResult

julia> regtable(iid, rob)			## FAILS
ERROR: type GLMResult has no field dof_residual```