User Tools

Site Tools


multistatsmore

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
multistatsmore [2018/11/05 15:30]
ivo [Non-Linear Function Least-Squares Optimization (LsqFit)]
multistatsmore [2018/12/27 13:27] (current)
Line 4: Line 4:
 ~~TOC 1-3 wide~~ ~~TOC 1-3 wide~~
  
 +---
  
 ```juliarepl ```juliarepl
-julia> pkgchk( [  "​julia"​ => v"​1.0.1",​ "​CovarianceMatrices"​ => v"​0.9.0",​ "​LsqFit"​ => v"​0.6.0"​ ] )+julia> pkgchk.( [  "​julia"​ => v"1.0.3", "​DataFrames"​ => v"0.14.1", "​GLM"​ => v"​1.0.1",​ "​Plots"​ => v"​0.21.0", "​CovarianceMatrices"​ => v"​0.9.0",​ "​LsqFit"​ => v"​0.6.0"​, "​RegressionTables"​ => v"​0.2.0",​ "​FixedEffectModels"​ => v"​0.6.1",​ "​CategoricalArrays"​ => v"​0.4.0",​ "​LsqFit"​ => v"​0.6.0",​ "​ANOVA"​ => v"​0.1.0"  ​] );
 ``` ```
 +
  
 # More Multivariate Regression Topics # More Multivariate Regression Topics
  
-Do not use the statistical recipes here for [[unitimeseries|time-series]] data.  OLS can be wrong and quite sensitive here.+Most of the examples reuse the artificial data set from [[multistats]]:
  
 +```julianoeval
 +julia> yb= [1.0:​10.0;​];​ df= DataFrame( y=yb, x1= yb.^2, x2= yb.^3 );
 +```
  
-## Standardized Coefficients+They should also work with Missing observations. ​ For example, you could instead work with
  
-Every statistician must understand why standardized coefficients (i.e., coefficients multiplied by `se(x)/se(y)`) are so useful before being let loose on humanity They make it possible to understand the "​economic significance"​ rather than just the "​statistical significance"​ of a regressor ​Roughlythey tell how much one standard deviation difference in X influences Yin terms of Y's standard deviation.+```julianoeval 
 +julia> yb= [1.0:10.0;]; mydf= DataFrame( y=vcat(yb,​999.9,​missing, x1= vcat(yb.^2,​missing,​999.9) x2= vcat(yb.^3999.0, 999.0) );
  
-Standardized coefficients are covered in [[multistats]].  ​+```
  
  
-## No-Intercept Regressions+ 
 +## Non(-Conditionally) Normal Dependent Variables 
 + 
 +OLS is surprisingly robust with respect to violations of the normal error assumption. ​ However, it is easy to improve the specification and thus not have to appeal the robustness of OLS. 
 + 
 +### Probit and Logit (Binary Dependent Variable) 
 + 
 +These are often used to model cases in which the dependent variable can only take two values. ​ The following example shows first an OLS, then a Probit, and finally a Logit:
  
 ```juliarepl ```juliarepl
-julia> using DataFrames, ​GLM, CategoricalArrays+julia> using GLM, DataFrames
  
-julia> df= DataFrame(y= [18.,17,15,​20,​10,​20,​25,​13,​12], x1= sin.(1.0:9.0;) );+julia> df= DataFrame( y= repeat( ​[1,0], 5); x1= 1:10, x2= sin.((1:10)) ) 
 +10×3 DataFrame 
 +│ Row │ y     │ x1    │ x2        │ 
 +│     │ Int64 │ Int64 │ Float64 ​  │ 
 +├─────┼───────┼───────┼───────────┤ 
 +│ 1   │ 1     │ 1     │ 0.841471 ​ │ 
 +│ 2   ​│ ​    │ 2     │ 0.909297 ​ │ 
 +│ 3   │ 1     │ 3     │ 0.14112 ​  │ 
 +│ 4   │ 0     │ 4     │ -0.756802 │ 
 +│ 5   │ 1     │ 5     │ -0.958924 │ 
 +│ 6   │ 0     │ 6     │ -0.279415 │ 
 +│ 7   │ 1     │ 7     │ 0.656987 ​ │ 
 +│ 8   │ 0     │ 8     │ 0.989358 ​ │ 
 +│   │ 1     │ 9     │ 0.412118 ​ │ 
 +│ 10  │     │ 10    │ -0.544021 │
  
-julia> ​glm( @formula(y~ ​-1+x1), df, Normal()+julia> ​lm( @formula( y ~ x1 x2 ), df ) 
-StatsModels.DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,​1},​Normal{Float64},​IdentityLink},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}}+StatsModels.DataFrameRegressionModel{LinearModel{LmResp{Array{Float64,​1}},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}}
  
-Formula: y ~ +x1+Formula: y ~ + x1 + x2
  
 Coefficients:​ Coefficients:​
-     Estimate Std.Error ​ ​z ​value Pr(>|z|) +               Estimate Std.Error ​  t value Pr(>|t|) 
-x1    7.78657   ​7.96923 0.977079 ​  0.3285 +(Intercept) ​   0.644264 ​ 0.412811 ​  ​1.56068   0.1626 
-```+x1           ​-0.0277944 0.0655179 -0.424227 ​  0.6841 
 +x2            0.0609811 ​ 0.271558 ​  ​0.22456 ​  ​0.8287
  
  
 +julia> glm( @formula( y ~ x1 + x2 ), df, Binomial(), ProbitLink() ) ## f(mu_Y) = invnorm(P)
 +StatsModels.DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,​1},​Binomial{Float64},​ProbitLink},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}}
  
-## Dummies+Formula: y ~ 1 + x1 + x2
  
-Julia understands that categorical variables should be included as dummies.  ​In the following example, ​x2 has three values:+Coefficients:​ 
 +               ​Estimate Std.Error   z value Pr(>​|z|) 
 +(Intercept) ​   0.374207 ​ 0.897261 ​ 0.417054 ​  ​0.6766 
 +x1           ​-0.0722244 ​ 0.142676 -0.506213 ​  ​0.6127 
 +x2             0.157343 ​ 0.584762 ​ 0.269072 ​  ​0.7879
  
-```juliarepl 
-julia> using DataFrames, GLM, CategoricalArrays 
  
-julia> df= DataFrame(y= [18.,​17,​15,​20,​10,​20,​25,​13,​12],​ x1= sin.(1.0:​9.0;​),​ x2= categorical([1,​1,​1,​2,​2,​2,​3,​3,​3]));​ +julia> glm( @formula( y ~ x1 + x2 ), df, Binomial(), LogitLink() ) ## f(mu_Y) = log(P/(1-P)) 
- +StatsModels.DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,​1},​Binomial{Float64},LogitLink},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}}
-julia> glm( @formula(y~x1+x2),​ df, Normal()) +
-StatsModels.DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,​1},​Normal{Float64},IdentityLink},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}}+
  
 Formula: y ~ 1 + x1 + x2 Formula: y ~ 1 + x1 + x2
  
 Coefficients:​ Coefficients:​
-              Estimate Std.Error ​   z value Pr(>​|z|) +              Estimate Std.Error ​  ​z value Pr(>​|z|) 
-(Intercept) ​   13.3609   ​5.12237    2.60834   0.0091 +(Intercept) ​  0.596395 ​  ​1.44895  0.411604 ​  0.6806 
-x1             5.24206 ​  ​6.32732 ​   ​0.82848   0.4074 +x1           -0.114614  ​0.230644 -0.496932 ​  0.6192 
-x2: 2          6.79201 ​  ​9.37251 ​  0.724674 ​  0.4687 +x2            0.251826 ​ 0.938053  ​0.268456 ​  0.7883
-x2: 3        -0.291065 ​  4.55604 -0.0638855 ​  0.9491+
  
 ``` ```
  
  
-## Heteroskedasticity-Consistent Standard Errors 
  
-For a nice introduction,​ see Long-Ervin ​(AS 2000).  The original heteroskedasticity correction in econometrics traces back at least to White (1980).  It is now often known as HC0.  In finite samples, HC3 is better than HC0.+### Poisson ​(Log-Linearand Negative Binomial ​(Count Dependent Variable)
  
-The examples ​are sensitive ​to the precise invokations.  ​You must use `glm` and the dataframe interfacerather than the simpler `lm`. +These are often used to model cases in which the dependent variable can only take non-negative integers, such as count data.  ​However, the dependent variable should still be of type Float. ​ The following example has been adapted from the GLM documentation,​ and shows first an OLS, then a Poisson, and finally a Negative Binomial model.
- +
-### White (1980)+
  
 ```juliarepl ```juliarepl
-julia> using GLM, CovarianceMatrices+julia> using DataFrames, ​GLM, CategoricalArrays
  
-julia> ​yb= [1.0:​10.0;​]; ​df= DataFrame( y=vcat(yb,999.9,missing) ​, x1= vcat(yb.^2,missing,999.9) , x2= vcat(yb.^3, 999.9999.9) );+julia> df= DataFrame(y=[18.,17,​15,​20,​10,​20,​25,​13,​12], x1=categorical([1,2,3,1,​2,​3,​1,​2,​3]), x2=categorical([1,​1,​1,​2,​2,​2,​3,3,3])) 
 +9×3 DataFrame 
 +│ Row │ y       │ x1      │ x2    │ 
 +│     │ Float64 │ Categorical… │ Categorical… │ 
 +├─────┼─────────┼──────────────┼──────────────┤ 
 +│ 1   │ 18.0    │ 1            │ 1            │ 
 +│ 2   │ 17.0    │ 2            │ 1            │ 
 +│ 3   │ 15.0    │ 3            │ 1            │ 
 +│ 4   │ 20.0    │ 1            │ 2            │ 
 +│ 5   │ 10.0    │ 2            │ 2            │ 
 +│ 6   │ 20.0    │ 3            │ 2            │ 
 +│ 7   │ 25.0    │ 1            │ 3            │ 
 +│ 8   │ 13.0    │ 2            │ 3            │ 
 +│ 9   │ 12.0    │ 3            │ 3            │
  
-julia> ​lm1= glm( @formula(y~x1+x2),​ df, Normal(), IdentityLink())+julia> glm( @formula( y ~ x1 + x2), df, Normal() )
 StatsModels.DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,​1},​Normal{Float64},​IdentityLink},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}} StatsModels.DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,​1},​Normal{Float64},​IdentityLink},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}}
  
Line 83: Line 122:
  
 Coefficients:​ Coefficients:​
-               Estimate ​ Std.Error ​ z value Pr(>​|z|) +              ​Estimate Std.Error ​ z value Pr(>​|z|) 
-(Intercept) ​      1.247   0.179253 ​ 6.95663   <1e-11 +(Intercept) ​      21.0   3.40207  6.17271    ​<1e-9 
-x1             0.201943  ​0.0158779 ​ 12.7185   ​<1e-36 +x1: 2    -7.66667   ​3.72678 -2.05718 ​  0.0397 
-x2           -0.0116423 ​0.00157859 -7.37515   ​<1e-12+x1: 3    -5.33333   ​3.72678 ​-1.43108 ​  ​0.1524 
 +x2: 2       0.0   3.72678      0.  ​1.0000 
 +x2: 3       ​0.0 ​  ​3.72678 ​     0.0   ​1.0000
  
 +julia> glm( @formula( y ~ x1 + x2), df, Poisson())
 +StatsModels.DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,​1},​Poisson{Float64},​LogLink},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}}
  
-julia> vcov( lm1, HC0 ) ## White Standard Errors +Formulay ~ 1 + x1 + x2
-3×3 Array{Float64,​2}: +
-  0.0455358 ​   -0.00294435 ​   0.000260202 +
- ​-0.00294435 ​   0.000234629 ​ -2.18796e-5 +
-  0.000260202 ​ -2.18796e-5 ​   2.08003e-6+
  
-julia> ​vcovlm1HC3 ) ## Better in Small Samples +Coefficients:​ 
-3×3 Array{Float64,​2}:​ +                 ​Estimate Std.Error ​    z value Pr(>​|z|) 
-  0.133385 ​    -0.00995492 ​   0.000943407 +(Intercept) ​      ​3.04452 ​ 0.170899 ​    ​17.8148 ​  <​1e-70 
- -0.00995492 ​   0.000961188 ​ -9.92315e-5 +x1: 2      -0.454255 ​ 0.202171 ​   -2.24689 ​  ​0.0246 
-  ​0.000943407  ​-9.92315e-5    ​1.05389e-5+x1: 3      -0.292987 ​ 0.192742 ​    ​-1.5201 ​  ​0.1285 
 +x2: 2  4.61065e-16 ​      0.2 2.30532e-15 ​  ​1.0000 
 +x2: 3  3.44687e-17 ​      0.2 1.72344e-16 ​  ​1.0000 
 + 
 +julia> ​glm@formula( y ~ x1 + x2)df, NegativeBinomial()
 +StatsModels.DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,​1},​NegativeBinomial{Float64},​NegativeBinomialLink},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}} 
 + 
 +Formulay ~ 1 + x1 + x2 
 + 
 +Coefficients:​ 
 +                 ​Estimate Std.Error ​    z value Pr(>​|z|) 
 +(Intercept) ​     -0.04652 ​0.0106111 ​    ​-4.3841 ​   <1e-4 
 +x1: 2     -0.0258006 0.0138336 ​   -1.86507 ​  0.0622 
 +x1: 3     -0.0153554 ​ 0.012453 ​   -1.23306 ​  0.2176 
 +x2: 2  2.66415e-13 0.0130304 2.04457e-11   ​1.0000 
 +x2: 3  1.21585e-13 0.0130304 ​9.33093e-12   1.0000
  
 ``` ```
  
-FIXME Somehow, `stderr( lm1, HC0 )` now gives some TTY error?! 
  
  
 +### Tobit (Censored Dependent Variable)
  
 +Censored variable with masspoint at zero.
  
-### Newey-West ​(1994)+Tobin, James (1958). "​Estimation of relationships for limited dependent variables"​. Econometrica. 26 (1): 24
  
-```juliafixme 
-julia> using GLM, CovarianceMatrices,​ DataFrames 
  
-julia> yb= [1.0:​10.0;​];​ df= DataFrame( y=vcat(yb,​999.9,​missing) , x1= vcat(yb.^2,​missing,​999.9) , x2= vcat(yb.^3, 999.9, 999.9) ); 
  
-julia> mylmg = glm( @formula( y ~ x1 + x2), df, Normal(), IdentityLink()) 
-StatsModels.DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,​1},​Normal{Float64},​IdentityLink},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}} 
  
-Formula: y ~ 1 + x1 + x2+## No-Intercept Regressions
  
-Coefficients:​ +```juliarepl 
-               Estimate ​ Std.Error ​ z value Pr(>|z|) +juliausing DataFrames, GLM, CategoricalArrays
-(Intercept) ​      ​1.247 ​  ​0.179253 ​ 6.95663 ​  <​1e-11 +
-x1             ​0.201943 ​ 0.0158779 ​ 12.7185 ​  <​1e-36 +
-x2           ​-0.0116423 0.00157859 -7.37515 ​  <​1e-12+
  
-julia> ​stderror(mylmgQuadraticSpectralKernel(NeweyWest)prewhite=false+julia> ​yb= [1.0:​10.0;​];​ df= DataFramey=ybx1= yb.^2x2yb.^3 ); 
-3-element ​Array{Float64,​1}:​ + 
- 0.09847482841240222 +julia> lm( @formula(y ~ -1 + x1 + x2), df ) ## note the -1 in the formula 
- 0.010707835190040621 +StatsModels.DataFrameRegressionModel{LinearModel{LmResp{Array{Float64,​1}},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}} 
- 0.0010556957499695354+ 
 +Formulay ~ x1 + x2 
 + 
 +Coefficients:​ 
 +       ​Estimate ​ Std.Error  t value Pr(>|t|) 
 +x1     0.288149 ​ 0.0261222 ​ 11.0308 ​   <1e-5 
 +x2   ​-0.0193578 0.00295594 -6.54879 ​  0.0002
  
 ``` ```
  
-For Andrews'​ spectral autocorrelation,​ use `vcov(lm1, QuadraticSpectralKernel(Andrews),​ prewhite = false)`. 
  
  
 +## Independent Categorical Variables
 +
 +### Dummies
 +
 +Julia understands that categorical variables should be included as dummies. ​ In the following example, x2 has three values, and the regression output reports the coefficient on the dummy where x2==2 and on the dummy where x2==3. ​ (The x2==1 dummy is the intercept).
 +
 +```juliarepl
 +julia> using DataFrames, GLM, CategoricalArrays
 +
 +julia> yb= [1.0:​10.0;​];​ df= DataFrame( y=yb, x1= yb.^2, x2= categorical(yb.^3 .% 3) )
 +10×3 DataFrame
 +│ Row │ y       │ x1      │ x2           │
 +│     │ Float64 │ Float64 │ Categorical… │
 +├─────┼─────────┼─────────┼──────────────┤
 +│ 1   │ 1.0     │ 1.0     │ 1.0          │
 +│ 2   │ 2.0     │ 4.0     │ 2.0          │
 +│ 3   │ 3.0     │ 9.0     │ 0.0          │
 +│ 4   │ 4.0     │ 16.0    │ 1.0          │
 +│ 5   │ 5.0     │ 25.0    │ 2.0          │
 +│ 6   │ 6.0     │ 36.0    │ 0.0          │
 +│ 7   │ 7.0     │ 49.0    │ 1.0          │
 +│ 8   │ 8.0     │ 64.0    │ 2.0          │
 +│ 9   │ 9.0     │ 81.0    │ 0.0          │
 +│ 10  │ 10.0    │ 100.0   │ 1.0          │
 +
 +julia> lm( @formula( y ~ x1+x2 ), df )
 +StatsModels.DataFrameRegressionModel{LinearModel{LmResp{Array{Float64,​1}},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}}
 +
 +Formula: y ~ 1 + x1 + x2
 +
 +Coefficients:​
 +               ​Estimate ​ Std.Error ​   t value Pr(>|t|)
 +(Intercept) ​    ​2.35583 ​  ​0.557468 ​   4.22594 ​  ​0.0055
 +x1             ​0.086766 0.00774037 ​   11.2095 ​   <1e-4
 +x2: 1.0       ​-0.456617 ​  ​0.599092 ​ -0.762182 ​  ​0.4748
 +x2: 2.0      -0.0455741 ​  ​0.646077 -0.0705396 ​  ​0.9461
 +
 +```
  
  
-## Fixed-Effects ​Regression+### Fixed Effects
  
 The key package is [https://​github.com/​matthieugomez/​FixedEffectModels.jl](https://​github.com/​matthieugomez/​FixedEffectModels.jl). ​ The following example is copied from the [RegressionTables.jl](https://​github.com/​jmboehm/​RegressionTables.jl) documentation:​ The key package is [https://​github.com/​matthieugomez/​FixedEffectModels.jl](https://​github.com/​matthieugomez/​FixedEffectModels.jl). ​ The following example is copied from the [RegressionTables.jl](https://​github.com/​jmboehm/​RegressionTables.jl) documentation:​
Line 210: Line 300:
 ``` ```
  
-* `regtable(rr1,​rr2,​rr3,​rr4; ​renderSettings = latexOutput("​tbl.tex"​))` ​would create a LaTeX version, `renderSettings = htmlOutput()` ​would create an html version.+Use `renderSettings = latexOutput("​tbl.tex"​))` ​to create a LaTeX version, `renderSettings = htmlOutput()` ​to create an html version.
  
  
 +* David Bates' [MixedModels.jl](http://​dmbates.github.io/​MixedModels.jl/​) is modelled after the R lme4 package. ​ It allows for sophisticated fixed and random effects, including instrumental variables. ​ It has more features and thus carries a higher learning curve than FixedEffects.
  
  
-### Mixed Models (with Fixed Effects) 
  
-David Bates' [MixedModels.jl](http://​dmbates.github.io/​MixedModels.jl/​) is modelled after the R lme4 package. ​ It allows for sophisticated fixed and random effects. ​ It has more features and thus a higher learning curve than FixedEffects. 
  
 +### Anova
  
 +```juliarepl
 +julia> using DataFrames, GLM, ANOVA
  
-## ProbitLogitand Poisson Regressions+julia> yb= [1.0:​20.0;​];​ df= DataFrame( y=ybx1= yb.^2x2= categorical(yb.^3 .% 5) );
  
-OLS is surprisingly robust with respect to violations of the normal error assumption ​Howeverit is better to do this right than to appeal to robustness.+julia> amodel= fit(LinearModel,​ @formula(y ~ x1+x2), df, contrasts= Dict(:x2 => EffectsCoding())) 
 +StatsModels.DataFrameRegressionModel{LinearModel{LmResp{Array{Float64,1}},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}}
  
-### Probit and Logit+Formula: y ~ 1 + x1 + x2
  
-These are often used to model cases in which the dependent variable can only take two values.  ​The following example shows first an OLS, then a Probit, and finally a Logit:+Coefficients:​ 
 +               ​Estimate ​ Std.Error   t value Pr(>​|t|) 
 +(Intercept) ​    ​4.07463 ​  ​0.566125 ​   7.1974 ​   <1e-5 
 +x1            0.0447761 0.00301538 ​  ​14.8492 ​   <1e-9 
 +x2: 1.0       ​-0.208955 ​  ​0.739998 -0.282373 ​  ​0.7818 
 +x2: 2.0       ​0.0895522 ​  ​0.730127 ​ 0.122653 ​  ​0.9041 
 +x23.0      -0.0149254 ​   0.73311 -0.020359 ​  ​0.9840 
 +x2: 4.0        0.104478 ​  ​0.732589 ​ 0.142614 ​  ​0.8886
  
-```juliarepl 
-julia> using GLM, DataFrames 
  
-julia> ​df= DataFramey= repeat( [1,0], 5); x1= 1:10, x2= sin.((1:​10)) ​+julia> ​anova(amodel
-10×3 DataFrame +3×6 DataFrame 
-│ Row │ y     │ x1    ​│ ​x2        │ +│ Row │ Source ​   ​│ ​DF        │ SS        │ MSS       │ F         │ p           │ 
-│     ​│ ​Int64 │ Int64 │ Float64 ​  │ +│     ​│ ​String ​   ​│ Abstract… ​│ Abstract… │ Abstract… │ Abstract… │ Abstract… ​  │ 
-├─────┼───────┼───────┼───────────┤ +├─────┼───────────┼───────────┼───────────┼───────────┼───────────┼─────────────┤ 
-│ 1   ​│ ​1     │ 1     ​│ 0.841471 ​ │ +│ 1   ​│ ​x1        ​│ 1.0       │ 587.687   │ 587.687   │ 220.5     │ 5.81677e-10 ​│ 
-│ 2   │ 0     │ 2     │ 0.909297 ​ │ +│   │ x2        ​│ 4.0       │ 0.247543  ​│ 0.0618857 ​│ 0.0232195 ​│ 0.998816 ​   ​│ 
-│ 3   │     │ 3     │ 0.14112   │ +│   │ Residuals ​│ 14.0      ​│ 37.3134   │ 2.66525   │ 0.0       ​│ 0.0         
-│   │ 0     │ 4     ​│ -0.756802 ​ +
-│ 5   │ 1     │ 5     │ -0.958924 │ +
-│ 6   │ 0     ​│ 6     │ -0.279415 ​│ +
-│   │ 1     │ 7     │ 0.656987 ​ │ +
-│ 8   │ 0     │ 8     │ 0.989358 ​ │ +
-│ 9   ​│ 1     │ 9     │ 0.412118 ​ │ +
-│ 10  │     ​│ 10    │ -0.544021 ​+
  
-julia> lm( @formula( y ~ x1 + x2 ), df ) +```
-StatsModels.DataFrameRegressionModel{LinearModel{LmResp{Array{Float64,​1}},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}}+
  
-Formula: y ~ 1 + x1 + x2+* Because there are more parameters to fit, we cannot use the `lm` shortcut
  
-Coefficients:​ 
-               ​Estimate Std.Error ​  t value Pr(>|t|) 
-(Intercept) ​   0.644264 ​ 0.412811 ​  ​1.56068 ​  ​0.1626 
-x1           ​-0.0277944 0.0655179 -0.424227 ​  ​0.6841 
-x2            0.0609811 ​ 0.271558 ​  ​0.22456 ​  ​0.8287 
  
  
-julia> glm( @formula( y ~ x1 + x2 ), df, Binomial(), ProbitLink() ) ## f(mu_Y) = invnorm(P) 
-StatsModels.DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,​1},​Binomial{Float64},​ProbitLink},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}} 
  
-Formula: y ~ 1 + x1 + x2 
  
-Coefficients:​ 
-               ​Estimate Std.Error ​  z value Pr(>|z|) 
-(Intercept) ​   0.374207 ​ 0.897261 ​ 0.417054 ​  ​0.6766 
-x1           ​-0.0722244 ​ 0.142676 -0.506213 ​  ​0.6127 
-x2             ​0.157343 ​ 0.584762 ​ 0.269072 ​  ​0.7879 
  
  
-julia> glm( @formula( y ~ x1 + x2 ), df, Binomial(), LogitLink() ) ## f(mu_Y) = log(P/​(1-P)) +## Standardized Coefficients
-StatsModels.DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,​1},​Binomial{Float64},​LogitLink},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}}+
  
-Formula: ​~ 1 + x1 + x2+Every statistician must understand why standardized coefficients (i.e., coefficients multiplied by `se(x)/se(y)`) are so useful before being let loose on humanity. ​ They make it possible to understand the "​economic significance"​ rather than just the "​statistical significance"​ of a regressor. ​ Roughly, they tell how much one standard deviation difference in X influences Y, in terms of Y's standard deviation.
  
-Coefficients:​ +Standardized coefficients are covered in [[multistats]].  ​
-              Estimate Std.Error   z value Pr(>​|z|) +
-(Intercept) ​  ​0.596395 ​  ​1.44895 ​ 0.411604 ​  ​0.6806 +
-x1           ​-0.114614 ​ 0.230644 -0.496932 ​  ​0.6192 +
-x2            0.251826 ​ 0.938053 ​ 0.268456 ​  ​0.7883+
  
-``` 
  
  
  
-### Poisson (Log-Linear) and Negative Binomial+## Consistent Standard Errors
  
-These are often used to model cases in which the dependent variable can only take non-negative integerssuch as count data.  ​However, the dependent variable should still be of type Float.  ​The following example has been adapted from the GLM documentationand shows first an OLS, then a Poisson, and finally a Negative Binomial model.+For a nice introductionsee Long-Ervin (AS 2000). ​ The original heteroskedasticity correction in econometrics traces back at least to White (1980). ​ It is now often known as HC0.  ​In finite samplesHC3 is better than HC0. 
 + 
 +The examples are sensitive to the precise invokations.  ​You must use `glm` and the dataframe interfacerather than the simpler `lm`. 
 + 
 +### White (1980): Heteroskedasticity
  
 ```juliarepl ```juliarepl
-julia> using DataFrames, ​GLM, CategoricalArrays+julia> using GLM, CovarianceMatrices
  
-julia> df= DataFrame(y=[18.,​17,​15,​20,​10,​20,​25,​13,​12], x1=categorical([1,​2,​3,​1,​2,​3,​1,​2,​3]), x2=categorical([1,​1,​1,​2,​2,​2,​3,3,3])+julia> ​yb= [1.0:​10.0;​]; ​df= DataFrame( y=yb, x1= yb.^2, x2= yb.^3 );
-9×3 DataFrame +
-│ Row │ y       │ x1      │ x2    │ +
-│     │ Float64 │ Categorical… │ Categorical… │ +
-├─────┼─────────┼──────────────┼──────────────┤ +
-│ 1   │ 18.0    │ 1            │ 1            │ +
-│ 2   │ 17.0    │ 2            │ 1            │ +
-│ 3   │ 15.0    │ 3            │ 1            │ +
-│ 4   │ 20.0    │ 1            │ 2            │ +
-│ 5   │ 10.0    │ 2            │ 2            │ +
-│ 6   │ 20.0    │ 3            │ 2            │ +
-│ 7   │ 25.0    │ 1            │ 3            │ +
-│ 8   │ 13.0    │ 2            │ 3            │ +
-│ 9   │ 12.0    │ 3            │ 3            │+
  
-julia> glm( @formula( y ~ x1 + x2), df, Normal() )+julia> ​mylm= glm( @formula(y~x1+x2),​ df, Normal() )
 StatsModels.DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,​1},​Normal{Float64},​IdentityLink},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}} StatsModels.DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,​1},​Normal{Float64},​IdentityLink},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}}
  
Line 315: Line 377:
  
 Coefficients:​ Coefficients:​
-              ​Estimate Std.Error ​ z value Pr(>​|z|) +               Estimate ​ Std.Error ​ z value Pr(>​|z|) 
-(Intercept) ​      21.0   3.40207  6.17271    ​<1e-9 +(Intercept) ​      1.247   0.179253 ​ 6.95663   <1e-11 
-x1: 2    -7.66667   ​3.72678 -2.05718 ​  0.0397 +x1             0.201943  ​0.0158779 ​ 12.7185   ​<1e-36 
-x1: 3    -5.33333   ​3.72678 ​-1.43108 ​  ​0.1524 +x2           -0.0116423 ​0.00157859 -7.37515   ​<1e-12
-x2: 2       0.0   3.72678      0.  ​1.0000 +
-x2: 3       ​0.0 ​  ​3.72678 ​     0.0   ​1.0000+
  
-julia> glm( @formula( y ~ x1 + x2), df, Poisson()) 
-StatsModels.DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,​1},​Poisson{Float64},​LogLink},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}} 
  
-Formulay ~ 1 + x1 + x2+julia> vcov( mylm, HC0 ) ## White corrected coefficient var-cov  
 +3×3 Array{Float64,​2}: 
 +  0.0455358 ​   -0.00294435 ​   0.000260202 
 + ​-0.00294435 ​   0.000234629 ​ -2.18796e-5 
 +  0.000260202 ​ -2.18796e-5 ​   2.08003e-6
  
-Coefficients:​ +julia> stderrormylm, HC3 ) ## Much better in Small Samples 
-                 ​Estimate Std.Error ​    z value Pr(>|z|+3-element Array{Float64,​1}
-(Intercept) ​      3.04452 ​ 0.170899 ​    ​17.8148 ​  <​1e-70 + 0.3652189222615689 
-x12      -0.454255 ​ 0.202171 ​   -2.24689 ​  ​0.0246 + 0.031003033896914146 
-x1: 3      -0.292987 ​ 0.192742 ​    ​-1.5201 ​  ​0.1285 + 0.003246364827550289
-x2: 2  4.61065e-16 ​      0.2 2.30532e-15 ​  ​1.0000 +
-x2: 3  3.44687e-17 ​      0.2 1.72344e-16 ​  ​1.0000+
  
-julia> glm( @formula( y ~ x1 + x2), df, NegativeBinomial()) +``` 
-StatsModels.DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,​1},​NegativeBinomial{Float64},NegativeBinomialLink},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}}+ 
 +* WARNING: the correct function name is `stderror` not `stderr`. 
 + 
 + 
 + 
 +### Newey-West (1994): Heteroskedasticity and Autocorrelation 
 + 
 +```juliafixme 
 +julia> using GLM, CovarianceMatrices,​ DataFrames 
 + 
 +julia> yb= [1.0:​10.0;​];​ df= DataFrame( y=yb, x1= yb.^2, x2= yb.^3 ); 
 + 
 +julia> ​myglm = glm( @formula( y ~ x1 + x2), df, Normal() ) 
 +StatsModels.DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,​1},​Normal{Float64},IdentityLink},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}}
  
 Formula: y ~ 1 + x1 + x2 Formula: y ~ 1 + x1 + x2
  
 Coefficients:​ Coefficients:​
-                 Estimate Std.Error ​    ​z value Pr(>​|z|) +               Estimate ​ Std.Error ​ z value Pr(>​|z|) 
-(Intercept) ​     -0.04652 0.0106111 ​    -4.3841    ​<1e-4 +(Intercept) ​      1.247   0.179253 ​ 6.95663   <1e-11 
-x1: 2     -0.0258006 ​0.0138336 ​   -1.86507   ​0.0622 +x1             ​0.201943  ​0.0158779 ​ 12.7185   ​<1e-36 
-x1: 3     -0.0153554  ​0.012453 ​   ​-1.23306   0.2176 +x2           -0.0116423 ​0.00157859 ​-7.37515   ​<1e-12 
-x2: 2  2.66415e-13 ​0.0130304 2.04457e-11 ​  ​1.0000 + 
-x2: 3  1.21585e-13 ​0.0130304 9.33093e-12 ​  ​1.0000+julia> stderror(myglm,​ QuadraticSpectralKernel(NeweyWest),​ prewhite=false) 
 +3-element Array{Float64,​1}:​ 
 + 0.09847482841240222 
 + 0.010707835190040621 
 + 0.0010556957499695354
  
 ``` ```
  
 +For Andrews'​ spectral autocorrelation,​ use `vcov(mylm, QuadraticSpectralKernel(Andrews),​ prewhite = false)`.
  
-### Tobit 
  
-Censored variable with masspoint at zero. 
  
-Tobin, James (1958). "​Estimation of relationships for limited dependent variables"​. Econometrica. 26 (1): 24 
  
  
  
  
-## Non-Linear ​Function ​Least-Squares Optimization (LsqFit)+ 
 + 
 +## Fitting ​Non-Linear ​Functions Via Least-Squares Optimization (LsqFit)
  
 [LsqFit](https://​github.com/​JuliaNLSolvers/​LsqFit.jl) has a nice example, reproduced here. [LsqFit](https://​github.com/​JuliaNLSolvers/​LsqFit.jl) has a nice example, reproduced here.
  
 ```juliarepl ```juliarepl
-julia> using Statistics, LsqFit; 
- 
 julia> using Random; ​ Random.seed!(0);​ julia> using Random; ​ Random.seed!(0);​
  
-julia> import Base.summary+julia> expmodel(x, theta)= theta[1] * exp.(-x .* theta[2]); ​ ## or use `@. expmodel(x, p) = p[1]*exp(-x*p[2])` 
 + 
 +julia> xdata= range(0; stop=10, length=20); trueparms= [1.0 2.0]; 
 + 
 +julia> ydata= expmodel(xdata,​ trueparms) + 0.1*randn(length(xdata)); ​ ## the "​noisy"​ data 
 + 
 +julia> # # # above was the data generation. ​ now comes the loading and modification of LsqFit # # # 
 + 
 +julia> using Statistics, LsqFit; 
 + 
 +julia> import Base.summary ## we need a nicer summary function for the results
  
 julia> function summary( f::​LsqFit.LsqFitResult{Float64,​1};​ sigdigits=3 ); julia> function summary( f::​LsqFit.LsqFitResult{Float64,​1};​ sigdigits=3 );
         if (!f.converged) println("​optimizer did not converge!"​);​ return; end         if (!f.converged) println("​optimizer did not converge!"​);​ return; end
 +               ​(length(fit.wt)==0) || error("​this summary does not yet work with weights"​)
         ​print("​Degrees of Freedom: $(f.dof). ​ ")         ​print("​Degrees of Freedom: $(f.dof). ​ ")
                ​println("​RMSE:​ $(round(mean(f.resid.^2);​sigdigits=sigdigits)). ​ SDE: $(round(std(f.resid);​sigdigits=sigdigits))."​)                ​println("​RMSE:​ $(round(mean(f.resid.^2);​sigdigits=sigdigits)). ​ SDE: $(round(std(f.resid);​sigdigits=sigdigits))."​)
Line 380: Line 466:
 julia> rsq( f::​LsqFit.LsqFitResult{Float64,​1},​ ydata::​Vector{Float64};​ sigdigits=3 )= 1.0 - mean( f.resid.^2 )/​std(ydata);​ julia> rsq( f::​LsqFit.LsqFitResult{Float64,​1},​ ydata::​Vector{Float64};​ sigdigits=3 )= 1.0 - mean( f.resid.^2 )/​std(ydata);​
  
-julia> ​model(x, theta)= theta[1] * exp.(-x .* theta[2]);  ​## or use `@. model(x, p) = p[1]*exp(-x*p[2])` +julia> # # # # now we fit the model and print the results of the fitting # # # #
- +
-julia> xdata= range(0; stop=10, length=20); ydata = model(xdata,​ [1.0 2.0]) + 0.01*randn(length(xdata));​+
  
-julia> fit= curve_fit(model, xdata, ydata, [0.5, 0.5] );+julia> fit= curve_fit( ​expmodel, xdata, ydata, [0.5, 0.5] );
  
-julia> summary(fit) +julia> summary(fit) ## the parameter estimate is not [1,2] but [1.07,1.91] --- estimation error 
-Degrees of Freedom: 18.  RMSE: 9.83e-5.  SDE: 0.0102+Degrees of Freedom: 18.  RMSE: 0.00983.  SDE: 0.102
-Parameter Estimates: [1.0074, 1.98964]+Parameter Estimates: [1.07429, 1.90889]
  
 julia> rsq( fit, ydata ) julia> rsq( fit, ydata )
-0.9995798796059407+0.9638146161544147
  
 julia> standard_error(fit) ##​ coefficient standard error julia> standard_error(fit) ##​ coefficient standard error
 2-element Array{Float64,​1}:​ 2-element Array{Float64,​1}:​
- 0.010372210718712882 + 0.10357853033234828 
- 0.046122535942215465+ 0.406806155417925
  
 julia> margin_error(fit,​ 1.0-0.95) ##​ coefficient 5% margin julia> margin_error(fit,​ 1.0-0.95) ##​ coefficient 5% margin
 2-element Array{Float64,​1}:​ 2-element Array{Float64,​1}:​
- 0.021791206104968236 + 0.2176104172710054 
- 0.09689985231280993+ 0.8546680180232399
  
 julia> confidence_interval(fit,​ 0.05) ## plus or minus margin error julia> confidence_interval(fit,​ 0.05) ## plus or minus margin error
 2-element Array{Tuple{Float64,​Float64},​1}:​ 2-element Array{Tuple{Float64,​Float64},​1}:​
- (0.9856092961018889, 1.0291917083118254+ (0.8566787016414885, 1.2918995361834993
- (1.8927384426395748, 2.0865381472651947)+ (1.0542196905114256, 2.7635557265579056)
  
-julia> yfitted= ​model( xdata, fit.param );+julia> yfitted= ​expmodel( xdata, fit.param );
  
 julia> using Plots; julia> using Plots;
Line 416: Line 500:
 julia> plot!( xdata, yfitted, labels="​fitted"​ ) julia> plot!( xdata, yfitted, labels="​fitted"​ )
  
-julia> savefig("​lsqfit.pdf")+julia> savefig("​plotting/lsqfit.png")
  
 ``` ```
- 
-* The summary ignores the weights. 
  
 {{lsqfit.png}} {{lsqfit.png}}
Line 427: Line 509:
  
  
-### InstrumentalVariables.jl 
  
-julia> Pkg.clone("​https://​github.com/​gragusa/​InstrumentalVariables.jl.git"​)+## Instrumental Variables
  
 +```juliarepl
 +julia> using DataFrames, Random; Random.seed!(0); ​ ## create data set first
  
-https://​github.com/​gragusa/​InstrumentalVariables.jl:​ ivreg(x, z, y), vcov(iv, HC1())+julia> N=5000; instrument= randn(N); unobsvar=randn(N);
  
 +julia> x= 4.0 .+ 1.0*instrument .+ 2.0*unobsvar + randn(N);​ ##​ x is related both to :instrument and :unobsvar
  
-### First Step: Identifying Instruments+julia> y= 5.0 .+ 0.0*x .+ 3.0*unobsvar .+ randn(N); ## the true x is unrelated to y
  
-### Weak Instruments+julia> d= DataFrame( x=x, y=y, instrument=instrument,​ unobsvar=unobsvar ); describe(d) 
 +4×8 DataFrame 
 +│ Row │ variable ​  │ mean       │ min      │ median ​    │ max     │ nunique │ nmissing │ eltype ​  │ 
 +│     │ Symbol ​    │ Float64 ​   │ Float64 ​ │ Float64 ​   │ Float64 │ Nothing │ Nothing ​ │ DataType │ 
 +├─────┼────────────┼────────────┼──────────┼────────────┼─────────┼─────────┼──────────┼──────────┤ 
 +│ 1   │ x          │ 4.00319 ​   │ -4.91    │ 3.99529 ​   │ 13.5128 │         ​│ ​         │ Float64 ​ │ 
 +│ 2   │ y          │ 5.04997 ​   │ -4.93747 │ 5.01014 ​   │ 17.4715 │         ​│ ​         │ Float64 ​ │ 
 +│ 3   │ instrument │ -0.0266859 │ -3.6203 ​ │ -0.0176785 │ 4.37616 │         ​│ ​         │ Float64 ​ │ 
 +│ 4   │ unobsvar ​  │ 0.0103825 ​ │ -3.22047 │ -0.0138893 │ 3.65663 │         ​│ ​         │ Float64 ​ │
  
  
 +julia> using FixedEffectModels
  
 +julia> reg(d, @model( y ~ x + unobsvar )) ## if we could observe :unobsvar, it would be great!
 +                              Linear Model
 +========================================================================
 +Number of obs:                 ​5000 ​ Degrees of freedom: ​              3
 +R2:                           ​0.901 ​ R2 Adjusted: ​                 0.901
 +F Statistic: ​              ​0.434644 ​ p-value: ​                     1.000
 +========================================================================
 +               ​Estimate ​ Std.Error ​ t value Pr(>|t|) Lower 95% Upper 95%
 +------------------------------------------------------------------------
 +x            0.00381245 0.00991791 0.384401 ​   0.701 -0.015631 0.0232559
 +unobsvar ​       2.99684 ​ 0.0241536 ​ 124.074 ​   0.000   ​2.94949 ​  ​3.04419
 +(Intercept) ​    ​5.00359 ​ 0.0419245 ​ 119.348 ​   0.000    4.9214 ​  ​5.08578
 +========================================================================
  
-## Anova 
  
-FIXME Anova section is to be written+julia> reg(d, @model( y ~ x )) ## but we can't, so this would make a big mistake 
 +                            Linear Model 
 +==================================================================== 
 +Number of obs:               ​5000 ​ Degrees of freedom: ​            2 
 +R2:                         ​0.596 ​ R2 Adjusted: ​               0.596 
 +F Statistic: ​             7381.61 ​ p-value: ​                   0.000 
 +==================================================================== 
 +             ​Estimate Std.Error t value Pr(>|t|) Lower 95% Upper 95% 
 +-------------------------------------------------------------------- 
 +x             ​1.00355 0.0116805 85.9163 ​   0.000   ​0.98065 ​  ​1.02645 
 +(Intercept) ​  ​1.03257 0.0546984 18.8775 ​   0.000  0.925335 ​   1.1398 
 +====================================================================
  
-https://​github.com/​ibadr/​ANOVA.jl 
  
-https://github.com/​JOfTheAncientGermanSpear/​SimpleAnova.jl+julia> reg(d, @model( y ~ ( x ~ instrument) ))        ## but we can use a crutchour instrument 
 +                                IV Model 
 +======================================================================== 
 +Number of obs:                 ​5000 ​ Degrees of freedom: ​              2 
 +R2:                          -0.039 ​ R2 Adjusted: ​                ​-0.039 
 +F-Statistic: ​              ​0.470085 ​ p-value: ​                     1.000 
 +First Stage F-stat (KP):    945.133 ​ First Stage p-val (KP):       ​0.000 
 +======================================================================== 
 +               ​Estimate Std.Error ​  t value Pr(>|t|) Lower 95% Upper 95% 
 +------------------------------------------------------------------------ 
 +x            -0.0322202 0.0469937 -0.685627 ​   0.493 -0.124349 0.0599082 
 +(Intercept) ​    ​5.17895 ​ 0.193555 ​  ​26.7569 ​   0.000    4.7995    5.5584 
 +========================================================================
  
 +julia> #### Let's do this by hand.  It should get us the same coefs, but wrong standard errors ####
 +
 +julia> using GLM;  xp= predict( lm( @formula( x ~ instrument ), d ));  ## First Stage: Fitted x
 +
 +julia> lm( hcat(xp,​fill(1.0,​5000)),​ d.y ) ## Second State.
 +LinearModel{LmResp{Array{Float64,​1}},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}}:​
 +
 +Coefficients:​
 +       ​Estimate Std.Error ​  t value Pr(>|t|)
 +x1   ​-0.0322202 0.0461032 -0.698871 ​  ​0.4847
 +x2      5.17895 ​ 0.189888 ​  ​27.2738 ​  <​1e-99
 +
 +```
 +
 +* See [https://​github.com/​matthieugomez/​FixedEffectModels.jl](https://​github.com/​matthieugomez/​FixedEffectModels.jl).
 +
 +* The standard errors are only correct if you use the explicit IV formulation,​ rather than the 2-stage least-squares at the end.
 +
 +* IV is conceptually very subtle---seemingly easy, yet difficult.
 +
 +  - The :instrument must affect :y only via the direct channel through :x.  It must not work mysteriously through any other channels.
 +
 +  - If :instrument and :x have poor correlation,​ the final coefficient (x1 on y, as above) becomes very diffuse. ​ In the extreme, if the instrument is unrelated random noise, the coefficient (weirdly) becomes unbiased, although it has huge standard error.
  
  
Line 465: Line 615:
  
 * [Data Clustering](https://​github.com/​JuliaStats/​Clustering.jl) * [Data Clustering](https://​github.com/​JuliaStats/​Clustering.jl)
 +
 +* [https://​github.com/​matthieugomez/​FixedEffectModels.jl](https://​github.com/​matthieugomez/​FixedEffectModels.jl)
  
  
 ## Notes ## Notes
 +
 +* Do not use the statistical recipes here for [[unitimeseries|time-series]] data.  OLS is not only wrong but often badly wrong when errors can depend on lagged observations.
  
 * See also [[unistats#​descriptive_statistics|Univariate Statistics -- Descriptive Statistics]] for basic statistics. * See also [[unistats#​descriptive_statistics|Univariate Statistics -- Descriptive Statistics]] for basic statistics.
Line 484: Line 638:
  
  
 +```END```
 +
 +## Modifying the Default Output of GLM
 +
 +FIXME EPIC FAILURE Attempt to Modify the Default Output of GLM 
 +
 +The goal here is to integrate both standardized coefficients and heteroskedasticity-adjusted standard errors into the standard "​output"​ function for glm results, and to add some more useful statistics, too (like the R^2).  Ideally, we would also get rid of the long `StatsModels.DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,​1},​Normal{Float64},​IdentityLink},​DensePredChol{Float64,​LinearAlgebra.Cholesky{Float64,​Array{Float64,​2}}}},​Array{Float64,​2}}`,​ unless specifically requested.
  
-#### Attempted Program --- COMPLETE FAILURE 
  
-FIXME INTEGRATE Heteroskedasticity-Adjusted Standard Errors and Standardized Coefficients 
  
 From [GLM.jl](https://​github.com/​JuliaStats/​GLM.jl/​issues/​42):​ From [GLM.jl](https://​github.com/​JuliaStats/​GLM.jl/​issues/​42):​
Line 522: Line 681:
 julia> yb= [1.0:​10.0;​];​ df= DataFrame( y=vcat(yb,​999.9,​missing) , x1= vcat(yb.^2,​missing,​999.9) , x2= vcat(yb.^3, 999.9, 999.9) ); julia> yb= [1.0:​10.0;​];​ df= DataFrame( y=vcat(yb,​999.9,​missing) , x1= vcat(yb.^2,​missing,​999.9) , x2= vcat(yb.^3, 999.9, 999.9) );
  
-julia> lm1= glm( @formula( y ~ x1+x2 ), df, Normal(), IdentityLink() )+julia> lm1= glm( @formula( y ~ x1+x2 ), df, Normal() )
  
 julia> iid= tooutput(lm1,​ vcov(lm1)) ##​ OLS  --- FAILS julia> iid= tooutput(lm1,​ vcov(lm1)) ##​ OLS  --- FAILS
Line 534: Line 693:
  
 ``` ```
- 
  
multistatsmore.txt · Last modified: 2018/12/27 13:27 (external edit)