User Tools

Site Tools


bistats

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

bistats [2018/12/27 13:27] (current)
Line 1: Line 1:
 +
 +~~CLOSETOC~~
 +
 +~~TOC 1-3 wide~~
 +
 +
 +```juliarepl
 +julia> pkgchk.( [ "​julia"​ => v"​1.0.3",​ "​DataFrames"​ => v"​0.14.1",​ "​GLM"​ => v"​1.0.1",​ "​Loess"​ => v"​0.5.0",​ "​Plots"​ => v"​0.21.0"​ ] );
 +
 +```
 +
 +
 +
 +# Bivariate Relations
 +
 +## Regressing Y on X
 +
 +Bivariate regressions are simply a special case of the [[multistats|Multivariate Regressions]].
 +
 +
 +## Splines and Loess
 +
 +Loess has become the most popular method to fit data in two dimensions without a functional specification. ​ It runs regressions localized around points, akin to splines. ​ The canonical example, copied from the [Loess.jl](https://​github.com/​JuliaStats/​Loess.jl) docs, is:
 +
 +```juliarepl
 +julia> using Loess; using Random;
 +
 +julia> Random.seed!(0);​ xs= sort(10 .* rand(100)); ys= sin.(xs) .+ 0.5 * rand(100); ​ ## sample points to fit
 +
 +julia> model= loess(xs, ys);                                   ## the loess engine
 +
 +julia> xpoints= collect(minimum(xs):​0.1:​maximum(xs)); ​         ## where to fit
 +
 +julia> ypoints= Loess.predict(model,​ xpoints); ​                ## the fitted values
 +
 +julia> (hcat(xpoints,​ypoints))[1:​5,​ :]
 +5×2 Array{Float64,​2}:​
 + ​0.353445 ​ 0.896297
 + ​0.453445 ​ 0.92369
 + ​0.553445 ​ 0.946465
 + ​0.653445 ​ 0.964635
 + ​0.753445 ​ 0.978406
 +
 +julia> using Plots
 +
 +julia> plot( xs, ys, seriestype= :scatter, legend= false)
 +
 +julia> plot!( xpoints, ypoints )
 +
 +julia> savefig("​plotting/​loess.png"​);​
 +
 +```
 +
 +See [[graphing|Graphing and Plotting]].
 +
 +
 +
 +## Vector and Matrix Moments for Bivariate Matrix
 +
 +The following can be useful when working with bivariate data:
 +
 +```juliarepl
 +julia> using StatsBase, Statistics
 +
 +julia> m= reshape( [1.0:6.0;], 2,3 )
 +2×3 Array{Float64,​2}:​
 + ​1.0 ​ 3.0  5.0
 + ​2.0 ​ 4.0  6.0
 +
 +julia> mean( m ) ## overall mean
 +3.5
 +
 +julia> mean( m, dims=1) ## column means
 +1×3 Array{Float64,​2}:​
 + ​1.5 ​ 3.5  5.5
 +
 +julia> mean( m, dims=2) ## row means
 +2×1 Array{Float64,​2}:​
 + 3.0
 + 4.0
 +
 +julia> std( m ) ## overall stddev
 +1.8708286933869707
 +
 +julia> std(m, dims=1) ## column stddev
 +1×3 Array{Float64,​2}:​
 + ​0.707107 ​ 0.707107 ​ 0.707107
 +
 +julia> std(m, dims=2) ## row stddev
 +2×1 Array{Float64,​2}:​
 + 2.0
 + 2.0
 +
 +```
 +
 +
 +
 +### Vector Distances
 +
 +Statsbase has [fast functions](http://​juliastats.github.io/​StatsBase.jl/​latest/​deviation.html) to calculate distances between two vectors, such as
 +
 +* number-different (`counteq`)
 +* sum absolute difference (`L1dist`), and mean,
 +* sum squared difference (`L2dist`), and mean
 +* root mean-squared distance (`rmsd`), etc.
 +
 +Statsbase has [Scatter Matrix and Covariances](http://​juliastats.github.io/​StatsBase.jl/​latest/​cov.html) functionality,​ principally `cov()` and `cor()`.
 +
 +
 +
 +
 +
 +
 +
 +
 +# Backmatter
 +
 +## Useful Packages on Julia Repository
 +
 +* [Mocha](https://​github.com/​pluskid/​Mocha.jl) for neural network learning
 +
 +* [LightML](https://​juliaobserver.com/​packages/​LightML) for many statistical techniques now grouped as "​machine learning."​
 +
 +## Notes
 +
 +* `linreg(x,​y)` computes coefficients for a bivariate regression.
 +
 +* See also [[unistats|Univariate Statistics]] and [[unitimeseries|Time-Series]]
 +
 +## References
 +
 +- [GLM.jl](https://​github.com/​JuliaStats/​GLM.jl)
 +
 +
  
bistats.txt · Last modified: 2018/12/27 13:27 (external edit)