User Tools

Site Tools


bistats
snippet.juliarepl
julia> pkgchk( [ "julia" => v"1.0.2", "DataFrames" => v"0.14.1", "GLM" => v"1.0.1", "Loess" => v"0.5.0", "Plots" => v"0.21.0" ] )

Bivariate Relations

Regressing Y on X

Bivariate regressions are simply a special case of the Multivariate Regressions.

Splines and Loess

Loess has become the most popular method to fit data in two dimensions without a functional specification. It runs regressions localized around points, akin to splines. The canonical example, copied from the Loess.jl docs, is:

snippet.juliarepl
julia> using Loess; using Random;

julia> Random.seed!(0); xs= sort(10 .* rand(100)); ys= sin.(xs) .+ 0.5 * rand(100);  ## sample points to fit

julia> model= loess(xs, ys);                                   ## the loess engine

julia> xpoints= collect(minimum(xs):0.1:maximum(xs));          ## where to fit

julia> ypoints= Loess.predict(model, xpoints);                 ## the fitted values

julia> (hcat(xpoints,ypoints))[1:5, :]
5×2 Array{Float64,2}:
 0.353445  0.896297
 0.453445  0.92369
 0.553445  0.946465
 0.653445  0.964635
 0.753445  0.978406

julia> using Plots

julia> plot( xs, ys, seriestype= :scatter, legend= false)

julia> plot!( xpoints, ypoints )

julia> savefig("plotting/loess.png");

See Graphing and Plotting.

Vector and Matrix Moments for Bivariate Matrix

The following can be useful when working with bivariate data:

snippet.juliarepl
julia> using StatsBase, Statistics

julia> m= reshape( [1.0:6.0;], 2,3 )
2×3 Array{Float64,2}:
 1.0  3.0  5.0
 2.0  4.0  6.0

julia> mean( m )	## overall mean
3.5

julia> mean( m, dims=1)	## column means
1×3 Array{Float64,2}:
 1.5  3.5  5.5

julia> mean( m, dims=2)	## row means
2×1 Array{Float64,2}:
 3.0
 4.0

julia> std( m )		## overall stddev
1.8708286933869707

julia> std(m, dims=1)	## column stddev
1×3 Array{Float64,2}:
 0.707107  0.707107  0.707107

julia> std(m, dims=2)	## row stddev
2×1 Array{Float64,2}:
 2.0
 2.0

Vector Distances

Statsbase has fast functions to calculate distances between two vectors, such as

  • number-different (counteq)
  • sum absolute difference (L1dist), and mean,
  • sum squared difference (L2dist), and mean
  • root mean-squared distance (rmsd), etc.

Statsbase has Scatter Matrix and Covariances functionality, principally cov() and cor().

Backmatter

Useful Packages on Julia Repository

  • Mocha for neural network learning
  • LightML for many statistical techniques now grouped as “machine learning.”

Notes

References

bistats.txt · Last modified: 2018/11/22 20:47 (external edit)