random

# Differences

This shows you the differences between two versions of the page.

 — random [2018/12/27 13:27] (current) Line 1: Line 1: + + ~~CLOSETOC~~ + + ~~TOC 1-3 wide~~ + + ```juliarepl + julia> pkgchk.( [ "​julia"​ => v"​1.0.3",​ "​Distributions"​ => v"​0.16.4"​ ] ); + + ``` + + + # Statistical Distributions and Random Variates + + An array of random numbers can be created using the `rand` or `randn` functions. + + The rand function can also be called as `rand(T, n)` to generate n random numbers of type `T`. + + Base julia has facilities to generate uniform numbers (rand) and Gaussian (Normal) random numbers (randn). ​ Other distributions rely on the [Distributions](https://​github.com/​JuliaStats/​Distributions.jl) package. + + ```juliarepl + julia> using Distributions + ``` + + + + ## Drawing Random Variables + + `rand( 1 )` draws a vector with 1 element, while `rand()` draws a scalar. ​ Thus, `y= 3.0 + rand( 1 )` results in a vector, not a scalar. + + The following is a selection of random numbers drawn from some common distributions. ​ For a more complete list of available distributions,​ refer to [Distributions](http://​juliastats.github.io/​Distributions.jl/​stable/​). + + + ### Setting the Seed + + ```juliarepl + julia> using Random; ​ Random.seed!(0);​ + + ``` + + + + ### Uniform Floats + + ```juliarepl + julia> using Random; Random.seed!(0);​ + + julia> rand(3,​2) ​           ## multiple int args are array dimensions + 3×2 Array{Float64,​2}:​ + ​0.823648 ​ 0.177329 + ​0.910357 ​ 0.27888 + ​0.164566 ​ 0.203477 + ``` + + ### Uniform Integers + + ```juliarepl + julia> using Random; Random.seed!(0); ​ + + julia> rand(Int, 3)         ## Types as argument to rand mean over the whole range + 3-element Array{Int64,​1}:​ + ​1339893410598768192 + ​1575814717733606317 + ​2119082727832020007 + ``` + + + ### Uniform Ranged Integers (e.g., Bernoullis) + + ```juliarepl + julia> using Random, Distributions;​ Random.seed!(0);​ + + julia> rand( 1:10, 3 ) ## 3 draws from 1 to 10, equally likely + 3-element Array{Int64,​1}:​ + 1 + 3 + 10 + + julia> Random.seed!(0);​ + + julia> rand( 0:1, 10) ## 10 draws from 0 to 1, equally likely + 10-element Array{Int64,​1}:​ + 0 + 0 + 1 + 1 + 0 + 0 + 0 + 1 + 1 + 0 + + julia> Random.seed!(0);​ + + julia> rand( Bernoulli( 0.15 ), 10 )    ## 10 draws, each is 1 with prob 0.15 + 10-element Array{Int64,​1}:​ + 0 + 0 + 0 + 0 + 0 + 0 + 1 + 1 + 0 + 0 + + ``` + + * Note: If speed is of the essence, there are faster methods than `rand()`. ​ For example, to draw 1,000 uniform numbers between 1 and 4, you can replace `[ rand(1:4) for i=1:1000 ]` with `rand( Random.SamplerRangeFast(1:​4),​ 1000)` or  with `1 .+ Int.(floor.(rand(1000)*4))`. + + + ### Binomial Integers + + ```juliarepl + julia> using Random, Distributions; ​ Random.seed!(0);​ + + julia> rand( Binomial( 10, 0.2 ), 10 )   ## each draw has 0.2 prob of 1, summed up 10 times + 10-element Array{Int64,​1}:​ + 3 + 4 + 1 + 1 + 1 + 1 + 0 + 0 + 1 + 5 + ``` + + + ### Poissons + + ```juliarepl + julia> using Random, Distributions; ​ Random.seed!(0);​ + + julia> rand( Poisson( 2 ), 20 ) + 20-element Array{Int64,​1}:​ + 3 + 4 + 1 + 1 + 1 + 1 + 0 + 0 + 1 + 5 + 2 + 2 + 1 + 4 + 1 + 2 + 2 + 4 + 5 + 3 + + ``` + + + ### Random Orthogonal Matrix + + ```juliarepl + julia> using Random, LinearAlgebra; ​ Random.seed!(0);​ + + julia> QR= qr(randn(3,​3)); ​       ## QR is already orthogonal + + julia> Q= QR.Q * Diagonal(sign.(diag(QR.R))) ​       ## orthogonal *and* uniformly distributed + 3×3 Array{Float64,​2}:​ + 0.60212 ​  ​-0.466433 ​ -0.647991 + 0.7345 ​    ​0.641769 ​  ​0.220551 + ​-0.312988 ​  ​0.608747 ​ -0.729017 + + julia> Q'​*Q ​               ## show that we have orthogonal vectors + 3×3 Array{Float64,​2}:​ + 1.0          -1.94289e-16 ​ 0.0 + ​-1.94289e-16 ​  ​1.0 ​         1.66533e-16 + 0.0           ​1.66533e-16 ​ 1.0 + + ``` + + + + + ### Shuffling an Array + + Shuffled values from 1:N can be used as indices to rearrange N vector elements. + + ```juliarepl + julia> using Random, Distributions; ​ Random.seed!(0);​ + + julia> Random.seed!(0);​ randperm(5) + 5-element Array{Int64,​1}:​ + 5 + 4 + 3 + 1 + 2 + + julia> Random.seed!(0);​ shuffle( collect(1:​5) ) ## a different way of shuffling + 5-element Array{Int64,​1}:​ + 5 + 4 + 2 + 3 + 1 + + julia> Random.seed!(0);​ sample( collect(10:​20),​ 3 ) ## Pick 3 random numbers from 10:20 array + 3-element Array{Int64,​1}:​ + 10 + 12 + 19 + + ``` + + StatsBase offers many [sampling variants](http://​juliastats.github.io/​StatsBase.jl/​latest/​sampling.html). + + + ## Drawing "​Perfect-Density"​ Random Variables + + Independent random draws do not have empirical densities that perfectly match their theoretical distributions. ​ For example, a million fair coin flips may have 500,021 heads, rather than 500,000 heads. ​ Sometimes, this type of true randomness is desirable. ​ Often, it is not and simply adds extra noise where it is not desired. ​ Users may want to have random draws on each draw, but have exactly 500,000 heads and 500,000 tails. ​ The randomness is in when what value shows up. + + The following draws from a "​perfect density"​ distribution,​ guaranteed to match the theoretical density function, with randomness in when each value is drawn: + + + ```juliarepl + julia> using Random, Distributions; ​ Random.seed!(0); ​ + + julia> randPerfect(d::​UnivariateDistribution,​ n::​Integer)= shuffle!(quantile.(d,​ (1:​n)/​(n+1)));​ + + julia> randPerfect( Normal(0,​1),​ 5 ) ## Example: draws from a normal distribution + 5-element Array{Float64,​1}:​ + 0.9674215661017014 + 0.43072729929545733 + ​-0.4307272992954576 + 0.0 + ​-0.9674215661017014 + + ``` + + ## Fitting Parameters of Statistical Distributions To Data + + ```juliarepl + julia> using Random, Distributions; ​ Random.seed!(0);​ + + julia> fit_mle( Normal, rand( Normal( 10 , 5 ) , 10_000_000 ) )    ## Fitting the correct distribution -> recover params + Normal{Float64}(μ=9.99913328006548,​ σ=5.000592281736906) + + julia> fit_mle( Normal, rand(1_000_000) )         ## Fitting the wrong distribution--a normal fitted to uniforms + Normal{Float64}(μ=0.5001626330884961,​ σ=0.288518549254692) + + ``` + + + ### Testing for Statistical Distributions + + See [[unistats]] for tests whether a set of random draws come from a particular distributin. + + + ## The Gaussian (Normal) Distribution + + ### Random Numbers + + ```juliarepl + julia> using Random; ​ Random.seed!(0); ​ + + julia> randn(3) + 3-element Array{Float64,​1}:​ + 0.6791074260357777 + 0.8284134829000359 + ​-0.3530074003005963 + + ``` + + + In distributions,​ + + ```juliarepl + julia> using Random, Distributions; ​ Random.seed!(0);​ + + julia> rand( Normal(0,​1),​ 4 ) + 4-element Array{Float64,​1}:​ + 0.6791074260357777 + 0.8284134829000359 + ​-0.3530074003005963 + ​-0.13485387193052173 + + ``` + + See also the example of perfect-density draws above, which also generate Normal variates. + + + ### Multivariate Gaussian (Normal) Distribution with Covariance + + + ```juliarepl + + julia> using Random, Distributions;​ Random.seed!(0);​ + + julia> gen3dim= MvNormal( 3, 100.0 )    ## generate 3-dim normal SD=100 + ZeroMeanIsoNormal( + dim: 3 + μ: [0.0, 0.0, 0.0] + Σ: [10000.0 0.0 0.0; 0.0 10000.0 0.0; 0.0 0.0 10000.0] + ) + + + julia> rand( gen3dim, 5 ) + 3×5 Array{Float64,​2}:​ + 67.9107 ​ -13.4854 ​   6.49475 ​ 157.433 ​   39.7482 + 82.8413 ​  ​58.6617 ​ -10.9017 ​  ​-68.8907 ​  ​81.163 + ​-35.3007 ​  ​29.7336 ​ -51.421 ​   -76.2804 ​ -34.6355 + + julia> Random.seed!(0);​ + + julia> x= 10.0*randn(1000);​ y= 10.0*randn(1000)+x; ​ covmat= cov(hcat( x, y )); x= y= nothing + + julia> gen2dim= MvNormal( [10.0, 20.0], covmat ) ## specific mean and covariance matrix + FullNormal( + dim: 2 + μ: [10.0, 20.0] + Σ: [106.834 98.4571; 98.4571 190.721] + ) + + + julia> rand( gen2dim, 5 ) + 2×5 Array{Float64,​2}:​ + ​11.4939 ​ 23.7481 ​ 17.1309 ​ -1.47097 ​ -8.87016 + ​30.0597 ​ 33.7182 ​ 14.2002 ​ 13.7076 ​   5.54033 + + ``` + + + ### Probability Density Function (PDF) + + ```juliarepl + julia> using Random, Distributions;​ Random.seed!(0);​ + + julia> pdf.( Normal(0,​1),​ [ -2.0, 0, 2.0 ] )  ### use logpdf for tiny values + 3-element Array{Float64,​1}:​ + ​0.05399096651318806 + ​0.3989422804014327 + ​0.05399096651318806 + ``` + + ### Log Likelihood + + ```juliarepl + julia> using Random, Distributions;​ Random.seed!(0);​ + + julia> loglikelihood( Normal(0,​1),​ rand( Normal(0,​1),​ 1_000 ) ) + -1452.9989715615147 + + ``` + + + ### Cumulative Density Funtion (CDF) + + ```juliarepl + julia> using Distributions;​ Random.seed!(0);​ + + julia> cdf.( Normal(0,​1),​ [ -2.0, 0, 2.0 ] )  ### use logcdf for tiny values + 3-element Array{Float64,​1}:​ + ​0.022750131948179205 + 0.5 + ​0.9772498680518208 + + ``` + + ### Inverse CDF + + ```juliarepl + julia> using Distributions;​ Random.seed!(0);​ + + julia> ​ quantile.( Normal(0,​1),​ [0.025,​0.05,​0.5,​0.95,​0.975] ) + 5-element Array{Float64,​1}:​ + ​-1.9599639845400592 + ​-1.6448536269514724 + 0.0 + 1.6448536269514717 + 1.9599639845400576 + ``` + + + + + ## Histograms + + ```juliarepl + julia> using Random, Distributions,​ StatsBase; Random.seed!(0);​ + + julia> x= rand( Normal(0,​1),​ 10_000); + + julia> fit( Histogram, x , closed= :left, nbins=10 ​ ) + Histogram{Int64,​1,​Tuple{StepRangeLen{Float64,​Base.TwicePrecision{Float64},​Base.TwicePrecision{Float64}}}} + edges: + -4.0:​1.0:​5.0 + weights: [10, 224, 1398, 3426, 3373, 1342, 213, 13, 1] + closed: left + isdensity: false + + julia> fit( Histogram, x , closed= :right, nbins=10 ​ ) + Histogram{Int64,​1,​Tuple{StepRangeLen{Float64,​Base.TwicePrecision{Float64},​Base.TwicePrecision{Float64}}}} + edges: + -4.0:​1.0:​5.0 + weights: [10, 224, 1398, 3426, 3373, 1342, 213, 13, 1] + closed: right + isdensity: false + ``` + + Julia'​s [[plotting]] makes it easy to plot histograms: + + ```juliaeval + using Plots + using Distributions,​ StatsBase; n=1_000; rn= quantile.( Normal(0.0,​1.0),​ [1:​n;​]/​(n+1));​ + histogram( rn.^2, legend=false ); # a chi-squared + savefig( "​plotting/​histogram1.png"​ ); + ``` + + {{histogram1.png}} + + + + + * `StatsBase` contains an `ecdf` function that is the direct empirical cdf, whereas `EmpiricalUnivariateDistribution` produces an distribution function for input into cdf, pdf, rand, etc. + + + + # Backmatter + + ## Useful Packages on Julia Repository + + ## Notes + + * EmpiricalUnivariateDistribution is [broken](https://​github.com/​JuliaStats/​Distributions.jl/​pull/​800). + + ```julianoeval + julia> using Random, Distributions,​ Plots; Random.seed!(0);​ + + julia> xr= -4.0:​0.01:​4.0;​ plot( xr, cdf.( Normal(0,​1),​ xr ), legend= false) ## plots the standard normal first + + julia> x=  max.(randn(10000),​ (rand(10000) .- 0.5)); + + julia> empcdfun= EmpiricalUnivariateDistribution( x ); + + julia> cdf.( empcdfun, [-0.5,​-0.25,​0.0,​0.5,​ 10.0] ) + 5-element Array{Float64,​1}:​ + 0.0 + ​0.0998 + ​0.2453 + ​0.6972 + 1.0 + + julia> rand( empcdfun, 4 ) + 4-element Array{Float64,​1}:​ + ​0.43878137920825677 + ​0.17368409076394875 + ​0.17510611936949205 + ​1.0969671028632133 + + julia> plot!( x -> cdf(empcdfun,​ x) ) + + ``` + + ## References