User Tools

Site Tools


arraysortandsets

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

arraysortandsets [2018/12/11 01:57]
julia [Finding the First Array Element that passes a Test]
arraysortandsets [2018/12/27 17:20]
Line 1: Line 1:
- 
-~~CLOSETOC~~ 
- 
-~~TOC 1-3 wide~~ 
- 
---- 
- 
-^  [[arraysintro|Arrays Introduction]] ​ ^  [[arraysvector|Vectors and Range Iterators]] ​ ^  Sort and Sets  ^  [[arraysmatrix|Matrices+]] ​ ^ 
- 
---- 
- 
-```juliarepl 
-julia> pkgchk( [ "​julia"​ => v"​1.0.2",​ "​StatsBase"​ => v"​0.25.0"​ ] ) 
- 
-``` 
- 
- 
- 
- 
-# Sorting, Finding, and Sets (in Vectors) 
- 
-## Sorting An Array 
- 
-The `sort` function does not modify the passed array. ​ To modify the array, use `sort!`. 
- 
-The `sort` method is highly customizable. ​ `Sort` accepts the arguments: 
- 
-- `alg` : The sorting algorithm to be used. Can be one of `[InsertionSort,​ MergeSort, QuickSort]`. 
-- `rev` : Boolean argument which sorts in reverse, or decreasing order, when set to true. 
- 
- 
- 
-```juliarepl 
-julia> sort([3,​1,​2]) 
-3-element Array{Int64,​1}:​ 
- 1 
- 2 
- 3 
- 
-julia> sort([1,​2,​3],​ rev=true) 
-3-element Array{Int64,​1}:​ 
- 3 
- 2 
- 1 
-``` 
- 
- 
-### Sorting An Array by Transform (Absolute Values, Comparison Functions) 
- 
-- `by`  : Pass in a function to be applied on each element of the array before sorting. 
-- `lt`  : Pass in a function that can be used to compare any two elements in the array. 
- 
-The idea is that `lt` gives true if the value is less, and false otherwise. 
- 
-```juliarepl 
-julia> sort([1,​2,​3],​ by = x ->(-x)) 
-3-element Array{Int64,​1}:​ 
- 3 
- 2 
- 1 
- 
-julia> sort([1,​2,​3],​ lt = (x,​y)->​(!(x < y))) 
-3-element Array{Int64,​1}:​ 
- 3 
- 2 
- 1 
-``` 
- 
-#### Sorting an Array of Structs 
- 
-```juliarepl 
-julia> struct SomeTest; key::Int; s::String; end#​struct##​ 
- 
-julia> v= [ SomeTest(5, "​five"​),​ SomeTest(3, "​three"​),​ SomeTest(10,​ "​ten"​) ] 
-3-element Array{SomeTest,​1}:​ 
- ​SomeTest(5,​ "​five"​) 
- ​SomeTest(3,​ "​three"​) 
- ​SomeTest(10,​ "​ten"​) 
- 
-julia> sort( v, by= x->x.key ) 
-3-element Array{SomeTest,​1}:​ 
- ​SomeTest(3,​ "​three"​) 
- ​SomeTest(5,​ "​five"​) 
- ​SomeTest(10,​ "​ten"​) 
- 
-``` 
- 
- 
-### Sorting Indexes (Order function) 
- 
-Julia has a function that returns the order of elements (and which could then be used as an index to rearrange the order; in R, this is called the `order()` function): 
- 
-```juliarepl 
-julia> sortperm( [ 2,5,1,10,0 ] ) 
-5-element Array{Int64,​1}:​ 
- 5 
- 3 
- 1 
- 2 
- 4 
-``` 
- 
- 
-### Checking if Array Is-Sorted 
- 
-```juliarepl 
-julia> issorted( [(1, "​b"​),​ (2, "​a"​)] , by=x->​(x[1])) ​  ## x[1] is the first element of each tuple 
-true 
-``` 
- 
- 
- 
-### Randomizing (Shuffling) Array Elements 
- 
-To put the elements of an array in a random order, use the `shuffle()` function (or `shuffle!()` to change the original array): 
- 
-```juliarepl 
-julia> using Random 
- 
-julia> Random.seed!(0);​ shuffle([1,​2,​3,​4,​5]) 
-5-element Array{Int64,​1}:​ 
- 5 
- 4 
- 2 
- 3 
- 1 
-``` 
- 
- 
- 
-### Ranking 
- 
-```juliarepl 
-julia> using StatsBase: tiedrank 
- 
-julia> v= [ 1, 2, 5, 3, 5 ]; 
- 
-julia> tiedrank(v) 
-5-element Array{Float64,​1}:​ 
- 1.0 
- 2.0 
- 4.5 
- 3.0 
- 4.5 
- 
-julia> v= unique( rand(1:10, 100) ); ## show how tiedrank and sortperm fit together 
- 
-julia> tiedrank(v[sortperm(v)]) == collect(1.0:​length(v)) ##​ sortperm puts it in order, tiedrank confirms this 
-true 
- 
-``` 
- 
-## Set Operations 
- 
-### Finding Unique Elements in an Array 
- 
-```juliarepl 
-julia> unique([1,​4,​2,​2,​3,​3,​4,​1]) 
-4-element Array{Int64,​1}:​ 
- 1 
- 4 
- 2 
- 3 
-``` 
- 
-The `unique` function works faster when the input list is already sorted. ​ The `rle()` function can be used when only consecutive unique values are desired. 
- 
- 
-### Testing Uniqueness 
- 
-```juliarepl 
-julia> allunique([1,​4,​2,​2,​3,​3,​4,​1]) 
-false 
-``` 
- 
- 
-### Testing Existence of an Element in an Array (Set) 
- 
- 
-```juliarepl 
-julia> in( 2, [1,2,3,4] )      ## in used as a function 
-true 
- 
-julia> 20 in [1,​2,​3,​4] ​        ## in used as an operator 
-false 
-``` 
- 
- 
- 
-### Combining Two Arrays Without Duplicates (Set Union) 
- 
-```juliarepl 
-julia> union([1,​2,​3],​ [1,​4,​5]) ​                            ## set union 
-5-element Array{Int64,​1}:​ 
- 1 
- 2 
- 3 
- 4 
- 5 
-``` 
- 
-Or just do `unique(append(x,​y))`. 
- 
- 
-### Finding Elements in Both Arrays (Set Intersection) 
- 
-The `intersect` function can be used to perform the set-intersection operation: 
- 
-```juliarepl 
-julia> intersect([1,​2,​3,​4],​ [1,2,3,5]) 
-3-element Array{Int64,​1}:​ 
- 1 
- 2 
- 3 
-``` 
- 
- 
-### Finding Elements in One Array but Not Another (Set Diff) 
- 
-```juliarepl 
-julia> setdiff([1,​2,​3,​4],​ [1,​2,​5,​6]) ​        ## 3,4 are only in first array. ​  (5,6 are ignored) 
-2-element Array{Int64,​1}:​ 
- 3 
- 4 
-``` 
- 
-If your arrays have repeated elements, and you want to keep duplicates (eliminating those in the other set), then use: 
- 
-```juliarepl 
-julia> filter( x->​(!in(x,​ [1,2,4])), [1,​2,​2,​3,​3,​4,​6,​7,​7] )   ## remove all 1s, 2s, and 4s 
-5-element Array{Int64,​1}:​ 
- 3 
- 3 
- 6 
- 7 
- 7 
-``` 
- 
- 
-## Finding Elements in Vectors 
- 
- 
-### Finding the First Array Element that passes a Test 
- 
-The `findfirst` function can be used to find the position of the first non-zero element in an array: 
- 
-```juliarepl 
-julia> findfirst( x->​(x!=0),​ [0,0,1,2,0] ) 
-3 
-``` 
- 
-`findnext` can be used to find the first element equal to some value `v`, starting at 3+1: 
- 
-```juliarepl 
-julia> findnext( x->​(x!=0),​ [0,​0,​1,​2,​0],​ 4) 
-4 
-``` 
- 
-Finally, a boolean-returning-function can be passed in as a "​test"​. The position of the first element to pass the "​test",​ (i.e. the function returns true) is returned: 
- 
-```juliarepl 
-julia> findfirst( x->(x == 2), [0,​0,​1,​2,​0,​2]) 
-4 
-``` 
- 
-If `findfirst` cannot find a value, or if no element passes the "​test",​ then it returns nothing: 
- 
-```juliarepl 
-julia> findfirst( x->​(x!=0),​ [0,0,0] ) 
- 
-julia> findnext( x->​(x!=0),​ [0,0,0], 1 ) 
- 
-``` 
- 
-To find the first zero, 
- 
-```juliarepl 
-julia> findfirst( x->​(x!=0),​ [1,0,1] .== 0 ) 
-2 
-``` 
-FIXME (Andreas) Maybe introduce `isequal(0)` and the negation `!isequal(0)`. They are convenient as predicates. 
- 
- 
-### Finding (Locations of) all Duplicates and Uniques 
- 
-```juliarepl 
-julia> x= [ 1, 2, 1, 2, 1, 3, 1, 3, 3, 3, 1, 3, 4 ]; 
- 
-julia> d= Dict{Int64,​Array{Int64}}();​ 
- 
-julia> for i in 1 : length(x); ​ d[ x[i] ]= push!( get(d, x[i], []), i );  end#for 
- 
-julia> d 
-Dict{Int64,​Array{Int64,​N} where N} with 4 entries: 
-  4 => [13] 
-  2 => [2, 4] 
-  3 => [6, 8, 9, 10, 12] 
-  1 => [1, 3, 5, 7, 11] 
-``` 
- 
-For [[dataframerowops#​finding_unique_and_non-unique_rows|DataFrames]],​ see also the `nonunique` function. 
- 
- 
- 
-### Finding all Matching Elements in an Array 
- 
-`find()` returns the indexes of all non-zero elements in an vector. ​ (To find the values themselves, just use them as index.) 
- 
-```juliarepl 
-julia> findall( x->​(x!=0),​ [0,0,1,2,0] ) 
-2-element Array{Int64,​1}:​ 
- 3 
- 4 
-``` 
- 
- 
-Although `.OP` (like `.<=`) are element-wise operators, they also work on array-scalars:​ 
- 
-```juliarepl 
-julia> findall( x->​(x!=0),​ [10,​20,​30,​40,​20] .== 20 )    ## returns matching index values 
-2-element Array{Int64,​1}:​ 
- 2 
- 5 
-``` 
- 
- 
-The `find` function can also be used to find the positions of all elements that pass a "​test",​ described by a boolean-returning-function. 
- 
-```juliarepl 
-julia> findall( x->(x == 0), [1,​2,​0,​2,​0,​1,​0] ) 
-3-element Array{Int64,​1}:​ 
- 3 
- 5 
- 7 
-``` 
- 
- 
-### Finding all NaN and/or Missing Values 
- 
-```juliarepl 
-julia> findall( x->​isnan(x),​ [ 0, 2, NaN, 3.0, NaN ] )    ## use ismissing for missings 
-2-element Array{Int64,​1}:​ 
- 3 
- 5 
-``` 
- 
- 
- 
- 
- 
-## Filtering Out or Replacing Specific Values 
- 
-If you'd like to remove the elements of an array that do not satisfy a condition, the `filter` function can come in handy. For example, to remove the negative numbers in an array: 
- 
-```juliarepl 
-julia> filter( x->(x >= 0), [1,​-2,​3,​-4,​0] ) 
-3-element Array{Int64,​1}:​ 
- 1 
- 3 
- 0 
-``` 
- 
-(Use `findall()` instead of `filter()` to obtain indexes.) 
- 
- 
- 
-### Removing all NaN and/or Missing Values 
- 
- 
-```juliarepl 
-julia> v= [ 0.0, 2.0, NaN, 3.0, NaN ]; 
- 
-julia> filter( x->​(!isnan(x)),​ v ) 
-3-element Array{Float64,​1}:​ 
- 0.0 
- 2.0 
- 3.0 
- 
-julia> findall( x->​!isnan(x),​ v ) 
-3-element Array{Int64,​1}:​ 
- 1 
- 2 
- 4 
- 
-julia> v[ findall( x->​!isnan(x),​ v ) ] 
-3-element Array{Float64,​1}:​ 
- 0.0 
- 2.0 
- 3.0 
-``` 
- 
- 
-### Replacing all NaN and/or Missing Values 
- 
-```juliarepl 
-julia> v= [ 0.0, 2.0, NaN, 3.0, NaN ]; 
- 
-julia> using Missings; 
- 
-julia> map( x->​(isnan(x) ? missing : x), v) ## type deteriorates badly in this version 
-5-element Array{Union{Missing,​ Float64},​1}:​ 
- 0.0 
- 2.0 
-  missing 
- 3.0 
-  missing 
- 
-julia> oftype( [1.0, missing], ans ) 
-5-element Array{Union{Missing,​ Float64},​1}:​ 
- 0.0 
- 2.0 
-  missing 
- 3.0 
-  missing 
- 
-julia> replace( v, NaN => missing ) ## type is narrower and more useful 
-5-element Array{Union{Missing,​ Float64},​1}:​ 
- 0.0 
- 2.0 
-  missing 
- 3.0 
-  missing 
- 
-``` 
- 
- 
- 
- 
-### Faster Sorts 
- 
-Julia'​s default sort is fast when the keys tend to be unique, and slow when they tend to be similar. ​ A faster alternative when keys are similar is in [SortingLab.jl](https://​github.com/​xiaodaigh/​SortingLab.jl). ​ `radixsort()` speeds up (fixed-length) string sorting by an order of magnitude. 
- 
- 
- 
-# Backmatter 
- 
-## Commonly Useful Array-Related Packages in the Julia Repository 
- 
-- [Julia Fixed-Size Arrays](https://​github.com/​JuliaArrays/​StaticArrays.jl) 
- 
-- [DataStructures.jl](https://​github.com/​JuliaCollections/​DataStructures.jl) contains many more useful data structures, including ordered, sorted, and disjoint sets, priority queues, etc. 
- 
-## Notes 
- 
-## References 
- 
-- [Julia Array documentation](http://​docs.julialang.org/​en/​release-0.5/​stdlib/​arrays/​) 
- 
-- [Julia Array Wiki](https://​en.wikibooks.org/​wiki/​Introducing_Julia/​Arrays_and_tuples) 
  
arraysortandsets.txt ยท Last modified: 2018/12/27 17:20 (external edit)