User Tools

Site Tools


arraysortandsets


snippet.juliarepl
julia> pkgchk( [ "julia" => v"1.0.2", "StatsBase" => v"0.25.0" ] )

Sorting, Finding, and Sets (in Vectors)

Sorting An Array

The sort function does not modify the passed array. To modify the array, use sort!.

The sort method is highly customizable. Sort accepts the arguments:

  • alg : The sorting algorithm to be used. Can be one of [InsertionSort, MergeSort, QuickSort].
  • rev : Boolean argument which sorts in reverse, or decreasing order, when set to true.
snippet.juliarepl
julia> sort([3,1,2])
3-element Array{Int64,1}:
 1
 2
 3

julia> sort([1,2,3], rev=true)
3-element Array{Int64,1}:
 3
 2
 1

Sorting An Array by Transform (Absolute Values, Comparison Functions)

  • by : Pass in a function to be applied on each element of the array before sorting.
  • lt : Pass in a function that can be used to compare any two elements in the array.

The idea is that lt gives true if the value is less, and false otherwise.

snippet.juliarepl
julia> sort([1,2,3], by = x ->(-x))
3-element Array{Int64,1}:
 3
 2
 1

julia> sort([1,2,3], lt = (x,y)->(!(x < y)))
3-element Array{Int64,1}:
 3
 2
 1

Sorting an Array of Structs

snippet.juliarepl
julia> struct SomeTest; key::Int; s::String; end#struct##

julia> v= [ SomeTest(5, "five"), SomeTest(3, "three"), SomeTest(10, "ten") ]
3-element Array{SomeTest,1}:
 SomeTest(5, "five")
 SomeTest(3, "three")
 SomeTest(10, "ten")

julia> sort( v, by= x->x.key )
3-element Array{SomeTest,1}:
 SomeTest(3, "three")
 SomeTest(5, "five")
 SomeTest(10, "ten")

Sorting Indexes (Order function)

Julia has a function that returns the order of elements (and which could then be used as an index to rearrange the order; in R, this is called the order() function):

snippet.juliarepl
julia> sortperm( [ 2,5,1,10,0 ] )
5-element Array{Int64,1}:
 5
 3
 1
 2
 4

Checking if Array Is-Sorted

snippet.juliarepl
julia> issorted( [(1, "b"), (2, "a")] , by=x->(x[1]))   ## x[1] is the first element of each tuple
true

Randomizing (Shuffling) Array Elements

To put the elements of an array in a random order, use the shuffle() function (or shuffle!() to change the original array):

snippet.juliarepl
julia> using Random

julia> Random.seed!(0); shuffle([1,2,3,4,5])
5-element Array{Int64,1}:
 5
 4
 2
 3
 1

Ranking

snippet.juliarepl
julia> using StatsBase: tiedrank

julia> v= [ 1, 2, 5, 3, 5 ];

julia> tiedrank(v)
5-element Array{Float64,1}:
 1.0
 2.0
 4.5
 3.0
 4.5

julia> v= unique( rand(1:10, 100) );		## show how tiedrank and sortperm fit together

julia> tiedrank(v[sortperm(v)]) == collect(1.0:length(v))	## sortperm puts it in order, tiedrank confirms this
true

Set Operations

Finding Unique Elements in an Array

snippet.juliarepl
julia> unique([1,4,2,2,3,3,4,1])
4-element Array{Int64,1}:
 1
 4
 2
 3

The unique function works faster when the input list is already sorted. The rle() function can be used when only consecutive unique values are desired.

Testing Uniqueness

snippet.juliarepl
julia> allunique([1,4,2,2,3,3,4,1])
false

Testing Existence of an Element in an Array (Set)

snippet.juliarepl
julia> in( 2, [1,2,3,4] )      ## in used as a function
true

julia> 20 in [1,2,3,4]         ## in used as an operator
false

Combining Two Arrays Without Duplicates (Set Union)

snippet.juliarepl
julia> union([1,2,3], [1,4,5])                             ## set union
5-element Array{Int64,1}:
 1
 2
 3
 4
 5

Or just do unique(append(x,y)).

Finding Elements in Both Arrays (Set Intersection)

The intersect function can be used to perform the set-intersection operation:

snippet.juliarepl
julia> intersect([1,2,3,4], [1,2,3,5])
3-element Array{Int64,1}:
 1
 2
 3

Finding Elements in One Array but Not Another (Set Diff)

snippet.juliarepl
julia> setdiff([1,2,3,4], [1,2,5,6])         ## 3,4 are only in first array.   (5,6 are ignored)
2-element Array{Int64,1}:
 3
 4

If your arrays have repeated elements, and you want to keep duplicates (eliminating those in the other set), then use:

snippet.juliarepl
julia> filter( x->(!in(x, [1,2,4])), [1,2,2,3,3,4,6,7,7] )   ## remove all 1s, 2s, and 4s
5-element Array{Int64,1}:
 3
 3
 6
 7
 7

Finding Elements in Vectors

Finding the First Array Element that passes a Test

The findfirst function can be used to find the position of the first non-zero element in an array:

snippet.juliarepl
julia> findfirst( x->(x!=0), [0,0,1,2,0] )
3

findnext can be used to find the first element equal to some value v, starting at 3+1:

snippet.juliarepl
julia> findnext( x->(x!=0), [0,0,1,2,0], 4)
4

Finally, a boolean-returning-function can be passed in as a “test”. The position of the first element to pass the “test”, (i.e. the function returns true) is returned:

snippet.juliarepl
julia> findfirst( x->(x == 2), [0,0,1,2,0,2])
4

If findfirst cannot find a value, or if no element passes the “test”, then it returns nothing:

snippet.juliarepl
julia> findfirst( x->(x!=0), [0,0,0] )

julia> findnext( x->(x!=0), [0,0,0], 1 )

To find the first zero,

snippet.juliarepl
julia> findfirst( x->(x!=0), [1,0,1] .== 0 )
2

Finding (Locations of) all Duplicates and Uniques

snippet.juliarepl
julia> x= [ 1, 2, 1, 2, 1, 3, 1, 3, 3, 3, 1, 3, 4 ];

julia> d= Dict{Int64,Array{Int64}}();

julia> for i in 1 : length(x);  d[ x[i] ]= push!( get(d, x[i], []), i );  end#for

julia> d
Dict{Int64,Array{Int64,N} where N} with 4 entries:
  4 => [13]
  2 => [2, 4]
  3 => [6, 8, 9, 10, 12]
  1 => [1, 3, 5, 7, 11]

For DataFrames, see also the nonunique function.

Finding all Matching Elements in an Array

find() returns the indexes of all non-zero elements in an vector. (To find the values themselves, just use them as index.)

snippet.juliarepl
julia> findall( x->(x!=0), [0,0,1,2,0] )
2-element Array{Int64,1}:
 3
 4

Although .OP (like .<=) are element-wise operators, they also work on array-scalars:

snippet.juliarepl
julia> findall( x->(x!=0), [10,20,30,40,20] .== 20 )    ## returns matching index values
2-element Array{Int64,1}:
 2
 5

The find function can also be used to find the positions of all elements that pass a “test”, described by a boolean-returning-function.

snippet.juliarepl
julia> findall( x->(x == 0), [1,2,0,2,0,1,0] )
3-element Array{Int64,1}:
 3
 5
 7

Finding all NaN and/or Missing Values

snippet.juliarepl
julia> findall( x->isnan(x), [ 0, 2, NaN, 3.0, NaN ] )    ## use ismissing for missings
2-element Array{Int64,1}:
 3
 5

Filtering Out or Replacing Specific Values

If you'd like to remove the elements of an array that do not satisfy a condition, the filter function can come in handy. For example, to remove the negative numbers in an array:

snippet.juliarepl
julia> filter( x->(x >= 0), [1,2,3,4,0] )
3-element Array{Int64,1}:
 1
 3
 0

(Use findall() instead of filter() to obtain indexes.)

Removing all NaN and/or Missing Values

snippet.juliarepl
julia> v= [ 0.0, 2.0, NaN, 3.0, NaN ];

julia> filter( x->(!isnan(x)), v )
3-element Array{Float64,1}:
 0.0
 2.0
 3.0

julia> findall( x->!isnan(x), v )
3-element Array{Int64,1}:
 1
 2
 4

julia> v[ findall( x->!isnan(x), v ) ]
3-element Array{Float64,1}:
 0.0
 2.0
 3.0

Replacing all NaN and/or Missing Values

snippet.juliarepl
julia> v= [ 0.0, 2.0, NaN, 3.0, NaN ];

julia> using Missings;

julia> map( x->(isnan(x) ? missing : x), v)		## type deteriorates badly in this version
5-element Array{Union{Missing, Float64},1}:
 0.0
 2.0
  missing
 3.0
  missing

julia> oftype( [1.0, missing], ans )
5-element Array{Union{Missing, Float64},1}:
 0.0
 2.0
  missing
 3.0
  missing

julia> replace( v, NaN => missing )			## type is narrower and more useful
5-element Array{Union{Missing, Float64},1}:
 0.0
 2.0
  missing
 3.0
  missing

Faster Sorts

Julia's default sort is fast when the keys tend to be unique, and slow when they tend to be similar. A faster alternative when keys are similar is in SortingLab.jl. radixsort() speeds up (fixed-length) string sorting by an order of magnitude.

Backmatter

Notes

References

arraysortandsets.txt · Last modified: 2018/11/22 20:47 (external edit)