# julia

### Site Tools

arraysortandsets

snippet.juliarepl
`julia> pkgchk.( [ "julia" => v"1.0.3", "StatsBase" => v"0.25.0" ] );`

# Sorting, Finding, and Sets (in Vectors)

## Sorting An Array

The `sort` function does not modify the passed array. To modify the array, use `sort!`.

The `sort` method is highly customizable. `Sort` accepts the arguments:

• `alg` : The sorting algorithm to be used. Can be one of `[InsertionSort, MergeSort, QuickSort]`.
• `rev` : Boolean argument which sorts in reverse, or decreasing order, when set to true.
snippet.juliarepl
```julia> sort([3,1,2])
3-element Array{Int64,1}:
1
2
3

julia> sort([1,2,3], rev=true)
3-element Array{Int64,1}:
3
2
1```

### Sorting An Array by Transform (Absolute Values, Comparison Functions)

• `by` : Pass in a function to be applied on each element of the array before sorting.
• `lt` : Pass in a function that can be used to compare any two elements in the array.

The idea is that `lt` gives true if the value is less, and false otherwise.

snippet.juliarepl
```julia> sort([1,2,3], by = x ->(-x))
3-element Array{Int64,1}:
3
2
1

julia> sort([1,2,3], lt = (x,y)->(!(x < y)))
3-element Array{Int64,1}:
3
2
1```

#### Sorting an Array of Structs

snippet.juliarepl
```julia> struct SomeTest; key::Int; s::String; end#struct##

julia> v= [ SomeTest(5, "five"), SomeTest(3, "three"), SomeTest(10, "ten") ]
3-element Array{SomeTest,1}:
SomeTest(5, "five")
SomeTest(3, "three")
SomeTest(10, "ten")

julia> sort( v, by= x->x.key )
3-element Array{SomeTest,1}:
SomeTest(3, "three")
SomeTest(5, "five")
SomeTest(10, "ten")```

### Sorting Indexes (Order function)

Julia has a function that returns the order of elements (and which could then be used as an index to rearrange the order; in R, this is called the `order()` function):

snippet.juliarepl
```julia> sortperm( [ 2,5,1,10,0 ] )
5-element Array{Int64,1}:
5
3
1
2
4```

### Checking if Array Is-Sorted

snippet.juliarepl
```julia> issorted( [(1, "b"), (2, "a")] , by=x->(x))   ## x is the first element of each tuple
true```

### Randomizing (Shuffling) Array Elements

To put the elements of an array in a random order, use the `shuffle()` function (or `shuffle!()` to change the original array):

snippet.juliarepl
```julia> using Random

julia> Random.seed!(0); shuffle([1,2,3,4,5])
5-element Array{Int64,1}:
5
4
2
3
1```

### Ranking

snippet.juliarepl
```julia> using StatsBase: tiedrank

julia> v= [ 1, 2, 5, 3, 5 ];

julia> tiedrank(v)
5-element Array{Float64,1}:
1.0
2.0
4.5
3.0
4.5

julia> v= unique( rand(1:10, 100) );		## show how tiedrank and sortperm fit together

julia> tiedrank(v[sortperm(v)]) == collect(1.0:length(v))	## sortperm puts it in order, tiedrank confirms this
true```

## Set Operations

### Finding Unique Elements in an Array

snippet.juliarepl
```julia> unique([1,4,2,2,3,3,4,1])
4-element Array{Int64,1}:
1
4
2
3```

The `unique` function works faster when the input list is already sorted. The `rle()` function can be used when only consecutive unique values are desired.

### Testing Uniqueness

snippet.juliarepl
```julia> allunique([1,4,2,2,3,3,4,1])
false```

### Testing Existence of an Element in an Array (Set)

snippet.juliarepl
```julia> in( 2, [1,2,3,4] )      ## in used as a function
true

julia> 20 in [1,2,3,4]         ## in used as an operator
false```

### Combining Two Arrays Without Duplicates (Set Union)

snippet.juliarepl
```julia> union([1,2,3], [1,4,5])                             ## set union
5-element Array{Int64,1}:
1
2
3
4
5```

Or just do `unique(append(x,y))`.

### Finding Elements in Both Arrays (Set Intersection)

The `intersect` function can be used to perform the set-intersection operation:

snippet.juliarepl
```julia> intersect([1,2,3,4], [1,2,3,5])
3-element Array{Int64,1}:
1
2
3```

### Finding Elements in One Array but Not Another (Set Diff)

snippet.juliarepl
```julia> setdiff([1,2,3,4], [1,2,5,6])         ## 3,4 are only in first array.   (5,6 are ignored)
2-element Array{Int64,1}:
3
4```

If your arrays have repeated elements, and you want to keep duplicates (eliminating those in the other set), then use:

snippet.juliarepl
```julia> filter( x->(!in(x, [1,2,4])), [1,2,2,3,3,4,6,7,7] )   ## remove all 1s, 2s, and 4s
5-element Array{Int64,1}:
3
3
6
7
7```

## Finding Elements in Vectors

### Finding the First Array Element that passes a Test

The `findfirst` function can be used to find the position of the first non-zero element in an array:

snippet.juliarepl
```julia> findfirst( x->(x!=0), [0,0,1,2,0] )
3```

`findnext` can be used to find the first element equal to some value `v`, starting at 3+1:

snippet.juliarepl
```julia> findnext( x->(x!=0), [0,0,1,2,0], 4)
4```

Finally, a boolean-returning-function can be passed in as a “test”. The position of the first element to pass the “test”, (i.e. the function returns true) is returned:

snippet.juliarepl
```julia> findfirst( x->(x == 2), [0,0,1,2,0,2])
4```

If `findfirst` cannot find a value, or if no element passes the “test”, then it returns nothing:

snippet.juliarepl
```julia> findfirst( x->(x!=0), [0,0,0] )

julia> findnext( x->(x!=0), [0,0,0], 1 )```

To find the first matching zero (or other value), you can do the obvious—or you can define your own “customized” `isequal` function:

snippet.juliarepl
```julia> findfirst( x->(x==0), [1,0,1] )  ## obvious
2

julia> findfirst( x->(x!=0), [1,0,1] .== 0 )  ## overly involved
2

julia> isone= isequal(1)		## a useful function for finding 1's and not 1's
(::Base.Fix2{typeof(isequal),Float64}) (generic function with 1 method)

julia> findfirst( isone, [3,2,1,5,6] )	## a '1' in the 3rd position
3

julia> findfirst( !isone, [1,1,1,1,3,1,1] )	## a non-'1' in the 5th position
5```

### Finding (Locations of) all Duplicates and Uniques

snippet.juliarepl
```julia> x= [ 1, 2, 1, 2, 1, 3, 1, 3, 3, 3, 1, 3, 4 ];

julia> d= Dict{Int64,Array{Int64}}();

julia> for i in 1 : length(x);  d[ x[i] ]= push!( get(d, x[i], []), i );  end#for

julia> d
Dict{Int64,Array{Int64,N} where N} with 4 entries:
4 => 
2 => [2, 4]
3 => [6, 8, 9, 10, 12]
1 => [1, 3, 5, 7, 11]```

For DataFrames, see also the `nonunique` function.

### Finding all Matching Elements in an Array

`find()` returns the indexes of all non-zero elements in an vector. (To find the values themselves, just use them as index.)

snippet.juliarepl
```julia> findall( x->(x!=0), [0,0,1,2,0] )
2-element Array{Int64,1}:
3
4```

Although `.OP` (like `.<=`) are element-wise operators, they also work on array-scalars:

snippet.juliarepl
```julia> findall( x->(x!=0), [10,20,30,40,20] .== 20 )    ## returns matching index values
2-element Array{Int64,1}:
2
5```

The `find` function can also be used to find the positions of all elements that pass a “test”, described by a boolean-returning-function.

snippet.juliarepl
```julia> findall( x->(x == 0), [1,2,0,2,0,1,0] )
3-element Array{Int64,1}:
3
5
7```

### Finding all NaN and/or Missing Values

snippet.juliarepl
```julia> findall( x->isnan(x), [ 0, 2, NaN, 3.0, NaN ] )    ## use ismissing for missings
2-element Array{Int64,1}:
3
5```

## Filtering Out or Replacing Specific Values

If you'd like to remove the elements of an array that do not satisfy a condition, the `filter` function can come in handy. For example, to remove the negative numbers in an array:

snippet.juliarepl
```julia> filter( x->(x >= 0), [1,–2,3,–4,0] )
3-element Array{Int64,1}:
1
3
0```

(Use `findall()` instead of `filter()` to obtain indexes.)

### Removing all NaN and/or Missing Values

`isnan`

snippet.juliarepl
```julia> v= [ 0.0, 2.0, NaN, 3.0, NaN ];

julia> filter( x->(!isnan(x)), v )		# one way to filter out all NaN obs
3-element Array{Float64,1}:
0.0
2.0
3.0

julia> filter( !isnan, v ) == v[ .! isnan.(v) ]	## two more ways to do the same
true

julia> findall( x->!isnan(x), v )		## find the index values of non NaNs
3-element Array{Int64,1}:
1
2
4

julia> v[ findall( x->!isnan(x), v ) ]		## so this again removes all NaN's
3-element Array{Float64,1}:
0.0
2.0
3.0

julia> ismissing.( [ 1.0, missing, 3.0 ] )	## same trick for missing values
3-element BitArray{1}:
false
true
false```

### Replacing all NaN and/or Missing Values

snippet.juliarepl
```julia> v= [ 0.0, 2.0, NaN, 3.0, NaN ];

julia> using Missings;

julia> map( x->(isnan(x) ? missing : x), v)		## type deteriorates badly in this version
5-element Array{Union{Missing, Float64},1}:
0.0
2.0
missing
3.0
missing

julia> oftype( [1.0, missing], ans )
5-element Array{Union{Missing, Float64},1}:
0.0
2.0
missing
3.0
missing

julia> replace( v, NaN => missing )			## type is narrower and more useful
5-element Array{Union{Missing, Float64},1}:
0.0
2.0
missing
3.0
missing```

### Faster Sorts

Julia's default sort is fast when the keys tend to be unique, and slow when they tend to be similar. A faster alternative when keys are similar is in SortingLab.jl. `radixsort()` speeds up (fixed-length) string sorting by an order of magnitude.

# Backmatter

• DataStructures.jl contains many more useful data structures, including ordered, sorted, and disjoint sets, priority queues, etc.

### Page Tools 