User Tools

Site Tools


dicts

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
dicts [2018/06/18 13:00]
ivo [Creating Complex Dictionaries]
dicts [2018/12/27 13:27] (current)
Line 6: Line 6:
  
 ```juliarepl ```juliarepl
-julia> ​Pkg.status.(["​DataStructures"​]);​ +julia> ​pkgchk.( [ "​julia"​ => v"​1.0.3", ​"​DataStructures" => v"​0.14.0" ] ); 
- - DataStructures ​               0.7.4+
 ``` ```
 +
 +
 +
 +# Pairs
 +
 +A dictionary is a collection of *pairs*, a (unique) key and an arbitrary value, with fast access to values by keys.  A `Pair` is a useful data type in Julia, too.   Each pair has only two elements, but they can be of different types. ​ It is useful, e.g., for mapping [[Dicts]].
 +
 +```juliarepl
 +julia> x= "​hi"​ => :a ## this is x= ("​hi"​ => :a)
 +"​hi"​ => :a
 +
 +julia> dump(x)
 +Pair{String,​Symbol}
 +  first: String "​hi"​
 +  second: Symbol a
 +
 +julia> first(x)
 +"​hi"​
 +
 +julia> x.first
 +"​hi"​
 +
 +julia> last(x)
 +:a
 +
 +julia> x.second ##​ oddly, not x.last
 +:a
 +
 +julia> typeof( x )
 +Pair{String,​Symbol}
 +
 +julia> f( x::Vector{ Pair{String,​Int} } )= (); ## a function
 +
 +```
 +
  
  
Line 30: Line 65:
 ## Creating a Dictionary ## Creating a Dictionary
  
-The `Dict` constructor accepts a list of `Pair`s as arguments, and returns a dictionary. ​ The following dictionary `mydict` will be reused in examples below:+The `Dict` constructor accepts a list of [[arraysintro#​Pairs|Pair]]s as arguments, and returns a dictionary. ​ The following dictionary `mydict` will be reused in examples below:
  
 ```juliarepl ```juliarepl
Line 56: Line 91:
  
 **A common gotcha:** `Dict{String,​Int64}` is a type, while `Dict{String,​Int64}()` is an empty dictionary. **A common gotcha:** `Dict{String,​Int64}` is a type, while `Dict{String,​Int64}()` is an empty dictionary.
 +
  
 ### Creating More Complicated Dictionaries ### Creating More Complicated Dictionaries
  
 Various examples of working with Dictionaries of Other Types Various examples of working with Dictionaries of Other Types
 +
  
 #### Dictionary of Vectors #### Dictionary of Vectors
Line 69: Line 106:
   "​a"​ => [10.0, 2.0]   "​a"​ => [10.0, 2.0]
 ``` ```
 +
  
 #### Creating and Filling Empty Dictionary of Vectors #### Creating and Filling Empty Dictionary of Vectors
  
 ```juliarepl ```juliarepl
-julia> Dict{String,​Array{Float64,​1}}()+julia> ​x= Dict{String,​Array{Float64,​1}}()
 Dict{String,​Array{Float64,​1}} with 0 entries Dict{String,​Array{Float64,​1}} with 0 entries
  
 julia> x["​newval"​]= [12.0, 3.0 ]; x["​twoval"​]= [24.0, 4.0 ];  ## ok ## julia> x["​newval"​]= [12.0, 3.0 ]; x["​twoval"​]= [24.0, 4.0 ];  ## ok ##
    
-julia> push!( x["​twoval"​],​ 120.0 );  ## ok ##+julia> push!( x["​twoval"​],​ 120.0 ); ## ok ##
  
-julia> push!( x["​threeval"​],​ 120.0 ) .  ​## but no auto-vivification+julia> push!( x["​threeval"​],​ 120.0 )     ## but no auto-vivification
 ERROR: KeyError: key "​threeval"​ not found ERROR: KeyError: key "​threeval"​ not found
 Stacktrace: Stacktrace:
- [1] getindex(::​Dict{String,​Array{Float64,​1}},​ ::String) at ./​dict.jl:​474 
  
 julia> x julia> x
Line 89: Line 126:
   "​newval"​ => [12.0, 3.0]   "​newval"​ => [12.0, 3.0]
   "​twoval"​ => [24.0, 4.0, 120.0]   "​twoval"​ => [24.0, 4.0, 120.0]
-  ​+
 ``` ```
  
-#### Dictionary of Int-Float Pairs+#### Dictionary of Other Types Types
  
 ```juliarepl ```juliarepl
Line 98: Line 135:
 Dict{String,​Array{Pair{Int64,​Float64},​1}} with 0 entries Dict{String,​Array{Pair{Int64,​Float64},​1}} with 0 entries
  
-julia> ​dscpif[ "​a"​ ] = [ Pair( 12, 64.0 ) ]; dscpif+julia> ​dsvpif[ "​a"​ ] = [ Pair( 12, 64.0 ) ]; dsvpif
 Dict{String,​Array{Pair{Int64,​Float64},​1}} with 1 entry: Dict{String,​Array{Pair{Int64,​Float64},​1}} with 1 entry:
   "​a"​ => Pair{Int64,​Float64}[12=>​64.0]   "​a"​ => Pair{Int64,​Float64}[12=>​64.0]
 ``` ```
  
-FIXME Dictionary of Vector-Vector+ 
 +```juliarepl 
 +julia> dtype= Dict{ String, ​Vector{Vector{Int}} } ## not a variable, but a type 
 +Dict{String,​Array{Array{Int64,​1},​1}} 
 + 
 +julia> emptydtype= Dict{ String, Vector{Vector{Int}} }() ## empty variable 
 +Dict{String,​Array{Array{Int64,​1},​1}} with 0 entries 
 + 
 +julia> Dict( "​a"​ => [ [1,2], [3,4] ] ) ## new dict with "​a"​ 
 +Dict{String,​Array{Array{Int64,​1},​1}} with 1 entry: 
 +  "​a"​ => Array{Int64,​1}[[1,​ 2], [3, 4]] 
 +```
  
  
Line 112: Line 160:
  
 julia> keys(mydict) ​                                            ## returns a KeySet (Iterator) julia> keys(mydict) ​                                            ## returns a KeySet (Iterator)
-Base.KeyIterator ​for a Dict{String,​Int64} with 3 entries. Keys:+Base.KeySet ​for a Dict{String,​Int64} with 3 entries. Keys:
   "​One"​   "​One"​
   "​Two"​   "​Two"​
Line 122: Line 170:
  "​Two"​  "​Two"​
  "​Three"​  "​Three"​
 +
 ``` ```
 +
 +
 +## Converting Dictionary To Vector or Tuple
 +
 +```juliarepl
 +julia> mydict = Dict("​One"​ => 1, "​Two"​ => 2, "​Three"​ => 3);
 +
 +julia> asvector= collect(mydict)
 +3-element Array{Pair{String,​Int64},​1}:​
 +   "​One"​ => 1
 +   "​Two"​ => 2
 + "​Three"​ => 3
 +
 +julia> asvector= [ Pair(k,v) for (k,v) in mydict]
 +3-element Array{Pair{String,​Int64},​1}:​
 +   "​One"​ => 1
 +   "​Two"​ => 2
 + "​Three"​ => 3
 +
 +julia> astuple= [(k,v) for (k,v) in mydict]
 +3-element Array{Tuple{String,​Int64},​1}:​
 + ​("​One",​ 1)  ​
 + ​("​Two",​ 2)  ​
 + ​("​Three",​ 3)
 +
 +```
 +
  
  
Line 159: Line 235:
 ## Creating a Different Dictionary From an Existing Dictionary ## Creating a Different Dictionary From an Existing Dictionary
  
-`map()` can perform an *operation* on every key-value in a dictionary. ​ For example, using the value to suggest the number of repeats (with `fill`):+`map()` can perform an *operation* on every key-value in a dictionary. ​ For example, here we are creating a second dictionary from a first dictionary, using the value to suggest the number of repeats (with `fill`):
  
 ```juliarepl ```juliarepl
-julia> ​mydict ​= Dict("​One"​ => 1, "​Two"​ => 2, "​Three"​ => 3);+julia> ​firstdict= Dict("​One"​ => 1, "​Two"​ => 2, "​Three"​ => 3);
  
-julia> ​map(x -> (string(x[1], "s") => fill('​*', ​x[2])), mydict)+julia> ​newdict= Dict( string(key,"SSS") => fill('​*', ​valuefor (keyvalue) in firstdict ​)
 Dict{String,​Array{Char,​1}} with 3 entries: Dict{String,​Array{Char,​1}} with 3 entries:
-  "Twos" ​  => ['​*',​ '​*'​] +  "TwoSSS" ​  => ['​*',​ '​*'​] 
-  "Ones" ​  => ['​*'​] +  "OneSSS" ​  => ['​*'​] 
-  "Threes" => ['​*',​ '​*',​ '​*'​]+  "ThreeSSS" => ['​*',​ '​*',​ '​*'​]
 ``` ```
  
Line 186: Line 262:
   1 => "​One"​   1 => "​One"​
  
-julia> map(reverse,​ mydict) ​                             ## method 2 
-Dict{Int64,​String} with 3 entries: 
-  2 => "​Two"​ 
-  3 => "​Three"​ 
-  1 => "​One"​ 
 ``` ```
  
-`map()` applies the function to each element *pair*. ​ `reverse()` switches key and value. 
  
-A "julia special"​ method ​is+A "julia special"​ method ​is
  
 ```juliarepl ```juliarepl
Line 271: Line 341:
 ERROR: KeyError: key "​NotThere"​ not found ERROR: KeyError: key "​NotThere"​ not found
 Stacktrace: Stacktrace:
- [1] getindex(::​Dict{String,​Int64},​ ::String) at ./​dict.jl:​474+
 ``` ```
  
Line 332: Line 402:
 julia> mydict = Dict("​One"​ => 1, "​Two"​ => 2, "​Three"​ => 3); julia> mydict = Dict("​One"​ => 1, "​Two"​ => 2, "​Three"​ => 3);
  
-julia> filter( ​(key, value) ​-> value % 2 == 1 , mydict )       ## keep only odd-numbered values+julia> filter( ​p->(last(p) ​% 2 == 1, mydict )       ## keep only odd-numbered values
 Dict{String,​Int64} with 2 entries: Dict{String,​Int64} with 2 entries:
   "​One" ​  => 1   "​One" ​  => 1
   "​Three"​ => 3   "​Three"​ => 3
  
-julia> filter( ​(key, value) ​-> length(key)==3 , mydict )       ## keep only three-letter keys+julia> filter( ​p->(length(first(p)) == 3, mydict )       ## keep only three-letter keys
 Dict{String,​Int64} with 2 entries: Dict{String,​Int64} with 2 entries:
   "​One"​ => 1   "​One"​ => 1
Line 366: Line 436:
 true true
  
-julia> ​imydictmap(reverse, mydict) ​             ## invert+julia> ​invmydictDictvalue => key for (key,value) in mydict )              ## invert
 Dict{Int64,​String} with 3 entries: Dict{Int64,​String} with 3 entries:
   10 => "​Ten"​   10 => "​Ten"​
Line 372: Line 442:
   3  => "​Three"​   3  => "​Three"​
  
-julia> valuesinorder= sort(collect(keys(imydict)));+julia> valuesinorder= sort(collect(keys(invmydict)));
  
-julia> for key in valuesinorder; ​ println("​$key => $(imydict[key])"​);​ end+julia> for key in valuesinorder; ​ println("​$key => $(invmydict[key])"​);​ end##for##
 2 => Two 2 => Two
 3 => Three 3 => Three
Line 393: Line 463:
  "​d"​=>​3  "​d"​=>​3
  
-julia> sort( asarray, by = x -> last(x) )  ## last() works on each pair+julia> sort( asarray, by=x->​last(x) )  ## last() works on each pair
 4-element Array{Pair{String,​Int64},​1}:​ 4-element Array{Pair{String,​Int64},​1}:​
  "​c"​=>​2  "​c"​=>​2
Line 406: Line 476:
 ```juliarepl ```juliarepl
 julia> function showsorted(mydict::​Dict) julia> function showsorted(mydict::​Dict)
-     ssort( collect( ​sumtotal ​), by = x->​last(x) ) +   asvector= collect( ​mydict ​) 
-     ​for i=1:length(s+          assortedvector= sort( asvector, by = x->​last(x) ) 
-         println(s[i][2], "​\t", ​s[i][1])+   ​for i=1:length(assortedvector
 +         println(assortedvector[i][2], "​\t", ​assortedvector[i][1])
      end#for      end#for
  end;#​function##​  end;#​function##​
  
-julia> ​mydict= ​Dict( "​a"​ => 3, "​b"​ => 5, "​c"​ => 2, "​d"​ => 4 )+julia> ​showsorted( ​Dict( "​a"​ => 3, "​b"​ => 5, "​c"​ => 2, "​d"​ => 4 ) )
- +
-julia> showsorted( mydict ​)+
 2 c 2 c
 3 a 3 a
Line 433: Line 502:
  
 julia> mydict julia> mydict
-DataStructures.OrderedDict{String,​Int64} with 4 entries:+OrderedDict{String,​Int64} with 4 entries:
  "​a"​=>​3  "​a"​=>​3
  "​B"​=>​5  "​B"​=>​5
Line 483: Line 552:
 julia> for w in split("​a ab ac ad ab ab ba ba"); wordcounts[w]= get(wordcounts,​ w, 0) + 1; end;#for julia> for w in split("​a ab ac ad ab ab ba ba"); wordcounts[w]= get(wordcounts,​ w, 0) + 1; end;#for
  
-julia> arrsortedbycount = sort( collect( wordcounts ), by = x -> last(x), rev=true )+julia> arrsortedbycount= sort( collect( wordcounts ), by=x->​last(x),​ rev=true )
 5-element Array{Pair{String,​Int64},​1}:​ 5-element Array{Pair{String,​Int64},​1}:​
  "​ab"​=>​3  "​ab"​=>​3
Line 496: Line 565:
  "​ba"​  "​ba"​
  
-julia> first.( filter( x -> (last(x)>​=2),​ arrsortedbycount ) )   ## words appearing at least 2 times+julia> first.( filter( x->​(last(x) >= 2), arrsortedbycount ) )   ## words appearing at least 2 times
 2-element Array{String,​1}:​ 2-element Array{String,​1}:​
  "​ab"​  "​ab"​
Line 508: Line 577:
  
 ```juliarepl ```juliarepl
-julia> dictofarrays=Dict{ String, Array{Int64}}()+julia> dictofarrays= Dict{ String, Array{Int64}}()
 Dict{String,​Array{Int64,​N} where N} with 0 entries Dict{String,​Array{Int64,​N} where N} with 0 entries
  
Line 581: Line 650:
  
 ```juliarepl ```juliarepl
-julia> original=Dict{ String, Array{Int64}}( "​tens"​ => [10,20], "​hundreds"​ => [100,200], "​thousands"​ => [1000, 2000, 3000] )+julia> original= Dict{ String, Array{Int64}}( "​tens"​ => [10,20], "​hundreds"​ => [100,200], "​thousands"​ => [1000, 2000, 3000] )
 Dict{String,​Array{Int64,​N} where N} with 3 entries: Dict{String,​Array{Int64,​N} where N} with 3 entries:
   "​hundreds" ​ => [100, 200]   "​hundreds" ​ => [100, 200]
Line 587: Line 656:
   "​tens" ​     => [10, 20]   "​tens" ​     => [10, 20]
  
-julia> asgn= original; ​ cp= copy(original); ​ dpcp=deepcopy(original);​+julia> asgn= original; ​ cp= copy(original); ​ dpcp= deepcopy(original);​
  
 julia> asgn["​tens"​]= [-10,​-20,​-30];​ cp["​hundreds"​]= [-100,​-200,​-300]; ​ dpcp["​thousands"​]= [-1000,​-2000,​-3000];​ julia> asgn["​tens"​]= [-10,​-20,​-30];​ cp["​hundreds"​]= [-100,​-200,​-300]; ​ dpcp["​thousands"​]= [-1000,​-2000,​-3000];​
Line 601: Line 670:
  
 ```juliarepl ```juliarepl
-julia> original=Dict{ String, Array{Int64}}( "​tens"​ => [10,20], "​hundreds"​ => [100,200], "​thousands"​ => [1000, 2000, 3000] );+julia> original= Dict{ String, Array{Int64}}( "​tens"​ => [10,20], "​hundreds"​ => [100,200], "​thousands"​ => [1000, 2000, 3000] );
  
-julia> asgn= original; ​ cp= copy(original); ​ dpcp=deepcopy(original);​+julia> asgn= original; ​ cp= copy(original); ​ dpcp= deepcopy(original);​
  
 julia> asgn["​tens"​][1]= -40; cp["​hundreds"​][2]= -400;  dpcp["​thousands"​][3]= -4000; julia> asgn["​tens"​][1]= -40; cp["​hundreds"​][2]= -400;  dpcp["​thousands"​][3]= -4000;
Line 614: Line 683:
 ``` ```
  
 +
 +
 +# Extended Example: Transform Compressed CSV File into Dict of Pair Vectors
 +
 +The `om-zerocd.csv.gz` file is compressed and has entries like
 +
 +FIXME fix this example after we know how to embed a data frame
 +
 +```julianoeval
 +julia> using DataFrames, Gzip
 +
 +julia> open( "​tmp-sample.csv.gz",​ "​w"​ ) do;
 + write( DataFrame( yyyymmdd= [ 19960102, 19960102, ​
 +yyyymmdd,​days,​rate
 +19960102,​9,​5.763067
 +19960102,​15,​5.745902
 +19960102,​50,​5.673317
 +19960102,​78,​5.608884
 +19960103,​8,​5.763067
 +19960103,​14,​5.747397
 +19960103,​49,​5.672263
 +19960103,​77,​5.603705
 +```
 +
 +We want to create a dictionary with keys that are the yyyymmdd, and values that are vectors of pairs with days and rate.
 +
 +```juliafix
 +
 +using DataFrames, CodecZlib
 +
 +zerocdcsv= readtable( GzipDecompressorStream( open("​tmp-sample.csv.gz",​ "​r"​) ) );
 +
 +zerocd= Dict{ Int, Vector{Pair{Int,​Float64}} }()
 +
 +for i=1:​nrow(zerocdcsv)
 +    yyyymmdd= Int(zerocdcsv[i,​1])
 +    entry= Pair( Int(zerocdcsv[i,​2]),​ Float64(zerocdcsv[i,​3]) )
 +    if (!haskey(zerocd,​ yyyymmdd))
 + zerocd[ yyyymmdd ]= [ entry ]
 +    else
 + push!(zerocd[ yyyymmdd ],  entry )
 +    end#if#
 +end#for#
 +
 +julia> println( zerocsv[ 19960102 ] )
 +Pair{Int64,​Float64}[9=>​5.76307,​ 15=>​5.7459,​ 50=>​5.67332,​ 78=>​5.60888]
 +
 +```
  
  
Line 621: Line 738:
  
 - [DataStructures.jl](https://​github.com/​JuliaCollections/​DataStructures.jl) contains many more useful data structures, including not only ordered dicts, but alsosorted dicts, dicts with defaults, multi-dicts,​ and trees. - [DataStructures.jl](https://​github.com/​JuliaCollections/​DataStructures.jl) contains many more useful data structures, including not only ordered dicts, but alsosorted dicts, dicts with defaults, multi-dicts,​ and trees.
 +
 +* [Flatten.jl](https://​github.com/​JuliaStats/​Flatten.jl) converts flat to nested structures and vice-versa.
 +
  
 ## Notes ## Notes
Line 627: Line 747:
  
 ## References ## References
- 
  
dicts.txt · Last modified: 2018/12/27 13:27 (external edit)