User Tools

Site Tools


dicts
snippet.juliarepl
julia> pkgchk( [ "julia" => v"1.0.2", "DataStructures" => v"0.14.0" ] )

Pairs

A dictionary is a collection of pairs, a (unique) key and an arbitrary value, with fast access to values by keys. A Pair is a useful data type in Julia, too. Each pair has only two elements, but they can be of different types. It is useful, e.g., for mapping Dicts.

snippet.juliarepl
julia> x= "hi" => :a		## this is x= ("hi" => :a)
"hi" => :a

julia> dump(x)
Pair{String,Symbol}
  first: String "hi"
  second: Symbol a

julia> first(x)
"hi"

julia> x.first
"hi"

julia> last(x)
:a

julia> x.second			## oddly, not x.last
:a

julia> typeof( x )
Pair{String,Symbol}

julia> f( x::Vector{ Pair{String,Int} } )= ();		## a function

Dict(ionaries)

  • The Dict type implements what other languages have called a 'look-up table', 'hash', or 'hashmap'. Think of a dict as an array with a text key instead of a numeric index. The keys must be unique.
  • Lookups in dictionaries are very fast, but memory use can be prodigious. A rule of thumb is that it takes 3-4 times as much storage as an equivalent array.
  • Dictionaries are unordered, i.e. the order in which keys are inserted is not maintained. In an OrderedDictionary, keys can be retrieved in insertion order.
  • A Structure is logically similar—like a compile-time dictionaries with read-only keys—but accessed differently.
    • Dictionaries are accessed via [:member], while structs are accessed via .member.

Related: Like C++ (and unlike perl), julia also supports “sets”. Sets can be viewed as keys that all have uninteresting values. Useful operations include looking up existence, takings unions, intersect and setdiffs. Sets are discussed in Arrays Sorting and Sets.

Creating a Dictionary

The Dict constructor accepts a list of Pairs as arguments, and returns a dictionary. The following dictionary mydict will be reused in examples below:

snippet.juliarepl
julia> typeof( "One" => 1 )
Pair{String,Int64}

julia> mydict = Dict{String,Int64}("One" => 1, "Two" => 2, "Three" => 3)
Dict{String,Int64} with 3 entries:
  "One"   => 1
  "Two"   => 2
  "Three" => 3

The {String,Int64} types are optional, but programs with narrower types have fewer bugs. The keys and values in a dictionary can be of any type (even user-defined).

The Dict constructor also allows for "comprehension-like" syntax:

snippet.juliarepl
julia> Dict("x$i" => i for i in 1 : 3)                          ## note that Julia infers {String,Int} by itself
Dict{String,Int64} with 3 entries:
  "x1" => 1
  "x2" => 2
  "x3" => 3

A common gotcha: Dict{String,Int64} is a type, while Dict{String,Int64}() is an empty dictionary.

Creating More Complicated Dictionaries

Various examples of working with Dictionaries of Other Types

Dictionary of Vectors

snippet.juliarepl
julia> Dict( "a" => [ 10.0, 2.0 ], "b" => [ 1.0, 3.0 ] )
Dict{String,Array{Float64,1}} with 2 entries:
  "b" => [1.0, 3.0]
  "a" => [10.0, 2.0]

Creating and Filling Empty Dictionary of Vectors

snippet.juliarepl
julia> x= Dict{String,Array{Float64,1}}()
Dict{String,Array{Float64,1}} with 0 entries

julia> x["newval"]= [12.0, 3.0 ]; x["twoval"]= [24.0, 4.0 ];  ## ok ##
 
julia> push!( x["twoval"], 120.0 );		## ok ##

julia> push!( x["threeval"], 120.0 )    	## but no auto-vivification
ERROR: KeyError: key "threeval" not found
Stacktrace:

julia> x
Dict{String,Array{Float64,1}} with 2 entries:
  "newval" => [12.0, 3.0]
  "twoval" => [24.0, 4.0, 120.0]

Dictionary of Other Types Types

snippet.juliarepl
julia> dsvpif= Dict{ String, Vector{Pair{Int,Float64}} }()
Dict{String,Array{Pair{Int64,Float64},1}} with 0 entries

julia> dsvpif[ "a" ] = [ Pair( 12, 64.0 ) ]; dsvpif
Dict{String,Array{Pair{Int64,Float64},1}} with 1 entry:
  "a" => Pair{Int64,Float64}[12=>64.0]
snippet.juliarepl
julia> dtype= Dict{ String, Vector{Vector{Int}} }	## not a variable, but a type
Dict{String,Array{Array{Int64,1},1}}

julia> emptydtype= Dict{ String, Vector{Vector{Int}} }()	## empty variable
Dict{String,Array{Array{Int64,1},1}} with 0 entries

julia> Dict( "a" => [ [1,2], [3,4] ] )			## new dict with "a"
Dict{String,Array{Array{Int64,1},1}} with 1 entry:
  "a" => Array{Int64,1}[[1, 2], [3, 4]]

Obtaining all Dictionary Keys

snippet.juliarepl
julia> mydict = Dict("One" => 1, "Two" => 2, "Three" => 3);

julia> keys(mydict)                                             ## returns a KeySet (Iterator)
Base.KeySet for a Dict{String,Int64} with 3 entries. Keys:
  "One"
  "Two"
  "Three"

julia> collect(keys(mydict))                                    ## KeySet to orig type, here String vector
3-element Array{String,1}:
 "One"
 "Two"
 "Three"

Converting Dictionary To Vector or Tuple

snippet.juliarepl
julia> mydict = Dict("One" => 1, "Two" => 2, "Three" => 3);

julia> asvector= collect(mydict)
3-element Array{Pair{String,Int64},1}:
   "One" => 1
   "Two" => 2
 "Three" => 3

julia> asvector= [ Pair(k,v) for (k,v) in mydict]
3-element Array{Pair{String,Int64},1}:
   "One" => 1
   "Two" => 2
 "Three" => 3

julia> astuple= [(k,v) for (k,v) in mydict]
3-element Array{Tuple{String,Int64},1}:
 ("One", 1)  
 ("Two", 2)  
 ("Three", 3)

Obtaining all Dictionary Values

snippet.juliarepl
julia> mydict = Dict("One" => 1, "Two" => 2, "Three" => 3);

julia> values(mydict)
Base.ValueIterator for a Dict{String,Int64} with 3 entries. Values:
  1
  2
  3

julia> collect(values(mydict))             ## converts ValueIterator to orig type, here Vector{Int}
3-element Array{Int64,1}:
 1
 2
 3

Iterating over every Dictionary Element (Key,Value)

snippet.juliarepl
julia> mydict = Dict{String,Int64}("One" => 1, "Two" => 2, "Three" => 3);

julia> for i in keys(mydict) println( i, " <-> ", mydict[i] ); end#for
One <-> 1
Two <-> 2
Three <-> 3

Creating a Different Dictionary From an Existing Dictionary

map() can perform an operation on every key-value in a dictionary. For example, here we are creating a second dictionary from a first dictionary, using the value to suggest the number of repeats (with fill):

snippet.juliarepl
julia> firstdict= Dict("One" => 1, "Two" => 2, "Three" => 3);

julia> newdict= Dict( string(key,"SSS") => fill('*', value) for (key, value) in firstdict )
Dict{String,Array{Char,1}} with 3 entries:
  "TwoSSS"   => ['*', '*']
  "OneSSS"   => ['*']
  "ThreeSSS" => ['*', '*', '*']

Inverting a Dictionary

snippet.juliarepl
julia> mydict = Dict{String,Int64}("One" => 1, "Two" => 2, "Three" => 3);

julia> allunique( values(mydict) ) || error("non-deterministic dictionary invert request")
true

julia> Dict(value => key for (key, value) in mydict)     ## method 1
Dict{Int64,String} with 3 entries:
  2 => "Two"
  3 => "Three"
  1 => "One"

A “julia special” method 2 is

snippet.juliarepl
julia> mydict = Dict{String,Int64}("One" => 1, "Two" => 2, "Three" => 3);

julia> allunique( values(mydict) ) || error("non-deterministic dictionary invert request")
true

julia> arrnow = collect(mydict)                ## convert to array first
3-element Array{Pair{String,Int64},1}:
 "One"=>1
 "Two"=>2
 "Three"=>3

julia> arrinv= reverse.(arrnow)                ## reverse each pair
3-element Array{Pair{Int64,String},1}:
 1=>"One"
 2=>"Two"
 3=>"Three"

julia> Dict( arrinv )                          ## and make it a dictionary
Dict{Int64,String} with 3 entries:
  2 => "Two"
  3 => "Three"
  1 => "One"

Printing a Dictionary

snippet.juliarepl
julia> mydict = Dict{String,Int64}("One" => 1, "Two" => 2, "Three" => 3);
  
julia> println(mydict)                                             ## default show
Dict("One"=>1,"Two"=>2,"Three"=>3)

julia> for key in keys(mydict); println("\t$key -=> $(mydict[key])"); end#for   ## your own display
	One -=> 1
	Two -=> 2
	Three -=> 3

Testing for the Existence of a Key or Key-Value

snippet.juliarepl
julia> mydict = Dict("One" => 1, "Two" => 2, "Three" => 3);

julia> haskey(mydict, "One")                                   ## key only
true

julia> in( ("Two" => 2), mydict )                              ## key-value pair
true

Accessing Elements by Key

The getindex operator (i.e., the brackets) can be used to fetch the value corresponding to the input key. However, if the key doesn't exist in the dictionary, an error is thrown. To avoid errors, use the get function is used (which returns a default value if not present):

snippet.juliarepl
julia> mydict = Dict{String,Int64}("One" => 1, "Two" => 2, "Three" => 3);

julia> get(mydict, "One",99)                       ## key exists
1

julia> get(mydict, "NotThere",99)                  ## key does not exist99

julia> mydict["One"]                                 ## key exists
1

julia> mydict["NotThere"]                            ## key does not exist  ==> Error
ERROR: KeyError: key "NotThere" not found
Stacktrace:

Setting a Dictionary Key-Value

snippet.juliarepl
julia> mydict = Dict{String,Int64}("One" => 1, "Two" => 2, "Three" => 3);

julia> mydict["Four"] = 4; mydict
Dict{String,Int64} with 4 entries:
  "One"   => 1
  "Two"   => 2
  "Three" => 3
  "Four"  => 4

Adding a New Key-Value

If you want to add key-value only if the key does not yet exists,

snippet.juliarepl
julia> mydict = Dict{String,Int64}("One" => 1, "Two" => 2, "Three" => 3);
       
julia> if (!(haskey(mydict, "Four"))) mydict["Four"]=4 end;#if

julia> mydict
Dict{String,Int64} with 4 entries:
  "One"   => 1
  "Two"   => 2
  "Three" => 3
  "Four"  => 4
  • WARNING: haskey(mydict, "Four") || mydict["Four"]=4 fails, because mydict[“Four”] does not yet exist.

Deleting an Entry (Key) from a Dictionary

snippet.juliarepl
julia> mydict = Dict{String,Int64}("One" => 1, "Two" => 2, "Three" => 3);

julia> delete!(mydict, "Three")
Dict{String,Int64} with 2 entries:
  "One" => 1
  "Two" => 2

julia> delete!(mydict, "Four")                      ## not an error, but ignored
Dict{String,Int64} with 2 entries:
  "One"   => 1
  "Two"   => 2

Filtering (Out (Un-)Desirables) By Criteria

snippet.juliarepl
julia> mydict = Dict("One" => 1, "Two" => 2, "Three" => 3);

julia> filter( p->(last(p) % 2 == 1) , mydict )       ## keep only odd-numbered values
Dict{String,Int64} with 2 entries:
  "One"   => 1
  "Three" => 3

julia> filter( p->(length(first(p)) == 3) , mydict )       ## keep only three-letter keys
Dict{String,Int64} with 2 entries:
  "One" => 1
  "Two" => 2

Printing Dictionary Sorted By Key

snippet.juliarepl
julia> mydict = Dict{String,Int64}("One" => 1, "Two" => 2, "Three" => 3);

julia> for key in sort(collect(keys(mydict)));  println("\t$key => $(mydict[key])"); end
	One => 1
	Three => 3
	Two => 2

Printing Dictionary Sorted by Value

Unique Values

snippet.juliarepl
julia> mydict= Dict{String,Int64}("Ten" => 10, "Two" => 2, "Three" => 3);

julia> allunique( values(mydict) ) || error("non-deterministic dictionary invert request")
true

julia> invmydict= Dict( value => key for (key,value) in mydict )              ## invert
Dict{Int64,String} with 3 entries:
  10 => "Ten"
  2  => "Two"
  3  => "Three"

julia> valuesinorder= sort(collect(keys(invmydict)));

julia> for key in valuesinorder;  println("$key => $(invmydict[key])"); end##for##
2 => Two
3 => Three
10 => Ten

Non-Unique Values

snippet.juliarepl
julia> mydict= Dict( "a" => 3, "b" => 5, "c" => 2, "d" => 3 );

julia> asarray= collect(mydict)            ## change dict into array of pairs
4-element Array{Pair{String,Int64},1}:
 "c"=>2
 "b"=>5
 "a"=>3
 "d"=>3

julia> sort( asarray, by=x->last(x) )  ## last() works on each pair
4-element Array{Pair{String,Int64},1}:
 "c"=>2
 "a"=>3
 "d"=>3
 "b"=>5

Printing Dictionary in Sorted Value Order

snippet.juliarepl
julia> function showsorted(mydict::Dict)
	  asvector= collect( mydict )
          assortedvector= sort( asvector, by = x->last(x) )
	  for i=1:length(assortedvector)
	        println(assortedvector[i][2], "\t", assortedvector[i][1])
	    end#for
	end;#function##

julia> showsorted( Dict( "a" => 3, "b" => 5, "c" => 2, "d" => 4 ) )
2	c
3	a
4	d
5	b

Keeping Dictionaries in Insertion Order

snippet.juliarepl
julia> using DataStructures

julia> mydict= OrderedDict( "a" => 3, "B" => 5, "c" => 2, "D" => 3 );

julia> mydict
OrderedDict{String,Int64} with 4 entries:
 "a"=>3
 "B"=>5
 "c"=>2
 "D"=>3

Merging Two Dictionaries

snippet.juliarepl
julia> adict = Dict{String,Int64}("One" => 1, "Two" => 2, "Three" => 3);

julia> bdict = Dict{String,Int64}("Ten" => 10, "Two" => –20);

julia> merge(adict,bdict)
Dict{String,Int64} with 4 entries:
  "One"   => 1
  "Two"   => –20
  "Three" => 3
  "Ten"   => 10

Counting all Unique Words in a String

snippet.juliarepl
julia> wordcounts= Dict{String,Int64}();

julia> for w in split("a ab ac ad ab ab ba ba"); wordcounts[w]= get(wordcounts, w, 0) + 1; end#for

julia> wordcounts
Dict{String,Int64} with 5 entries:
  "ac" => 1
  "ad" => 1
  "ab" => 3
  "a"  => 1
  "ba" => 2

Finding X Most Common Words

(See above for printing dictionary sorted by key.)

snippet.juliarepl
julia> wordcounts= Dict{String,Int64}();

julia> for w in split("a ab ac ad ab ab ba ba"); wordcounts[w]= get(wordcounts, w, 0) + 1; end;#for

julia> arrsortedbycount= sort( collect( wordcounts ), by=x->last(x), rev=true )
5-element Array{Pair{String,Int64},1}:
 "ab"=>3
 "ba"=>2
 "ac"=>1
 "ad"=>1
 "a"=>1

julia> first.(arrsortedbycount[ 1:2 ])                           ## most frequent 2 words
2-element Array{String,1}:
 "ab"
 "ba"

julia> first.( filter( x->(last(x) >= 2), arrsortedbycount ) )   ## words appearing at least 2 times
2-element Array{String,1}:
 "ab"
 "ba"

Constructing and Using Dictionaries of Arrays

snippet.juliarepl
julia> dictofarrays= Dict{ String, Array{Int64}}()
Dict{String,Array{Int64,N} where N} with 0 entries

julia> dictofarrays["ones"]= [1,2,3]; dictofarrays["tens"]= [10,20]; dictofarrays
Dict{String,Array{Int64,N} where N} with 2 entries:
  "ones" => [1, 2, 3]
  "tens" => [10, 20]

Constructing and Using Arrays of Dictionarys

snippet.juliarepl
julia> adict = Dict("One" => 1, "Two" => 2, "Three" => 3)
Dict{String,Int64} with 3 entries:
  "One"   => 1
  "Two"   => 2
  "Three" => 3

julia> bdict = Dict("Ten" => 10, "Two" => 20)
Dict{String,Int64} with 2 entries:
  "Ten" => 10
  "Two" => 20

julia> vcat(adict,bdict)
2-element Array{Dict{String,Int64},1}:
 Dict("One"=>1,"Two"=>2,"Three"=>3)
 Dict("Ten"=>10,"Two"=>20)

Constructing and Using Dictionaries of Dictionaries

snippet.juliarepl
julia> dictofdicts = Dict{ String, Dict{String,Float64}}()  ## note the () at the end to signal "instance"
Dict{String,Dict{String,Float64}} with 0 entries

julia> adict = Dict{String,Int64}("One" => 1, "Two" => 2, "Three" => 3)
Dict{String,Int64} with 3 entries:
  "One"   => 1
  "Two"   => 2
  "Three" => 3

julia> bdict = Dict("Ten" => 10, "Two" => 20)
Dict{String,Int64} with 2 entries:
  "Ten" => 10
  "Two" => 20

julia> dictofdicts["an-a-dict"]= adict
Dict{String,Int64} with 3 entries:
  "One"   => 1
  "Two"   => 2
  "Three" => 3

julia> dictofdicts["a-b-dict"]= bdict
Dict{String,Int64} with 2 entries:
  "Ten" => 10
  "Two" => 20

julia> dictofdicts
Dict{String,Dict{String,Float64}} with 2 entries:
  "a-b-dict"  => Dict("Ten"=>10.0,"Two"=>20.0)
  "an-a-dict" => Dict("One"=>1.0,"Two"=>2.0,"Three"=>3.0)

Important Reminder: Copies and Deepcopies

Only Assignment (Alias) Can Alter Referenced Dictionary

snippet.juliarepl
julia> original= Dict{ String, Array{Int64}}( "tens" => [10,20], "hundreds" => [100,200], "thousands" => [1000, 2000, 3000] )
Dict{String,Array{Int64,N} where N} with 3 entries:
  "hundreds"  => [100, 200]
  "thousands" => [1000, 2000, 3000]
  "tens"      => [10, 20]

julia> asgn= original;  cp= copy(original);  dpcp= deepcopy(original);

julia> asgn["tens"]= [10,20,30]; cp["hundreds"]= [100,200,300];  dpcp["thousands"]= [1000,2000,3000];

julia> original
Dict{String,Array{Int64,N} where N} with 3 entries:
  "hundreds"  => [100, 200]
  "thousands" => [1000, 2000, 3000]
  "tens"      => [10,20,30]

Only Alias and Copy (not Deepcopy) Can Alter Referenced Dictionary's Referenced Array

snippet.juliarepl
julia> original= Dict{ String, Array{Int64}}( "tens" => [10,20], "hundreds" => [100,200], "thousands" => [1000, 2000, 3000] );

julia> asgn= original;  cp= copy(original);  dpcp= deepcopy(original);

julia> asgn["tens"][1]= –40; cp["hundreds"][2]= –400;  dpcp["thousands"][3]= –4000;

julia> original
Dict{String,Array{Int64,N} where N} with 3 entries:
  "hundreds"  => [100,400]
  "thousands" => [1000, 2000, 3000]
  "tens"      => [40, 20]

Extended Example: Transform Compressed CSV File into Dict of Pair Vectors

The om-zerocd.csv.gz file is compressed and has entries like

FIXME fix this example after we know how to embed a data frame

snippet.julianoeval
[download only julia statements]
julia> using DataFrames, Gzip
 
julia> open( "tmp-sample.csv.gz", "w" ) do;
	write( DataFrame( yyyymmdd= [ 19960102, 19960102, 
yyyymmdd,days,rate
19960102,9,5.763067
19960102,15,5.745902
19960102,50,5.673317
19960102,78,5.608884
19960103,8,5.763067
19960103,14,5.747397
19960103,49,5.672263
19960103,77,5.603705

We want to create a dictionary with keys that are the yyyymmdd, and values that are vectors of pairs with days and rate.

snippet.juliafix
[download only julia statements]
using DataFrames, CodecZlib
 
zerocdcsv= readtable( GzipDecompressorStream( open("tmp-sample.csv.gz", "r") ) );
 
zerocd= Dict{ Int, Vector{Pair{Int,Float64}} }()
 
for i=1:nrow(zerocdcsv)
    yyyymmdd= Int(zerocdcsv[i,1])
    entry= Pair( Int(zerocdcsv[i,2]), Float64(zerocdcsv[i,3]) )
    if (!haskey(zerocd, yyyymmdd))
	zerocd[ yyyymmdd ]= [ entry ]
    else
	push!(zerocd[ yyyymmdd ],  entry )
    end#if#
end#for#
 
julia> println( zerocsv[ 19960102 ] )
Pair{Int64,Float64}[9=>5.76307, 15=>5.7459, 50=>5.67332, 78=>5.60888]

Backmatter

Useful Packages on Julia Repository

  • DataStructures.jl contains many more useful data structures, including not only ordered dicts, but alsosorted dicts, dicts with defaults, multi-dicts, and trees.
  • Flatten.jl converts flat to nested structures and vice-versa.

Notes

References

dicts.txt · Last modified: 2018/11/22 20:48 (external edit)