User Tools

Site Tools


strings

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
strings [2018/11/14 07:39]
julia [Finding the Location of Substring(s) in a String]
strings [2018/12/28 11:19] (current)
Line 5: Line 5:
  
 ```juliarepl ```juliarepl
-julia> pkgchk( [ "​julia"​ => v"1.0.1" ] )+julia> pkgchk.( [ "​julia"​ => v"1.0.3" ] );
 ``` ```
  
Line 25: Line 25:
  
  
-- When defining a [[functions|function]] that takes a string argument, use `AbstractString` instead of `String`. ​ This will be explained in [[functions|functions]]. ​ Basically, it allows the function to work on string-like objects (like substrings),​ too.+- When defining a [[fundispatch|function]] that takes a string argument, use `AbstractString` instead of `String`. ​ This will be explained in [[functions|functions]]. ​ Basically, it allows the function to work on string-like objects (like substrings),​ too.
  
  
Line 172: Line 172:
 6 6
  
-julia> sizeof("​❤"​),​ length("​❤"​) ##​ in UTF-8, a heart requires 3 bytes +julia> ​sizeof('​❤'​), ​sizeof("​❤"​),​ length("​❤"​) ##​ in UTF-8, a heart requires 3 bytes 
-(3, 1)+(4, 3, 1)
  
 julia> lastindex("​AéB𐅍CD"​) ##​ byte length, not character length julia> lastindex("​AéB𐅍CD"​) ##​ byte length, not character length
 10 10
 ``` ```
 +
 +* All characters are 4 bytes, but their string encoding can be smaller. ​ This is why `sizeof('​❤'​) < sizeof("​❤"​)`.
  
 * `length(s)` is always less than or equal `lastindex(s)`. * `length(s)` is always less than or equal `lastindex(s)`.
Line 183: Line 185:
 * `ind2chr(s,​i)` and `chr2ind(s,​i)` convert indexes from character to byte index and vice-versa. * `ind2chr(s,​i)` and `chr2ind(s,​i)` convert indexes from character to byte index and vice-versa.
  
-FIXME (Andreas:) Maybe note that it has to be `sizeof("​❤"​)` and not `sizeof('​❤'​)` since all characters are four bytes even though their string encoding is smaller.+
  
 ## Testing for String Content with Digits, Alphas, UTF-8, Etc. ## Testing for String Content with Digits, Alphas, UTF-8, Etc.
Line 305: Line 307:
 * the optional '​raise'​ allows specifying whether an impossible parse should raise an exception or not * the optional '​raise'​ allows specifying whether an impossible parse should raise an exception or not
  
-FIXME (Andreas:Maybe mention that using a type when parsing ​is much more efficient ​than `Meta.parse`+* `parse()is not only less subject to surprises ​than `Meta.parse()`, but also far more efficient:
  
-```juliarepl+```julianoeval
 julia> @btime parse(Float64,​ "​1.1"​) julia> @btime parse(Float64,​ "​1.1"​)
   23.581 ns (0 allocations:​ 0 bytes)   23.581 ns (0 allocations:​ 0 bytes)
 1.1 1.1
  
-julia> @btime Meta.parse("​1.1"​)+julia> @btime Meta.parse("​1.1"​) ## a factor 1,000 slower!
   18.917 μs (10 allocations:​ 256 bytes)   18.917 μs (10 allocations:​ 256 bytes)
 1.1 1.1
Line 319: Line 321:
 ## Converting Numbers to Strings ## Converting Numbers to Strings
  
-```juliarepl +### Simplest
-julia> using Printf+
  
 +```juliarepl
 julia> string(8.89) ##​ lowercase string() julia> string(8.89) ##​ lowercase string()
 "​8.89"​ "​8.89"​
 +
 +```
 +
 +### C-Style Macros: @printf and @sprintf
 +
 +C-style printf and sprintf work when used as macros, requiring '​@'​ function prefixes:
 +
 +```juliarepl
 +julia> using Printf
 +
 +julia> @printf("​%12.5f",​ pi)
 +     ​3.14159
  
 julia> @sprintf("​%.3f",​pi) ##​ macros cannot use computed arguments from the program, just constants julia> @sprintf("​%.3f",​pi) ##​ macros cannot use computed arguments from the program, just constants
 "​3.142"​ "​3.142"​
  
 +```
 +
 +### C-Style Julia: printf and sprintf
 +
 +```juliarepl
 julia> using Formatting ##​ for a compiled version julia> using Formatting ##​ for a compiled version
  
Line 338: Line 357:
 ``` ```
  
-* These functions cannot deal with vectors. ​ To convert into a vector of string, use+* These functions cannot deal with vectors. ​ To convert ​a vector of numbers ​into a vector of strings, use
  
-FIXME The following is wrong. ​ How do I write a vector sprintf function? 
  
-FIXME (Andreas:I assume you want vector of strings. So what you try to do is implicit "​vectorizing"​ which is typically discouraged. I'd just use a comprehension here `[@sprintf("​%f",​ xᵢ) for xᵢ in x]`+### Vectorized ​(S)printing
  
-```juliafixme +Just use the comprehension expression itself:
-julia> sprintf(format::String, x::​Vector)::​Vector{String}= [ @sprintf(format,​ x[i]) for i=1:​length(x) ] +
-  +
-```+
  
 +```juliarepl
 +julia> x= 1:3;
  
-FIXME Look into `StringLiterals`+julia> using Printf; ​  [ @sprintf("​%.3f",​ xi ) for xi in x ] 
 +3-element Array{String,​1}:​ 
 + "​1.000"​ 
 + "​2.000"​ 
 + "​3.000"​
  
-```text +julia> ​using Formatting; ​ sprintf1.("​%.3f"​x) 
-The StringLiterals package is an attempt to bring a cleaner string literal syntax to Julia, as well as having an easier way of producing formatted strings, borrowing from both Python and C formatted printing syntax. It also adds support for using LaTexEmoji, HTML, or Unicode entity names that are looked up at compile-time.+3-element Array{String,​1}:​ 
 + "​1.000"​ 
 + "​2.000"​ 
 + "​3.000"
  
-Currently, it adds a Swift style string macro, f"​...",​ which uses the Swift syntax for interpolation,​ i.e. \(expression). ​ 
 ``` ```
  
-### C-Style Formatting Printing With Printf and Sprinf +You could write this into function that operates on a vector, but this is not the Julia way.
-FIXME (Andreas:​) ​You have already used this in `@sprintf` so maybe reorganize ​bit+
  
-C-style printf also works as a macro, requiring '​@'​ function prefixes:+ 
 +## Converting Function Name to Strings
  
 ```juliarepl ```juliarepl
-julia> ​using Printf+julia> ​f= [ sqrt, exp, sin ] ; [ "​$fi(10) = $(fi(10))"​ for fi in f ] 
 +3-element Array{String,​1}:​ 
 + "​sqrt(10) = 3.1622776601683795"​ 
 + "​exp(10) = 22026.465794806718"​  
 + "​sin(10) = -0.5440211108893698"​
  
-julia> @printf("​%12.5f",​ pi) 
-     ​3.14159 
 ``` ```
- 
- 
- 
  
 ## Converting Strings to Function Names ## Converting Strings to Function Names
  
 ```juliarepl ```juliarepl
-julia> ​ftest()= "​invoked ftest";​+julia> ​s= Symbol(sqrt) ## first convert to symbol 
 +:sqrt
  
-julia> ​const fnmstring= "ftest";+julia> ​eval(s)(9) ##​ dangerous: an eval on a user input string could wreak havoc! 
 +3.0 
 + 
 +julia> f[ :sqrt, :exp, :sin ] ; [ "$fi(10) = $(eval(fi)(10))"​ for fi in f ] 
 +3-element Array{String,​1}:​ 
 + "​sqrt(10) = 3.1622776601683795"​ 
 + "​exp(10) = 22026.465794806718"​ 
 + "​sin(10) = -0.5440211108893698"
  
-julia> eval( Symbol(fnmstring) )() 
-"​invoked ftest" 
 ``` ```
  
-* Useful for passing function names to other functions, which can then evaluate the function. 
- 
-FIXME (Andreas:) This should generally be discouraged. Most users shouldn'​t need to `eval` anything. 
  
 ## Preview: Arrays or Tuples With Strings ## Preview: Arrays or Tuples With Strings
Line 399: Line 424:
 ``` ```
  
-FIXME (Andreas:) Maybe mention that `'​` ​can't be used to permute the dimensions ("​transpose"​) a vector or matrix ​of strings.+* WARNING: `[1,2]'​` ​transposes numerical arrays just fine, but this does not work for arrays ​of strings ​`["​1","​2"​]`. 
 + 
  
 ### Stringifying Numeric Arrays ### Stringifying Numeric Arrays
  
-To convert an array of numbers into an array of strings, use the [[functions#​dot-postfix_functions|element-wise version]] of `string()` or use `map()`:+To convert an array of numbers into an array of strings, use the [[funother#​dot-postfix_functions|element-wise version]] of `string()` or use `map()`:
  
 ```juliarepl ```juliarepl
Line 412: Line 439:
  "​3.0"​  "​3.0"​
 ``` ```
 +
 +* you could also use the `sprintf1` or `@sprintf` facilities
 +
 +* for multidimensional arrays, the output is "[1 2; 3 4]"
 +
 +
  
 ### Converting Character Ranges to String Ranges ### Converting Character Ranges to String Ranges
Line 423: Line 456:
  "​4"​  "​4"​
  
-julia> map( x-> x[1], ans ) ## convert back to string ​array+julia> map( x-> x[1], ans ) ## convert back to char array
 4-element Array{Char,​1}:​ 4-element Array{Char,​1}:​
  '​1'​  '​1'​
Line 429: Line 462:
  '​3'​  '​3'​
  '​4'​  '​4'​
 +
 ``` ```
  
Line 551: Line 585:
 Note that `s[1]` is a character, while `s[[1]]` or `s[1:1]` is a string. ​ To convert a character to a string, use `string('​c'​)`. Note that `s[1]` is a character, while `s[[1]]` or `s[1:1]` is a string. ​ To convert a character to a string, use `string('​c'​)`.
  
-FIXME (Andreas:) Maybe repeat a caution against ​wide characters ​here, i.e.+WARNING This does not work with wide characters
 ```juliarepl ```juliarepl
-julia> "​æble"​[2:4]+julia> ​x="​æble"​ 
 +"​æble"​ 
 + 
 +julia> x[1] 
 +'​æ'​Unicode U+00e6 (category Ll: Letter, lowercase) 
 + 
 +julia> x[2]
 ERROR: StringIndexError("​æble",​ 2) ERROR: StringIndexError("​æble",​ 2)
 Stacktrace: Stacktrace:
  [1] string_index_err(::​String,​ ::Int64) at ./​strings/​string.jl:​12  [1] string_index_err(::​String,​ ::Int64) at ./​strings/​string.jl:​12
- ​[2] ​getindex(::String, ::UnitRange{Int64}) at ./​strings/​string.jl:​245 + ​[2] ​getindex_continued(::String, ::Int64, ::UInt32) at ./​strings/​string.jl:​216 
- [3] top-level scope at none:0+ [3] getindex(::​String,​ ::Int64) at ./​strings/​string.jl:​209 
 + [4] top-level scope at none:0
 ``` ```
  
Line 681: Line 723:
  
 ### Filtering (Grepping) Only Matching Strings in Vector of Strings ### Filtering (Grepping) Only Matching Strings in Vector of Strings
 +
 +The Julia way is to use a comprehension:​
 +
 +```juliarepl
 +julia> heystack= [ "​ab1",​ "​ab2",​ "​cd1",​ "​ab3",​ "​ef5"​ ];
 +
 +julia> filter( x -> occursin("​ab",​ x), heystack ) ## method 1
 +3-element Array{String,​1}:​
 + "​ab1"​
 + "​ab2"​
 + "​ab3"​
 +
 +julia> w= occursin.( "​ab",​ heystack ) ## method 2
 +5-element BitArray{1}:​
 +  true
 +  true
 + false
 +  true
 + false
 +
 +julia> ​ heystack[ w ]
 +3-element Array{String,​1}:​
 + "​ab1"​
 + "​ab2"​
 + "​ab3"​
 +
 +```
 +
 +The non-Julia way is to define a vector function
  
 ```juliarepl ```juliarepl
-julia> gnep(needle,​heystack)= filter( ​hey->​occursin(needle, ​hey), heystack )+julia> gnep(needle,​heystack::Vector)= filter( ​-> occursin(needle, ​x), heystack )
 gnep (generic function with 1 method) gnep (generic function with 1 method)
  
Line 691: Line 762:
  "​ab2"​  "​ab2"​
  "​ab3"​  "​ab3"​
 +
 ``` ```
-FIXME (Andreas:) This kind of automatic vectorization is typically being discouraged. Instead you'd want to use broadcasting.+
  
 ### Finding the Location of Substring(s) in a String ### Finding the Location of Substring(s) in a String
Line 720: Line 792:
 ``` ```
  
-WARNING ​`rsearch(s::​AbstractString, ​c::Char)` is deprecateduse `coalesce(findlast(isequal(c),​ s)0)` instead.+### Finding the Location of Char in a String 
 + 
 +```juliarepl 
 +julia> ​s= "ab cd as cd more cd end";​ 
 + 
 +julia> findfirst(isequal('​c'),s) 
 +
 + 
 +julia> ​findlast(isequal('c'),s) 
 +18 
 + 
 +julia> findfirst(isequal('​0'),s) ## can be tested against '​nothing'​ 
 + 
 +```
  
-FIXME (Andreas) this is too literal translation. Users shouldn'​t need the `coalesce` part. The point is that users should now test against `nothing` instead of `0`. 
  
 ### Replacing String Inside Other String ### Replacing String Inside Other String
Line 788: Line 872:
 #### Good #### Good
  
-`readcsv()` ​understands ​csv and can be used on individual lines or on whole files ([[dataio#​reading_and_writing_matrices_to_csv_tsv_sv_files|dataio]])+`readcsv()` ​and `readdlm()` understand ​csv and can be used on individual lines or on whole files ([[dataio#​reading_and_writing_matrices_to_csv_tsv_sv_files|dataio]])
  
 ```juliarepl ```juliarepl
Line 798: Line 882:
 ``` ```
  
 +* For nontrivial cases, use the optimized [CSV.jl](https://​juliadata.github.io/​CSV.jl).
  
 +* [TextParse](https://​juliacomputing.github.io/​TextParse.jl/​) offers a `TextParse.csvread(filename)` function.
  
-## Reading from and Writing Strings to File+ 
 +## Reading ​Strings ​from and Writing Strings to Files
  
 See also [[fileio|File IO]]. See also [[fileio|File IO]].
Line 906: Line 993:
  
 * The colon prefix is used not only for a Symbol, but also for expressions,​ such as `a=3; :($a+3)`. * The colon prefix is used not only for a Symbol, but also for expressions,​ such as `a=3; :($a+3)`.
 +
 +* FIXME Look into `StringLiterals`:​
 +
 +```text
 +The StringLiterals package is an attempt to bring a cleaner string literal syntax to Julia, as well as having an easier way of producing formatted strings, borrowing from both Python and C formatted printing syntax. It also adds support for using LaTex, Emoji, HTML, or Unicode entity names that are looked up at compile-time.
 +
 +Currently, it adds a Swift style string macro, f"​...",​ which uses the Swift syntax for interpolation,​ i.e. \(expression). ​
 +```
 +
  
 ## References ## References
strings.txt · Last modified: 2018/12/28 11:19 (external edit)