julia> pkgchk.( [ "julia" => v"1.0.3" ] );
Strings are perhaps the most important building blocks of general-purpose computer languages. In modern languages, regular expressions have become central to dealing with strings. Regex are treated in Regular Expressions.
Quoting | Example | Explanation |
---|---|---|
“ | “mytext” | A standard double-quoted string, with interpolation |
r” | r“[abc]” | Preceded by letter means a special type of string |
' | 'c' | A single quote designates a (UTF-8) character |
`` | `ls` | A backquote (backtick), used for operating system commands |
: | :sym | A symbols_and_variable_names_symbol is a string limited to julia identifier characters (and w/o interpolation) |
Most strings are created with double quotes. Backslashes are used for quoting. Strings can contain newlines:
[download only julia statements] julia> "ab\"cd ## string with embedded newline ef" "ab\"cd\nef" julia> "\u00a5 \u20ac ¥" ## string with UTF-8, quoted and direct "¥ € ¥"
There are a number of variants of strings, which can have different meanings. For example r“[abc]” is a regular expression. A
“raw” string eliminates interpolation and many special character interpretation, which can make it easier to create a String
with many special-meaning characters. The result of entering a raw quote is an ordinary string, though:
julia> raw"a$x\ab\n\u00A5" "a\$x\\ab\\n\\u00A5" julia> typeof( ans ) String
Triple-quoted strings have special meaning and (primarily) make it easier to include doublequotes:
julia> """ ## start of string ab"cd """ ## end of string ## "\t\t## start of string\n\nab\"cd\n\n"
julia> repeat("-+", 10) ## two times ten is twenty characters "-+-+-+-+-+-+-+-+-+-+"
julia> using Random julia> Random.seed!(0); julia> randstring(12) ## ASCII only, not UTF–8 "0IPrGg0JVONT"
julia> Int('A'), convert(Int, 'A') ## UTF–8 Index (65, 65) julia> convert(Char, 65') 'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase) julia> '\u41' ## unicode entry 'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase) julia> string( UInt32('A'); base=16 ) "41" julia> string( UInt32('A'); base=2 ) "1000001"
julia> ascii("ab") ## only tests that entire string is ascii, throws exception otherwise "ab" julia> ascii("abπ") ## Julia does not know how to convert UTF–8 into ASCII ERROR: ArgumentError: invalid ASCII at index 3 in "abπ" Stacktrace: julia> map( c->(isascii(c) ? c : '?'), "abπ") ## replace all utf–8 characters with '?' "ab?"
To convert into UTF-8 into ASCII with proper escape sequences
julia> hex(a::Char; kwargs...)= string( UInt32(a); base=16, kwargs... ) hex (generic function with 1 method) julia> function escape_unicode(s::AbstractString) buf= IOBuffer() for c in s if isascii(c) print(buf, c) else i= UInt32(c) if i < 0x10000 print( buf, "\\u", hex(c; pad= 4) ) else print( buf, "\\U", hex(c; pad= 8) ) end#if end#if end#for return String( take!(buf) ) end#function## escape_unicode (generic function with 1 method) julia> const yen= unescape_string("\\u00a5") "¥" julia> const oneyen= unescape_string("1 \\u00a5") "1 ¥" julia> escape_unicode(oneyen) "1 \\u00a5"
In UTF-8, characters can be more than one byte long.
julia> length("AéB𐅍CD") ## character length, not byte length! 6 julia> sizeof('❤'), sizeof("❤"), length("❤") ## in UTF–8, a heart requires 3 bytes (4, 3, 1) julia> lastindex("AéB𐅍CD") ## byte length, not character length 10
sizeof('❤') < sizeof("❤")
.
length(s)
is always less than or equal lastindex(s)
.
ind2chr(s,i)
and chr2ind(s,i)
convert indexes from character to byte index and vice-versa.julia> function whatis(c) for isfunction in (isletter, isascii, iscntrl, isdigit, islowercase, isnumeric, isprint, ispunct, isspace, isuppercase, isxdigit) println("$(isfunction)($c)= \t$(isfunction(c))") end#for end#function## whatis (generic function with 1 method) julia> whatis('\u20AC') ## for characters isletter(€)= false isascii(€)= false iscntrl(€)= false isdigit(€)= false islowercase(€)= false isnumeric(€)= false isprint(€)= true ispunct(€)= false isspace(€)= false isuppercase(€)= false isxdigit(€)= false
julia> all(isletter, "abc23") false julia> all(isascii, "AéB𐅍CD") false julia> any(isascii, "AéB𐅍CD") true
These functionalities can also be accomplished with regex expressions.
julia> "1" * "2" * "3" * "h" ## works only with strings, not with mixed numbers and strings. "123h"
Strings can also be concatenated with the string
function. Unlike with the *
operator, non-string objects are converted into strings if they have a show()
method function:
julia> string("One+", "Two+", 3, '+', :four) ## the last is a "Symbol" type "One+Two+3+four" julia> const many= ( "one|" , "two|" , 3 , 3.0 , '|' , :four) ## a tuple of elements (PS: internally uses show() methods) ("one|", "two|", 3, 3.0, '|', :four) julia> typeof(many) ## most suitable type of each element; const is always ignored Tuple{String,String,Int64,Float64,Char,Symbol} julia> string(many) ## the tuple is one string() argument "(\"one|\", \"two|\", 3, 3.0, '|', :four)" julia> string(many...) ## the tuple is turned into many string() arguments "one|two|33.0|four"
The coolest feature of julia strings is that the $
notation can be used to substitute a user defined variable or expression (its string equivalent to be more precise) into any position in a string:
julia> const w= "world"; const x= 1; "hello $w $x" "hello world 1" julia> "two= $(x + 1)" "two= 2"
\$
) in order not to be confused with interpolation.julia> parse( Float32, "1.1" ) ## basic parse with known type 1.1f0 julia> typeof( Meta.parse("12") ) ## Meta.parse tries to convert strings into a most suitable type Int64 julia> typeof( Meta.parse("12.0") ) Float64 julia> typeof( Meta.parse("haha") ) Symbol julia> typeof( Meta.parse("5+6") ) Expr julia> Float64( Meta.parse("12") ) ## request specific type, can throw exception 12.0 julia> Meta.parse("a12"; raise=false) ## don't except; if fails, result is a symbol :a12
Int("9")
is illegal. Int('9')
gives the ASCII code (57), not the integer value (9).
parse()
is not only less subject to surprises than Meta.parse()
, but also far more efficient:[download only julia statements] julia> @btime parse(Float64, "1.1") 23.581 ns (0 allocations: 0 bytes) 1.1 julia> @btime Meta.parse("1.1") ## a factor 1,000 slower! 18.917 μs (10 allocations: 256 bytes) 1.1
julia> string(8.89) ## lowercase string() "8.89"
C-style printf and sprintf work when used as macros, requiring '@' function prefixes:
julia> using Printf julia> @printf("%12.5f", pi) 3.14159 julia> @sprintf("%.3f",pi) ## macros cannot use computed arguments from the program, just constants "3.142"
julia> using Formatting ## for a compiled version julia> sprintf1("%'d", 1000000) ## note the quote, and commas in the output "1,000,000" julia> sprintf1("%'f", 1000000.0) "1,000,000.000000"
Just use the comprehension expression itself:
julia> x= 1:3; julia> using Printf; [ @sprintf("%.3f", xi ) for xi in x ] 3-element Array{String,1}: "1.000" "2.000" "3.000" julia> using Formatting; sprintf1.("%.3f", x) 3-element Array{String,1}: "1.000" "2.000" "3.000"
julia> f= [ sqrt, exp, sin ] ; [ "$fi(10) = $(fi(10))" for fi in f ] 3-element Array{String,1}: "sqrt(10) = 3.1622776601683795" "exp(10) = 22026.465794806718" "sin(10) = –0.5440211108893698"
julia> s= Symbol(sqrt) ## first convert to symbol :sqrt julia> eval(s)(9) ## dangerous: an eval on a user input string could wreak havoc! 3.0 julia> f= [ :sqrt, :exp, :sin ] ; [ "$fi(10) = $(eval(fi)(10))" for fi in f ] 3-element Array{String,1}: "sqrt(10) = 3.1622776601683795" "exp(10) = 22026.465794806718" "sin(10) = –0.5440211108893698"
Arrays and Tuples can hold strings. For example,
julia> [ "a" "b"; "c" "d" ] ## a two-dimensional array of strings 2×2 Array{String,2}: "a" "b" "c" "d"
[1,2]'
transposes numerical arrays just fine, but this does not work for arrays of strings ["1","2"]
.
To convert an array of numbers into an array of strings, use the element-wise version of string()
or use map()
:
julia> string.( [ 1.0, 2.0, 3.0 ] ) ## or map( string, [1.0, 2.0, 3.0] ) 3-element Array{String,1}: "1.0" "2.0" "3.0"
sprintf1
or @sprintf
facilities
julia> string.( '1':'4'; ) ## convert character array to string array; note semicolon 4-element Array{String,1}: "1" "2" "3" "4" julia> map( x-> x[1], ans ) ## convert back to char array 4-element Array{Char,1}: '1' '2' '3' '4'
julia> using DelimitedFiles julia> readdlm( IOBuffer("1 2 3\n4 5 6"), Int ) 2×3 Array{Int64,2}: 1 2 3 4 5 6
readdlm()
is usually a file operation. However, by wrapping a string into an IOStream, file operations work.
?readdlm
.julia> replace("ab cd ef gh", " " => "|"; count=2) ## only the first two replacements "ab|cd|ef gh"
julia> print("a\tb\n"); a b julia> escape_string("a\tb\n") "a\\tb\\n" julia> unescape_string("a\\tb\\n") "a\tb\n"
The rstrip()
(lstrip()
)function can be used to remove trailing (leading) blanks from a string.
julia> const sa1= ("ab\n", "ab\r", "ab\r\n", "ab\n\r", "ab\r\n\r\n"); ## 5 test strings julia> chomp.(sa1) ## removes just trailing \n or \r\n ("ab", "ab\r", "ab", "ab\n\r", "ab\r\n") julia> rstrip.(sa1) ## trailing [\n\r] ("ab", "ab", "ab", "ab", "ab") julia> const s2= " ab\ncd "; ## another test string julia> strip(s2), rstrip(s2), lstrip(s2) ## leaves intermittent [\n\r] ("ab\ncd", " ab\ncd", "ab\ncd ")
For more lines,
julia> const s3= " ab \n cd \r ef \r\n gh \n\r ij "; ## messy multi-line test string julia> join( strip.( split( s3, r"[\n\r]+" ) ), "\n" ) ## split lines, wipe \n\r, and rejoin w/ \n "ab\ncd\nef\ngh\nij"
julia> rpad("ab", 10), lpad("abcdefgh", 10) ## result is ten characters long ("ab ", " abcdefgh")
julia> uppercase("aBcD"), " | ", lowercase("aBcD"), " | ", titlecase("aBcD efG"), " | ", uppercasefirst("aBcD efG") ("ABCD", " | ", "abcd", " | ", "Abcd Efg", " | ", "ABcD efG") julia> titlecase( lowercase("aBcD efG") ) "Abcd Efg"
julia> all(isuppercase,"aBcD"), all(isuppercase,"ABCD"), all(isuppercase,"abcd"), all(isuppercase,"Abcd") (false, true, false, false)
The getindex notation []
can be used on String
to generate substrings (just as in arrays or tuples):
julia> const str= "abcdefg" "abcdefg" julia> str[1:3] "abc" julia> str[5:end] "efg"
Note that s[1]
is a character, while s[[1]]
or s[1:1]
is a string. To convert a character to a string, use string('c')
.
This does not work with wide characters:
julia> x="æble" "æble" julia> x[1] 'æ': Unicode U+00e6 (category Ll: Letter, lowercase) julia> x[2] ERROR: StringIndexError("æble", 2) Stacktrace: [1] string_index_err(::String, ::Int64) at ./strings/string.jl:12 [2] getindex_continued(::String, ::Int64, ::UInt32) at ./strings/string.jl:216 [3] getindex(::String, ::Int64) at ./strings/string.jl:209 [4] top-level scope at none:0
Strings in Julia are similar to arrays of character. Thus, it is possible to iterate over the characters in a String
:
julia> const s= "abcd" "abcd" julia> for ch in s; println(ch+1); end#for b c d e julia> for chi in eachindex(s); println( "$chi is $(s[chi]+1)" ); end#if 1 is b 2 is c 3 is d 4 is e
The map()
function can be more convenient when operating over every character in the String
:
julia> map(x -> x + 1, "abcd") ## operate and return as string "bcde"
See also Create a String Character by Character below.
julia> const ssa= split("1,2,3,4,5", ',') ## comma is splitting char. default is space. 5-element Array{SubString{String},1}: "1" "2" "3" "4" "5" julia> String.(ssa) ## (rarely) needed, convert from Substrings to pure String 5-element Array{String,1}: "1" "2" "3" "4" "5" julia> split("abc","") ## collect("abc") does the same thing 3-element Array{SubString{String},1}: "a" "b" "c"
Vector objects can be joined-and-stringified with inserted characters between them:
julia> join( ("1", "2", "3", "4", "5"), '|' ) ## join strings "1|2|3|4|5" julia> join( (1, 2, 3, 4, 5, 6), ", " ) ## convert ints to strings and then join "1, 2, 3, 4, 5, 6" julia> join( (1, 2, 3, 4, 5, 6), ", ", ", and " ) ## Julia's clever (optional) last argument "1, 2, 3, 4, 5, and 6"
julia> reverse("abcdefg") "gfedcba"
If you want to reverse the individual words in the String
, first split into words, reverse the string array, and recombine them:
julia> join( reverse( split("abc def ghi jkl", ' ') ), ' ' ) "jkl ghi def abc"
Most searching, replacing, etc., functions work not only with regular expressions, but also with plain strings.
julia> occursin("ab", "abcd") true julia> startswith("abcd", "ab") true julia> endswith("abcd", "ab") false
The Julia way is to use a comprehension:
julia> heystack= [ "ab1", "ab2", "cd1", "ab3", "ef5" ]; julia> filter( x -> occursin("ab", x), heystack ) ## method 1 3-element Array{String,1}: "ab1" "ab2" "ab3" julia> w= occursin.( "ab", heystack ) ## method 2 5-element BitArray{1}: true true false true false julia> heystack[ w ] 3-element Array{String,1}: "ab1" "ab2" "ab3"
The non-Julia way is to define a vector function
julia> gnep(needle,heystack::Vector)= filter( x -> occursin(needle, x), heystack ) gnep (generic function with 1 method) julia> gnep( "ab", [ "ab1", "ab2", "cd1", "ab3", "ef5" ] ) 3-element Array{String,1}: "ab1" "ab2" "ab3"
julia> s= "ab cd as cd more cd end"; julia> search( needle::AbstractString, heystack::AbstractString )= something(findfirst(heystack,needle), 0:–1); julia> search( needle::AbstractString, heystack::AbstractString, nmatch::Int )= something(findnext(heystack,needle, nmatch), 0:–1); julia> rsearch( needle::AbstractString, heystack::AbstractString )= something(findlast(heystack,needle), 0:–1); julia> rsearch( needle::AbstractString, heystack::AbstractString, nmatch::Int )= something(findprev(heystack,needle, nmatch), 0:–1); julia> search(s, "cd") ## first match -> type is UnitRange{Int64} 4:5 julia> first(search( s, "cd" )) ## first match, but just start index 4 julia> search(s, "cd", 5) ## start search beginning char #5 10:11 julia> rsearch(s, "cd") ## last match 18:19
julia> s= "ab cd as cd more cd end"; julia> findfirst(isequal('c'),s) 4 julia> findlast(isequal('c'),s) 18 julia> findfirst(isequal('0'),s) ## can be tested against 'nothing'
julia> replace("abc.txt", ".txt" => ".csv") "abc.csv"
julia> in('a', "abcabcabc") ## also works as 'a' in "abcabcabc" true julia> count( c-> (c == 'a') , "abcabcabc") ## count takes a function and an object 3
julia> matchall(r::Regex,s::AbstractString; overlap::Bool=false)= collect((m.match for m = eachmatch(r, s, overlap=overlap))); julia> length( matchall(r"ab", "abcabsdabkab") ) 4 julia> ( matchall(r"abc", "abcabsdabkab") ) 1-element Array{SubString{String},1}: "abc"
CSV us an old Microsoft Excel, but nowadays ubiquitous standard. Parsing CSV can be challenging, because string fields can contain commas themselves, and strings can but need not be quoted.
julia> s= "\"abcd\" , \"abc,d\" , \"ab,c,d\" , \"a,b,c,d\""; julia> split(s, ",") ## works only if quoted strings do not contain commas; here, yikes 10-element Array{SubString{String},1}: "\"abcd\" " " \"abc" "d\" " " \"ab" "c" "d\" " " \"a" "b" "c" "d\""
readcsv()
and readdlm()
understand csv and can be used on individual lines or on whole files (dataio)
julia> using DelimitedFiles julia> readdlm( IOBuffer("\"abcd\",\"abc,d\",\"ab,c,d\",\"a,b,c,d\""), ',' ) 1×4 Array{Any,2}: "abcd" "abc,d" "ab,c,d" "a,b,c,d"
See also File IO.
julia> filename= "/tmp/myhellostring.txt"; mytext= "hello\n\n"; julia> write(filename, mytext) ## returns # characters written 7 julia> open(filename, "w") do ofile; print(ofile, mytext); end#do# ## another way to write to file filename
julia> filename= "/tmp/myhellostring.txt" "/tmp/myhellostring.txt" julia> read(filename, String) ## reading back from the file "hello\n\n" julia> open(filename) do ifile for ln in enumerate(eachline(ifile)); println(ln); end#for# end;#do## (1, "hello") (2, "")
See [#convertingastringtoanumericarray|Converting String to Numerical Array]. Wrap the string into an IOBuffer first, and use the provided IO file-like operations. For example,
julia> readline( IOBuffer( "abc\nde\n" ) ) "abc"
Strings are read-only. Thus, it is often convenient and fast to write to an IOBuffer first, and then convert this IOBuffer into a string.
julia> using Random; Random.seed!(0); julia> destbuf= IOBuffer(); julia> for srs in randstring(50); print(destbuf, srs); end;#for ## just give me random stuff julia> s= String(take!(destbuf)) "0IPrGg0JVONTEB5dhw4LVno7ocnuaJ6CBGWN2iSNQhb3wD3AaC" julia> close(destbuf);
writedlm
can work with arrays of any type, but you must make sure that your strings do not contain the delimiting character. See below and the chapter for more file IO. For example.
julia> sar= string.( "5", [".1" ".2"; ".3" ".4"], "6" ) ## create an example string array by clever concatenation 2×2 Array{String,2}: "5.16" "5.26" "5.36" "5.46" julia> writedlm("/tmp/fourstrings1.txt", sar, ","); ## simplest way to write a string array. julia> open("/tmp/fourstrings2.txt", "w") do ofile ## an alternative writedlm(ofile, sar, ",") end;#do##
AbstractString
and concrete types such as ASCIIString
.
b"\xff"
or b"\uff hello"
and raw strings, e.g., r"the us$"
.
a=3; :($a+3)
.
StringLiterals
:[download only julia statements] The StringLiterals package is an attempt to bring a cleaner string literal syntax to Julia, as well as having an easier way of producing formatted strings, borrowing from both Python and C formatted printing syntax. It also adds support for using LaTex, Emoji, HTML, or Unicode entity names that are looked up at compile-time. Currently, it adds a Swift style string macro, f"...", which uses the Swift syntax for interpolation, i.e. \(expression).