User Tools

Site Tools


fileio
snippet.juliarepl
julia> pkgchk( [ "julia" => v"1.0.2" ] )

Files (IOStreams)

Julia stores open file information in IOStream structures. Common file operations (like open(), print(), read(), and close(), etc., work as expected). (In turn, the IOStream structure is an IO structure, which facilitates such features as detecting whether the destination is a TTY or a file, whether it is buffered, allows random access, etc.)

Most functions have stdin, stdout, and stderr as (default) IOStreams. Specifically, file-related functions that accept IO (or IOStreams) typically have one form in which IO is the first argument to the function. Another form of the same function without the first IO argument then directs to the IO variant when the IO destination is not explicitly specified. The typical default io is stdin, stdout, or stderr.

Moreover, IOStreams can also point to memory buffers and to strings instead of files, thus allowing input-stream relatedfunctions to work from buffers/strings instead of from files.

For writing entire objects to various file formats, see fileformats.

List of (Prominent) IOStream Reading and Writing Functions

Write Output Stream Functions

The following are prominent functions which are used primarily for writing to file output:

Function Notes
print( [os,] universal writer. convert to UTF-8 string via show and print
println( [os,] prints with trailing “\n”
print_shortest( [os,] tries to convert to shorter string
write( [os,] writes a binary representation of argument
writedlm( [os,] writes in delimited form (e.g., csv). see fileformats
serialize( [os,] writes a binary representation of a Julia object. see fileformats
flush( [os,] flush the output buffer now

String vs. Byte Representations

snippet.juliarepl
julia> x= 7310302560386184563;            ## (thanks steveng)

julia> println(x);                       ## convert number x to string and print
7310302560386184563

julia> write(stdout, x); println();      ## it's the byte representation!!!
surprise

print() by default converts objects to UTF-8 strings and prints the UTF-8 string, but objects can also define their own show() (showall()) methods.

Colored and Boldfaced Output

Julia also has some support for tty display and colored output, warnings, logging, etc:

Function Notes
displaysize() returns (rows, columns)
print_with_color( [os,] can print in color or boldface

Logging Info, Warnings, and Errors

Julia also provides support for logging output, with info(), warn(), and error(), which print in highlight when on a TTY. To set a different default destination, use the logging() function.

Many Other Julia Functions That Print

Many common functions, like versioninfo(), whos(), join(), parse(), dump(), get(), getinfo(), showerror() etc., are not principally file writing but terminal display operations. Because they are implemented based on IOStreams, they can also write to any IOStream file instead of the terminal.

Read Input Stream Functions

Function Notes
countlines( [is,] obvious
eachline( [is,] each line from stream iterator
readdlm( [os,] reads delimited form files (e.g., csv). see fileformats
deserialize( [is,] reads a binary representation of a Julia object back
read[!}( [is,] read bytes (UInt8). argument :all=false means only one read call, no wait for eof
readbytes(!)( [is,] like read, but with number of bytes to be read
readline( [is,] reads one line into string
readlines( [is,] reads multiple lines into array
readstring( [is,] reads entire file into string
readuntil( [is,] read until a terminating string is found

Reading beyond the end of a file typically does not throw an exception.

Read-Write Random Access Functions

Function Notes
seek(io, position the pointer
flush(io, flush the buffer now
seekstart(io, go to start
ismarked(io, does stream have a marked position?
reset(io, go back to marked position
mark(io, mark the current position
unmark(io, and remove the mark

Open Files

Opening, Operating on, and Closing Files (IOStreams)

There is more than one way to do this:

  1. ostream= open(name, mode); operate(ostream); close(ostream);
  2. open(name, mode) do ostream; operate(ostream); end;#do##
  3. open( fout->operate(fout), name, mode );
  4. operate( open(name, mode) ); WARNING If operate is for write operations, you must run GC.gc() to force the flush and close.

Examples follow.

Sequential Open and Close (with Explicit Close)

  • Writing
snippet.juliarepl
julia> ofile= open("/tmp/fileio.tmp", "w")
IOStream(.tmp>)

julia> println(ofile, "fileio example, plain open and close\nLine 2")   ## prints string to fout

julia> close(ofile)
  • Reading
snippet.juliarepl
julia> ifile= open("/tmp/fileio.tmp")        ## "r" is the default
IOStream(.tmp>)

julia> readline(ifile)                       ## reads one \n-terminated line from fin into a string
"fileio example, plain open and close"

julia> close(ifile)                          ## superfluous, because Julia will close stream at fin deallocation

IO Loops and Open-File Mode Information (with AutoClose)

A do...end construct guarantees that the file will be closed when done:

  • Writing
snippet.juliarepl
julia>  open("/tmp/fileio.tmp", "w") do ofile;
	    println(ofile, "fileio opendo example");
	end#do##
  • Reading
snippet.juliarepl
julia>  open("/tmp/fileio.tmp") do ifile;
	    readline(ifile);
	end#do##
"fileio opendo example"

Embedded Function (with AutoClose)

Open() can accept a function as its first argument. Julia opens the file, runs the function, and then closes the file. Very clean and compact for short input or output:

  • Writing
snippet.juliarepl
julia> open( ofile->println(ofile, "fileio embedded function example"),   "/tmp/fileio.tmp", "w" )
  • Reading
snippet.juliarepl
julia> open( ifile->println("readback: '", readline(ifile)),   "/tmp/fileio.tmp" )
readback: 'fileio embedded function example

One-Off File Operations (with Lazy AutoClose)

Julia closes IOStreams at memory destruction time, which in turn happens at garbage collection time. This can be (ab-)used when there is only one IOStream operation to be performed, as follows:

  • Writing
snippet.juliarepl
julia> println( open("/tmp/fileio.tmp", "w"), "fileio lazy example")   ## close at IOStream garbage collect

julia> GC.gc()           ## force flush and close, rather than just wait for nature to take its course
  • Reading
snippet.juliarepl
julia> readline(open("/tmp/fileio.tmp"))      ## don't worry.  flush for read is superfluous.
"fileio lazy example"

(Some) Functions With Integrated Filename Arguments

Some functions (like eachline(), read(), readline(), readuntil(), and write()) can take a filename as their first argument. They will open, operate on, and then close the file themselves. For example,

snippet.juliarepl
julia> write("/tmp/fileio.tmp", "this is the time for all good men...")            ## returns number of bytes written
36

julia> read("/tmp/fileio.tmp", String)
"this is the time for all good men..."

Learning Filename Of An Open IOStream

snippet.juliarepl
julia> f= open("/etc/passwd"); f.name
""

julia> replace(f.name, r"^(.*)\>$" => s"\1")
"/etc/passwd"

Learning File Status of an Open IOStream

snippet.juliarepl
julia> open("/etc/passwd","r") do ifile;
	  println( "/etc/passwd: eof=$(eof(ifile)). ", "isopen=$(isopen(ifile)). ",
                  "isreadonly=$(isreadonly(ifile)). ", "iswritable=$(iswritable(ifile))." )
       end;#do##
/etc/passwd: eof=false. isopen=true. isreadonly=true. iswritable=false.

Strings (instead of Files) as IOStreams

Reading from a String instead of an IOStream (File)

snippet.juliarepl
julia> reallyfromastring= IOBuffer("I am a string masquerading as a file (well, IOBuffer, not IOString)");

julia> read( reallyfromastring, String )
"I am a string masquerading as a file (well, IOBuffer, not IOString)"

Writing to a String instead of an IOStream (File)

In Julia, strings are read-only objects, so it is not possible to write to them. Ideally, you construct a string in one operation. For example:

snippet.juliarepl
julia> using Printf

julia> s= string( @sprintf("%5.5f", 12.0), " AND ", @sprintf("%5d", 13) );  ## in effect "print to s"

julia> println( s )
12.00000 AND    13
  • If you need to build up a string with many append operations, you can write to an in-memory IOStream, and then assign it to a string (like s= take!(ios)) before you close the stream. This is the next exampe.
  • If you need to write and rearrange the contents in a random access manner, use a character buffer instead.

Writing a Function that Can Operate on an IOBuffer or IOStream

snippet.juliarepl
julia> printme( io::IO )= println( "printme: $(read(io, String))" );

julia> printme( IOBuffer("my info\nis here\n") );
printme: my info
is here

Reading Lines Containing String X in Field N

Assuming that you have a trivial csv file (no embedded commas):

snippet.julianorepl
[download only julia statements]
julia> const SEP= ","
 
julia> linematches(heystack::String, needle::String, field::Int)=  (occursin( split( heystack, SEP )[field], needle ))
 
julia> linematches(heystack::String, needle::Real, field::Int)=  (parse( split( heystack, SEP )[field] ) == needle)
 
julia> abc_on3_lines= linematches( readlines( open("filename.csv") ), "abc", 3 )   ## an array of strings, with 'abc's in field 3.
  • For real-world use, instead rely on CSV.jl.

Memory Buffers (instead of Files) as IOStreams

snippet.juliarepl
julia> iobuf= IOBuffer()          ## We can write-modify this easily
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=–1)

julia> print(iobuf, "we are writing to a buffer!\n")

julia> String( take!(iobuf) )      ## could be assigned to string now.  take! would empty iobuf
"we are writing to a buffer!\n"

julia> close(iobuf)

Random File Access

snippet.juliarepl
julia> write("/tmp/zeroseek.bin", zeros(10))         ## each Float64 is 8 bytes long
80

julia> myrwfile= open("/tmp/zeroseek.bin", "r+")
IOStream(.bin>)

julia> seek(myrwfile, 50)
IOStream(.bin>)

julia> position(myrwfile)
50

julia> write(myrwfile, 'X')
1

julia> write(myrwfile, 'Y')
1

julia> seek(myrwfile, 50)
IOStream(.bin>)

julia> x= read(myrwfile, 3)
3-element Array{UInt8,1}:
 0x58
 0x59
 0x00

julia> convert( Vector{Char}, x )
3-element Array{Char,1}:
 'X'
 'Y'
 '\0'

julia> seek(myrwfile, 50)
IOStream(.bin>)

julia> skipchars( x->(x == 'X'), myrwfile )     ## skip over all 'X' characters
IOStream(.bin>)

julia> ( position(myrwfile),  Char( read(myrwfile, 1)[1] ) )
(51, 'Y')

julia> close(myrwfile)
  • skipchars() can also skip over remainders of lines after a comment character

Reading from stdin, Writing to stdout

File output operations default to stdout. File Input operations default to stdin. Otherwise, the user provides the IOStream.

snippet.juliarepl
julia> write( stdout, "hello\n" )
hello
6

Redirecting stdin, stdout, or stderr

snippet.juliarepl
julia> ofile= open("/tmp/redirected.txt", "w")
IOStream(.txt>)

julia> redirect_stdout(ofile)
IOStream(.txt>)

julia> write( stdout, "my text has been diverted.\n" )
27

julia> close(ofile)

julia> read("/tmp/redirected.txt", String)
"my text has been diverted.\n"

There are equivalent redirect_stderr() and redirect_stdin() functions.

Reading and Writing Content to Files

Reading (Entire) File Contents into A String ("Slurping")

snippet.juliarepl
julia> mytext= "hello A\nhello B\n";

julia> write("/tmp/string.txt", mytext)                          ## returns num characters written
16

julia> read("/tmp/string.txt", String) == mytext
true

Reading Only Until Terminating String is Found

snippet.juliarepl
julia> readuntil("/tmp/string.txt", "A")
"hello "

Reading and Writing (Line-Separated) File Contents into A String Array ("Slurping")

snippet.juliarepl
julia> mylines= [ "a", "b", "c", "d" ];

julia> write( "/tmp/manylines.txt", join(mylines, "\n" )*"\n")     ## there is no writeline().
8

julia> mylines == readlines("/tmp/manylines.txt")                ## default: chomp=true
true

julia> for lncontent=eachline("/tmp/manylines.txt"); println(lncontent); end#for
a
b
c
d

Counting Lines in Files

snippet.juliarepl
julia> countlines("/tmp/manylines.txt")
4

Personalized Show Objects

FIXME Show how to write a personalized Show functions. For own types, users can define show(stream, ::MIME"something", x::myType)= ... to output their own type objects. Moreover, show(io, x) can be used for other displays, too.

Personalized Representation and Printing with Show()

snippet.juliarepl
julia> struct FFF; fv::Float64; end#struct

julia> Base.show(io::IO, f::FFF)= print(io, "Your FFF type fv value is $(f.fv)")

julia> f= FFF(2.0)
Your FFF type fv value is 2.0

Guaranteeing Atomic Appends (With Parallel Writes)

True atomic writes are unfortunately not a native feature of Julia. Unfortunately, lock() and unlock() are only advisory reentrant but not threadsafe.

Thread-Safe Atomic File Appends

OS-type file locking is unfortunately different from OS to OS. For shared deposits of data by many simultaneous slave threads of the same Julia process, you can use use something like

snippet.juliafix
[download only julia statements]
z = [IOBuffer() for i in 1:10] ##@everywhere?
write(z[thread_id?], "somedata");

FIXME fix the IOBuffer synchronized writing example

Operating-System and Process Safe Atomic File Appends

If there are multiple programs that all want to append their results to a file, and all agree to write only via safeappend, then it should work:

snippet.julia
using FileWatching

function safeappend(filename::AbstractString, content::AbstractString; MAXWAITNUM::Int=500, LockAppend=".jlock")::Nothing

    lockname= "$filename$LockAppend"; waitnum= 0;
    while (stat(lockname).inode != 0)
        waitnum+= 1;
        @assert( waitnum < MAXWAITNUM, "$lockname was held by another process for $MAXWAITNUM timeunits." )
	watch_file(lockname, 0.05)     ## blocks until file reappears, timeout .05 seconds
    end#while#

    ## very short race condition.
    touch( lockname )			## write an empty lock file.

    fo= open( filename, "a" )
    println(fo, content)		## safeappend is enforcing eoln!
    close(fo);

    rm(lockname)

end#function#
  • For even safer versions:
    • the lock could contain a random code, and safeappend checks that the random code is its own before it appends to the file.
    • if some other processes may be writing to the file without safeappend (e.g., /var/log/syslog), the file can first be copied to a different name, then appended, then moved back–but only if no other file of the same name has appeared. If another file has appeared, then the changes should be discarded and tried all over again. This is an expensive way to handle the problem that is best left only for absolutely necessary cases.

Performance of Quasi-Locks using Different File Systems

The following are not comparable across rows, only within row.

OS mv-to mv-back touch rm symlink rm +glob(*) Computer
macOS HFS+ 6.3 8.8 53.3 3.1 4.2GHz i7-7700
macOS APFS 11.1 7.7 6.4 21.7 3.2GHz Xeon W
linux ext4 1.1 1.5 1.1 2.0 VM on VM, 3.2GHz i5-6500
Windows ?
  • Do not use symlinks on HFS+. Fortunately, HFS+ has been deprecated in the latest version of macos.
  • Do not use mv-to-mv-back on APFS. Instead, use the symlink version.
  • The linux ext4 filesystem is nearly an order of magnitude faster than the macos file systems.
snippet.julia
using BenchmarkTools, Glob; 
run(`touch rw.txt`)
@btime begin; for i=1:100000; mv("bmrw.txt", "bmrwb.txt"); mv("bmrwb.txt", "bmrw.txt"); end; end##rename
@btime begin; for i=1:100000; touch("bmrwb.txt"); rm("bmrwb.txt"); end; end##create and destroy lockfile
@btime begin; for i=1:100000; symlink("bmrw.txt", "bmrwb.txt"); rm("bmrwb.txt"); end; end## create and destroy locksym
@btime begin; for i=1:100000; glob("bmrw*.txt"); end; end##wait until the .lock disappears
run(`rm bmrw*.txt`)

Backmatter

Useful Packages on Julia Repository

Notes

  • There is also mmap() that makes it possible to designate memory as an IOStream.

References

fileio.txt · Last modified: 2018/11/22 20:48 (external edit)