User Tools

Site Tools


fileio

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

fileio [2018/12/27 13:27] (current)
Line 1: Line 1:
 +
 +~~CLOSETOC~~
 +
 +~~TOC 1-3 wide~~
 +
 +```juliarepl
 +julia> pkgchk.( [ "​julia"​ => v"​1.0.3"​ ] );
 +```
 +
 +# Files (IOStreams)
 +
 +Julia stores open file information in IOStream structures. ​ Common file operations (like `open()`, `print()`, `read()`, and `close()`, etc., work as expected). ​ (In turn, the IOStream structure is an IO structure, which facilitates such features as detecting whether the destination is a TTY or a file, whether it is buffered, allows random access, etc.)
 +
 +
 +Most functions have stdin, stdout, and stderr as (default) IOStreams. ​ Specifically,​ file-related functions that accept IO (or IOStreams) typically have one form in which IO is the first argument to the function. ​ Another form of the same function without the first IO argument then directs to the IO variant when the IO destination is not explicitly specified. ​ The typical default io is stdin, stdout, or stderr.
 +
 +Moreover, IOStreams can also point to memory buffers and to strings instead of files, thus allowing input-stream relatedfunctions to work from buffers/​strings instead of from files.
 +
 +For writing entire objects to various file formats, see [[fileformats]].
 +
 +
 +
 +## List of (Prominent) IOStream Reading and Writing Functions
 +
 +### Write Output Stream Functions
 +
 +The following are prominent functions which are used primarily for writing to file output:
 +
 +^ **Function** ​ ^ **Notes** ​ ^
 +| print( [os,] | universal writer. convert to UTF-8 string via show and print |
 +| println( [os,] | prints with trailing "​\n"​ |
 +| print_shortest( [os,] | tries to convert to shorter string |
 +| write( [os,] | writes a binary representation of argument |
 +| writedlm( [os,] | writes in delimited form (e.g., csv).  see [[fileformats]] |
 +| serialize( [os,] | writes a binary representation of a Julia object. ​ see [[fileformats]] |
 +| flush( [os,] | flush the output buffer now |
 +
 +
 +#### String vs. Byte Representations
 +
 +```juliarepl
 +julia> x= 7310302560386184563; ​           ## (thanks steveng)
 +
 +julia> println(x); ​                      ## convert number x to string and print
 +7310302560386184563
 +
 +julia> write(stdout,​ x); println(); ​     ## it's the byte representation!!!
 +surprise
 +```
 +
 +`print()` by default converts objects to UTF-8 strings and prints the UTF-8 string, but objects can also define their own `show()` (`showall()`) methods.
 +
 +
 +#### Colored and Boldfaced Output
 +
 +Julia also has some support for tty display and colored output, warnings, logging, etc:
 +
 +^ **Function** ​ ^ **Notes** ​ ^
 +| displaysize() | returns (rows, columns) |
 +| `print_with_color( [os,] ` | can print in color or boldface |
 +
 +#### Logging Info, Warnings, and Errors
 +
 +Julia also provides support for logging output, with `info()`, `warn()`, and `error()`, which print in highlight when on a TTY.  To set a different default destination,​ use the `logging()` function.
 +
 +#### Many Other Julia Functions That Print
 +
 +Many common functions, like `versioninfo()`,​ `whos()`, `join()`, `parse()`, `dump()`, `get()`, `getinfo()`,​ `showerror()` etc., are not principally file writing but terminal display operations. ​ Because they are implemented based on IOStreams, they can also write to any IOStream file instead of the terminal.
 +
 +
 +
 +### Read Input Stream Functions
 +
 +^ **Function** ​ ^ **Notes** ​ ^
 +| countlines( [is,] | obvious |
 +| eachline( [is,]   | each line from stream iterator |
 +| readdlm( [os,] | reads delimited form files (e.g., csv).  see [[fileformats]] |
 +| deserialize( [is,] | reads a binary representation of a Julia object back |
 +| read[!}( [is,]| read bytes (UInt8). ​ argument :all=false means only one read call, no wait for eof |
 +| readbytes(!)( [is,] | like read, but with number of bytes to be read |
 +| readline( [is,]   | reads one line into string ​ |
 +| readlines( [is,]  | reads multiple lines into array  |
 +| readstring( [is,] | reads entire file into string |
 +| readuntil( [is,] | read until a terminating string is found |
 +
 +Reading beyond the end of a file typically does *not* throw an exception.
 +
 +
 +### Read-Write Random Access Functions
 +
 +^ **Function** ​ ^ **Notes** ​ ^
 +| seek(io, | position the pointer |
 +| flush(io, | flush the buffer now |
 +| seekstart(io,​ | go to start |
 +| ismarked(io,​ | does stream have a marked position? |
 +| reset(io, | go back to marked position|
 +| mark(io, | mark the current position |
 +| unmark(io, | and remove the mark |
 +
 +
 +
 +
 +
 +
 +## Open Files
 +
 +### Opening, Operating on, and Closing Files (IOStreams)
 +
 +There is more than one way to do this:
 +
 +1. `ostream= open(name, mode); ​ operate(ostream);​ close(ostream);​`
 +
 +2. `open(name, mode) do ostream; ​ operate(ostream);​ end;#do##`
 +
 +3. `open( fout->​operate(fout),​ name, mode );`
 +
 +4. `operate( open(name, mode) )`;  WARNING If operate is for **write** operations, you must run `GC.gc()` to force the flush and close.
 +
 +Examples follow.
 +
 +
 +#### Sequential Open and Close (with Explicit Close)
 +
 +* Writing
 +
 +```juliarepl
 +julia> ofile= open("/​tmp/​fileio.tmp",​ "​w"​)
 +IOStream(<​file /​tmp/​fileio.tmp>​)
 +
 +julia> println(ofile,​ "​fileio example, plain open and close\nLine 2"​) ​  ## prints string to fout
 +
 +julia> close(ofile)
 +```
 +
 +* Reading
 +
 +```juliarepl
 +julia> ifile= open("/​tmp/​fileio.tmp"​) ​       ## "​r"​ is the default
 +IOStream(<​file /​tmp/​fileio.tmp>​)
 +
 +julia> readline(ifile) ​                      ## reads one \n-terminated line from fin into a string
 +"​fileio example, plain open and close"
 +
 +julia> close(ifile) ​                         ## superfluous,​ because Julia will close stream at fin deallocation
 +```
 +
 +
 +#### IO Loops and Open-File Mode Information (with AutoClose)
 +
 +A `do...end` construct guarantees that the file will be closed when done:
 +
 +
 +* Writing
 +
 +```juliarepl
 +julia> ​ open("/​tmp/​fileio.tmp",​ "​w"​) do ofile;
 +     println(ofile,​ "​fileio opendo example"​);​
 + end#do##
 +```
 +
 +* Reading
 +
 +```juliarepl
 +julia> ​ open("/​tmp/​fileio.tmp"​) do ifile;
 +     readline(ifile);​
 + end#do##
 +"​fileio opendo example"​
 +```
 +
 +
 +#### Embedded Function (with AutoClose)
 +
 +`Open()` can accept a function as its first argument. ​ Julia opens the file, runs the function, and then closes the file.  Very clean and compact for short input or output:
 +
 +* Writing
 +
 +```juliarepl
 +julia> open( ofile->​println(ofile,​ "​fileio embedded function example"​), ​  "/​tmp/​fileio.tmp",​ "​w"​ )
 +```
 +
 +* Reading
 +
 +```juliarepl
 +julia> open( ifile->​println("​readback:​ '",​ readline(ifile)), ​  "/​tmp/​fileio.tmp"​ )
 +readback: '​fileio embedded function example
 +```
 +
 +
 +
 +#### One-Off File Operations (with Lazy AutoClose)
 +
 +Julia closes IOStreams at memory destruction time, which in turn happens at garbage collection time.  This can be (ab-)used when there is only one IOStream operation to be performed, as follows:
 +
 +* Writing
 +
 +```juliarepl
 +julia> println( open("/​tmp/​fileio.tmp",​ "​w"​),​ "​fileio lazy example"​) ​  ## close at IOStream garbage collect
 +
 +julia> GC.gc() ​          ## force flush and close, rather than just wait for nature to take its course
 +```
 +
 +* Reading
 +
 +```juliarepl
 +julia> readline(open("/​tmp/​fileio.tmp"​)) ​     ## don't worry. ​ flush for read is superfluous.
 +"​fileio lazy example"​
 +```
 +
 +
 +#### (Some) Functions With Integrated Filename Arguments
 +
 +Some functions (like `eachline()`,​ `read()`, `readline()`,​ `readuntil()`,​ and `write()`) can take a filename as their first argument. ​ They will open, operate on, and then close the file themselves. ​ For example,
 +
 +```juliarepl
 +julia> write("/​tmp/​fileio.tmp",​ "this is the time for all good men..."​) ​           ## returns number of bytes written
 +36
 +
 +julia> read("/​tmp/​fileio.tmp",​ String)
 +"this is the time for all good men..."​
 +
 +```
 +
 +
 +
 +### Learning Filename Of An Open IOStream
 +
 +```juliarepl
 +julia> f= open("/​etc/​passwd"​);​ f.name
 +"<​file /​etc/​passwd>"​
 +
 +julia> replace(f.name,​ r"​^<​file (.*)\>​$"​ => s"​\1"​)
 +"/​etc/​passwd"​
 +```
 +
 +
 +### Learning File Status of an Open IOStream
 +
 +```juliarepl
 +julia> open("/​etc/​passwd","​r"​) do ifile;
 +   println( "/​etc/​passwd:​ eof=$(eof(ifile)). ", "​isopen=$(isopen(ifile)). ",
 +                  "​isreadonly=$(isreadonly(ifile)). ", "​iswritable=$(iswritable(ifile))."​ )
 +       ​end;#​do##​
 +/​etc/​passwd:​ eof=false. isopen=true. isreadonly=true. iswritable=false.
 +```
 +
 +
 +
 +
 +### Strings (instead of Files) as IOStreams
 +
 +#### Reading from a String instead of an IOStream (File)
 +
 +```juliarepl
 +julia> reallyfromastring= IOBuffer("​I am a string masquerading as a file (well, IOBuffer, not IOString)"​);​
 +
 +julia> read( reallyfromastring,​ String )
 +"I am a string masquerading as a file (well, IOBuffer, not IOString)"​
 +```
 +
 +#### Writing to a String instead of an IOStream (File)
 +
 +In Julia, strings are read-only objects, so it is not possible to write to them.  Ideally, you construct a string in one operation. ​ For example:
 +
 +```juliarepl
 +julia> using Printf
 +
 +julia> s= string( @sprintf("​%5.5f",​ 12.0), " AND ", @sprintf("​%5d",​ 13) );  ## in effect "print to s"
 +
 +julia> println( s )
 +12.00000 AND    13
 +
 +```
 +
 +
 +* If you need to build up a string with many append operations, you can write to an in-memory IOStream, and then assign it to a string (like `s= take!(ios)`) before you close the stream. ​ This is the next exampe.
 +
 +* If you need to write and rearrange the contents in a random access manner, use a character buffer instead.
 +
 +
 +#### Writing a Function that Can Operate on an IOBuffer or IOStream
 +
 +```juliarepl
 +julia> printme( io::IO )= println( "​printme:​ $(read(io, String))"​ );
 +
 +julia> printme( IOBuffer("​my info\nis here\n"​) );
 +printme: my info
 +is here
 +
 +```
 +
 +
 +
 +### Reading Lines Containing String X in Field N
 +
 +Assuming that you have a trivial csv file (no embedded commas):
 +
 +```julianorepl
 +
 +julia> const SEP= ","​
 +
 +julia> linematches(heystack::​String,​ needle::​String,​ field::​Int)= ​ (occursin( split( heystack, SEP )[field], needle ))
 +
 +julia> linematches(heystack::​String,​ needle::​Real,​ field::​Int)= ​ (parse( split( heystack, SEP )[field] ) == needle)
 +
 +julia> abc_on3_lines= linematches( readlines( open("​filename.csv"​) ), "​abc",​ 3 )   ## an array of strings, with '​abc'​s in field 3.
 +
 +```
 +
 +* For real-world use, instead rely on [CSV.jl](https://​juliadata.github.io/​CSV.jl).
 +
 +
 +### Memory Buffers (instead of Files) as IOStreams
 +
 +```juliarepl
 +julia> iobuf= IOBuffer() ​         ## We can write-modify this easily
 +IOBuffer(data=UInt8[...],​ readable=true,​ writable=true,​ seekable=true,​ append=false,​ size=0, maxsize=Inf,​ ptr=1, mark=-1)
 +
 +julia> print(iobuf,​ "we are writing to a buffer!\n"​)
 +
 +julia> String( take!(iobuf) )      ## could be assigned to string now.  take! would empty iobuf
 +"we are writing to a buffer!\n"​
 +
 +julia> close(iobuf)
 +
 +```
 +
 +
 +### Random File Access
 +
 +```juliarepl
 +julia> write("/​tmp/​zeroseek.bin",​ zeros(10)) ​        ## each Float64 is 8 bytes long
 +80
 +
 +julia> myrwfile= open("/​tmp/​zeroseek.bin",​ "​r+"​)
 +IOStream(<​file /​tmp/​zeroseek.bin>​)
 +
 +julia> seek(myrwfile,​ 50)
 +IOStream(<​file /​tmp/​zeroseek.bin>​)
 +
 +julia> position(myrwfile)
 +50
 +
 +julia> write(myrwfile,​ '​X'​)
 +1
 +
 +julia> write(myrwfile,​ '​Y'​)
 +1
 +
 +julia> seek(myrwfile,​ 50)
 +IOStream(<​file /​tmp/​zeroseek.bin>​)
 +
 +julia> x= read(myrwfile,​ 3)
 +3-element Array{UInt8,​1}:​
 + 0x58
 + 0x59
 + 0x00
 +
 +julia> convert( Vector{Char},​ x )
 +3-element Array{Char,​1}:​
 + '​X'​
 + '​Y'​
 + '​\0'​
 +
 +julia> seek(myrwfile,​ 50)
 +IOStream(<​file /​tmp/​zeroseek.bin>​)
 +
 +julia> skipchars( x->(x == '​X'​),​ myrwfile )     ## skip over all '​X'​ characters
 +IOStream(<​file /​tmp/​zeroseek.bin>​)
 +
 +julia> ( position(myrwfile), ​ Char( read(myrwfile,​ 1)[1] ) )
 +(51, '​Y'​)
 +
 +julia> close(myrwfile)
 +
 +```
 +
 +* `skipchars()` can also skip over remainders of lines after a comment character
 +
 +
 +
 +### Reading from stdin, Writing to stdout
 +
 +File output operations default to stdout. ​ File Input operations default to stdin. ​ Otherwise, the user provides the IOStream.
 +
 +```juliarepl
 +julia> write( stdout, "​hello\n"​ )
 +hello
 +6
 +```
 +
 +
 +
 +### Redirecting stdin, stdout, or stderr
 +
 +```juliarepl
 +julia> ofile= open("/​tmp/​redirected.txt",​ "​w"​)
 +IOStream(<​file /​tmp/​redirected.txt>​)
 +
 +julia> redirect_stdout(ofile)
 +IOStream(<​file /​tmp/​redirected.txt>​)
 +
 +julia> write( stdout, "my text has been diverted.\n"​ )
 +27
 +
 +julia> close(ofile)
 +
 +julia> read("/​tmp/​redirected.txt",​ String)
 +"my text has been diverted.\n"​
 +```
 +
 +There are equivalent `redirect_stderr()` and `redirect_stdin()` functions.
 +
 +
 +
 +
 +## Reading and Writing Content to Files
 +
 +### Reading (Entire) File Contents into A String ​ ("​Slurping"​)
 +
 +```juliarepl
 +julia> mytext= "hello A\nhello B\n";
 +
 +julia> write("/​tmp/​string.txt",​ mytext) ​                         ## returns num characters written
 +16
 +
 +julia> read("/​tmp/​string.txt",​ String) == mytext
 +true
 +```
 +
 +#### Reading Only Until Terminating String is Found
 +
 +```juliarepl
 +julia> readuntil("/​tmp/​string.txt",​ "​A"​)
 +"hello "
 +
 +```
 +
 +
 +### Reading and Writing (Line-Separated) File Contents into A String Array ("​Slurping"​)
 +
 +```juliarepl
 +julia> mylines= [ "​a",​ "​b",​ "​c",​ "​d"​ ];
 +
 +julia> write( "/​tmp/​manylines.txt",​ join(mylines,​ "​\n"​ )*"​\n"​) ​    ## there is no writeline().
 +8
 +
 +julia> mylines == readlines("/​tmp/​manylines.txt"​) ​               ## default: chomp=true
 +true
 +
 +julia> for lncontent=eachline("/​tmp/​manylines.txt"​);​ println(lncontent);​ end#for
 +a
 +b
 +c
 +d
 +```
 +
 +#### Counting Lines in Files
 +
 +```juliarepl
 +julia> countlines("/​tmp/​manylines.txt"​)
 +4
 +```
 +
 +
 +
 +
 +### Personalized Show Objects
 +
 +FIXME Show how to write a personalized `Show` functions. ​ For own types, users can define `show(stream,​ ::​MIME"​something",​ x::myType)= ...` to output their own type objects. ​ Moreover, `show(io, x)` can be used for other displays, too.
 +
 +
 +
 +
 +### Personalized Representation and Printing with Show()
 +
 +```juliarepl
 +julia> struct FFF; fv::​Float64;​ end#struct
 +
 +julia> Base.show(io::​IO,​ f::FFF)= print(io, "Your FFF type fv value is $(f.fv)"​)
 +
 +julia> f= FFF(2.0)
 +Your FFF type fv value is 2.0
 +
 +
 +```
 +
 +
 +
 +## Guaranteeing Atomic Appends (With Parallel Writes)
 +
 +True atomic writes are unfortunately *not* a native feature of Julia. ​ Unfortunately,​ `lock()` and `unlock()` are only advisory reentrant but not threadsafe.
 +
 +
 +### Thread-Safe Atomic File Appends
 +
 +OS-type file locking is unfortunately different from OS to OS.  For shared deposits of data by many simultaneous slave threads of the same Julia process, you can use use something like
 +
 +```juliafix
 +z = [IOBuffer() for i in 1:10] ##​@everywhere?​
 +write(z[thread_id?​],​ "​somedata"​);​
 +```
 +
 +FIXME fix the IOBuffer synchronized writing example
 +
 +
 +
 +### Operating-System and Process Safe Atomic File Appends
 +
 +If there are multiple programs that all want to append their results to a file, and all agree to write only via `safeappend`,​ then it should work:
 +
 +
 +```julia
 +using FileWatching
 +
 +function safeappend(filename::​AbstractString,​ content::​AbstractString;​ MAXWAITNUM::​Int=500,​ LockAppend="​.jlock"​)::​Nothing
 +
 +    lockname= "​$filename$LockAppend";​ waitnum= 0;
 +    while (stat(lockname).inode != 0)
 +        waitnum+= 1;
 +        @assert( waitnum < MAXWAITNUM, "​$lockname was held by another process for $MAXWAITNUM timeunits."​ )
 + watch_file(lockname,​ 0.05)     ## blocks until file reappears, timeout .05 seconds
 +    end#while#
 +
 +    ## very short race condition.
 +    touch( lockname ) ## write an empty lock file.
 +
 +    fo= open( filename, "​a"​ )
 +    println(fo, content) ##​ safeappend is enforcing eoln!
 +    close(fo);
 +
 +    rm(lockname)
 +
 +end#​function#​
 +
 +```
 +
 +* For even safer versions:
 +
 +  - the lock could contain a random code, and safeappend checks that the random code is its own before it appends to the file.
 +
 +  - if some other processes may be writing to the file without safeappend (e.g., /​var/​log/​syslog),​ the file can first be copied to a different name, then appended, then moved back--but only if no other file of the same name has appeared. ​ If another file has appeared, then the changes should be discarded and tried all over again. ​ This is an expensive way to handle the problem that is best left only for absolutely necessary cases.
 +
 +
 +
 +#### Performance of Quasi-Locks using Different File Systems
 +
 +The following are not comparable across rows, only within row.
 +
 +^ **OS** ​     ^  mv-to mv-back ^  touch rm ^  symlink rm ^  +glob(*) ^ Computer ​ ^
 +| macOS HFS+   ​| ​  6.3 |  8.8 |  53.3 |   3.1 | 4.2GHz i7-7700 ​ |
 +| macOS APFS   ​| ​ 11.1 |  7.7 |   6.4 |  21.7 | 3.2GHz Xeon W  |
 +| linux ext4   ​| ​  1.1 |  1.5 |   1.1 |   2.0 | VM on VM, 3.2GHz i5-6500 ​ |
 +| Windows ?     ​| ​   |     |
 +
 +
 +* Do not use symlinks on HFS+.  Fortunately,​ HFS+ has been deprecated in the latest version of macos.
 +
 +* Do not use mv-to-mv-back on APFS.  Instead, use the symlink version.
 +
 +* The linux ext4 filesystem is nearly an order of magnitude faster than the macos file systems.
 +
 +
 +```julia
 +using BenchmarkTools,​ Glob; 
 +run(`touch rw.txt`)
 +@btime begin; for i=1:100000; mv("​bmrw.txt",​ "​bmrwb.txt"​);​ mv("​bmrwb.txt",​ "​bmrw.txt"​);​ end; end##rename
 +@btime begin; for i=1:100000; touch("​bmrwb.txt"​);​ rm("​bmrwb.txt"​);​ end; end##create and destroy lockfile
 +@btime begin; for i=1:100000; symlink("​bmrw.txt",​ "​bmrwb.txt"​);​ rm("​bmrwb.txt"​);​ end; end## create and destroy locksym
 +@btime begin; for i=1:100000; glob("​bmrw*.txt"​);​ end; end##wait until the .lock disappears
 +run(`rm bmrw*.txt`)
 +```
 +
 +
 +# Backmatter
 +
 +## Useful Packages on Julia Repository
 +
 +## Notes
 +
 +* There is also `mmap()` that makes it possible to designate memory as an IOStream.
 +
 +
 +## References
 +
 +https://​en.wikibooks.org/​wiki/​Introducing_Julia/​Working_with_text_files#​Writing_to_files
 +
 +[Input/​Output](https://​docs.julialang.org/​en/​stable/​stdlib/​io-network/#​General-I/​O-1)
 +
 +
  
fileio.txt ยท Last modified: 2018/12/27 13:27 (external edit)