This first edition was written for Lua 5.0. While still largely relevant for later versions, there are some differences.
The third edition targets Lua 5.2 and is available at Amazon and other bookstores.
By buying the book, you also help to support the Lua project.


21.2.2 – Binary Files

The simple model functions io.input and io.output always open a file in text mode (the default). In Unix, there is no difference between binary files and text files. But in some systems, notably Windows, binary files must be opened with a special flag. To handle such binary files, you must use io.open, with the letter `b´ in the mode string.

Binary data in Lua are handled similarly to text. A string in Lua may contain any bytes and almost all functions in the libraries can handle arbitrary bytes. (You can even do pattern matching over binary data, as long as the pattern does not contain a zero byte. If you want to match the byte zero, you can use the class %z instead.)

Typically, you read binary data either with the *all pattern, that reads the whole file, or with the pattern n, that reads n bytes. As a simple example, the following program converts a text file from DOS format to Unix format (that is, it translates sequences of carriage return-newlines to newlines). It does not use the standard I/O files (stdin/stdout), because those files are open in text mode. Instead, it assumes that the names of the input file and the output file are given as arguments to the program:

    local inp = assert(io.open(arg[1], "rb"))
    local out = assert(io.open(arg[2], "wb"))
    
    local data = inp:read("*all")
    data = string.gsub(data, "\r\n", "\n")
    out:write(data)
    
    assert(out:close())
You can call this program with the following command line:
    > lua prog.lua file.dos file.unix

As another example, the following program prints all strings found in a binary file. The program assumes that a string is any zero-terminated sequence of six or more valid characters, where a valid character is any character accepted by the pattern validchars. In our example, that comprises the alphanumeric, the punctuation, and the space characters. We use concatenation and string.rep to create a pattern that captures all sequences of six or more validchars. The %z at the end of the pattern matches the byte zero at the end of a string.

    local f = assert(io.open(arg[1], "rb"))
    local data = f:read("*all")
    local validchars = "[%w%p%s]"
    local pattern = string.rep(validchars, 6) .. "+%z"
    for w in string.gfind(data, pattern) do
      print(w)
    end

As a last example, the following program makes a dump of a binary file. Again, the first program argument is the input file name; the output goes to the standard output. The program reads the file in chunks of 10 bytes. For each chunk, it writes the hexadecimal representation of each byte, and then it writes the chunk as text, changing control characters to dots.

    local f = assert(io.open(arg[1], "rb"))
    local block = 10
    while true do
      local bytes = f:read(block)
      if not bytes then break end
      for b in string.gfind(bytes, ".") do
        io.write(string.format("%02X ", string.byte(b)))
      end
      io.write(string.rep("   ", block - string.len(bytes) + 1))
      io.write(string.gsub(bytes, "%c", "."), "\n")
    end
Suppose we store that program in a file named vip; if we apply the program to itself, with the call
    prompt> lua vip vip
it will produce an output like this (in a Unix machine):
    6C 6F 63 61 6C 20 66 20 3D 20    local f = 
    61 73 73 65 72 74 28 69 6F 2E    assert(io.
    6F 70 65 6E 28 61 72 67 5B 31    open(arg[1
    5D 2C 20 22 72 62 22 29 29 0A    ], "rb")).
               ...
    22 25 63 22 2C 20 22 2E 22 29    "%c", ".")
    2C 20 22 5C 6E 22 29 0A 65 6E    , "\n").en
    64 0A                            d.