This first edition was written for Lua 5.0. While still largely relevant for later versions, there are some differences.
The fourth edition targets Lua 5.3 and is available at Amazon and other bookstores.
By buying the book, you also help to support the Lua project.


20 – The String Library

The power of a raw Lua interpreter to manipulate strings is quite limited. A program can create string literals and concatenate them. But it cannot extract a substring, check its size, or examine its contents. The full power to manipulate strings in Lua comes from its string library.

Some functions in the string library are quite simple: string.len(s) returns the length of a string s. string.rep(s, n) returns the string s repeated n times. You can create a string with 1M bytes (for tests, for instance) with string.rep("a", 2^20). string.lower(s) returns a copy of s with the upper-case letters converted to lower case; all other characters in the string are not changed (string.upper converts to upper case). As a typical use, if you want to sort an array of strings regardless of case, you may write something like

    table.sort(a, function (a, b)
      return string.lower(a) < string.lower(b)
    end)
Both string.upper and string.lower follow the current locale. Therefore, if you work with the European Latin-1 locale, the expression
    string.upper("ação")
results in "AÇÃO".

The call string.sub(s,i,j) extracts a piece of the string s, from the i-th to the j-th character inclusive. In Lua, the first character of a string has index 1. You can also use negative indices, which count from the end of the string: The index -1 refers to the last character in a string, -2 to the previous one, and so on. Therefore, the call string.sub(s, 1, j) gets a prefix of the string s with length j; string.sub(s, j, -1) gets a suffix of the string, starting at the j-th character (if you do not provide a third argument, it defaults to -1, so we could write the last call as string.sub(s, j)); and string.sub(s, 2, -2) returns a copy of the string s with the first and last characters removed:

    s = "[in brackets]"
    print(string.sub(s, 2, -2))   -->  in brackets

Remember that strings in Lua are immutable. The string.sub function, like any other function in Lua, does not change the value of a string, but returns a new string. A common mistake is to write something like

    string.sub(s, 2, -2)
and to assume that the value of s will be modified. If you want to modify the value of a variable, you must assign the new value to the variable:
    s = string.sub(s, 2, -2)

The string.char and string.byte functions convert between characters and their internal numeric representations. The function string.char gets zero or more integers, converts each one to a character, and returns a string concatenating all those characters. The function string.byte(s, i) returns the internal numeric representation of the i-th character of the string s; the second argument is optional, so that a call string.byte(s) returns the internal numeric representation of the first (or single) character of s. In the following examples, we assume that characters are represented in ASCII:

    print(string.char(97))                    -->  a
    i = 99; print(string.char(i, i+1, i+2))   -->  cde
    print(string.byte("abc"))                 -->  97
    print(string.byte("abc", 2))              -->  98
    print(string.byte("abc", -1))             -->  99
In the last line, we used a negative index to access the last character of the string.

The function string.format is a powerful tool when formatting strings, typically for output. It returns a formatted version of its variable number of arguments following the description given by its first argument, the so-called format string. The format string has rules similar to those of the printf function of standard C: It is composed of regular text and directives, which control where and how each argument must be placed in the formatted string. A simple directive is the character `%´ plus a letter that tells how to format the argument: `d´ for a decimal number, `x´ for hexadecimal, `o´ for octal, `f´ for a floating-point number, `s´ for strings, plus other variants. Between the `%´ and the letter, a directive can include other options, which control the details of the format, such as the number of decimal digits of a floating-point number:

    print(string.format("pi = %.4f", PI))     --> pi = 3.1416
    d = 5; m = 11; y = 1990
    print(string.format("%02d/%02d/%04d", d, m, y))
      --> 05/11/1990
    tag, title = "h1", "a title"
    print(string.format("<%s>%s</%s>", tag, title, tag))
      --> <h1>a title</h1>
In the first example, the %.4f means a floating-point number with four digits after the decimal point. In the second example, the %02d means a decimal number (`d´), with at least two digits and zero padding; the directive %2d, without the zero, would use blanks for padding. For a complete description of those directives, see the Lua reference manual. Or, better yet, see a C manual, as Lua calls the standard C libraries to do the hard work here.