![]() |
Programming in Lua | ![]() |
| Part III. The Standard Libraries Chapter 20. The String Library |
The power of a raw Lua interpreter to manipulate strings is quite limited. A program can create string literals and concatenate them. But it cannot extract a substring, check its size, or examine its contents. The full power to manipulate strings in Lua comes from its string library.
Some functions in the string library are quite simple:
string.len(s)
returns the length of a string s.
string.rep(s, n)
returns the string s repeated n times.
You can create a string with 1M bytes (for tests, for instance)
with string.rep("a", 2^20).
string.lower(s)
returns a copy of s with the
upper-case letters converted to lower case;
all other characters in the string are not changed
(string.upper converts to upper case).
As a typical use, if you want to sort an
array of strings regardless of case,
you may write something like
table.sort(a, function (a, b)
return string.lower(a) < string.lower(b)
end)
Both string.upper and string.lower follow the current locale.
Therefore, if you work with the European Latin-1 locale,
the expression
string.upper("ação")
results in "AÇÃO".
The call string.sub(s,i,j) extracts a piece of the string s,
from the i-th to the j-th character inclusive.
In Lua, the first character of a string has index 1.
You can also use negative indices,
which count from the end of the string:
The index -1 refers to the last character in a string,
-2 to the previous one, and so on.
Therefore, the call string.sub(s, 1, j) gets a prefix of
the string s with length j;
string.sub(s, j, -1) gets a suffix of the string,
starting at the j-th character
(if you do not provide a third argument,
it defaults to -1,
so we could write the last call as string.sub(s, j));
and string.sub(s, 2, -2) returns a copy of the string s with
the first and last characters removed:
s = "[in brackets]"
print(string.sub(s, 2, -2)) --> in brackets
Remember that strings in Lua are immutable.
The string.sub function,
like any other function in Lua,
does not change the value of a string,
but returns a new string.
A common mistake is to write something like
string.sub(s, 2, -2)
and to assume that the
value of s will be modified.
If you want to modify the value of a variable,
you must assign the new value to the variable:
s = string.sub(s, 2, -2)
The string.char and string.byte functions convert
between characters and their internal numeric representations.
The function string.char gets zero or more integers,
converts each one to a character,
and returns a string concatenating all those characters.
The function string.byte(s, i) returns the internal numeric
representation of the i-th character of the string s;
the second argument is optional, so that a call string.byte(s)
returns the internal numeric representation of the first
(or single) character of s.
In the following examples,
we assume that characters are represented in ASCII:
print(string.char(97)) --> a
i = 99; print(string.char(i, i+1, i+2)) --> cde
print(string.byte("abc")) --> 97
print(string.byte("abc", 2)) --> 98
print(string.byte("abc", -1)) --> 99
In the last line, we used a negative index to access
the last character of the string.
The function string.format is a powerful tool when formatting strings,
typically for output.
It returns a formatted version of its variable number of arguments
following the description given by its first argument,
the so-called format string.
The format string has rules similar to those of the printf
function of standard C:
It is composed of regular text and directives,
which control where and how each argument must be placed
in the formatted string.
A simple directive is the character `%´ plus a letter that
tells how to format the argument:
`d´ for a decimal number, `x´ for hexadecimal,
`o´ for octal,
`f´ for a floating-point number,
`s´ for strings, plus other variants.
Between the `%´ and the letter,
a directive can include other options,
which control the details of the format,
such as the number of decimal digits of a floating-point number:
print(string.format("pi = %.4f", PI)) --> pi = 3.1416
d = 5; m = 11; y = 1990
print(string.format("%02d/%02d/%04d", d, m, y))
--> 05/11/1990
tag, title = "h1", "a title"
print(string.format("<%s>%s</%s>", tag, title, tag))
--> <h1>a title</h1>
In the first example, the %.4f means a floating-point number
with four digits after the decimal point.
In the second example, the %02d means a decimal number
(`d´), with at least two digits and zero padding;
the directive %2d, without the zero,
would use blanks for padding.
For a complete description of those directives,
see the Lua reference manual.
Or, better yet, see a C manual,
as Lua calls the standard C libraries to do the hard work here.
| Copyright © 2003-2004 Roberto Ierusalimschy. All rights reserved. |
|
![]() |