This first edition was written for Lua 5.0. While still largely relevant for later versions, there are some differences.
The fourth edition targets Lua 5.3 and is available at Amazon and other bookstores.
By buying the book, you also help to support the Lua project.


27.2 – String Manipulation

When a C function receives a string argument from Lua, there are only two rules that it must observe: Not to pop the string from the stack while accessing it and never to modify the string.

Things get more demanding when a C function needs to create a string to return to Lua. Now, it is up to the C code to take care of buffer allocation/deallocation, buffer overflow, and the like. Nevertheless, the Lua API provides some functions to help with those tasks.

The standard API provides support for two of the most basic string operations: substring extraction and string concatenation. To extract a substring, remember that the basic operation lua_pushlstring gets the string length as an extra argument. Therefore, if you want to pass to Lua a substring of a string s ranging from position i to j (inclusive), all you have to do is

    lua_pushlstring(L, s+i, j-i+1);
As an example, suppose you want a function that splits a string according to a given separator (a single character) and returns a table with the substrings. For instance, the call
    split("hi,,there", ",")
should return the table {"hi", "", "there"}. We could write a simple implementation as follows. It needs no extra buffers and puts no constraints on the size of the strings it can handle.
    static int l_split (lua_State *L) {
      const char *s = luaL_checkstring(L, 1);
      const char *sep = luaL_checkstring(L, 2);
      const char *e;
      int i = 1;
    
      lua_newtable(L);  /* result */
    
      /* repeat for each separator */
      while ((e = strchr(s, *sep)) != NULL) {
        lua_pushlstring(L, s, e-s);  /* push substring */
        lua_rawseti(L, -2, i++);
        s = e + 1;  /* skip separator */
      }
    
      /* push last substring */
      lua_pushstring(L, s);
      lua_rawseti(L, -2, i);
    
      return 1;  /* return the table */
    }

To concatenate strings, Lua provides a specific function in its API, called lua_concat. It is equivalent to the .. operator in Lua: It converts numbers to strings and triggers metamethods when necessary. Moreover, it can concatenate more than two strings at once. The call lua_concat(L, n) will concatenate (and pop) the n values at the top of the stack and leave the result on the top.

Another helpful function is lua_pushfstring:

    const char *lua_pushfstring (lua_State *L,
                                 const char *fmt, ...);
It is somewhat similar to the C function sprintf, in that it creates a string according to a format string and some extra arguments. Unlike sprintf, however, you do not need to provide a buffer. Lua dynamically creates the string for you, as large as it needs to be. There are no worries about buffer overflow and the like. The function pushes the resulting string on the stack and returns a pointer to it. Currently, this function accepts only the directives %% (for the character `%´), %s (for strings), %d (for integers), %f (for Lua numbers, that is, doubles), and %c (accepts an integer and formats it as a character). It does not accept any options (such as width or precision).

Both lua_concat and lua_pushfstring are useful when we want to concatenate only a few strings. However, if we need to concatenate many strings (or characters) together, a one-by-one approach can be quite inefficient, as we saw in Section 11.6. Instead, we can use the buffer facilities provided by the auxiliary library. Auxlib implements these buffers in two levels. The first level is similar to buffers in I/O operations: It collects small strings (or individual characters) in a local buffer and passes them to Lua (with lua_pushlstring) when the buffer fills up. The second level uses lua_concat and a variant of the stack algorithm that we saw in Section 11.6 to concatenate the results of multiple buffer flushes.

To describe the buffer facilities from auxlib in more detail, let us see a simple example of its use. The next code shows the implementation of string.upper, right from the file lstrlib.c:

    static int str_upper (lua_State *L) {
      size_t l;
      size_t i;
      luaL_Buffer b;
      const char *s = luaL_checklstr(L, 1, &l);
      luaL_buffinit(L, &b);
      for (i=0; i<l; i++)
        luaL_putchar(&b, toupper((unsigned char)(s[i])));
      luaL_pushresult(&b);
      return 1;
    }
The first step for using a buffer from auxlib is to declare a variable with type luaL_Buffer, and then to initialize it with a call to luaL_buffinit. After the initialization, the buffer keeps a copy of the state L, so we do not need to pass it when calling other functions that manipulate the buffer. The macro luaL_putchar puts a single character into the buffer. Auxlib also offers luaL_addlstring, to put a string with an explicit length into the buffer, and luaL_addstring, to put a zero-terminated string. Finally, luaL_pushresult flushes the buffer and leaves the final string on the top of the stack. The prototypes of those functions are as follows:
    void luaL_buffinit (lua_State *L, luaL_Buffer *B);
    void luaL_putchar (luaL_Buffer *B, char c);
    void luaL_addlstring (luaL_Buffer *B, const char *s,
                                          size_t l);
    void luaL_addstring (luaL_Buffer *B, const char *s);
    void luaL_pushresult (luaL_Buffer *B);

Using these functions, we do not have to worry about buffer allocation, overflows, and other such details. As we saw, the concatenation algorithm is quite efficient. The str_upper function handles huge strings (more than 1 MB) without any problem.

When you use the auxlib buffer, you have to worry about one detail. As you put things into the buffer, it keeps some intermediate results in the Lua stack. Therefore, you cannot assume that the stack top will remain where it was before you started using the buffer. Moreover, although you can use the stack for other tasks while using a buffer (even to build another buffer), the push/pop count for these uses must be balanced every time you access the buffer. There is one obvious situation where this restriction is too severe, namely when you want to put into the buffer a string returned from Lua. In such cases, you cannot pop the string before adding it to the buffer, because you should never use a string from Lua after popping it from the stack; but also you cannot add the string to the buffer before popping it, because then the stack would be in the wrong level. In other words, you cannot do something like this:

    luaL_addstring(&b, lua_tostring(L, 1));   /* BAD CODE */
Because this is a common situation, auxlib provides a special function to add the value on the top of the stack into the buffer:
    void luaL_addvalue (luaL_Buffer *B);
Of course, it is an error to call this function if the value on the top is not a string or a number.