(From C/C++ Users Journal, July 2001)
John Panzer
"One size fits all" has never applied to strings. Now you can have as many different kinds of strings as you need, all wrapped in the std::string interface.
Copyright © 2001 John Panzer
The standard C++ string class, std::string, is a useful and flexible tool. But it isn’t optimal in every situation. For performance reasons, it can be important to be able to choose a custom string implementation. This article presents a pair of high-performance alternative implementations that preserve the standard string interface, along with a framework for easily creating new implementations.
Recently, a programming team at my company discovered that std::string was causing a performance problem in a multithreaded server program. The reason? Constructing and initializing each string required an additional heap allocation, and the program's classes used many small strings as member variables.
Their program was getting bogged down in the memory allocator. Heap allocations can be particularly expensive in multithreaded programs because of thread contention for the shared global heap. A possible solution would have been to switch to fixed length char arrays. But while most of the strings were short in practice, there were occasional long strings that needed to be handled correctly. Also, this would involve changing a fair amount of string processing code.
After profiling their code and identifying the bottleneck, they ended up replacing std::string member variables with a custom string class designed to avoid extra heap allocations in the common cases. It has a fixed-size internal buffer to handle the common case, along with an overflow strategy to handle longer strings correctly.
Unfortunately, the interface of the custom string class was not compatible with std::string, so they still had to rewrite code. The result is no longer compatible with the std::string interface. Looking at that interface, it’s easy to see that the team made the right decision. The std::string class has a huge interface and replicating it is not something you want to tackle during the optimization phase of a project.
This is unfortunate, because ideally we would like to have a toolbox of string classes tuned for different situations, all implementing at least the standard string interface. With such a toolbox, changing string variants would involve relatively minor surgery to the source code, and experimenting with different variants would be inexpensive. And, of course, the client code would remain compatible with the standard string interface. Emulating the std::string iterator interface would be especially useful, because it allows standard library algorithms to work with variant strings. (An example of such a variant is the SGI STL rope class [1], which is optimized for dealing with large strings.)
Towards this goal, I decided to create an adaptor template class that would make it easier to create new string class implementations. The xstr::xstring adaptor, so called because it wraps any kind of implementation, provides the full standard string interface. It is designed to wrap a particular implementation interface, which is described below. Any class providing that interface can be used as an xstring implementation.
To assemble a custom string variant with a particular implementation, combine xstring with that implementation class. For example, to create an xstring implemented by a fixed size 64-character buffer:
#include <xstring.h>
#include <fixed_char_buf.h>
typedef xstr::xstring<
xstr::fixed_char_buf<64> >
small_string;
small_string s("Hello World!");
s += "\n-from small_string";
The template class xstring is contained in xstring.h (see source.zip):
template <class Impl> class xstring {...};
Its interface is essentially the same as std::string. It also provides conversions to and from std::string for convenience. The xstring library currently provides two lightweight string implementations, fixed_char_buf and var_char_buf, which are described below.
Listing 1 shows fixed_char_buf.h, the header file for the fixed_char_buf template. This template class uses an internal fixed-size buffer for storage, similar to the c_array template described in [2]. The first template argument specifies the size of the buffer, and the others are the same as for std::string:
template <size_t SIZE, class CharT=char, class Traits=std::char_traits<CharT> > class fixed_char_buf;
Unlike std::string, fixed_char_buf does not have an allocator template argument, because it never does any dynamic allocation.
That's also the reason to use this implementation: It avoids the overhead of additional heap allocation, and may outperform std::string in certain situations for that reason. Of course, the tradeoff is that it is limited to SIZE characters. If it overflows, it will throw a length_error exception. Clearly this implementation isn't acceptable as a general substitute for std::string, but it may be perfectly reasonable for certain kinds of string data, especially if it allows substantial performance or scalability improvements.
Use this implementation when you know that heap allocation is a performance problem, and all strings are limited to SIZE characters.
Listing 2 shows var_char_buf.h, the header file for the var_char_buf template. This template class is an elaboration of fixed_char_buf that uses a fixed-size buffer initially, but uses dynamic memory to handle overflow. The template arguments are similar to fixed_char_buf, but with the addition of an allocator argument:
template <size_t SIZE, class CharT=char, class Traits=std::char_traits<CharT>, class Alloc=allocator<CharT> > class var_char_buf;
This is essentially the implementation used by the development team to solve their heap contention problem, but generalized to deal with arbitrary character types and allocators. var_char_buf optimizes for the case where the majority of strings are short, less than SIZE, but a few are longer and must be handled correctly. When a long string must be handled, there is an extra memory hit, because the class cannot return the original SIZE bytes to the heap. This implementation is also slightly less efficient than fixed_char_buf even if its overflow capability is not used.
Use this implementation when heap allocation is a performance problem, and a majority of strings are short, but a minority are longer than SIZE and need to be handled correctly.
I wrote a small program, test_time.cc (Listing 3), to measure raw string construction time. This test measures how effective the specialized implementations are at reducing string construction time. I ran the tests on two platforms, Cygwin under Windows and Cygnus 99r1 under Solaris, using basic (non-thread safe) and thread safe versions of the runtime libraries. In each case the program was compiled using g++ with optimizations turned on (-O2).
Table 1 shows the results; the specialized strings were faster at construction than std::string. In the single-threaded cases, they reduced the run time by up to 30 percent. In the thread-safe cases, the specialized strings reduced the run time by up to 78 percent.
This indicates that the additional overhead of locking and unlocking heap data structures can be a significant overhead in string creation time, even if there is no actual thread contention for the heap. Thus, a string implementation that minimizes heap access can provide a significant speed increase, even if thread contention is not an issue.
Table 1: Comparison of string construction times for std::string, fixed_char_buf, and var_char_buf strings
std::string baseline |
fixed_char_buf<128> |
var_char_buf<32> |
||||
Platform |
run time (ms) |
% of baseline |
run time |
% of baseline |
run time |
% of baseline |
Basic Cygwin |
7111 |
100% |
4973 |
70% |
6112 |
86% |
Threadsafe Cygwin |
21568 |
100% |
4645 |
22% |
6208 |
29% |
Basic Solaris |
7060 |
100% |
6260 |
89% |
6530 |
92% |
Threadsafe Solaris |
12740 |
100% |
6260 |
49% |
6600 |
52% |
As a second test, I wrote a multithreaded program, test_mt.cc (Listing 4). This is the same as test_time.cc but runs the test in parallel in multiple threads. Each thread constructs a fixed number of strings. By running the test with an increasing number of threads, I compared how the run times increase in proportion to the amount of work done. I ran this test on a two CPU Solaris machine, again using g++ with optimizations on (-O2).
Table 2 shows the results. It gives the total run time for each combination of string class and number of threads. As each thread does a fixed amount of work, the total workload increases from 1 to 10 in the table. Note that as the workload goes up by a factor of 10, the run time for the std::string goes up by a factor of 68. At the same time, the run time for the specialized strings goes up only by a factor of 10. In the 10 thread case, the specialized strings are more than 10 times faster at construction than std::string (seven percent and eight percent of the base run time, respectively).
Table 2: Comparison of string construction times vs. number of threads for std::string, fixed_char_buf, and var_char_buf strings
std::string baseline |
fixed_char_buf<128> |
var_char_buf<32> |
||||
Number of |
run time (ms) |
% of baseline |
run time (ms) |
% of baseline |
run time (ms) |
% of baseline |
1 |
1270 |
100% |
620 |
49% |
680 |
54% |
2 |
6990 |
100% |
1250 |
18% |
1400 |
20% |
3 |
8960 |
100% |
1840 |
21% |
2110 |
24% |
4 |
22440 |
100% |
2460 |
11% |
2800 |
12% |
5 |
40560 |
100% |
3070 |
8% |
3560 |
9% |
10 |
87380 |
100% |
6230 |
7% |
7110 |
8% |
The testing I have done against various small single-threaded programs shows that the specialized strings tend to perform about as well as std::string when construction time is not critical. For example, the xr cross-reference generator [3] takes about 4.5 seconds to process the text of Moby Dick using either std::string or fixed_char_buf. This isn't surprising, as it creates only as many strings as there are unique words within the text, and the majority of its running time is spent in I/O and string comparisons. The significance of these timing tests is that splitting the interface and implementations apart has not produced a noticeable slowdown on any commonly used operations.
xstring implementation classes must provide a "minimal string" interface, which is a small subset of the full std::string interface. This interface is designed for ease of implementation rather than ease of use. It is an STL container; as an STL concept [4], it is a refinement of Austern's Container. Its iterators are Random Access Iterators. Its value_type is a POD (plain old data) type with an appropriate char_traits<> specialization. Beyond that, the interface supports the following operations, with semantics identical to those of std::string:
// Insert a range of elements: s.insert(iterator pos, InputIter first, InputIter last) // Erase range [first,last): s.erase(iterator first, iterator last) // Reserve space for n elements, // or throw an exception: s.reserve(size_type n) // Return the current capacity: s.capacity() // Returns a zero-terminated C // string: s.c_str() // Returns a pointer to an // internal buffer: s.data()
Finally, this interface provides the following two additional operations, whose semantics are the same as the corresponding assign methods in std::string:
// Assign from range: s.range_initialize( InputIter first, InputIter last) // Assign from n characters: s.element_initialize( size_type n, value_type c)
These requirements are much easier to satisfy than the full std::string interface;
fixed_char_buf.h and var_char_buf.h are each under 200 lines. By comparison,
xstring.h is over 1,000 lines and requires some nontrivial template metaprogramming.
Much of the xstring interface is devoted to functions that provide a more convenient
interface to functionality already present in the minimal string interface.
The xstring template inherits from its implementation class, so an implementation is free to provide additional member functions. This is arguably an abuse of inheritance, but it is the easiest way to provide a wrapper that allows for additional member functions. These can be very useful; for example, an implementation could provide acquire_buf and release_buf member functions to allow client code to write directly to its internal buffer.
To create a new implementation, I recommend starting with fixed_char_buf or var_char_buf as an example and filling out the various member functions. The semantics of each member function are the same as for the corresponding functions in std::string.
Examples of possible variant implementations are garbage collected strings, strings that use specialized reference counting schemes, or substrings (views into larger strings).
The xstring class and the implementations presented above pass the GNU C++ library test suite [5]. I've also tested it successfully against various small utility programs. In addition, the xstring class is based on the SGI STL [1] std::string class, and so benefits from being based on code from a high quality and widely used library. However, it has not yet been used in a large-scale program.
My experience so far has been that compiling with a new program does occasionally find an unimplemented feature, but so far problems have always been caught at compile time and are easily fixed. I have used this library only with recent versions of GNU/Cygnus g++; compilers that don't implement modern C++ language features such as template partial ordering may have problems.
The xstring interface provides source code compatibility with std::string. Note that passing an xstring to a function requiring an std::string will work, due to the automatic conversions provided by xstring. However, this performs a copy of the string contents, even if the function accepts a const string reference. The exception is when a function takes a non-const std::string reference. This will produce a compiler error, and you'll need to decide whether to modify the function to accept an xstring, or to copy to a local std::string and pass that. Finally, note that the fixed_char_buf variant will break code that assumes it can add an arbitrary number of characters to a string. Code should call max_size to determine the maximum possible size of a string.
You may need to use different implementations for different kinds of data; in this case, using descriptive typedef names (such as address_t and phonenum_t) may be helpful.
The xstring template provides standard-compatible string objects that can be constructed an order of magnitude faster than std::string in multithreaded programs, with comparable performance for other operations. It also provides an extensible framework for implementing new specialized string classes quickly and easily. This lowers the risk involved in using std::string, because it is easy to swap implementations--or even create a new implementation--if profiling shows the need.
Creating the xstring class would have been very difficult without the SGI STL std::string class [1]. Much of the code in xstring is derived directly from this class, and I learned a lot about template metaprogramming by studying its code. Without the work of the programmers contributing to the GNU string test suite [5], it would have many more bugs. Finally, I thank my colleagues at AOL for providing an interesting C++ design challenge.
[1] Alexander Stepanov and Matthew Austern, available at <http://www.sgi.com/tech/stl>
[2] Bjarne Stroustrup, The C++ Programming Language, 3rd Edition (Addison-Wesley, 1997), 17.5.4, Built-In Arrays.
[3] Dan Saks, "C++ Theory and Practice: Replacing Character Arrays with Strings, Part 2," C/C++ Users Journal, February 2000.
[4] Matthew Austern, Generic Programming and the STL (Addison-Wesley, 1998)
[5] The egcs libstdc+-v3 string test suite is available at <http://gcc.gnu.org/cgi-bin/cvsweb.cgi/egcs/libstdc++-v3/testsuite/21_strings/>
John Panzer is a principal software engineer with AOL Time Warner, currently working on large scale Internet server systems. In his spare time, he teaches classes in C++, Java, and the STL. His interests include system design, generic programming, and object-oriented design. He can be reached as jpanzer@acm.org or via <www.johnpanzer.com>