Tomalak's Tuesday Tip #2: String Streams

OK, so this tip thing doesn't look like it's going to be weekly. But hey, close enough right?

This week I want to talk about two classes that are both part of the C++ Standard Library, both do similar things and yet are fundamentally different. String streams.

The modern way

If you write decent C++ you may already be familiar with std::stringstream from the <sstream> header. It provides a buffered stream interface over the standard std::string object, allowing very easy manipulation of the underlying buffer. In particular, it's often used due to the large number of conversion opportunities exposed by the stream interface.

For example if you wanted a numeric string in standard C, you'd probably have done something like this:

char str[4]; sprintf(str, "%u", 123); //str is now "123"

Annoyingly, you had to pre-allocate the amount of memory that you thought would be required. In the example above that's easy: I knew I needed space for three characters and the terminating NUL, so I allocated four characters for the C-string.

But it wasn't always that easy. If you wanted to take unpredeterminable input and put that into a string, you'd have to simply guess at the space required, and if there wasn't enough that was just too damn bad. But hey, at least you could put almost anything into it thanks to printf and its friends.

Then along came C++ with its standard string wrapper, with its dynamic allocation and, most importantly, dynamic resizing. You can concatenate with a single operator and you can add as much data to it as you like without worrying about running out of pre-allocated buffer space.

// C const char str1[7] = "Hello "; const char str2[6] = "world"; char str[12] = {0}; memcpy(str, str1, 6); memcpy(str, str2, 5);</p> <p>// C++ std::string stdstr = "Hello "; stdstr += "world";

std::string concatenation is great. But the class is not very good at implicit conversion:

std::string str = "Hello "; str += (string)5; str += " worlds."; // error: no matching function for call to 'string::basic_string(int)'

Thankfully, the streams interface is great at this as it has all sorts of varieties of conversion loaded into the << operator. Since stringstream is a stream buffer built over a string, we can use it to easily manipulate the underlying string in ways the string itself would never allow us to:

std::stringstream ss; ss << "Hello " << 5 << " worlds."; std::string str = ss.str(); // str contains "Hello 5 worlds."

Of course, the above is a silly simplistic example as we could have written std::string str = "Hello 5 worlds" directly, but the technique is useful when you don't know in advance what that number's going to be:

std::stringstream ss; ss << "Hello " << rand() << " worlds."; std::string str = ss.str(); // str contains "Hello N worlds." // where N is a random number between 0 and RAND_MAX.

The simplicity of this approach seems like a lifesaver, and is in fact used all the time when people would in the past have used sprintf with a fixed C-string buffer.

However, there is one oft-overlooked flaw with this approach. stringstream.str() returns a copy of the string buffer, not a reference. In fact, there is no way to get a reference to the string buffer of a stringstream. This means that every time you pull a string from a stringstream, the data is copied in memory.

It might not seem like such a big deal unless you're frequently creating a stringstream purely to use its conversion facilities, then grabbing the underlying string for further use. You're wasting memory and CPU cycles.

Looking backwards

There exists a standard alternative that a lot of people don't know about, with the similar name discouraged by experts for at least eight years.

This recommendation may seem a little premature when you consider that strstream is not being dropped from the upcoming C++0x and the next standard version after that is not expected until we approach 2020.

But more importantly, what most of these experts opt not to mention is that the underlying data of a strstream is an old C-style character array rather than a C++ string object. Because of this, we get direct access to the data without having to go through an protective layer of abstraction.

Specifically, where stringstream.str() gives us a copy of a string object (which copies the string), strstream.str() gives us a copy of a pointer to characters (which does not).

strstream ss; ss << "Hello " << rand() << " worlds."; char* str = ss.str();

We'd still have to create a copy of the C-style string if we wanted to use all the functionality of C++ strings because std::string doesn't give us a choice, but now we have a C-style string that wasn't copied and we can do what we like with it.

A complicated manipulation of strstream's underlying buffer might look like this:

std::strstream ss; ss << "HI WORLD." << '\0';</p> <p>char* c = ss.str(); memcpy(c+8, "!", sizeof(char)); cout << ss.str(); // Output: "HI WORLD!"

So it's not beautiful; but it does demonstrate the added power of direct stream buffer access.

If you like the options provided by the stream interface and find yourself concerned that you're copying string data needlessly, or have a need to modify the underlying buffer data, stop and think for a moment before throwing strstream mericilessly to the hounds of time. Because it has use yet.

Bootnote

I apologise for the use of the past tense when referring to C. Yes, I know the language is still very much alive and kicking and that plenty of people still use it. However, in this article's C++ context it's merely a precursor. So you'll just have to get used to it.

Tom Lachecki

(Tomalak Geret'kal)

Recent Posts

Tags

Archives