{"id":339,"date":"2008-09-09T17:16:09","date_gmt":"2008-09-09T17:16:09","guid":{"rendered":"http:\/\/kera.name\/articles\/?p=339"},"modified":"2011-03-30T23:27:21","modified_gmt":"2011-03-30T23:27:21","slug":"tomalaks-tuesday-tip-2-string-streams","status":"publish","type":"post","link":"https:\/\/kera.name\/articles\/2008\/09\/tomalaks-tuesday-tip-2-string-streams\/","title":{"rendered":"Tomalak&#039;s Tuesday Tip #2: String Streams"},"content":{"rendered":"<p>OK, so this tip thing doesn&#039;t look like it&#039;s going to be <a title=\"kera.name :: Articles \u00bb  Tomalak's Tuesday Tip\" href=\"https:\/\/kera.name\/articles\/2008\/08\/tomalaks-tuesday-tip\/\">weekly<\/a>. But hey, close enough right?<\/p>\n<p>This week I want to talk about two classes that are both part of the C++ Standard Library, both do similar things and yet are fundamentally different. String streams.<\/p>\n<p><strong>The modern way<\/strong><\/p>\n<p>If you write decent C++ you may already be familiar with <a href=\"http:\/\/www.cplusplus.com\/reference\/iostream\/stringstream\/\"><code>std::stringstream<\/code><\/a> from the <code>&lt;sstream&gt;<\/code> header. It provides a buffered stream interface over the standard <code>std::string<\/code> object, allowing very easy manipulation of the underlying buffer. In particular, it&#039;s often used due to the large number of conversion opportunities exposed by the stream interface.<\/p>\n<p>For example if you wanted a numeric string in standard C, you&#039;d probably have done something like this:<\/p>\n<p><textarea name=\"code\" class=\"cpp:nocontrols:nogutter\" cols=\"60\" rows=\"10\">char str[4];\nsprintf(str, \"%u\", 123); \/\/str is now \"123\"<\/textarea>\n<\/p>\n<p>Annoyingly, you had to pre-allocate the amount of memory that you thought would be required. In the example above that&#039;s easy: I knew I needed space for three characters and the terminating <code>NUL<\/code>, so I allocated four characters for the C-string.<\/p>\n<p>But it wasn&#039;t always that easy. If you wanted to take unpredeterminable input and put that into a string, you&#039;d have to simply guess at the space required, and if there wasn&#039;t enough that was just too damn bad. But hey, at least you could put almost anything into it thanks to <code>printf<\/code> and its friends.<\/p>\n<p>Then along came C++ with its standard <code>string<\/code> wrapper, with its dynamic allocation and, most importantly, dynamic resizing. You can concatenate with a single operator and you can add as much data to it as you like without worrying about running out of pre-allocated buffer space.<\/p>\n<p><textarea name=\"code\" class=\"cpp:nocontrols:nogutter\" cols=\"60\" rows=\"10\">\/\/ C\nconst char str1[7] = \"Hello \";\nconst char str2[6] = \"world\";\nchar str[12] = {0};\nmemcpy(str, str1, 6);\nmemcpy(str, str2, 5);<\/p>\n<p>\/\/ C++\nstd::string stdstr = \"Hello \";\nstdstr += \"world\";<\/textarea>\n<\/p>\n<p><code>std::string<\/code> concatenation is great. But the class is not very good at implicit conversion:<\/p>\n<p><textarea name=\"code\" class=\"cpp:nocontrols:nogutter\" cols=\"60\" rows=\"10\">std::string str = \"Hello \";\nstr += (string)5;\nstr += \" worlds.\";\n\/\/ error: no matching function for call to 'string::basic_string(int)'<\/textarea>\n<\/p>\n<p>Thankfully, the streams interface is great at this as it has all sorts of varieties of conversion loaded into the <code>&lt;&lt;<\/code> operator. Since <code>stringstream<\/code> is a stream buffer built over a string, we can use it to easily manipulate the underlying string in ways the string itself would never allow us to:<\/p>\n<p><textarea name=\"code\" class=\"cpp:nocontrols:nogutter\" cols=\"60\" rows=\"10\">std::stringstream ss;\nss << \"Hello \" << 5 << \" worlds.\";\nstd::string str = ss.str();\n\/\/ str contains \"Hello 5 worlds.\"<\/textarea>\n<\/p>\n<p>Of course, the above is a silly simplistic example as we could have written <code>std::string str = \"Hello 5 worlds\"<\/code> directly, but the technique is useful when you don&#039;t know in advance what that number&#039;s going to be:<\/p>\n<p><textarea name=\"code\" class=\"cpp:nocontrols:nogutter\" cols=\"60\" rows=\"10\">std::stringstream ss;\nss << \"Hello \" << rand() << \" worlds.\";\nstd::string str = ss.str();\n\/\/ str contains \"Hello N worlds.\"\n\/\/ where N is a random number between 0 and RAND_MAX.<\/textarea>\n<\/p>\n<p>The simplicity of this approach seems like a lifesaver, and is in fact used all the time when people would in the past have used sprintf with a fixed C-string buffer.<\/p>\n<p>However, there is one oft-overlooked flaw with this approach. <code>stringstream.str()<\/code> returns a <em>copy<\/em> of the string buffer, not a reference. In fact, there is <em>no way<\/em> to get a reference to the string buffer of a <code>stringstream<\/code>. This means that every time you pull a string from a stringstream, the data is copied in memory.<\/p>\n<p>It might not seem like such a big deal unless you&#039;re frequently creating a <code>stringstream<\/code> purely to use its conversion facilities, then grabbing the underlying string for further use. You&#039;re wasting memory and CPU cycles.<\/p>\n<p><strong>Looking backwards<\/strong><\/p>\n<p>There exists a standard alternative that a lot of people don&#039;t know about, with the similar name <a title=\"<strstream> (Standard C++ library)&#034; href=&#034;http:\/\/msdn.microsoft.com\/en-us\/library\/4e4xe3f4(VS.80).aspx&#034;><code>std::strstream<\/code><\/a> in the header <code>&lt;strstream&gt;<\/code>. In fairness, <code>strstream<\/code> is deprecated and its use has been <a title=\"Prefer Stringstream Objects to Strstream Objects\" href=\"http:\/\/www.devx.com\/tips\/Tip\/14133\">discouraged by experts<\/a> for at least eight years.<\/p>\n<p>This recommendation may seem a little premature when you consider that <code>strstream<\/code> is not being dropped from the upcoming C++0x and the next standard version after that is not expected until we approach 2020.<\/p>\n<p>But more importantly, what most of these experts opt not to mention is that the underlying data of a <code>strstream<\/code> is an old C-style character array rather than a C++ <code>string<\/code> object. Because of this, we get direct access to the data without having to go through an protective layer of abstraction.<\/p>\n<p>Specifically, where <code>stringstream.str()<\/code> gives us a copy of a string object (which copies the string), <code>strstream.str()<\/code> gives us a copy of a pointer to characters (which does not).<\/p>\n<p><textarea name=\"code\" class=\"cpp:nocontrols:nogutter\" cols=\"60\" rows=\"10\">strstream ss;\nss << \"Hello \" << rand() << \" worlds.\";\nchar* str = ss.str();<\/textarea>\n<\/p>\n<p>We&#039;d still have to create a copy of the C-style string if we wanted to use all the functionality of C++ strings because <code>std::string<\/code> doesn&#039;t give us a choice, but now we have a C-style string that wasn&#039;t copied and we can do what we like with it.<\/p>\n<p>A complicated manipulation of strstream&#039;s underlying buffer might look like this:<\/p>\n<p><textarea name=\"code\" class=\"cpp:nocontrols:nogutter\" cols=\"60\" rows=\"10\">std::strstream ss;\nss << \"HI WORLD.\" << '\\0';<\/p>\n<p>char* c = ss.str();\nmemcpy(c+8, \"!\", sizeof(char));\ncout << ss.str();\n\/\/ Output: \"HI WORLD!\"<\/textarea>\n<\/p>\n<p>So it&#039;s not beautiful; but it does demonstrate the added power of direct stream buffer access.<\/p>\n<p>If you like the options provided by the stream interface and find yourself concerned that you&#039;re copying string data needlessly, or have a need to modify the underlying buffer data, stop and think for a moment before throwing <code>strstream<\/code> mericilessly to the hounds of time. Because it has use yet.<\/p>\n<p><strong>Bootnote<\/strong><\/p>\n<p>I apologise for the use of the past tense when referring to C. Yes, I know the language is still very much alive and kicking and that plenty of people still use it. However, in this article&#039;s C++ context it&#039;s merely a precursor. So you&#039;ll just have to get used to it.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This week I want to talk about two classes that are both part of the C++ Standard Library, both do similar things and yet are fundamentally different. String streams.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[21,20,33],"_links":{"self":[{"href":"https:\/\/kera.name\/articles\/wp-json\/wp\/v2\/posts\/339"}],"collection":[{"href":"https:\/\/kera.name\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kera.name\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kera.name\/articles\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kera.name\/articles\/wp-json\/wp\/v2\/comments?post=339"}],"version-history":[{"count":2,"href":"https:\/\/kera.name\/articles\/wp-json\/wp\/v2\/posts\/339\/revisions"}],"predecessor-version":[{"id":641,"href":"https:\/\/kera.name\/articles\/wp-json\/wp\/v2\/posts\/339\/revisions\/641"}],"wp:attachment":[{"href":"https:\/\/kera.name\/articles\/wp-json\/wp\/v2\/media?parent=339"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kera.name\/articles\/wp-json\/wp\/v2\/categories?post=339"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kera.name\/articles\/wp-json\/wp\/v2\/tags?post=339"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}