How to convert a std::string to const char* or char*?


Question

How can I convert an std::string to a char* or a const char*?

1
853
10/6/2014 7:43:51 AM

Accepted Answer

If you just want to pass a std::string to a function that needs const char* you can use

std::string str;
const char * c = str.c_str();

If you want to get a writable copy, like char *, you can do that with this:

std::string str;
char * writable = new char[str.size() + 1];
std::copy(str.begin(), str.end(), writable);
writable[str.size()] = '\0'; // don't forget the terminating 0

// don't forget to free the string after finished using it
delete[] writable;

Edit: Notice that the above is not exception safe. If anything between the new call and the delete call throws, you will leak memory, as nothing will call delete for you automatically. There are two immediate ways to solve this.

boost::scoped_array

boost::scoped_array will delete the memory for you upon going out of scope:

std::string str;
boost::scoped_array<char> writable(new char[str.size() + 1]);
std::copy(str.begin(), str.end(), writable.get());
writable[str.size()] = '\0'; // don't forget the terminating 0

// get the char* using writable.get()

// memory is automatically freed if the smart pointer goes 
// out of scope

std::vector

This is the standard way (does not require any external library). You use std::vector, which completely manages the memory for you.

std::string str;
std::vector<char> writable(str.begin(), str.end());
writable.push_back('\0');

// get the char* using &writable[0] or &*writable.begin()
1007
10/6/2014 7:44:34 AM

Given say...

std::string x = "hello";

Getting a `char *` or `const char*` from a `string`

How to get a character pointer that's valid while x remains in scope and isn't modified further

C++11 simplifies things; the following all give access to the same internal string buffer:

const char* p_c_str = x.c_str();
const char* p_data  = x.data();
char* p_writable_data = x.data(); // for non-const x from C++17 
const char* p_x0    = &x[0];

      char* p_x0_rw = &x[0];  // compiles iff x is not const...

All the above pointers will hold the same value - the address of the first character in the buffer. Even an empty string has a "first character in the buffer", because C++11 guarantees to always keep an extra NUL/0 terminator character after the explicitly assigned string content (e.g. std::string("this\0that", 9) will have a buffer holding "this\0that\0").

Given any of the above pointers:

char c = p[n];   // valid for n <= x.size()
                 // i.e. you can safely read the NUL at p[x.size()]

Only for the non-const pointer p_writable_data and from &x[0]:

p_writable_data[n] = c;
p_x0_rw[n] = c;  // valid for n <= x.size() - 1
                 // i.e. don't overwrite the implementation maintained NUL

Writing a NUL elsewhere in the string does not change the string's size(); string's are allowed to contain any number of NULs - they are given no special treatment by std::string (same in C++03).

In C++03, things were considerably more complicated (key differences highlighted):

  • x.data()

    • returns const char* to the string's internal buffer which wasn't required by the Standard to conclude with a NUL (i.e. might be ['h', 'e', 'l', 'l', 'o'] followed by uninitialised or garbage values, with accidental accesses thereto having undefined behaviour).
      • x.size() characters are safe to read, i.e. x[0] through x[x.size() - 1]
      • for empty strings, you're guaranteed some non-NULL pointer to which 0 can be safely added (hurray!), but you shouldn't dereference that pointer.
  • &x[0]

    • for empty strings this has undefined behaviour (21.3.4)
      • e.g. given f(const char* p, size_t n) { if (n == 0) return; ...whatever... } you mustn't call f(&x[0], x.size()); when x.empty() - just use f(x.data(), ...).
    • otherwise, as per x.data() but:
      • for non-const x this yields a non-const char* pointer; you can overwrite string content
  • x.c_str()

    • returns const char* to an ASCIIZ (NUL-terminated) representation of the value (i.e. ['h', 'e', 'l', 'l', 'o', '\0']).
    • although few if any implementations chose to do so, the C++03 Standard was worded to allow the string implementation the freedom to create a distinct NUL-terminated buffer on the fly, from the potentially non-NUL terminated buffer "exposed" by x.data() and &x[0]
    • x.size() + 1 characters are safe to read.
    • guaranteed safe even for empty strings (['\0']).

Whichever way you get a pointer, you must not access memory further along from the pointer than the characters guaranteed present in the descriptions above. Attempts to do so have undefined behaviour, with a very real chance of application crashes and garbage results even for reads, and additionally wholesale data, stack corruption and/or security vulnerabilities for writes.

When do those pointers get invalidated?

If you call some string member function that modifies the string or reserves further capacity, any pointer values returned beforehand by any of the above methods are invalidated. You can use those methods again to get another pointer. (The rules are the same as for iterators into strings).

See also How to get a character pointer valid even after x leaves scope or is modified further below....

So, which is better to use?

From C++11, use .c_str() for ASCIIZ data, and .data() for "binary" data (explained further below).

In C++03, use .c_str() unless certain that .data() is adequate, and prefer .data() over &x[0] as it's safe for empty strings....

...try to understand the program enough to use data() when appropriate, or you'll probably make other mistakes...

The ASCII NUL '\0' character guaranteed by .c_str() is used by many functions as a sentinel value denoting the end of relevant and safe-to-access data. This applies to both C++-only functions like say fstream::fstream(const char* filename, ...) and shared-with-C functions like strchr(), and printf().

Given C++03's .c_str()'s guarantees about the returned buffer are a super-set of .data()'s, you can always safely use .c_str(), but people sometimes don't because:

  • using .data() communicates to other programmers reading the source code that the data is not ASCIIZ (rather, you're using the string to store a block of data (which sometimes isn't even really textual)), or that you're passing it to another function that treats it as a block of "binary" data. This can be a crucial insight in ensuring that other programmers' code changes continue to handle the data properly.
  • C++03 only: there's a slight chance that your string implementation will need to do some extra memory allocation and/or data copying in order to prepare the NUL terminated buffer

As a further hint, if a function's parameters require the (const) char* but don't insist on getting x.size(), the function probably needs an ASCIIZ input, so .c_str() is a good choice (the function needs to know where the text terminates somehow, so if it's not a separate parameter it can only be a convention like a length-prefix or sentinel or some fixed expected length).

How to get a character pointer valid even after x leaves scope or is modified further

You'll need to copy the contents of the string x to a new memory area outside x. This external buffer could be in many places such as another string or character array variable, it may or may not have a different lifetime than x due to being in a different scope (e.g. namespace, global, static, heap, shared memory, memory mapped file).

To copy the text from std::string x into an independent character array:

// USING ANOTHER STRING - AUTO MEMORY MANAGEMENT, EXCEPTION SAFE
std::string old_x = x;
// - old_x will not be affected by subsequent modifications to x...
// - you can use `&old_x[0]` to get a writable char* to old_x's textual content
// - you can use resize() to reduce/expand the string
//   - resizing isn't possible from within a function passed only the char* address

std::string old_x = x.c_str(); // old_x will terminate early if x embeds NUL
// Copies ASCIIZ data but could be less efficient as it needs to scan memory to
// find the NUL terminator indicating string length before allocating that amount
// of memory to copy into, or more efficient if it ends up allocating/copying a
// lot less content.
// Example, x == "ab\0cd" -> old_x == "ab".

// USING A VECTOR OF CHAR - AUTO, EXCEPTION SAFE, HINTS AT BINARY CONTENT, GUARANTEED CONTIGUOUS EVEN IN C++03
std::vector<char> old_x(x.data(), x.data() + x.size());       // without the NUL
std::vector<char> old_x(x.c_str(), x.c_str() + x.size() + 1);  // with the NUL

// USING STACK WHERE MAXIMUM SIZE OF x IS KNOWN TO BE COMPILE-TIME CONSTANT "N"
// (a bit dangerous, as "known" things are sometimes wrong and often become wrong)
char y[N + 1];
strcpy(y, x.c_str());

// USING STACK WHERE UNEXPECTEDLY LONG x IS TRUNCATED (e.g. Hello\0->Hel\0)
char y[N + 1];
strncpy(y, x.c_str(), N);  // copy at most N, zero-padding if shorter
y[N] = '\0';               // ensure NUL terminated

// USING THE STACK TO HANDLE x OF UNKNOWN (BUT SANE) LENGTH
char* y = alloca(x.size() + 1);
strcpy(y, x.c_str());

// USING THE STACK TO HANDLE x OF UNKNOWN LENGTH (NON-STANDARD GCC EXTENSION)
char y[x.size() + 1];
strcpy(y, x.c_str());

// USING new/delete HEAP MEMORY, MANUAL DEALLOC, NO INHERENT EXCEPTION SAFETY
char* y = new char[x.size() + 1];
strcpy(y, x.c_str());
//     or as a one-liner: char* y = strcpy(new char[x.size() + 1], x.c_str());
// use y...
delete[] y; // make sure no break, return, throw or branching bypasses this

// USING new/delete HEAP MEMORY, SMART POINTER DEALLOCATION, EXCEPTION SAFE
// see boost shared_array usage in Johannes Schaub's answer

// USING malloc/free HEAP MEMORY, MANUAL DEALLOC, NO INHERENT EXCEPTION SAFETY
char* y = strdup(x.c_str());
// use y...
free(y);

Other reasons to want a char* or const char* generated from a string

So, above you've seen how to get a (const) char*, and how to make a copy of the text independent of the original string, but what can you do with it? A random smattering of examples...

  • give "C" code access to the C++ string's text, as in printf("x is '%s'", x.c_str());
  • copy x's text to a buffer specified by your function's caller (e.g. strncpy(callers_buffer, callers_buffer_size, x.c_str())), or volatile memory used for device I/O (e.g. for (const char* p = x.c_str(); *p; ++p) *p_device = *p;)
  • append x's text to an character array already containing some ASCIIZ text (e.g. strcat(other_buffer, x.c_str())) - be careful not to overrun the buffer (in many situations you may need to use strncat)
  • return a const char* or char* from a function (perhaps for historical reasons - client's using your existing API - or for C compatibility you don't want to return a std::string, but do want to copy your string's data somewhere for the caller)
    • be careful not to return a pointer that may be dereferenced by the caller after a local string variable to which that pointer pointed has left scope
    • some projects with shared objects compiled/linked for different std::string implementations (e.g. STLport and compiler-native) may pass data as ASCIIZ to avoid conflicts

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon