Show Lecture.Strings as a slide show.
CS253 Strings
Operators
Java programmers aren’t used to mutable strings with operators:
string a = "alpha", b = "beta", g("gamma"), parens = "()";
b += "<"+g+'>';
auto result = parens[0]+a+b+parens[1];
result[3] = '*';
cout << result << '\n';
if (b < g) cout << "Good!\n";
(al*habeta<gamma>)
Good!
Old vs. New
// C strings
char q[] = "gamma delta";
char *p = "alpha beta";
printf("%zd chars; first char is %c\n", strlen(q), q[0]);
printf("%zd chars; ninth char is %c\n", strlen(p), p[8]);
11 chars; first char is g
10 chars; ninth char is t
// C++ strings
string s = "ceti alpha six";
cout << s.size() << " chars; third char is " << s[2] << '\n'
<< s.length() << " chars; last char is " << s.back() << '\n';
14 chars; third char is t
14 chars; last char is x
Why C strings?
- Unfortunately, you have to deal with old-fashioned C strings
from time to time.
- For example,
"foobar"
is a C string, not a C++ string.
C strings
- Use C-style strings only when you have no other choice.
- Many libraries have only C-style interfaces.
- A C-style string is an array of char, ending with
'\0'
.
- No, not at the end of the array.
It’ll find a
'\0'
or die trying.
- A C array is NOT stretchy—its length is fixed at compile time.
- It has the max size that you gave it when you defined the array.
- If no size is given, it’s determined from the initializer.
char foo[10] = "xyz", bar[] = "pdq";
cout << sizeof(foo) << ' ' << strlen(foo) << '\n';
cout << sizeof(bar) << ' ' << strlen(bar) << '\n';
10 3
4 3
C strings
- The null (not NULL) char,
'\0'
, ends the string.
- OK, maybe “NUL”, but that’s super pedantic, even for me. “null” is an adjective, “NUL” is the name of that particular character, as “TAB” is the name of the tab character.
- There may be extra room in the array—that’s ok.
- Therefore, the length of the string may be less than
than the length of the containing aray.
- C strings have no methods; they’re arrays, not objects.
- Use strlen() (function, not method) from <string.h>
to get the length.
strlen("Bjarne")
is 6, not 7.
- Use subscripting to get to individual characters—it’s an array.
C++ strings
- string, not
String
- Mutable, unlike Java.
- No fixed length. All chars are allowed, even
'\0'
.
- Unlike Java, the string object is NOT dynamically allocated via new.
- Use the string::size() method to obtain the length.
- Use subscripting to get to individual characters.
.at()
exists to access individual chars of a C++ string, but only
Java fans use it.
- Other methods: https://en.cppreference.com/w/cpp/string/basic_string
- Learn them. You will use strings a lot.
How NOT to define a C++ string
Welcome to C++, which is not Java:
string riley = new string;
cout << riley;
c.cc:1: error: conversion from 'std::__cxx11::string*' {aka
'std::__cxx11::basic_string<char>*'} to non-scalar type
'std::__cxx11::string' {aka 'std::__cxx11::basic_string<char>'} requested
- A C++ string is not a reference—it’s an object.
- In C++, you often have objects without references.
- It’s just an object on the stack, like an int variable.
How NOT to define a C++ string
Don’t do this, either, though it does work:
string joy = string("sadness");
cout << joy;
sadness
That creates an anonymous temporary string on the right-hand side,
copies/moves it to joy
, then destroys the temporary string.
Sure, it works, but … no!
How to define a C++ string
Do it like this:
string fear = "disgust";
cout << fear;
disgust
or, if you don’t have a value for the string at first:
string anger;
anger = "Bing Bong";
cout << anger;
Bing Bong
Java programmers are trained to treat objects differently
than other types. Shake that off!
Subscripting
Subscripting on a C++ string produces a char:
string course="CS253";
cout << course << '\n';
cout << course[2] << '\n';
CS253
2
which can be modified:
string pet = "cat";
pet[0] = 'r';
cout << pet << '\n';
rat
Note the 'r'
, not "r"
. 'x'
is a char, "y"
is a
C-string, a const char *.
Single quotes for single characters.
Mutable
Unlike Java, C++ strings are mutable—they can be modified.
string soup = "Tomato dispue is bisgusting.";
cout << soup << '\n';
soup[7] = 'b';
soup[10] = 'q';
soup[17] = 'd';
cout << soup << '\n';
Tomato dispue is bisgusting.
Tomato bisque is disgusting.
String methods
The string class has many methods. These are only some of them.
Learn those methods
- Seriously, learn the string methods.
- You use strings so often that it’s worth the trouble.
- Some methods have several versions:
- nine ctors
- seven versions of
.replace()
- eight versions of
.insert()
Truth
I freely use C string literals, like this:
void emit(string s) {
cout << "*** " << s << '\n';
}
int main() {
emit("Today is a lovely day.");
return 0;
}
*** Today is a lovely day.
- The first C string literal,
"*** "
, gets sent to cout.
- I could have specified a C++ string via
"*** "s
.
- The second literal,
"Today is a lovely day."
, got converted
to a std::string at the point of the function call.
It has some tiny cost that could become significant
inside a loop, but it pales compared to the cost of output.
Some Code
char q[80] = "This is a C string.\n";
cout << q;
char r[] = "foobar";
r[3] = '\0';
cout << "r is now \"" << r << "\"\n";
const char *p = "This is also a C string";
cout << p << ", length is " << strlen(p) << '\n';
This is a C string.
r is now "foo"
This is also a C string, length is 23
string s("useless initial value");
s = "This am a C++ string"; // mixed
s[5] = 'i'; // mutable
s[6] += 6; // char is integer-like
cout << s << ", length is " << s.size() << '\n';
This is a C++ string, length is 20
Conversions
Converting from a C-style string to a C++ string is easy,
because the C++ string object has a constructor that takes
a C-style string:
char chip[] = "chocolate";
string dale(chip);
cout << dale << '\n';
chocolate
Conversions
Converting from a C++ string to a C-style string requires a method:
string wall(30, '#');
const char *p = wall;
cout << p << '\n';
c.cc:2: error: cannot convert 'std::__cxx11::string' {aka
'std::__cxx11::basic_string<char>'} to 'const char*' in initialization
string wall(30, '#');
const char *p = wall.c_str();
cout << p << '\n';
##############################
string::c_str() is useful for calling an old-fashioned library function
that wants a C-style string.
string command = "date";
system(command);
c.cc:2: error: cannot convert 'std::__cxx11::string' {aka
'std::__cxx11::basic_string<char>'} to 'const char*'
string command = "date";
system(command.c_str());
Fri Nov 22 01:21:51 MST 2024
String Literals
Literals
- Literals are constants, like
42
, 1.2e-24
, 'x'
, "foo"
, true, or nullptr.
- const variables are not literals. They’re variables that don’t vary.
- This is a char:
'X'
.
- Single quotes for a single character.
- This is a C-style string literal:
"alpha beta gamma"
.
- This is a std::string:
"foobar"s
, with the trailing s
.
String Literals
A "string literal"
is an anonymous array of constant characters.
These are equivalent:
cout << "FN-2187";
FN-2187
const char whatever[] = "FN-2187";
cout << whatever;
FN-2187
const char whatever[] = "FN-2187";
const char *p = &whatever[0];
cout << p;
FN-2187
- A
"string literal"
is like an anonymous array.
- An array name is the same as the address of its first element.
Escape Sequences:
Sequence | Meaning | Sequence | Meaning |
\a | bell | \' | ' |
\b | backspace | \" | " |
\f | form feed | \\ | \ |
\n | newline | \0 ddd | 0–3 octal digits |
\r | carriage return | \x dd | 1–∞ hex digits |
\t | horizontal tab | \u dddd | Unicode U+dddd |
\v | vertical tab | \U dddddddd | Unicode U+dddddddd |
String Pasting
Two adjacent string literals are merged into one at compile-time:
cout << "alpha beta " "gamma delta "
"epsilon\n";
alpha beta gamma delta epsilon
cout << "Business plan:\n\n"
"1. Collect underpants\n"
"2. ?\n"
"3. Profit\n";
Business plan:
1. Collect underpants
2. ?
3. Profit
Raw Strings
- Sometimes, you want a string to actually contain a backslash.
- If so, you double the backslash.
- This can get tedious.
Raw Strings
A raw string starts with R"(
and ends with )"
.
The parens are not part of the string.
cout << R"(Don’t be "afraid" of letters:
\a\b\c\d\e\f\g)";
Don’t be "afraid" of letters:
\a\b\c\d\e\f\g
Cool! Quotes inside of quotes!
However …
What if the string contains a right paren? I want to emit:
A goatee! \:-)" Cool!
cout << R"(A goatee! \:-))" Cool!";
c.cc:1: warning: missing terminating " character
c.cc:1: error: missing terminating " character
That didn’t work. The )"
at the bottom of the face
was taken to be the end of the raw string.
Solution
A raw string starts with:
R"
whatever-you-like-up-to-sixteen-chars(
and ends with:
)
the-same-up-to-sixteen-chars"
cout << R"X(A goatee! \:-)" Cool!)X";
A goatee! \:-)" Cool!
cout << R"<COVID-19>(What the #"%'&*)?)<COVID-19>";
What the #"%'&*)?
cout << R"(The degenerate case)";
The degenerate case
Comparing C-Style Strings
if ("foo" < "bar")
cout << "😢";
c.cc:1: warning: comparison with string literal results in unspecified behavior
😢
- Look—“unspecified behavior”! Remember that?
- This will not compare the letters in the strings.
It will, instead, compare the addresses.
Which is at the lower address? Who knows‽
- Are the arrays
"marx"
and "marx"
two arrays or one? Who knows‽
g++ -Wall
will detect this deplorable code.
Comparing C-style strings properly.
- To compare C-style strings, use the function strcmp().
- It has a peculiar return value.
strcmp(a,b)
returns:
- Some value <0 if
a
<b
.
- 0 if
a
==b
.
- Some value >0 if
a
>b
.
- Why are you even considering using C-style strings?
- Well, sometimes, you have to.
- To compare C++ std::string values, or to compare a std::string
with a C-style string, use the usual operators:
< > <= >= == !=
- Only Java geeks use string::compare(),
which has the same three-way return value as strcmp().
- This language has operator overloading. Be thankful!
Example
string name = "Conan O’Brien";
if (name == "Conan O’Brien")
cout << "good 1\n";
if (name < "Zulu")
cout << "good 2\n";
if (name > "Andy Richter")
cout << "good 3\n";
if (name == name)
cout << "good 4\n";
good 1
good 2
good 3
good 4
God help us, another string!
C++17’s string_view is a non-owning read-only view into a C-string
or std::string. It’s generally implemented as a char * and a
length.
const char *a = "alpha";
string b = "beta";
string_view c = a;
cout << c << '\n';
c = b;
cout << b << '\n';
alpha
beta
void hero(string_view sv) {
cout << "Nice work, " << sv << "!"
<< " (len=" << sv.size() << ")\n";
}
int main() {
hero("Batman"); // C-string
hero("Robin"s); // C++ string
}
Nice work, Batman! (len=6)
Nice work, Robin! (len=5)
Methods
- As a read-only non-owner, string_view has most
of the usual string accessors:
- It’s light on the mutators, since it doesn’t own the data.
- and it works with iterators in a for loop.
Timing: converting to const string reference
bool first(const string &csr) { return csr[0]; }
int main() {
const char s[] = "abcdefghijklmnopqrstuvwxyz";
for (int i=0; i<10'000'000; i++)
first(s);
}
Real time: 127 ms
bool first(const string &csr) { return csr[0]; }
int main() {
string s = "abcdefghijklmnopqrstuvwxyz";
for (int i=0; i<10'000'000; i++)
first(s);
}
Real time: 5.2 ms
- Constructing a C++ string from a C-string or a C++ string
is slow, as it copies of all the characters in the string.
- Could matter for big strings and many function calls.
bool first(string_view sv) { return sv[0]; }
int main() {
const char s[] = "abcdefghijklmnopqrstuvwxyz";
for (int i=0; i<10'000'000; i++)
first(s);
}
Real time: 5.48 ms
bool first(string_view sv) { return sv[0]; }
int main() {
string s = "abcdefghijklmnopqrstuvwxyz";
for (int i=0; i<10'000'000; i++)
first(s);
}
Real time: 5.13 ms
- Constructing a string_view from a C-string or a C++ string is
quick, requiring no copying of data.
- The C++ string already knows its length, so that’s just a copy.
- A C-string is just a const char *, so the string_view ctor
has to count chars.
- Good compilers compute the length of a constant
"
C-string"
at compile time.