Show Lecture.Strings as a slide show.
CS253 Strings
Inclusion
To use strings, you need to:
#include <string>
Use of string_view objects requires:
#include <string_view>
The rare uses of strcmp() or strlen() require:
#include <cstring>
Operators
Java programmers aren’t used to mutable strings with operators:
string a = "alpha", b("beta"), g{"gamma"}, parens = "()";
b += "<"+g+'>';
auto result = parens[0]+a+b+parens[1];
result[3] = '*';
cout << result << '\n';
if (b < g) cout << "Good!\n";
(al*habeta<gamma>)
Good!
Old vs. New
// C strings
char q[] = "gamma delta";
char *p = "alpha beta";
printf("%zd chars; first char is %c\n", strlen(q), q[0]);
printf("%zd chars; ninth char is %c\n", strlen(p), p[8]);
11 chars; first char is g
10 chars; ninth char is t
// C++ strings
string s = "ceti alpha six";
cout << s.size() << " chars; third char is " << s[2] << '\n'
<< s.length() << " chars; last char is " << s.back() << '\n';
14 chars; third char is t
14 chars; last char is x
Why C strings?
- Unfortunately, you have to deal with old-fashioned C strings
from time to time.
- For example,
"foobar"
is a C string, not a C++ string.
C strings
- Use C-style strings only when you have no other choice.
- Many libraries have only C-style interfaces.
- A C-style string is an array of char, ending with
'\0'
.
- No, not at the end of the array.
It’ll find a
'\0'
or die trying.
- A C array is NOT stretchy—its length is fixed at compile time.
- It has the max size that you gave it when you defined the array.
- If no size is given, it’s determined from the initializer.
char foo[10] = "xyz", bar[] = "pdq";
cout << sizeof(foo) << ' ' << strlen(foo) << '\n';
cout << sizeof(bar) << ' ' << strlen(bar) << '\n';
10 3
4 3
C strings
- The null (not NULL) char,
'\0'
, ends the string.
- OK, maybe “NUL”, but that’s super pedantic, even for me. “null” is an adjective, “NUL” is the name of that particular character, as “TAB” is the name of the tab character.
- There may be extra room in the array—that’s ok.
- Therefore, the length of the string may be less than
than the length of the containing aray.
- C strings have no methods; they’re arrays, not objects.
- Use strlen() (function, not method) from <string.h>
to get the length.
strlen("Bjarne")
is 6, not 7.
- Use subscripting to get to individual characters—it’s an array.
C++ strings
- string, not
String
- Mutable, unlike Java.
- No fixed length. All chars are allowed, even
'\0'
.
- Unlike Java, the string object is NOT dynamically allocated via new.
- Use the string::size() method to obtain the length.
- Use subscripting to get to individual characters.
.at()
exists to access individual chars of a C++ string, but only
Java fans use it.
- Other methods: https://en.cppreference.com/w/cpp/string/basic_string
- Learn them. You will use strings a lot.
How NOT to define a C++ string
Welcome to C++, which is not Java:
string riley = new string; // 🦡
cout << riley;
c.cc:1: error: conversion from ‘std::string*’ {aka
‘std::__cxx11::basic_string<char>*’} to non-scalar type
‘std::string’ {aka ‘std::__cxx11::basic_string<char>’} requested
- A C++ string is not a reference—it’s an object.
- In C++, you often have objects without references.
- It’s just an object on the stack, like an int variable.
How NOT to define a C++ string
Don’t do this, either, though it does work:
string joy = string("sadness"); // 🦡
cout << joy;
sadness
That creates an anonymous temporary string on the right-hand side,
copies/moves it to joy
, then destroys the temporary string.
Sure, it works, but … no!
How to define a C++ string
Do it like this:
string fear = "disgust";
cout << fear;
disgust
or, if you don’t have a value for the string at first:
string anger;
anger = "Bing Bong";
cout << anger;
Bing Bong
Java programmers are trained to treat objects differently
than other types. Shake that off!
Subscripting
Subscripting on a C++ string produces a char:
string course="CS253";
cout << course << '\n';
cout << course[1] << '\n';
CS253
S
which can be modified:
string pet = "cat";
pet[0] = 'r';
cout << pet << '\n';
rat
Note the 'r'
, not "r"
. 'x'
is a char, "y"
is a
C-string, a const char *.
Single quotes for single characters.
Indexing
Declare your index variable properly:
string s = "Great horny toads!\n";
for (int i=0; i<s.size(); i++) // 🦡
if (s[i] == 'o') s[i] = '*';
cout << s;
c.cc:2: warning: comparison of integer expressions of different signedness:
‘int’ and ‘std::__cxx11::basic_string<char>::size_type’ {aka ‘long
unsigned int’}
Great h*rny t*ads!
i
is int, which is signed, whereas string::size() returns a
size_t, which is unsigned. The compiler dislikes comparing signed
to unsigned variables.
Indexing
What’s so bad about comparing signed to unsigned variables?
The rules of the language say that both sides of the comparison get
promoted to unsigned, which produces interesting results:
if (-2 > 3U) // 🦡
cout << "Isn’t that surprising!\n";
c.cc:1: warning: comparison of integer expressions of different signedness:
‘int’ and ‘unsigned int’
Isn’t that surprising!
Indexing
The solution: make i
the right type:
string s = "Great horny toads!\n";
for (size_t i=0; i<s.size(); i++)
if (s[i] == 'o') s[i] = '*';
cout << s;
Great h*rny t*ads!
Why wouldn’t auto work?
auto would just make i
the type of 0
, which is int.
Sure, you could say auto i=0ULL
, but you may as well
just use size_t. Besides, size_t is not necessarily
the same as unsigned long long.
Mutable
Unlike Java, C++ strings are mutable—they can be modified.
string soup = "Tomato dispue is bisgusting.";
cout << soup << '\n';
soup[7] = 'b';
soup[10] = 'q';
soup[17] = 'd';
cout << soup << '\n';
Tomato dispue is bisgusting.
Tomato bisque is disgusting.
String methods
The string class has many methods. These are only some of them.
Learn those methods
- Seriously, learn the string methods.
- You use strings so often that it’s worth the trouble.
- Some methods have several versions:
- nine ctors
- seven versions of
.replace()
- eight versions of
.insert()
Truth
I freely use C string literals, like this:
void emit(string s) {
cout << "*** " << s << '\n';
}
int main() {
emit("Today is a lovely day.");
return 0;
}
*** Today is a lovely day.
- The first C string literal,
"*** "
, gets sent to cout.
- I could have specified a C++ string via
"*** "s
.
- The second literal,
"Today is a lovely day."
, got converted
to a std::string at the point of the function call.
It has some tiny cost that could become significant
inside a loop, but it pales compared to the cost of output.
Some Code
char q[80] = "This is a C string.\n";
cout << q;
char r[] = "foobar";
r[3] = '\0';
cout << "r is now \"" << r << "\"\n";
const char *p = "This is also a C string";
cout << p << ", length is " << strlen(p) << '\n';
This is a C string.
r is now "foo"
This is also a C string, length is 23
string s("useless initial value");
s = "This am a C++ string"; // mixed
s[5] = 'i'; // mutable
s[6] += 6; // char is integer-like
cout << s << ", length is " << s.size() << '\n';
This is a C++ string, length is 20
Conversions
Converting from a C-style string to a C++ string is easy,
because the C++ string object has a constructor that takes
a C-style string:
char chip[] = "chocolate";
string dale(chip);
cout << dale << '\n';
chocolate
Conversions
Converting from a C++ string to a C-style string requires a method:
string wall(30, '#');
const char *p = wall; // 🦡
cout << p << '\n';
c.cc:2: error: cannot convert ‘std::string’ {aka
‘std::__cxx11::basic_string<char>’} to ‘const char*’ in
initialization
string wall(30, '#');
const char *p = wall.c_str();
cout << p << '\n';
##############################
string::c_str() is useful for calling an old-fashioned library function
that wants a C-style string.
string command = "date";
system(command); // 🦡
c.cc:2: error: cannot convert ‘std::string’ {aka
‘std::__cxx11::basic_string<char>’} to ‘const char*’
string command = "date";
system(command.c_str());
Thu Nov 21 09:55:13 MST 2024
String Literals
Literals
- Literals are constants, like
12
, 3.4e-56
, 'x'
, "foo"
, true, or nullptr.
- const variables are not literals—they’re variables that don’t vary.
- Expressions (
2+2
) aren’t literals, either.
- This is a char:
'X'
.
- Single quotes for a single character.
- This is a C-style string literal:
"alpha beta gamma"
.
- This is a std::string:
"foobar"s
, with the trailing s
.
String Literals
A "string literal"
is an anonymous array of constant characters.
These are equivalent:
cout << "FN-2187";
FN-2187
const char whatever[] = "FN-2187";
cout << whatever;
FN-2187
const char whatever[] = "FN-2187";
const char *p = &whatever[0];
cout << p;
FN-2187
- A
"string literal"
is like an anonymous array.
- An array name is the same as the address of its first element.
Escape Sequences:
Sequence | Meaning | Sequence | Meaning |
\a | bell | \' | ' |
\b | backspace | \" | " |
\f | form feed | \\ | \ |
\n | newline | \0 ddd | 0–3 octal digits |
\r | carriage return | \x dd | 1–∞ hex digits |
\t | horizontal tab | \u dddd | Unicode U+dddd |
\v | vertical tab | \U dddddddd | Unicode U+dddddddd |
String Pasting
Two adjacent string literals are merged into one at compile-time:
cout << "alpha beta " "gamma delta "
"epsilon\n";
alpha beta gamma delta epsilon
cout << "Business plan:\n\n"
"1. Collect underpants\n"
"2. ?\n"
"3. Profit\n";
Business plan:
1. Collect underpants
2. ?
3. Profit
Raw Strings
- Sometimes, you want a string to actually contain a backslash.
- If so, you double the backslash.
- This can get tedious.
Raw Strings
A raw string starts with R"(
and ends with )"
.
The parens are not part of the string.
cout << R"(Don’t be "afraid" of letters:
\a\b\c\d\e\f\g)";
Don’t be "afraid" of letters:
\a\b\c\d\e\f\g
Cool! Quotes inside of quotes!
However …
What if the string contains a right paren? I want to emit:
A goatee! \:-)" Cool!
cout << R"(A goatee! \:-))" Cool!"; // 🦡
c.cc:1: warning: missing terminating " character
c.cc:1: error: missing terminating " character
That didn’t work. The )"
at the bottom of the face
was taken to be the end of the raw string.
Solution
A raw string starts with:
R"
whatever-you-like-up-to-sixteen-chars(
and ends with:
)
the-same-up-to-sixteen-chars"
cout << R"X(A goatee! \:-)" Cool!)X";
A goatee! \:-)" Cool!
cout << R"<COVID-19>(What the #"%'&*)?)<COVID-19>";
What the #"%'&*)?
cout << R"(The degenerate case)";
The degenerate case
Comparing C-Style Strings
if ("foo" < "bar") // 🦡
cout << "😢";
😢
- Look—unspecified behavior! Remember that?
- This will not compare the letters in the strings.
It will, instead, compare the addresses.
Which string is at the lower address? Who knows‽
- Are the arrays
"marx"
and "marx"
two arrays or one? Who knows‽
g++ -Wall
will detect this deplorable code.
Comparing C-style strings properly.
- To compare C-style strings, use the function strcmp().
- It has a peculiar return value.
strcmp(a,b)
returns:
- Some value <0 if
a
<b
.
- 0 if
a
==b
.
- Some value >0 if
a
>b
.
- Why are you even considering using C-style strings?
- Well, sometimes, you have to.
- Often, it’s better to assign them to C++ string objects
and compare those.
- To compare C++ std::string values, or to compare a std::string
with a C-style string, use the usual operators:
< > <= >= == !=
- Only Java geeks use string::compare(),
which has the same three-way return value as strcmp().
- This language has operator overloading. Be thankful!
- Really, it’s just this easy.
- Imagine how cluttered this would be with calls to string::compare()!
string name = "Conan O’Brien";
if (name == "Conan O’Brien")
cout << "good 1\n";
if (name < "Zulu")
cout << "good 2\n";
if (name > "Andy Richter")
cout << "good 3\n";
if (name == name)
cout << "good 4\n";
good 1
good 2
good 3
good 4
God help us, another string!
C++17’s string_view is a non-owning read-only view into a C-string
or std::string. It’s generally implemented as a char * and a
length.
const char *a = "alpha";
string b = "beta";
string_view c = a;
cout << c << '\n';
c = b;
cout << b << '\n';
alpha
beta
void hero(string_view sv) {
cout << "Nice work, " << sv << "!"
<< " (len=" << sv.size() << ")\n";
}
int main() {
hero("Batman"); // C-string
hero("Robin"s); // C++ string
}
Nice work, Batman! (len=6)
Nice work, Robin! (len=5)
Methods
- As a read-only non-owner, string_view has most
of the usual string accessors:
- It’s light on the mutators, since it doesn’t own the data.
- and it works with iterators in a for loop.
Timing: converting to const string
reference
bool first(const string &csr) { return csr[0]; }
int main() {
const char s[] = "abcdefghijklmnopqrstuvwxyz";
for (int i=0; i<10'000'000; i++)
first(s);
}
Real time: 145 ms
bool first(const string &csr) { return csr[0]; }
int main() {
string s = "abcdefghijklmnopqrstuvwxyz";
for (int i=0; i<10'000'000; i++)
first(s);
}
Real time: 5.58 ms
- Constructing a C++ string from a C-string or a C++ string
is slow, as it copies of all the characters in the string.
- Could matter for big strings and many function calls.
bool first(string_view sv) { return sv[0]; }
int main() {
const char s[] = "abcdefghijklmnopqrstuvwxyz";
for (int i=0; i<10'000'000; i++)
first(s);
}
Real time: 5.92 ms
bool first(string_view sv) { return sv[0]; }
int main() {
string s = "abcdefghijklmnopqrstuvwxyz";
for (int i=0; i<10'000'000; i++)
first(s);
}
Real time: 5.37 ms
- Constructing a string_view from a C-string or a C++ string is
quick, requiring no copying of data.
- The C++ string already knows its length, so that’s just a copy.
- A C-string is just a const char *, so the string_view ctor
has to count chars.
- Good compilers compute the length of a constant
"
C-string"
at compile time.