Show Lecture.Strings as a slide show.
CS253 Strings
Old vs. New
// C strings
char q[] = "gamma delta";
char *p = "alpha beta";
printf("%zd chars; first char is %c\n", strlen(q), q[0]);
printf("%zd chars; ninth char is %c\n", strlen(p), p[8]);
11 chars; first char is g
10 chars; ninth char is t
// C++ strings
string s = "ceti alpha six";
cout << s.size() << " chars; third char is " << s[2] << '\n'
<< s.length() << " chars; last char is " << s.back() << '\n';
14 chars; third char is t
14 chars; last char is x
Why C strings?
- Unfortunately, you have to deal with old-fashioned C strings
from time to time.
- For example,
"foobar"
is a C string, not a C++ string.
C strings
- Only use C-style strings when you have no other choice.
- Old libraries have only C-style interfaces.
- A C-style string is an array of
char
, ending with '\0'
.
- Because it’s a C array, it is NOT stretchy.
- It has the maximum length that you gave it when you defined the array.
- If no length is given, it’s determined from the initializer.
- Either case: array length is fixed at compile time.
C strings
- The null (not
NULL
) char
, '\0'
, ends the string.
- OK, maybe “NUL”, but that’s super pedantic.
- There may be extra room in the array—that’s ok.
- Therefore, the length of the string may be less than
than the length of the containing aray.
- C strings have no methods; they’re arrays, not objects.
- Use the
strlen()
function, defined in <string.h>
,
to get a C string’s size.
strlen("Jack")
is 4, not 5.
- Use subscripting to get to individual characters.
C++ strings
string
, not String
- Mutable, unlike Java.
- No fixed length. All chars are allowed, even
'\0'
.
- Unlike Java, the string object is NOT dynamically allocated via
new
.
- Use the
string::size()
method to calculate the length.
- Use subscripting to get to individual characters.
- A method exists to access individual chars
of a C++ string, but only Java buffoons use it.
- Other methods:https://en.cppreference.com/w/cpp/string/basic_string
- Learn them. You will use strings constantly.
How NOT to define a C++ string
Welcome to C++, which is different (better) than Java:
string riley = new string;
cout << riley;
c.cc:1: error: conversion from 'std::__cxx11::string*' {aka
'std::__cxx11::basic_string<char>*'} to non-scalar type
'std::__cxx11::string' {aka 'std::__cxx11::basic_string<char>'} requested
- A C++ string is not a reference—it’s an object.
- In C++, you often have objects without references.
- It’s just an object on the stack, like an
int
variable.
How NOT to define a C++ string
Don’t do this, either, though it does work:
string joy = string("sadness");
cout << joy;
sadness
That creates an anonymous temporary string on the right-hand side,
copies/moves it to joy
, then destroys the temporary string.
Sure, it works, but … no!
How to define a C++ string
Do it like this:
string fear = "disgust";
cout << fear;
disgust
or, if you don’t have a value for the string at first:
string anger;
anger = "Bing Bong";
cout << anger;
Bing Bong
Java programmers are trained to treat objects differently
than other types. Shake that off!
Subscripting
Subscripting on a C++ string produces a char
:
string course="CS253";
cout << course << '\n';
cout << course[2] << '\n';
CS253
2
which can be modified:
string pet = "cat";
pet[0] = 'r';
cout << pet << '\n';
rat
Note the 'r'
, not "r"
.
Mutable
Unlike Java, C++ strings are mutable—they can be modified.
string soup = "Tomato dispue is bisgusting.";
cout << soup << '\n';
soup[7] = 'b';
soup[10] = 'q';
soup[17] = 'd';
cout << soup << '\n';
Tomato dispue is bisgusting.
Tomato bisque is disgusting.
Learn those methods
- Seriously, learn the
string
methods.
- You use strings so often that it’s worth the trouble.
- Some methods have several versions:
- nine ctors
- seven versions of
.replace()
- eight versions of
.insert()
Truth
I freely use C string literals, like this:
void emit(string s) {
cout << "*** " << s << '\n';
}
int main() {
emit("Today is a lovely day.");
return 0;
}
*** Today is a lovely day.
- The first C string literal,
"*** "
, gets sent to cout
.
- I could have specified a C++ string via
"*** "s
.
- The second literal,
"Today is a lovely day."
, got converted
to a std::string
at the point of the function call.
I suppose that has some tiny cost that could become significant
inside a loop, but it pales compared to output.
Some Code
char q[80] = "This is a C string.\n";
cout << q;
char r[] = "foobar";
r[3] = '\0';
cout << "r is now \"" << r << "\"\n";
const char *p = "This is also a C string";
cout << p << ", length is " << strlen(p) << '\n';
This is a C string.
r is now "foo"
This is also a C string, length is 23
string s("useless initial value");
s = "This am a C++ string"; // mixed
s[5] = 'i'; // mutable
s[6] += 6; // char is integer-like
cout << s << ", length is " << s.size() << '\n';
This is a C++ string, length is 20
Conversions
Converting from a C-style string to a C++ string is easy,
because the C++ string object has a constructor that takes
a C-style string:
char chip[] = "chocolate";
string dale(chip);
cout << dale << '\n';
chocolate
Conversions
Converting from a C++ string to a C-style string requires a method:
string wall(30, '#');
const char *p = wall;
cout << p << '\n';
c.cc:2: error: cannot convert 'std::__cxx11::string' {aka
'std::__cxx11::basic_string<char>'} to 'const char*' in initialization
string wall(30, '#');
const char *p = wall.c_str();
cout << p << '\n';
##############################
string::c_str()
is useful for calling an old-fashioned library function
that wants a C-style string.
string command = "date";
system(command);
c.cc:2: error: cannot convert 'std::__cxx11::string' {aka
'std::__cxx11::basic_string<char>'} to 'const char*'
string command = "date";
system(command.c_str());
Fri Nov 22 02:51:29 MST 2024
String Literals
Literals
- Literals are constants, like
42
, 1.2e-24
, 'x'
, "foo"
, true
, or nullptr
.
const
variables are not literals. They’re variables that don’t vary.
- This is a
char
: 'X'
.
- Single quotes for a single character.
- This is a C-style string literal:
"alpha beta gamma"
.
- Its type is
const char []
, or const char *
.
- This is a
std::string
: "foobar"s
, with the trailing s
.
String Literals
A "string literal"
is an anonymous array of constant characters.
These are equivalent:
cout << "FN-2187";
FN-2187
const char whatever[] = "FN-2187";
cout << whatever;
FN-2187
const char whatever[] = "FN-2187";
const char *p = &whatever[0];
cout << p;
FN-2187
- A
"string literal"
is like an anonymous array.
- An array name is the same as the address of its first element.
Escape Sequences:
Sequence | Meaning | Sequence | Meaning |
\a | bell | \' | ' |
\b | backspace | \" | " |
\f | form feed | \\ | \ |
\n | newline | \0 ddd | 0–3 octal digits |
\r | carriage return | \x dd | 1–∞ hex digits |
\t | horizontal tab | \u dddd | Unicode U+dddd |
\v | vertical tab | \U dddddddd | Unicode U+dddddddd |
String Pasting
Two adjacent string literals are merged into one at compile-time:
cout << "alpha beta " "gamma delta "
"epsilon\n";
alpha beta gamma delta epsilon
cout << "Business plan:\n\n"
"1. Collect underpants\n"
"2. ?\n"
"3. Profit\n";
Business plan:
1. Collect underpants
2. ?
3. Profit
Raw Strings
- Sometimes, you want a string to actually contain a backslash.
- If so, you double the backslash.
- This can get tedious.
Raw Strings
A raw string starts with R"(
and ends with )"
.
The parens are not part of the string.
cout << R"(Don't be "afraid" of letters:
\a\b\c\d\e\f\g)";
Don't be "afraid" of letters:
\a\b\c\d\e\f\g
Cool! Quotes inside of quotes!
However …
What if the string contains a right paren? I want to emit:
A goatee! :-)" Cool!
cout << "A goatee! :-)" Cool!";
c.cc:1: warning: missing terminating " character
c.cc:1: error: missing terminating " character
That didn’t work. The )"
at the bottom of the face
was taken to be the end of the raw string.
Solution
A raw string starts with:
R"
whatever-you-like-up-to-sixteen-chars(
and ends with:
)
the-same-up-to-sixteen-chars"
cout << R"X(A goatee! :-)" Cool!)X";
A goatee! :-)" Cool!
cout << R"WashYourHair(What the #"%'&*)?)WashYourHair";
What the #"%'&*)?
cout << R"(The degenerate case)";
The degenerate case
Comparing C-Style Strings
if ("foo" < "bar")
cout << "😢";
c.cc:1: warning: comparison with string literal results in unspecified behavior
😢
- Look—“unspecified behavior”! Remember that?
- This will not compare the letters in the strings.
It will, instead, compare the addresses.
Which is at the lower address? Who knows‽
- Are the arrays
"marx"
and "marx"
two arrays or one? Who knows‽
g++ -Wall
will detect this deplorable code.
Comparing C-style strings properly.
- To compare C-style strings, use the function
strcmp()
.
- It has a peculiar return value.
strcmp(a,b)
returns:
- Some value <0 if
a
<b
.
- 0 if
a
==b
.
- Some value >0 if
a
>b
.
- Why are you even considering using C-style strings?
- Well, sometimes, you have to.
- To compare C++
std::string
values, or to compare a std::string
with a C-style string, use the usual operators:
< > <= >= == !=
- Only Java geeks use
string::compare()
,
which has the same three-way return value as strcmp()
.
- This language has operator overloading. Be thankful!
Example
string name = "Conan O’Brien";
if (name == "Conan O’Brien")
cout << "good 1\n";
if (name < "Zulu")
cout << "good 2\n";
if (name > "Andy Richter")
cout << "good 3\n";
if (name == name)
cout << "good 4\n";
good 1
good 2
good 3
good 4