Show Lecture.NotFullySpecified as a slide show.
CS253 Not Fully Specified
What the language definition does not say.
Unlike many languages, the C++ standard leaves some choices up to the
compiler. It defines several varieties of not-fully specified things:
- Implementation-defined (§1.9.2)
-
A choice made by the compiler, must be documented
- Unspecified behavior (§1.9.3)
-
A choice made by the compiler, need not be documented
- Undefined behavior (§1.9.4)
-
All bets are off!
Implementation-defined behavior
A choice made by the compiler, which must be documented.
The number of bytes or bits allocated to various types:
cout << sizeof(int) << '\n';
4
Floating-point precision:
float f = 123456789;
f += 3;
f -= 123456789;
cout << f << '\n';
0
Such choices are often heavily influenced by the hardware.
Implementation-defined behavior examples
What happens when a value gets too big for its signed variable:
short s = 32767; // 🦡
cout << ++s;
-32768
Character set (ASCII, EBCDIC, HP Roman-8, Windows-1252, Big5,
Shift JIS, various flavors of Unicode, etc.):
switch ('$') {
case 0x24: cout << "ASCII or UTF-8"; break;
case 0x5b: cout << "EBCDIC"; break;
default: cout << "WTF!?"; break;
}
ASCII or UTF-8
Implementation-defined behavior examples
When >>
is used to shift a signed value, what comes in
to replace the leftmost (sign) bit? It might be a copy of
the sign bit, or it might be just a zero.
cout << (-1 >> 4) << '\n'; // 🦡
-1
system() invokes the command interpreter. Its result really
depends on the host operating system.
system("date");
system("hostname");
Thu Nov 21 09:33:04 MST 2024
beethoven
Unspecified behavior
A choice made by the compiler, need not be documented or consistent,
generally a “this-or-that” sort of choice.
Do not assume that expressions are evaluated left-to-right:
int foo() { cout << "foo"; return 0; }
int bar() { cout << "bar"; return 0; }
int main() {
return foo()*bar(); // 🦡
}
foobar
Many students assume that expressions must be evaluated left-to-right.
This is not true in C++.
Unspecified behavior examples
Do not assume that arguments are evaluated left-to-right:
int foo() { cout << "foo"; return 0; }
int bar() { cout << "bar"; return 0; }
void ignore_arguments(int, int) { }
int main() {
ignore_arguments(foo(), bar()); // 🦡
}
barfoo
Unspecified behavior examples
Declaration order does not determine memory order:
int a,b;
if (&a < &b) // 🦡
cout << "a has a lower address\n";
else
cout << "b has a lower address\n";
b has a lower address
Unspecified behavior examples
I suspect that byte order (little-endian, big-endian) is unspecified,
since a program can’t detect byte order without an unspecified
operation:
int word = 0x12345678;
short *sp = reinterpret_cast<short *>(&word);
cout << hex << *sp << '\n';
5678
Undefined behavior
- With undefined behavior, all bets are off!
- Anything can happen. Consistency is not required.
- Warnings are not required. If you’re lucky, a clever compiler
might detect the problem. Or, it may not.
Undefined behavior
- The U.S. Constitution requires of a new president:
- Before he enter on the
Execution of his Office, he shall take the following Oath or
Affirmation:– I do solemnly swear (or affirm) that I will faithfully
execute the Office of President of the United States, and will to the
best of my Ability, preserve, protect and defend the Constitution of
the United States.
- No rules about hands during the oath. Hand position is undefined.
- The president may do whatever with their hands: touch a holy book,
wave the constitution, devil horns, vulcan salute, etc.
- A president doesn’t have to do the same thing if re-elected.
- Future presidents don’t have to do the same as previous presidents.
Undefined variables
If you want a value in a variable, put one there.
// Compiled without warnings or optimization.
int a,b,c,d,e,f,g,h,i,j;
cout << a << ' ' << b << ' ' << c << ' ' << d << ' ' << e << ' '
<< f << ' ' << g << ' ' << h << ' ' << i << ' ' << j << '\n';
0 0 32766 1857816928 0 4195904 0 4196528 0 0
- Variables in a function are not initialized to zero,
unless you write
=0
.
- The compiler might complain, or it might not, depending on the
circumstances.
- Despite this, a quarter of the students, based on quiz results,
will continue to believe that
variables are magically initialized to zero.
- Your program memory is cleared, once, by the operating system
to hide data from previous programs.
Out-of-bounds array access
long a=11, b[] = {22,33}, c=44;
cout << a << '\n'
<< b[2] << '\n' // 🦡
<< c << '\n';
11
44
44
- Note that
b
only has two elements, but we accessed its third
element with no complaints. Don’t do that!
- Since
b
has only two elements, b[2]
(the third element) accesses an adjacent memory location.
Undefined behavior
Similar code can produce quite different results.
long d[2];
cout << d[1] << endl; // 🦡
cout << d[100] << endl; // 🦡
cout << d[1000] << endl; // 🦡
cout << d[1000000] << endl; // 🦡
c.cc:2: warning: ‘d’ is used uninitialized
c.cc:1: note: ‘d’ declared here
0
140736745610446
SIGSEGV: Segmentation fault
- Possibilities include:
- Detecting the problem at compile-time.
- Checking array bounds at runtime and displaying a nice error message.
- Doing the address arithmetic and just fetching whatever’s at that address.
- Or anything else, because the behavior is ⚡❢🌟💀 undefined.
- C++ compilers usually don’t do runtime bounds checking, so the result
depends on how near the array is to the edge of memory allocated
to this program. Exceeding your memory boundaries gives a segmentation
fault on typical Linux systems.
Undefined behavior examples
cout << "Hello, world!\n";
int *p = nullptr;
cout << *p << '\n'; // 🦡 Kaboom!
SIGSEGV: Segmentation fault
Why don’t we see the “Hello, world!”?
Buffering! Output does not go out immediately—that’s inefficient.
Instead, the output accumulates, piles up in a buffer, until endl,
flush, or program end. Program dies; output is lost. ☹
Interactive output is line-buffered, but these slides send the output to
a file, so it’s fully buffered.
Undefined behavior examples
// Shifting too far:
int amount=35;
cout << (1<<amount); // 🦡
8
The standard states that you can’t shift more than
the word size, and can’t shift a negative amount.
Why not?
Since shifting is such a common operation, most CPUs have a shift
instruction. For 32‑bit values, the shift amount is typically held in a
five‑bit field in the instruction (25 = 32). Alas, 35 cannot
be represented in five bits, and we’re not going to slow down
my correct program to check for errors in your faulty code.
Undefined behavior examples
// Multiple writes to the same location
// in a single expression:
int a = 0;
cout << ++a + ++a << '\n'; // 🦡
c.cc:4: warning: operation on ‘a’ may be undefined
c.cc:4: warning: operation on ‘a’ may be undefined
4
int b;
cout << b << '\n'; // 🦡
c.cc:2: warning: ‘b’ is used uninitialized
0
g++ notices some undefined behavior, not all.
This is a QOI (Quality Of Implementation) aspect,
but not a standards-conformance issue.
But, why??
C++ is quite concerned about efficiency.
- The size of an int is implementation-defined so that the compiler
can use the natural size provided by the architecture.
- The size of an double is implementation-defined
so that the compiler can use the available hardware floating-point format.
- Some things are unspecified to give the compiler maximum freedom
to generate fast code. Perhaps the compiler evaluates function
arguments right-to-left because that’s the easiest order to push them
onto the stack.
C++’s attitude is “You break the rules, you pay the price.”
It doesn’t hold your hand.
Things Be Changin’
This is undefined behavior in C++14:
// C++ 2014
int i=5;
i = i++; // 🦡
cout << i;
c.cc:3: warning: operation on ‘i’ may be undefined
c.cc:3: warning: operation on ‘i’ may be undefined
5
C++17, regarding assignment, says “The right operand is sequenced before
the left operand”, so ++
finishes before =
, and the output of
this awful code is guaranteed to be 6:
// C++ 2017
int i=5;
i = i++;
cout << i;
c.cc:3: warning: operation on ‘i’ may be undefined
c.cc:3: warning: operation on ‘i’ may be undefined
5
Looks like the compiler (on the web server) hasn’t caught up to the standard.
Not just theoretical
Information from table 4–6 (page 4–11) of the
Unisys C Compiler Programming Reference Manual:
Type | Bits | sizeof | Signed Range | Unsigned Max |
char | 9 | 1 | −255 to 255 | 511 |
short | 18 | 2 | −217+1 to 217−1 | 218−1 |
int | 36 | 4 | −235+1 to 235−1 | 236−2 |
long | 36 | 4 | −235+1 to 235−1 | 236−2 |
long long | 72 | 8 | −271+1 to 271−1 |