Show Lecture.NotFullySpecified as a slide show.
CS253 Not Fully Specified
What the language definition does not say.
Unlike many languages, the C++ standard leaves some choices up to the
compiler. It defines several varieties of not-fully specified things:
- Implementation-defined (§1.9.2)
-
A choice made by the compiler, must be documented
- Unspecified behavior (§1.9.3)
-
A choice made by the compiler, need not be documented
- Undefined behavior (§1.9.4)
-
All bets are off!
Implementation-defined behavior
A choice made by the compiler, which must be documented.
The number of bytes or bits allocated to various types:
cout << sizeof(int) << '\n';
4
Floating-point precision:
float f = 123456789;
f += 3;
f -= 123456789;
cout << f << '\n';
0
Such choices are often heavily influenced by the hardware.
Implementation-defined behavior examples
What happens when a value gets too big for its signed variable:
short s = 32767;
cout << ++s;
-32768
Character set (ASCII, EBCDIC, HP Roman-8, Windows-1252, Big5,
Shift JIS, various flavors of Unicode, etc.):
switch ('$') {
case 0x24: cout << "ASCII or UTF-8"; break;
case 0x5b: cout << "EBCDIC"; break;
default: cout << "WTF!?"; break;
}
ASCII or UTF-8
Implementation-defined behavior examples
When >>
is used to shift a signed value, what comes in
to replace the leftmost (sign) bit? It might be a copy of
the sign bit, or it might be just a zero.
cout << (-1 >> 4) << '\n';
-1
system() invokes the command interpreter. Its result really
depends on the host operating system.
system("date");
system("hostname");
Fri Nov 22 02:13:10 MST 2024
beethoven
Unspecified behavior
A choice made by the compiler, need not be documented or consistent,
generally a “this-or-that” sort of choice.
// Order of evaluation of an expression (mostly):
int foo() { cout << "foo"; return 0; }
int bar() { cout << "bar"; return 0; }
int main() {
return foo()*bar(); // 🦡
}
foobar
Unspecified behavior examples
// Comparing addresses of different objects:
int a,b;
cout << boolalpha << (&a < &b); // 🦡
false
// Order of evaluation of function arguments:
int foo() { cout << "foo"; return 0; }
int bar() { cout << "bar"; return 0; }
void ignore_arguments(int, int) { }
int main() {
ignore_arguments(foo(), bar()); // 🦡
}
barfoo
Unspecified behavior examples
I suspect that byte order (little-endian, big-endian) is unspecified,
since a program can’t detect byte order without an unspecified
operation:
int word = 0x12345678;
short *sp = reinterpret_cast<short *>(&word);
cout << hex << *sp << '\n';
5678
Undefined behavior
With undefined behavior, all bets are off!
Anything can happen. Consistency is not required.
Warnings are not required.
long a=11, b[] = {22,33}, c=44;
cout << a << '\n'
<< b[2] << '\n' // 🦡
<< c << '\n';
11
44
44
That makes sense. Since b
has only two elements, b[2]
(the third element) accesses an adjacent memory location.
Undefined behavior
Similar code can produce quite different results.
long d[2];
cout << d[1] << endl; // 🦡
cout << d[100] << endl; // 🦡
cout << d[1000] << endl; // 🦡
cout << d[1000000] << endl; // 🦡
c.cc:2: warning: ‘d’ is used uninitialized
c.cc:1: note: ‘d’ declared here
0
140736928672973
SIGSEGV: Segmentation fault
- Possibilities include:
- Detecting the problem at compile-time.
- Checking array bounds at runtime and displaying a nice error message.
- Doing the address arithmetic and just fetching whatever’s at that address.
- Or anything else, because the behavior is ⚡❢🌟💀 undefined.
- C++ compilers usually don’t do runtime bounds checking, so the result
depends on how near the array is to the edge of memory allocated
to this program. Exceeding your memory boundaries gives a segmentation
fault on typical Linux systems.
Undefined behavior examples
cout << "Hello, world!\n";
int *p = nullptr;
cout << *p << '\n'; // 🦡 Kaboom!
SIGSEGV: Segmentation fault
Why don’t we see the “Hello, world!”?
Buffering! Output does not go out immediately—that’s inefficient.
Instead, the output accumulates, piles up in a buffer, until endl,
flush, or program end. Program dies; output is lost. ☹
Interactive output is line-buffered, but these slides send the output to
a file, so it’s fully buffered.
Undefined behavior examples
// Shifting too far:
int amount=35;
cout << (1<<amount); // 🦡
8
The standard states that you can’t shift more than
the word size, and can’t shift a negative amount.
Why not?
Since shifting is such a common operation, most CPUs have a shift
instruction. For 32‑bit values, the shift amount is typically held in a
five‑bit field in the instruction (25 = 32). Alas, 35 cannot
be represented in five bits, and we’re not going to slow down
my correct program to check for errors in your faulty code.
Undefined behavior examples
// Multiple writes to the same location
// in a single expression:
int a = 0;
cout << ++a + ++a << '\n'; // 🦡
c.cc:4: warning: operation on ‘a’ may be undefined
c.cc:4: warning: operation on ‘a’ may be undefined
4
int b;
cout << b << '\n'; // 🦡
c.cc:2: warning: ‘b’ is used uninitialized
0
g++ notices some undefined behavior, not all.
This is a QOI (Quality Of Implementation) aspect,
but not a standards-conformance issue.
But, why??
C++ is quite concerned about efficiency.
- The size of an int is implementation-defined so that the compiler
can use the natural size provided by the architecture.
- The size of an double is implementation-defined
so that the compiler can use the available hardware floating-point format.
- Some things are unspecified to give the compiler maximum freedom
to generate fast code. Perhaps the compiler evaluates function
arguments right-to-left because that’s the easiest order to push them
onto the stack.
C++’s attitude is “You break the rules, you pay the price.”
It doesn’t hold your hand.
Things Be Changin’
This is undefined behavior in C++14:
// C++ 2014
int i=5;
i = i++; // 🦡
cout << i;
c.cc:3: warning: operation on ‘i’ may be undefined
c.cc:3: warning: operation on ‘i’ may be undefined
5
C++17, regarding assignment, says “The right operand is sequenced before
the left operand”, so ++
finishes before =
, and the output of
this awful code is guaranteed to be 6:
// C++ 2017
int i=5;
i = i++;
cout << i;
c.cc:3: warning: operation on ‘i’ may be undefined
c.cc:3: warning: operation on ‘i’ may be undefined
5
Looks like the compiler (on the web server) hasn’t caught up to the standard.
Not just theoretical
Information from table 4–6 (page 4–11) of the
Unisys C Compiler Programming Reference Manual:
Type | Bits | sizeof | Signed Range | Unsigned Max |
char | 9 | 1 | −255 to 255 | 511 |
short | 18 | 2 | −217+1 to 217−1 | 218−1 |
int | 36 | 4 | −235+1 to 235−1 | 236−2 |
long | 36 | 4 | −235+1 to 235−1 | 236−2 |
long long | 72 | 8 | −271+1 to 271−1 |