Show Lecture.NotFullySpecified as a slide show.
CS253 Not Fully Specified
What the language definition does not say.
Unlike many languages, the C++ standard leaves some choices up to the
compiler. It defines several varieties of not-fully specified things:
- Implementation-defined (§1.9.2)
-
A choice made by the compiler, must be documented
- Unspecified behavior (§1.9.3)
-
A choice made by the compiler, need not be documented
- Undefined behavior (§1.9.4)
-
All bets are off!
Implementation-defined behavior
A choice made by the compiler, which must be documented.
The number of bytes or bits allocated to various types:
cout << sizeof(int) << '\n';
4
Floating-point precision:
float f = 123456789;
f += 3;
f -= 123456789;
cout << f << '\n';
0
Such choices are often heavily influenced by the hardware.
Implementation-defined behavior examples
What happens when a value gets too big for its variable:
short s = 32767;
cout << ++s;
-32768
Character set (ASCII, EBCDIC, HP Roman-8, Windows-1252, Big5,
Shift JIS, various flavors of Unicode, etc.):
switch ('$') {
case 0x24: cout << "ASCII or UTF-8"; break;
case 0x5b: cout << "EBCDIC"; break;
default: cout << "WTF!?"; break;
}
ASCII or UTF-8
Implementation-defined behavior examples
When >>
is used to shift a signed value, what comes in
to replace the leftmost (sign) bit? It might be a copy of
the sign bit, or it might be just a zero.
cout << (-1 >> 4) << '\n';
-1
system() invokes the command interpreter. Its result really
depends on the host operating system.
system("date");
Thu Nov 21 07:02:44 MST 2024
Unspecified behavior
A choice made by the compiler, need not be documented or consistent,
generally a “this-or-that” sort of choice.
// Order of evaluation of an expression (mostly):
int foo() { cout << "foo"; return 0; }
int bar() { cout << "bar"; return 0; }
int main() {
return foo()*bar();
}
foobar
Unspecified behavior examples
// Comparing addresses of different objects:
int a,b;
cout << boolalpha << (&a < &b);
false
// Order of evaluation of function arguments:
int foo() { cout << "foo"; return 0; }
int bar() { cout << "bar"; return 0; }
void ignore_arguments(int, int) { }
int main() {
ignore_arguments(foo(), bar());
}
barfoo
Unspecified behavior examples
I suspect that byte order (little-endian, big-endian) is unspecified,
since a program can’t detect byte order without an unspecified
operation:
int word = 0x12345678;
short *usp = reinterpret_cast<short *>(&word);
cout << hex << *usp << '\n';
5678
Undefined behavior
With undefined behavior, all bets are off!
Anything can happen. Consistency is not required.
Warnings are not required.
long a=11, b[] = {22,33}, c=44;
cout << a << '\n'
<< b[2] << '\n'
<< c << '\n';
11
44
44
That makes sense. Since b
has only two elements, b[2]
(the third element) accesses an adjacent memory location.
Undefined behavior
Similar code can produce quite different results.
long d[2];
cout << d[1] << '\n';
c.cc:2: warning: 'd[1]' is used uninitialized in this function
0
long e[2];
cout << e[100] << '\n';
140735140355279
long f[2];
cout << f[1000] << '\n';
0
long g[2];
cout << g[10000] << '\n';
SIGSEGV: Segmentation fault
long h[2];
cout << h[1000000] << '\n';
SIGSEGV: Segmentation fault
Undefined behavior examples
cout << "Hello, world!\n";
int *p = nullptr;
cout << *p << '\n'; // Kaboom!
SIGSEGV: Segmentation fault
Why don’t we see the “Hello, world!”?
Buffering! Output does not go out immediately—that’s inefficient.
Instead, the output accumulates, piles up in a buffer, until endl,
flush, or program end. Program dies; output is lost. ☹
Interactive output is line-buffered, but these slides send the output to
a file, so it’s fully buffered.
Undefined behavior examples
// Shifting too far:
int amount=35;
cout << (1<<amount);
8
The standard states that you can’t shift more than
the word size, and can’t shift a negative amount.
Why not?
Since shifting is such a common operation, most CPUs have a shift
instruction. For 32‑bit values, the shift amount is typically held in a
five‑bit field in the instruction (25 = 32). Alas, 35 cannot
be represented in five bits.
Undefined behavior examples
// Multiple writes to the same location
// in a single expression:
int a = 0;
cout << ++a + ++a << '\n';
c.cc:4: warning: operation on 'a' may be undefined
4
int b;
cout << b << '\n';
c.cc:2: warning: 'b' is used uninitialized in this function
0
g++ notices some undefined behavior, not all.
This is a QOI (Quality Of Implementation) aspect,
but not a standards-conformance issue.
But, why??
C++ is quite concerned about efficiency.
- The size of an int is implementation-defined so that the compiler
can use the natural size provided by the architecture.
- The size of an double is implementation-defined
so that the compiler can use the available hardware floating-point format.
- Some things are unspecified to give the compiler maximum freedom
to generate fast code. Perhaps the compiler evaluates function
arguments right-to-left because that’s the easiest order to push them
onto the stack.
C++’s attitude is “You break the rules, you pay the price.”
It doesn’t hold your hand.
Things Be Changin’
This is undefined behavior in C++14:
// C++ 2014
int i=5;
i = i++;
cout << i;
c.cc:3: warning: operation on 'i' may be undefined
5
C++17, regarding assignment, says “The right operand is sequenced before
the left operand”, so ++
finishes before =
, and this output is
guaranteed to be 6:
// C++ 2017
int i=5;
i = i++;
cout << i;
c.cc:3: warning: operation on 'i' may be undefined
5
Looks like the compiler (on the web server) hasn’t caught up to the standard.
Not just theoretical
Information from table 4–6 (page 4–11) of the
Unisys C Compiler Programming Reference Manual:
Type | Bits | sizeof | Signed Range | Unsigned Max |
char | 9 | 1 | −255 to 255 | 511 |
short | 18 | 2 | −217+1 to 217−1 | 218−1 |
int | 36 | 4 | −235+1 to 235−1 | 236−2 |
long | 36 | 4 | −235+1 to 235−1 | 236−2 |
long long | 72 | 8 | −271+1 to 271−1 |