Show Lecture.NotFullySpecified as a slide show.
CS253 Not Fully Specified
What the language definition does not say.
The C++ standard defines several kinds of not-fully specified things:
Implementation-defined (§1.9.2):
A choice made by the compiler, must be documented
Unspecified behavior (§1.9.3):
A choice made by the compiler, need not be documented
Undefined behavior (§1.9.4):
All bets are off!
Implementation-defined behavior
A choice made by the compiler, which must be documented.
// Size of variables:
cout << sizeof(int) << '\n';
4
// Maximum value of a double:
double d = 6e307;
cout << d << '\n' << d*2 << '\n' << d*3;
6e+307
1.2e+308
inf
Such choices are often heavily influenced by the hardware.
Implementation-defined behavior examples
// Signed overflow:
short s = 32767;
cout << ++s;
-32768
// Character set:
switch ('$') {
case 0x24: cout << "ASCII or UTF-8\n"; break;
case 0x5b: cout << "EBCDIC\n"; break;
default: cout << "WTF!?\n"; break;
}
ASCII or UTF-8
Implementation-defined behavior examples
// The result of shifting a negative signed value right:
cout << (-1 >> 4) << '\n';
-1
// The result of system():
system("date");
Fri Nov 22 05:47:27 MST 2024
Unspecified behavior
A choice made by the compiler, need not be documented or consistent,
generally a “this-or-that” sort of choice.
// Order of evaluation of an expression (mostly):
int foo() { cout << "foo"; return 0; }
int bar() { cout << "bar"; return 0; }
int main() {
return foo()*bar();
}
foobar
Unspecified behavior examples
// Comparing addresses of different objects:
int a,b;
cout << boolalpha << (&a < &b);
false
// Order of evaluation of function arguments:
int foo() { cout << "foo"; return 0; }
int bar() { cout << "bar"; return 0; }
void ignore_arguments(int, int) { }
int main() {
ignore_arguments(foo(), bar());
}
barfoo
Unspecified behavior examples
I suspect that byte order (little-endian, big-endian) is unspecified,
since a program can’t detect byte order without an unspecified
operation:
int word = 0x12345678;
short *usp = reinterpret_cast<short *>(&word);
cout << hex << *usp << '\n';
5678
Undefined behavior
With undefined behavior, all bets are off!
Anything can happen. Consistency is not required.
Warnings are not required.
// Uninitialized & out-of-range values:
long long a[25][6];
a[0][0] = 0;
for (int j=0; j<8; j++) {
for (int i=0; i<30; i++)
cout << a[i][j] << ' ';
cout << '\n';
}
0 140735868298136 7813586406938797358 2 0 73728 139703012354264 0 256 139703012328208 139703012328304 0 0 0 0 2 0 0 0 0 0 0 4196016 1 2 140735868298112 4902899136 0 0 0
139703014666800 139703012452408 140735868297776 384 139703012375694 139703001832736 281470681751424 32 1000 139703012328248 139703012328344 0 0 0 0 0 0 0 0 0 0 0 139702998159452 4196464 4196573 4294967321 4196262 0 139703012405786 4196078
1 1 140735868297824 4196640 139703009622192 0 1 140735868298136 139703012328304 139703012328208 0 0 0 0 0 0 0 0 0 0 0 0 140735868297824 7813586406938797358 139703012404432 4196496 0 1616078941733922814 0 140735868298104
6295584 0 6294984 6295921 32 0 140735868298120 140735868298120 139703012328344 139703012328248 0 0 0 0 0 0 0 0 0 0 0 0 1 4295032831 0 139702998067173 -1616148622835308546 1696747404146518014 0 139703014666544
1 139702997864728 1 4196016 139703009622166 139703007951824 140735868298136 0 139703012328304 139703012328344 0 0 0 0 0 0 0 0 0 0 0 5153960759809 140735868297824 140735868297840 4196496 139703009443168 4196032 140733193388032 0 1
140735868298120 139702998159600 139703012375694 139703001832736 281470681751424 139703008716384 139703012375694 1000 139703012328344 0 0 0 0 0 0 0 0 0 0 0 0 139703001847112 6294984 4196486 4196032 140735868298120 140735868298112 0 0 140735868306098
140735868298136 7813586406938797358 2 0 73728 139703012354264 0 256 139703012328208 139703012328304 0 0 0 0 2 0 0 0 0 0 0 4196016 1 2 140735868298112 4902899136 0 0 0 0
139703012452408 140735868297776 384 139703012375694 139703001832736 281470681751424 32 1000 139703012328248 139703012328344 0 0 0 0 0 0 0 0 0 0 0 139702998159452 4196464 4196573 30064771096 4196262 0 139703012405786 4196078 140735868306106
Undefined behavior examples
cout << "Hello, world!\n";
int *p = nullptr;
cout << *p << '\n'; // Kaboom!
SIGSEGV: Segmentation fault
Why don’t we see the desired output?
Buffering! Output does not go out immediately—that’s inefficient.
Instead, the output accumulates, until endl
, flush
, or program
end. Program dies; output is lost. ☹
Interactive output is line-buffered, but these slides send the output to
a file, so it’s fully buffered.
Undefined behavior examples
// Shifting too far:
int amount=35;
cout << (1<<amount);
8
The standard states that you can’t shift more than the word size,
and can’t shift a negative amount. Why?
Since shifting is such a common operations,
most CPUs have a shift instruction. For 32-bit values,
the shift amount
is typically held in a five-bit field in the instruction
(25 = 32).
Alas, 35 cannot be represented in five bits.
Undefined behavior examples
// Multiple writes to the same location
// in a single expression:
int a = 0;
cout << ++a + ++a << '\n';
c.cc:4: warning: operation on 'a' may be undefined
4
int b;
cout << b << '\n';
c.cc:2: warning: 'b' is used uninitialized in this function
0
g++ notices some undefined behavior, not all.
This is a QOI (Quality Of Implementation) aspect,
but not a standards-conformance issue.
But, why??
C++ is quite concerned about efficiency.
- The size of an
int
is implementation-defined so that the compiler
can use the natural size provided by the architecture.
- The size of an
double
is implementation-defined
so that the compiler can use the available hardware floating-point format.
- Some things are unspecified to give the compiler maximum freedom
to generate fast code. Perhaps the compiler evaluates function
arguments right-to-left because that’s the easiest order to push them
onto the stack.
C++’s attitude is “You break the rules, you pay the price.”
It doesn’t hold your hand.
Things Be Changin’
This is undefined behavior in C++14:
int i=5;
i = i++;
cout << i;
c.cc:2: warning: operation on 'i' may be undefined
5
C++17, regarding assignment, says “The right operand is sequenced before
the left operand”, so ++
finishes before =
, and this output is
guaranteed to be 6:
// c++17
int i=5;
i = i++;
cout << i;
c.cc:3: warning: operation on 'i' may be undefined
5
Looks like the compiler hasn’t caught up to the standard.
Not just theoretical
Information from the
Unisys C Compiler Programming Reference Manual:
Type | Bits | sizeof | Signed Range | Unsigned Max |
char | 9 | 1 | −255 to 255 | 511 |
short | 18 | 2 | −217+1 to 217−1 | 218−1 |
int | 36 | 4 | −235+1 to 235−1 | 236−2 |
long | 36 | 4 | −235+1 to 235−1 | 236−2 |
long long | 72 | 8 | −271+1 to 271−1 |