Show Lecture.NotFullySpecified as a slide show.
CS253 Not Fully Specified
What the language definition does not say.
The C++ standard defines several kinds of not-fully specified things:
Implementation-defined (§1.9.2):
A choice made by the compiler, must be documented
Unspecified behavior (§1.9.3):
A choice made by the compiler, need not be documented
Undefined behavior (§1.9.4):
All bets are off!
Implementation-defined behavior
A choice made by the compiler, which must be documented.
// Size of variables:
cout << sizeof(int) << '\n';
4
// Maximum value of a double:
double d = 6e307;
cout << d << '\n' << d*2 << '\n' << d*3;
6e+307
1.2e+308
inf
Such choices are often heavily influenced by the hardware.
Implementation-defined behavior examples
// Signed overflow:
short s = 32767;
cout << ++s;
-32768
// Character set:
switch ('$') {
case 0x24: cout << "ASCII or UTF-8\n"; break;
case 0x5b: cout << "EBCDIC\n"; break;
default: cout << "WTF!?\n"; break;
}
ASCII or UTF-8
Implementation-defined behavior examples
// The result of shifting a negative signed value right:
cout << (-1 >> 4) << '\n';
-1
// The result of system():
system("date");
Thu Sep 26 21:39:52 MDT 2024
Unspecified behavior
A choice made by the compiler, need not be documented or consistent,
generally a “this-or-that” sort of choice.
// Order of evaluation of an expression (mostly):
int foo() { cout << "foo"; return 0; }
int bar() { cout << "bar"; return 0; }
int main() {
return foo()*bar();
}
foobar
Unspecified behavior examples
// Comparing addresses of different objects:
int a,b;
cout << boolalpha << (&a < &b);
false
// Order of evaluation of function arguments:
int foo() { cout << "foo"; return 0; }
int bar() { cout << "bar"; return 0; }
void ignore_arguments(int, int) { }
int main() {
ignore_arguments(foo(), bar());
}
barfoo
Unspecified behavior examples
I suspect that byte order (little-endian, big-endian) is unspecified,
since a program can’t detect byte order without an unspecified
operation:
int word = 0x12345678;
short *usp = reinterpret_cast<short *>(&word);
cout << hex << *usp << '\n';
5678
Undefined behavior
With undefined behavior, all bets are off!
Anything can happen. Consistency is not required.
Warnings are not required.
// Uninitialized & out-of-range values:
long long a[25][6];
a[0][0] = 0;
for (int j=0; j<8; j++) {
for (int i=0; i<30; i++)
cout << a[i][j] << ' ';
cout << '\n';
}
0 139909606192504 140726729664936 7813586406938797358 2 0 73728 140726729664936 0 256 139909620646672 139909620646768 0 0 0 0 2 0 0 0 0 0 4196016 1 2 140726729664912 5352787392 0 0 0
0 139909622985264 139909620770984 140726729664576 384 139909620694158 139909610151200 281470681751424 139909620678368 1000 139909620646712 139909620646808 0 0 0 0 0 0 0 0 0 0 139909606477916 4196464 4196573 4294967321 4196262 0 139909620724250 4196078
1 1 1 140726729664624 4196640 139909617940656 0 1 139909620666784 139909620646768 139909620646672 0 0 0 0 0 0 0 0 0 0 0 0 7813586406938797358 139909620722896 4196496 0 8575367029477840117 0 140726729664904
139909622968480 6295584 0 6294984 6295921 32 139909620694158 139909610151200 281470681751424 139909620646808 139909620646712 0 0 0 0 0 0 0 0 0 0 0 1 4295032831 0 139909606385637 -8576209286175157003 8540323873924420853 0 139909622985008
4294967295 1 139909606183192 1 4196016 139909617940630 0 0 0 139909620646768 139909620646808 0 0 0 0 0 0 0 0 0 0 0 140726729664624 140726729664640 4196496 139909617761632 4196032 140724603453440 0 1
140726729663584 140726729664920 139909606478064 139909620694158 139909610151200 281470681751424 32 139909620694158 1000 139909620646808 0 0 0 0 0 139909607409121 0 0 0 0 0 139909610165576 6294984 4196486 4196032 140726729664920 140726729664912 0 0 140726729671331
139909606192504 140726729664936 7813586406938797358 2 0 73728 140726729664936 0 256 139909620646672 139909620646768 0 0 0 0 2 0 0 0 0 0 4196016 1 2 140726729664912 5352787392 0 0 0 0
139909622985264 139909620770984 140726729664576 384 139909620694158 139909610151200 281470681751424 139909620678368 1000 139909620646712 139909620646808 0 0 0 0 0 0 0 0 0 0 139909606477916 4196464 4196573 30064771096 4196262 0 139909620724250 4196078 140726729671339
Undefined behavior examples
cout << "Hello, world!\n";
int *p = nullptr;
cout << *p << '\n'; // Kaboom!
SIGSEGV: Segmentation fault
Why don’t we see the desired output?
Buffering! Output does not go out immediately—that’s inefficient.
Instead, the output accumulates, until endl
, flush
, or program
end. Program dies; output is lost. ☹
Interactive output is line-buffered, but these slides send the output to
a file, so it’s fully buffered.
Undefined behavior examples
// Shifting too far:
int amount=35;
cout << (1<<amount);
8
The standard states that you can’t shift more than the word size,
and can’t shift a negative amount. Why?
Since shifting is such a common operations,
most CPUs have a shift instruction. For 32-bit values,
the shift amount
is typically held in a five-bit field in the instruction
(25 = 32).
Alas, 35 cannot be represented in five bits.
Undefined behavior examples
// Multiple writes to the same location
// in a single expression:
int a = 0;
cout << ++a + ++a << '\n';
c.cc:4: warning: operation on 'a' may be undefined
4
int b;
cout << b << '\n';
c.cc:2: warning: 'b' is used uninitialized in this function
0
g++ notices some undefined behavior, not all.
This is a QOI (Quality Of Implementation) aspect,
but not a standards-conformance issue.
But, why??
C++ is quite concerned about efficiency.
- The size of an
int
is implementation-defined so that the compiler
can use the natural size provided by the architecture.
- The size of an
double
is implementation-defined
so that the compiler can use the available hardware floating-point format.
- Some things are unspecified to give the compiler maximum freedom
to generate fast code. Perhaps the compiler evaluates function
arguments right-to-left because that’s the easiest order to push them
onto the stack.
C++’s attitude is “You break the rules, you pay the price.”
It doesn’t hold your hand.
Things Be Changin’
This is undefined behavior in C++14:
int i=5;
i = i++;
cout << i;
c.cc:2: warning: operation on 'i' may be undefined
5
C++17, regarding assignment, says “The right operand is sequenced before
the left operand”, so ++
finishes before =
, and this output is
guaranteed to be 6:
// c++17
int i=5;
i = i++;
cout << i;
c.cc:3: warning: operation on 'i' may be undefined
5
Looks like the compiler hasn’t caught up to the standard.
Not just theoretical
Information from the
Unisys C Compiler Programming Reference Manual:
Type | Bits | sizeof | Signed Range | Unsigned Max |
char | 9 | 1 | −255 to 255 | 511 |
short | 18 | 2 | −217+1 to 217−1 | 218−1 |
int | 36 | 4 | −235+1 to 235−1 | 236−2 |
long | 36 | 4 | −235+1 to 235−1 | 236−2 |
long long | 72 | 8 | −271+1 to 271−1 |