CS253: Software Development with C++

Spring 2020

Not Fully Specified

Show Lecture.NotFullySpecified as a slide show.

CS253 Not Fully Specified

What the language definition does not say.

The C++ standard defines several kinds of not-fully specified things:

Implementation-defined (§1.9.2):

A choice made by the compiler, must be documented

Unspecified behavior (§1.9.3):

A choice made by the compiler, need not be documented

Undefined behavior (§1.9.4):

All bets are off!

Implementation-defined behavior

A choice made by the compiler, which must be documented.

// Size of variables:
cout << sizeof(int) << '\n';
4
// Maximum value of a double:
double d = 6e307;
cout << d << '\n' << d*2 << '\n' << d*3;
6e+307
1.2e+308
inf

Such choices are often heavily influenced by the hardware.

Implementation-defined behavior examples

// Signed overflow:
short s = 32767;
cout << ++s;
-32768
// Character set:
switch ('$') {
    case 0x24: cout << "ASCII or UTF-8\n"; break;
    case 0x5b: cout << "EBCDIC\n";         break;
    default:   cout << "WTF!?\n";          break;
}
ASCII or UTF-8

Implementation-defined behavior examples

// The result of shifting a negative signed value right:
cout << (-1 >> 4) << '\n';
-1
// The result of system():
system("date");
Fri Nov 22 02:10:34 MST 2024

Unspecified behavior

A choice made by the compiler, need not be documented or consistent, generally a “this-or-that” sort of choice.

// Order of evaluation of an expression (mostly):

int foo() { cout << "foo"; return 0; }
int bar() { cout << "bar"; return 0; }

int main() {
    return foo()*bar();
}
foobar

Unspecified behavior examples

// Comparing addresses of different objects:
int a,b;
cout << boolalpha << (&a < &b);
false
// Order of evaluation of function arguments:
int foo() { cout << "foo"; return 0; }
int bar() { cout << "bar"; return 0; }

void ignore_arguments(int, int) { }

int main() {
    ignore_arguments(foo(), bar());
}
barfoo

Unspecified behavior examples

I suspect that byte order (little-endian, big-endian) is unspecified, since a program can’t detect byte order without an unspecified operation:

int word = 0x12345678;
short *usp = reinterpret_cast<short *>(&word);
cout << hex << *usp << '\n';
5678

Undefined behavior

With undefined behavior, all bets are off! Anything can happen. Consistency is not required. Warnings are not required.

// Uninitialized & out-of-range values:
long long a[25][6];
a[0][0] = 0;
for (int j=0; j<8; j++) {
    for (int i=0; i<30; i++)
        cout << a[i][j] << ' ';
    cout << '\n';
}
0 140727443107288 7813586406938797358 2 0 73728 140639227685080 0 256 140639227659024 140639227659120 0 0 0 0 2 0 0 0 0 0 0 4196016 1 2 140727443107264 4815359424 0 0 0 
140639229997616 140639227783224 140727443106928 384 140639227706510 140639217163552 281470681751424 32 1000 140639227659064 140639227659160 0 0 0 0 0 0 0 0 0 0 0 140639213490268 4196464 4196573 4294967321 4196262 0 140639227736602 4196078 
1 1 140727443106976 4196640 140639224953008 0 1 140727443107288 140639227659120 140639227659024 0 0 0 0 0 0 0 0 0 0 0 0 140727443106976 7813586406938797358 140639227735248 4196496 0 -4146124474441559642 0 140727443107256 
6295584 0 6294984 6295921 32 0 140727443107272 140727443107272 140639227659160 140639227659064 0 0 0 0 0 0 0 0 0 0 0 0 1 4295032831 0 140639213397989 4147062927678738854 -4152477233767149146 0 140639229997360 
1 140639213195544 1 4196016 140639224952982 140639223282640 140727443107288 0 140639227659120 140639227659160 0 0 0 0 0 0 0 0 0 0 0 5153960759809 140727443106976 140727443106992 4196496 140639224773984 4196032 140724603453440 0 1 
140727443107272 140639213490416 140639227706510 140639217163552 281470681751424 140639224047200 140639227706510 1000 140639227659160 0 0 0 0 0 0 0 0 0 0 0 0 140639217177928 6294984 4196486 4196032 140727443107272 140727443107264 0 0 140727443108517 
140727443107288 7813586406938797358 2 0 73728 140639227685080 0 256 140639227659024 140639227659120 0 0 0 0 2 0 0 0 0 0 0 4196016 1 2 140727443107264 4815359424 0 0 0 0 
140639227783224 140727443106928 384 140639227706510 140639217163552 281470681751424 32 1000 140639227659064 140639227659160 0 0 0 0 0 0 0 0 0 0 0 140639213490268 4196464 4196573 30064771096 4196262 0 140639227736602 4196078 140727443108525 

Undefined behavior examples

cout << "Hello, world!\n";
int *p = nullptr;
cout << *p << '\n'; // Kaboom!
SIGSEGV: Segmentation fault

Why don’t we see the desired output?

Buffering! Output does not go out immediately—that’s inefficient. Instead, the output accumulates, until endl, flush, or program end. Program dies; output is lost. ☹

Interactive output is line-buffered, but these slides send the output to a file, so it’s fully buffered.

Undefined behavior examples

// Shifting too far:
int amount=35;
cout << (1<<amount);
8

The standard states that you can’t shift more than the word size, and can’t shift a negative amount. Why?

Since shifting is such a common operations, most CPUs have a shift instruction. For 32-bit values, the shift amount is typically held in a five-bit field in the instruction (25 = 32). Alas, 35 cannot be represented in five bits.

Undefined behavior examples

// Multiple writes to the same location
// in a single expression:
int a = 0;
cout << ++a + ++a << '\n';
c.cc:4: warning: operation on 'a' may be undefined
4
int b;
cout << b << '\n';
c.cc:2: warning: 'b' is used uninitialized in this function
0

g++ notices some undefined behavior, not all. This is a QOI (Quality Of Implementation) aspect, but not a standards-conformance issue.

But, why??

C++ is quite concerned about efficiency.

C++’s attitude is “You break the rules, you pay the price.” It doesn’t hold your hand.

Things Be Changin’

This is undefined behavior in C++14:

int i=5;
i = i++;
cout << i;
c.cc:2: warning: operation on 'i' may be undefined
5

C++17, regarding assignment, says “The right operand is sequenced before the left operand”, so ++ finishes before =, and this output is guaranteed to be 6:

// c++17
int i=5;
i = i++;
cout << i;
c.cc:3: warning: operation on 'i' may be undefined
5

Looks like the compiler hasn’t caught up to the standard.

Not just theoretical

Information from the Unisys C Compiler Programming Reference Manual:

TypeBitssizeofSigned RangeUnsigned Max
char91−255 to 255511
short182−217+1 to 217−1218−1
int364−235+1 to 235−1236−2
long364−235+1 to 235−1236−2
long long728−271+1 to 271−1