CS253: Software Development with C++

Fall 2019

Not Fully Specified

Show Lecture.NotFullySpecified as a slide show.

CS253 Not Fully Specified

What the language definition does not say.

The C++ standard defines several kinds of not-fully specified things:

Implementation-defined (§1.9.2):

A choice made by the compiler, must be documented

Unspecified behavior (§1.9.3):

A choice made by the compiler, need not be documented

Undefined behavior (§1.9.4):

All bets are off!

Implementation-defined behavior

A choice made by the compiler, which must be documented.

// Size of variables:
cout << sizeof(int) << '\n';
4
// Maximum value of a double:
double d = 6e307;
cout << d << '\n' << d*2 << '\n' << d*3;
6e+307
1.2e+308
inf

Such choices are often heavily influenced by the hardware.

Implementation-defined behavior examples

// Signed overflow:
short s = 32767;
cout << ++s;
-32768
// Character set:
switch ('$') {
    case 0x24: cout << "ASCII or UTF-8\n"; break;
    case 0x5b: cout << "EBCDIC\n";         break;
    default:   cout << "WTF!?\n";          break;
}
ASCII or UTF-8

Implementation-defined behavior examples

// The result of shifting a negative signed value right:
cout << (-1 >> 4) << '\n';
-1
// The result of system():
system("date");
Sun Jun 30 09:22:20 MDT 2024

Unspecified behavior

A choice made by the compiler, need not be documented or consistent, generally a “this-or-that” sort of choice.

// Order of evaluation of an expression (mostly):

int foo() { cout << "foo"; return 0; }
int bar() { cout << "bar"; return 0; }

int main() {
    return foo()*bar();
}
foobar

Unspecified behavior examples

// Comparing addresses of different objects:
int a,b;
cout << boolalpha << (&a < &b);
false
// Order of evaluation of function arguments:
int foo() { cout << "foo"; return 0; }
int bar() { cout << "bar"; return 0; }

void ignore_arguments(int, int) { }

int main() {
    ignore_arguments(foo(), bar());
}
barfoo

Unspecified behavior examples

I suspect that byte order (little-endian, big-endian) is unspecified, since a program can’t detect byte order without an unspecified operation:

int word = 0x12345678;
short *usp = reinterpret_cast<short *>(&word);
cout << hex << *usp << '\n';
5678

Undefined behavior

With undefined behavior, all bets are off! Anything can happen. Consistency is not required. Warnings are not required.

// Uninitialized & out-of-range values:
long long a[25][6];
a[0][0] = 0;
for (int j=0; j<8; j++) {
    for (int i=0; i<30; i++)
        cout << a[i][j] << ' ';
    cout << '\n';
}
0 1 139873894699288 1 4196016 139873906456726 139873904786384 140736087507528 0 139873909162864 139873909162904 0 0 0 0 0 0 0 0 0 0 0 4196016 1 2 140736087507504 8296009152 0 0 0 
139873909260577 140736087507512 139873894994160 139873909210254 139873898667296 281470681751424 139873905550944 139873909210254 1000 139873909162904 0 0 0 0 0 0 0 0 0 0 0 0 139873894994012 4196464 4196573 4294967321 4196262 0 139873909240346 4196078 
139873894708600 140736087507528 7813586406938797358 2 0 73728 0 0 256 139873909162768 139873909162864 0 0 0 0 2 0 0 0 0 0 0 140736087507216 7813586406938797358 139873909238992 4196496 0 8266817329509147680 0 140736087507496 
139873911501360 139873909287080 140736087507168 384 139873909210254 139873898667296 281470681751424 32 1000 139873909162808 139873909162904 0 0 0 0 0 0 0 0 0 0 0 1 4295032831 0 139873894901733 -8266644026838506464 8298762116035093536 0 139873911501104 
1 1 140736087507216 4196640 139873906456752 0 1 140736087507528 139873909162864 139873909162768 0 0 0 0 0 0 0 0 0 0 0 0 140736087507216 140736087507232 4196496 139873906277728 4196032 140733193388032 0 1 
6295584 0 6294984 6295921 32 0 140736087507512 140736087507512 139873909162904 139873909162808 0 0 0 0 0 0 0 0 0 0 0 139873898681672 6294984 4196486 4196032 140736087507512 140736087507504 0 0 140736087515782 
1 139873894699288 1 4196016 139873906456726 139873904786384 140736087507528 0 139873909162864 139873909162904 0 0 0 0 0 0 0 0 0 0 0 4196016 1 2 140736087507504 8296009152 0 0 0 0 
140736087507512 139873894994160 139873909210254 139873898667296 281470681751424 139873905550944 139873909210254 1000 139873909162904 0 0 0 0 0 0 0 0 0 0 0 0 139873894994012 4196464 4196573 30064771096 4196262 0 139873909240346 4196078 140736087515790 

Undefined behavior examples

cout << "Hello, world!\n";
int *p = nullptr;
cout << *p << '\n'; // Kaboom!
SIGSEGV: Segmentation fault

Why don’t we see the desired output?

Buffering! Output does not go out immediately—that’s inefficient. Instead, the output accumulates, until endl, flush, or program end. Program dies; output is lost. ☹

Interactive output is line-buffered, but these slides send the output to a file, so it’s fully buffered.

Undefined behavior examples

// Shifting too far:
int amount=35;
cout << (1<<amount);
8

The standard states that you can’t shift more than the word size, and can’t shift a negative amount. Why?

Since shifting is such a common operations, most CPUs have a shift instruction. For 32-bit values, the shift amount is typically held in a five-bit field in the instruction (25 = 32). Alas, 35 cannot be represented in five bits.

Undefined behavior examples

// Multiple writes to the same location
// in a single expression:
int a = 0;
cout << ++a + ++a << '\n';
c.cc:4: warning: operation on 'a' may be undefined
4
int b;
cout << b << '\n';
c.cc:2: warning: 'b' is used uninitialized in this function
0

g++ notices some undefined behavior, not all. This is a QOI (Quality Of Implementation) aspect, but not a standards-conformance issue.

But, why??

C++ is quite concerned about efficiency.

C++’s attitude is “You break the rules, you pay the price.” It doesn’t hold your hand.

Things Be Changin’

This is undefined behavior in C++14:

int i=5;
i = i++;
cout << i;
c.cc:2: warning: operation on 'i' may be undefined
5

C++17, regarding assignment, says “The right operand is sequenced before the left operand”, so ++ finishes before =, and this output is guaranteed to be 6:

// c++17
int i=5;
i = i++;
cout << i;
c.cc:3: warning: operation on 'i' may be undefined
5

Looks like the compiler hasn’t caught up to the standard.

Not just theoretical

Information from the Unisys C Compiler Programming Reference Manual:

TypeBitssizeofSigned RangeUnsigned Max
char91−255 to 255511
short182−217+1 to 217−1218−1
int364−235+1 to 235−1236−2
long364−235+1 to 235−1236−2
long long728−271+1 to 271−1