CLHS: Issue ARRAY-TYPE-ELEMENT-TYPE-SEMANTICS Writeup

Issue ARRAY-TYPE-ELEMENT-TYPE-SEMANTICS Writeup

Status: Passed, Jan 89 X3J13

Forum: Cleanup

Issue: ARRAY-TYPE-ELEMENT-TYPE-SEMANTICS

References: Data types and Type specifiers: CLtL p. 11; Sect. 4.5, p.45

Functions: TYPEP and SUBTYPEP; CLtL Sect. 6.2.1, p.72

ARRAY-ELEMENT-TYPE, CLtL p. 291

The type-specifiers:

ARRAY, SIMPLE-ARRAY, VECTOR, SIMPLE-VECTOR

COMPLEX

Related Issues: SUBTYPEP-TOO-VAGUE, LIST-TYPE-SPECIFIER

Category: CHANGE

Edit history: Version 1, 13-May-88, JonL

Version 2, 23-May-88, JonL

(typo fixes, comments from moon, rearrange some discussion)

Version 3, 02-Jun-88, JonL

(flush alternate proposal ["flush-upgrading"]; consequently,

move more of discussion back to discussion section.

Version 4, 01-Oct-88, Jan Pedersen & JonL

(reduce discussion, and "cleanup" wordings)

(Version 5 edit history missing)

Version 6, 6-Oct-88, Moon

(fix typos, cover subtypep explicitly, add complex,

change name of UPGRADE-ARRAY-ELEMENT-TYPE)

Version 7, 7-Oct-88, JonL (more name and wording changes)

Version 8, 8-Oct-88, Masinter (wording, discussion)

Version 9, 31-Oct-88, JonL (major re-wording to accommodate

recent discussion; esp. re-introduce and clarify "upgrading")

Problem description:

CLtL occasionally draws a distinction between type-specifiers "for

declaration" and "for discrimination"; see CLtL, section 4.5 "Type

Specifiers That Specialize" (p.45 and following) The phrase

"for declaration" encompasses type-specifiers passed in as the

:element-type argument to MAKE-ARRAY, passed in as the <result-type>

argument to COERCE, and used in THE and DECLARE type declarations. The

phrase "for discrimination" refers to the type-specifiers passed in as

the <type> argument(s) to TYPEP and SUBTYPEP.

One consequence of this distinction is that a variable declared to be of

type <certain-type>, and all of whose assigned objects are created in

accordance with that type, may still have none of its values ever satisfy

the TYPEP predicate with that type-specifier. One type-specifier with

this property is

(ARRAY <element-type>)

for various implementation dependent values of <element-type>. For

example, in most implementations of CL, an array X created with an

element-type of (SIGNED-BYTE 5) will, depending on the vendor, either

satisfy

(TYPEP X '(ARRAY (SIGNED-BYTE 8))), or

(TYPEP X '(ARRAY T))

but (almost) never will it satisfy

(TYPEP X '(ARRAY (SIGNED-BYTE 5))).

This is entirely permissible within the scope of standardization on

MAKE-ARRAY, where an implementation is required only to construct up the

result out of "the most specialized [element] type that can nevertheless

accommodate elements of the given type [the :element-type argument]"

(see CLtL, p287). That is, an implementation may in fact only provide a

very small number of equivalence classes of element-types for storing

arrays, corresponding to its repertoire of specialized storage techniques;

and it is explicitly permitted to "upgrade" any element-type request into

one of its built-in repertoire (see also CLtL, p45, second and third

paragraphs under Section 4.5.)

As a practical matter, almost every existing implementation does some

serious upgrading of the :element-type argument given to MAKE-ARRAY.

Yet the difference between "for declaration" and "for discrimination"

has been very confusing to many people. Similarly, portability is

hindered when users do not know just how a given implementation does

upgrading.

The type specifier (COMPLEX <part-type>) also falls in the domain of CLtL

Section 4.5. Currently, only one implementation actually provides any kind

of specialized storage for complex parts; and in this case, the practical

matter is less urgent, since the kind of upgrading happening is so obvious

as to cause little or no confusion.

Proposal: (ARRAY-TYPE-ELEMENT-TYPE-SEMANTICS:UNIFY-UPGRADING)

Short Summary:

** Eliminate the distinction between type-specifiers "for declaration" and

"for discrimination". In short, change the meaning of array and

complex type specifiers in favor of the "for declaration" meaning.

** Change the meaning of TYPEP to be in accord with "for declaration"

meaning of type-specifiers.

** Add an implementation-dependent function that reveals how a given

type-specifier for array element-types is upgraded. Add another such

function that reveals how a given type-specifier for complex parts is

upgraded.

** Clarify that "upgrading" implies a movement upwards in the type-

hierarchy lattice; i.e., if <type> upgrades to <Type>, then

<Type> must be a super-type of <type>.

** Clarify that upgrading an array element-type is independent of any

other property of arrays, such as rank, adjustability, fill-pointers,

etc.

** Clarify how SUBTYPEP thus behaves on array type-specifiers.

** Define how SUBTYPEP behaves on complex type-specifiers.

Note that despite this issue's name, the detailed specifications herein

apply to the type system -- not to the behavior of MAKE-ARRAY, nor to how

arrays are actually implemented.

Details:

First, some definitions: Two type-specifiers <type1> and <type2> are said

to be "type-equivalent" if and only if each one specifies a subtype of the

other one. For example, (UNSIGNED-BYTE 5) and (MOD 32) are two different

type- specifiers that always refer to the same sets of things; hence they

are type-equivalent. But (UNSIGNED-BYTE 5) and (SIGNED-BYTE 8) are not

type- equivalent since the former refers to a proper subset of the latter.

Two type-specifiers <type1> and <type2> are said to be "type-disjoint"

if their specified intersection is null. For example, INTEGER and FLOAT

are type disjoint by definition (see CLtL p.33), and (INTEGER 3 5) and

(INTEGER 7 10) are type-disjoint because the specified ranges have no

elements in common.

*. Eliminate the distinction between types "for declaration" and "for

discrimination". In particular, elminate any such reference in the

discussion of array and complex type-specifiers; this would include

documentation patterned after the discussion in section 4.5, pp. 45-7,

especially the example on p.46 that says "See ARRAY-ELEMENT-TYPE".

Change the meaning of (ARRAY <element-type>), as well as any of the

subtypes of ARRAY (such as SIMPLE-ARRAY, VECTOR, etc.) in favor of the

"for declaration" meaning. Make the similar simplification for the

<part-type> specifiers in the COMPLEX type-specifier.

*. Change the meaning of (TYPEP <x> '(ARRAY <type>)), where <type> is not

*, to be true if and only if <x> is an array that could be the result

of giving <type> as the :element-type argument to MAKE-ARRAY. While

(ARRAY *) refers to all arrays regardless of element type, (ARRAY <type>)

refers only to those arrays that can result from giving <type> as the

:element-type argument to the function MAKE-ARRAY. Change the meanings

for (SIMPLE-ARRAY <type>) and (VECTOR <type>) in the same way.

Change the meaning of (TYPEP <x> '(COMPLEX <type>)) similarly. Thus,

(COMPLEX <type>) refers to all complex numbers that can result from

giving numbers of type <type> to the function COMPLEX, plus all other

complex numbers of the same specialized representation. Remember that

both the real and the imaginary parts of any such complex number must

satisfy:

(TYPEP <real-or-imag-part> '<type>).

*. Add the function UPGRADED-ARRAY-ELEMENT-TYPE of one argument, which

returns the element type of the most specialized array representation

capable of holding items of the given argument type. Note that except

for storage allocation consequences, it could be defined as:

(DEFUN UPGRADED-ARRAY-ELEMENT-TYPE (TYPE)

(ARRAY-ELEMENT-TYPE (MAKE-ARRAY 0 :ELEMENT-TYPE TYPE)))

Since element-type upgrading is a fundamental operation implicit in

almost every existing implementation of MAKE-ARRAY, the purpose of this

added function is primarily to reveal how an implementation does its

upgrading.

Add the function UPGRADED-COMPLEX-PART-TYPE of one argument that

returns the part type of the most specialized complex number

representation that can hold parts of the given argument type.

*. Clarify that "upgrading" implies a movement upwards in the type-

hierarchy lattice. Specifically, the type-specifier <type> must be

a subtype of (UPGRADED-ARRAY-ELEMENT-TYPE '<type>). Furthermore, if

<type1> is a subtype of <type2>, then:

(UPGRADED-ARRAY-ELEMENT-TYPE '<type1>)

must also be a subtype of:

(UPGRADED-ARRAY-ELEMENT-TYPE '<type2>).

Note however, that two type-disjoint types can in fact be upgraded into

the same thing.

Clarify that ARRAY-ELEMENT-TYPE returns the upgraded element type

for the array; in particular, any documentation patterned after

the sentence on p. 291 begining "This set may be larger than the

set requested when the array was created; for example . . ." should

be embellished with this clarification.

Similarly, the type-specifier <type> must be a subtype of

(UPGRADED-COMPLEX-PART-TYPE <type>).

*. Clarify that upgrading an array element-type is independent of any

other property of arrays, such as rank, adjustability, fill-pointers,

displacement etc. For all such properties other than rank this should

be obvious (since they are not expressible in the language of

type-specifiers); but note that unless it is also independent of rank,

it would not be consistently possible to displace arrays to those of

differing rank.

*. Clarify that SUBTYPEP on ARRAY type-specifiers is as follows:

For all type-specifiers <type1> and <type2> other than *, require

(ARRAY <type1>) and (ARRAY <type2>) to be type-equivalent if and only

if they refer to arrays of exactly the same specialized representation;

and require them to be type-disjoint if and only if they refer to arrays

of different, distinct specialized representations. This definition

follows that implicitly prescribed in CLtL.

As a consequence of the preceding change to TYPEP and of the definition

of UPGRADED-ARRAY-ELEMENT-TYPE, the two type specifiers

(ARRAY <type1>) and

(ARRAY <type2>)

are type-equivalent if and only if

(UPGRADED-ARRAY-ELEMENT-TYPE '<type1>) and

(UPGRADED-ARRAY-ELEMENT-TYPE '<type2>)

are type-equivalent. This is another way of saying that `(ARRAY <type>)

and `(ARRAY ,(UPGRADED-ARRAY-ELEMENT-TYPE '<type>)) refer to the same

set of specialized array representations.

This defines the behavior of SUBTYPEP on array type-specifiers; namely:

(SUBTYPEP '(ARRAY <type1>) '(ARRAY <type2>))

is true if and only if

(UPGRADED-ARRAY-ELEMENT-TYPE '<type1>) and

(UPGRADED-ARRAY-ELEMENT-TYPE '<type2>)

are type-equivalent.

*. Define SUBTYPEP on COMPLEX type-specifiers as follows:

For all type-specifiers <type1> and <type2> other than *,

(SUBTYPEP '(COMPLEX <type1>) '(COMPLEX <type2>))

is T T if:

1. <type1> is a subtype of <type2>, or

2. (UPGRADED-COMPLEX-PART-TYPE '<type1>) is type-equivalent

to (UPGRADED-COMPLEX-PART-TYPE '<type2>); in this case,

(COMPLEX <type1>) and (COMPLEX <type2>) both refer to the

same specialized representation.

The result is NIL T otherwise.

The small differences between the SUBTYPEP specification for ARRAY and

for COMPLEX are necessary because there is no creation function for

complexes which allows one to specify the resultant part type independently

of the actual types of the parts. Thus in the case of COMPLEX, we must

refer to the actual type of the parts, although a number can be a member

of more than one type; e.g., 17 is of type (MOD 18) as well as of type

(MOD 256); and 2.3f5 is of type SINGLE-FLOAT was well as FLOAT.

The form:

(SUBTYPEP '(COMPLEX SINGLE-FLOAT) '(COMPLEX FLOAT))

must be true in all implementations; but:

(SUBTYPEP '(ARRAY SINGLE-FLOAT) '(ARRAY FLOAT))

is true only in implementations that do not have a specialized array

representation for single-floats distinct from that for other floats.

Examples:

Let <aet-x> and <aet-y> be two distinct type specifiers that are

definitely not type-equivalent in a given implementation, but for which

make-array will return an object of the same array type. This will be

an implementation dependent search, but in every implementation that

the proposer has tested, there will be some such types; often,

(SIGNED-BYTE 5) and (SIGNED-BYTE 8) will work.

Thus, in each case, both of the following forms return T T:

(subtypep (array-element-type (make-array 0 :element-type '<aet-x>))

(array-element-type (make-array 0 :element-type '<aet-y>)))

(subtypep (array-element-type (make-array 0 :element-type '<aet-y>))

(array-element-type (make-array 0 :element-type '<aet-x>)))

To eliminate the distinction between "for declaration" and "for

discrimination" both of the following should be true:

[A]

(typep (make-array 0 :element-type '<aet-x>)

'(array <aet-x>))

(typep (make-array 0 :element-type '<aet-y>)

'(array <aet-y>))

Since (array <aet-x>) and (array <aet-y>) are different names for

exactly the same set of objects, these names should be type-equivalent.

That implies that the following set of tests should also be true:

[B]

(subtypep '(array <aet-x>) '(array <aet-y>))

(subtypep '(array <aet-y>) '(array <aet-x>))

Additionally, to show that un-equivalent type-specifiers that use the

same specialized array type should be equivalent as element-type

specifiers, the following type tests should be true:

[C]

(typep (make-array 0 :element-type '<aet-y>)

'(array <aet-x>))

(typep (make-array 0 :element-type '<aet-x>)

'(array <aet-y>))

Rationale:

This proposal legitimizes current practice, and removes the obscure and

un-useful distinction between type-specifiers "for declaration" and

"for discrimination". The suggested changes to the interpretation of

array and complex type-specifiers follow from defining type-specifiers

as names for collections of objects, on TYPEP being a set membership

test, and SUBTYPEP a subset test on collections of objects.

Current Practice:

Every vendor's implementation that the proposer has queried has a finite

set of specialized array representations, such that two non-equivalent

element types can be found that use the same specialized array

representation; this includes Lucid, Vaxlisp, Symbolics, TI, Franz,

and Xerox. Most implementations fail tests [A] and [C] part 1, but pass

tests [A] and [C] part 2; this is a consequence of implementing the

distinction between "for declaration" and "for discrimination". Lucid

and Xerox both pass test [B], and the other implementations fail it.

The Explorer returns NIL for all six tests in [A], [B], and [C].

Allegedly, the PCLS implementation does no "upgrading"; each array

"remembers" exactly the type-specifier handed to the MAKE-ARRAY call

that created it. Thus the test cases are not applicable to PCLS,

since the precondition cannot be met (i.e., find two non-type-equivalent

type-specifiers that are non-trivially upgraded by make-array).

The TI Explorer offers specialized representation for complexes;

part types of SINGLE-FLOAT and DOUBLE-FLOAT are specialized.

Cost to Implementors:

This proposal is an incompatible change to the current language

specification, but only a small amount of work should be required in

each vendor's implementation of TYPEP and SUBTYPEP.

Cost to Users:

Because of the prevalence of confusion in this area, it seems unlikely

that any user code will have to be changed. In fact, it is more likely

that some of the vendors will cease to get bug reports about MAKE-ARRAY

returning a result that isn't of "the obvious type". Since the change

is incompatible, some user code might have to be changed.

Cost of non-adoption:

Continuing confusion in the user community.

Benefits:

It will greatly reduce confusion in the user community. The fact that

(MAKE-ARRAY <n> :ELEMENT-TYPE '<type>) frequently is not of type

(ARRAY <type>) has been very confusing to almost everyone.

Portability of applications will be increased slightly, since

the behavior of

(TYPEP (MAKE-ARRAY <n> :ELEMENT-TYPE <type>) '(ARRAY <type>))

will no longer be implementation-dependent.

Esthetics:

Reducing the confusing distinction between type-specifiers "for

declaration" and "for discrimination" is a simplifying step -- it is a

much simpler rule to state that the type-specifiers actually describe

the collections of data they purport to name. Thus this is a step

towards increased elegance.

Discussion:

This issue was prompted by a lengthy discussion on the Common Lisp

mailing list. See for example a series of exchanges started on Thu,

17 Dec 87 10:48:05 PST by Jeff Barnett <jbarnett@nrtc.northrop.com>

under the subject line of "Types in CL". See also the exchange started

Wed, 6 Jan 88 23:21:16 PST by Jon L White <edsel!jonl@labrea.stanford.edu>

under the subject line of "TYPEP warp implications"

Although the types STRING, BIT-VECTOR, SIMPLE-STRING, and

SIMPLE-BIT-VECTOR are subtypes of the ARRAY type, they are not

specifically discussed in this proposal. The reason is that

they are not type-specifiers "that specialize", but are merely

abbreviations as follows:

STRING == (VECTOR STRING-CHAR)

SIMPLE-STRING == (SIMPLE-ARRAY STRING-CHAR (*))

BIT-VECTOR == (VECTOR BIT)

SIMPLE-BIT-VECTOR == (SIMPLE-ARRAY BIT (*))

Thus their semantics could be affected only in an implementation that

doesn't support a specific "specialized storage" type for arrays of

bits and vectors of string-chars. But in fact, every CL implementation

must appear to support "specialized storage" for bit-arrays and strings,

even if it means nothing more than remembering the fact that such an

array was created with that element-type. This is required in order

for strings, bit-vectors, and bit-arrays to be disjoint datatypes

(see CLtL p.34; see also the definitions of BIT-ARRAY and STRING found

in CLtL p.293, Section 17.4, and in CLtL p.299.)

We considered the possibility of flushing the permission to "upgrade";

for example, it could be made a requirement that:

(ARRAY-ELEMENT-TYPE (MAKE-ARRAY <n> :ELEMENT-TYPE <type>))

always be equal to <type> (or, at least type-equivalent to <type>)

for all valid type specifiers <type>. This has several problems: it

increases the storage requirement for many kinds of arrays, and hides

a relevant part of the underlying implementation for no apparently

good reason. However, it would increase portability, since it would be

much more difficult, for example, to write a program that created an

array with one element-type, say, (UNSIGNED-BYTE 5), but operated on it

assuming a non-trivial upgraded element-type, say, (UNSIGNED-BYTE 8).

Under this proposal, it is valid for an implementation of MAKE-ARRAY

to have arrays "remember" the type-equivalence class of the original

:element-type argument; such an implementation would satisfy all of

the constraints listed above.

We considered a suggestion to restrict the set of "known" array element

types; this would gain portability at the expense of limiting the

language.

We considered leaving out of the proposal the addition of the two

functions UPGRADED-ARRAY-ELEMENT-TYPE and UPGRADED-COMPLEX-PART-TYPE.

But it was noted that every implementation of CL supports exactly

that functionality somewhere in its implementation of MAKE-ARRAY; and

exposing this to the user would be a good thing. Furthermore, the

existence of at least UPGRADED-ARRAY-ELEMENT-TYPE makes the clarifications

on "upgrading" and SUBTYPEP implications easier. Finally, there would

be no other way at all to pinpoint just how complex parts are upgraded,

since there is no type information available except for the actual

types of the parts.

Since this proposal contains the implication:

(ARRAY <type1>) is-type-equivalent-to (ARRAY <type2>)

==>

<type1> is-type-equivalent-to <type2>

then the question naturally arises "Does the reverse implication hold?"

That is, should two non-EQ but type-equivalent type-specifiers <type1>

and <type2> always give rise to the same array types? For example,

consider SHORT-FLOAT and SINGLE-FLOAT in an implementation where these

are type-equivalent (see CLtL section 2.1.3). One may desire to implement

(ARRAY SHORT-FLOAT) and (ARRAY SINGLE-FLOAT) differently. Say, for example

that the former is packed into 16-bit half-words, whereas the latter is

packed into 32-bit words; but for either kind of packing, the result of

AREF is an ordinary "single-float". The whole point of the type-specifier

to make-array is merely to specify a packing technique for "packed float"

arrays. This "krinkle", however, will not be addressed by the proposal

herein; it should simply be remembered that the implication above goes

only one way, and is not an "if-and-only-if" link.