[HARLEQUIN][Common Lisp HyperSpec (TM)] [Previous][Up][Next]


Issue SXHASH-DEFINITION Writeup

Issue:            SXHASH-DEFINITION

References: SXHASH, "similarity as constants"

Related issues: CONSTANT-COMPILABLE-TYPES

COMPILE-FILE-SYMBOL-HANDLING

CONSTANT-COLLAPSING

HASH-TABLE-KEY-MODIFICATION

Category: CLARIFICATION, CHANGE

Edit history: v1, 14 Feb 1991, Sandra Loosemore

v2, 19 Feb 1991, Sandra Loosemore (comments from KAB)

v3, 11 Mar 1991, Sandra Loosemore (additional proposals)

v4, 13 Mar 1991, Sandra Loosemore (new proposal)

v5, 26 Mar 1991, Sandra Loosemore (amendment from meeting)

Status: v5 (proposal SIMILAR-FOR-SXHASH) accepted by X3J13, Mar 1991

Problem description:

The definition of the SXHASH function is confusing. The relevant

words from CLtL are:

SXHASH computes a hash code for an object and returns the hash

code as a non-negative fixnum. A property of SXHASH is that

(EQUAL <x> <y>) implies (= (SXHASH <x>) (SXHASH <y>)).

The manner in which the hash code is computed is implementation-

dependent but independent of the particular "incarnation" or "core

image". Hash values produced by SXHASH may be written out to files,

for example, and meaningfully read in again into an instance of the

same implementation.

Different people have different interpretations of what the second

paragraph is trying to say.

An additional problem is that the effects on SXHASH of destructive

operations on the object are not well-specified. Issue

HASH-TABLE-KEY-MODIFICATION dealt only with objects used as keys in

hash tables, not with SXHASH.

Proposal (SXHASH-DEFINITION:SIMILAR-FOR-SXHASH):

Define SXHASH as a function with these properties:

(1) The result is a non-negative fixnum.

(2) (EQUAL x y) implies (= (SXHASH x) (SXHASH y)).

(3) If x and y are "similar for SXHASH", then (SXHASH x) and

(SXHASH y) return the same mathematical value even if x and y

exist in different sessions of the same Lisp implementation.

Objects are "similar for SXHASH" if they are of type string,

bit-vector, character, cons, number, pathname or symbol, and

are "similar as constants".

[This assumes that a bug in the specification of "similar as

constants" for pathnames (from issue CONSTANT-COMPILABLE-TYPES) is

going to be fixed by the editor. The problem is that the definition

of "similar as constants" now implies that the pathname components are

recursively compared for being "similar as constants", while EQUAL

talks about components merely being "equivalent". If this change to

"similar as constants" is not made, then the presentation of SXHASH

must describe the relationship for pathnames separately, in a way that

is compatible with EQUAL.]

(4) The function is intended for hashing so implementations should

return values well distributed within the range of non-negative

fixnums.

(5) The value returned by SXHASH on an object within a single session

is constant provided that the object is not visibly modified with

regard to the equivalence test EQUAL (as defined in proposal

HASH-TABLE-KEY-MODIFICATION:SPECIFY).

(6) SXHASH must terminate.

Implementation Notes:

For objects of other types that EQUAL compares with EQ, item 5

restricts SXHASH to assigning the hash code based on some immutable

attribute of the identity of the object, such as its type. Another

legitimate implementation technique would be to have SXHASH assign

(and cache) a random hash code for these objects, since there is

no requirement that "similar" but non-EQ objects have the same hash

code.

Although the "similar as constants" relationship on symbols is defined

in terms of both the symbol's name and the packages it is accessible

in, item 5 disallows using package information to compute the hash

code, since changes to the package status of a symbol are not visible

to EQUAL.

Rationale:

This is probably what CLtL meant to say. Basically, what the proposal

does is to combine the original definition of SXHASH's relationship

with EQUAL with using "similar as constants" to compare objects that

exist in different Lisp sessions. For types where EQUAL and "similar

as constants" differ, or where "similar as constants" is not

well-defined, the relationship between Lisp sessions is unspecified.

Current Practice:

Unknown. In many implementations, the hash values returned by

SXHASH for objects of types that EQUAL compares with EQ are

apparently based on the type of the object.

JonL says he knows of at least one implementation that keeps a

hash-number slot inside every stored object.

Cost to Implementors:

Probably not too large.

Cost to Users:

The current definition of SXHASH is pretty garbled and probably

most uses of SXHASH do not depend on much more than the relationship to

EQUAL. That relationship is preserved in proposal SIMILAR-FOR-SXHASH.

Some users may have interpreted the read/print consistency language

in CLtL to require stricter behavior for SXHASH than is specified by

the current proposal. For example, some uses of SXHASH might assume

that arrays that are "similar" under read/print consistency have

the same hash codes. These applications could break in implementations

that choose to assign unique hash codes to non-EQ arrays.

Cost of non-adoption:

The definition of SXHASH will continue to be garbled. SXHASH will

be of limited usefulness.

Performance impact:

Probably small.

Benefits:

The cost of non-adoption is avoided.

Esthetics:

Better than the way things are now.

Discussion:

This issue was discussed briefly on the x3j13 mailing list in

November 89, but not written up until now.

There has been quite a bit of discussion about whether SXHASH should

be required to "work" on circular objects. Some have argued that such

a requirement would make it more useful, but others claim that it's

pointless to make the requirement since EQUAL doesn't have to "work"

on circular objects.

Barrett says he is concerned about proposal SIMILAR-FOR-SXHASH

potentially breaking applications that depend on SXHASH returning

the same values for arrays, etc. that are "similar", but concludes

that trying to specify what attributes of these objects might or might

not be used to compute the hash value is probably not practical.

Loosemore, Barrett, Moon, and MacLachlan have expressed agreement in

principle with proposal SIMILAR-FOR-SXHASH.


[Starting Points][Contents][Index][Symbols][Glossary][Issues]
Copyright 1996, The Harlequin Group Limited. All Rights Reserved.