Issue: SXHASH-DEFINITIONReferences: SXHASH, "similarity as constants"
Related issues: CONSTANT-COMPILABLE-TYPES
Category: CLARIFICATION, CHANGE
Edit history: v1, 14 Feb 1991, Sandra Loosemore
v2, 19 Feb 1991, Sandra Loosemore (comments from KAB)
v3, 11 Mar 1991, Sandra Loosemore (additional proposals)
v4, 13 Mar 1991, Sandra Loosemore (new proposal)
v5, 26 Mar 1991, Sandra Loosemore (amendment from meeting)
Status: v5 (proposal SIMILAR-FOR-SXHASH) accepted by X3J13, Mar 1991
Problem description:
The definition of the SXHASH function is confusing. The relevant
words from CLtL are:
SXHASH computes a hash code for an object and returns the hash
code as a non-negative fixnum. A property of SXHASH is that
(EQUAL <x> <y>) implies (= (SXHASH <x>) (SXHASH <y>)).
The manner in which the hash code is computed is implementation-
dependent but independent of the particular "incarnation" or "core
image". Hash values produced by SXHASH may be written out to files,
for example, and meaningfully read in again into an instance of the
same implementation.
Different people have different interpretations of what the second
paragraph is trying to say.
An additional problem is that the effects on SXHASH of destructive
operations on the object are not well-specified. Issue
HASH-TABLE-KEY-MODIFICATION dealt only with objects used as keys in
hash tables, not with SXHASH.
Proposal (SXHASH-DEFINITION:SIMILAR-FOR-SXHASH):
Define SXHASH as a function with these properties:
(1) The result is a non-negative fixnum.
(2) (EQUAL x y) implies (= (SXHASH x) (SXHASH y)).
(3) If x and y are "similar for SXHASH", then (SXHASH x) and
(SXHASH y) return the same mathematical value even if x and y
exist in different sessions of the same Lisp implementation.
Objects are "similar for SXHASH" if they are of type string,
bit-vector, character, cons, number, pathname or symbol, and
are "similar as constants".
[This assumes that a bug in the specification of "similar as
constants" for pathnames (from issue CONSTANT-COMPILABLE-TYPES) is
going to be fixed by the editor. The problem is that the definition
of "similar as constants" now implies that the pathname components are
recursively compared for being "similar as constants", while EQUAL
talks about components merely being "equivalent". If this change to
"similar as constants" is not made, then the presentation of SXHASH
must describe the relationship for pathnames separately, in a way that
is compatible with EQUAL.]
(4) The function is intended for hashing so implementations should
return values well distributed within the range of non-negative
fixnums.
(5) The value returned by SXHASH on an object within a single session
is constant provided that the object is not visibly modified with
regard to the equivalence test EQUAL (as defined in proposal
HASH-TABLE-KEY-MODIFICATION:SPECIFY).
(6) SXHASH must terminate.
Implementation Notes:
For objects of other types that EQUAL compares with EQ, item 5
restricts SXHASH to assigning the hash code based on some immutable
attribute of the identity of the object, such as its type. Another
legitimate implementation technique would be to have SXHASH assign
(and cache) a random hash code for these objects, since there is
no requirement that "similar" but non-EQ objects have the same hash
code.
Although the "similar as constants" relationship on symbols is defined
in terms of both the symbol's name and the packages it is accessible
in, item 5 disallows using package information to compute the hash
code, since changes to the package status of a symbol are not visible
to EQUAL.
Rationale:
This is probably what CLtL meant to say. Basically, what the proposal
does is to combine the original definition of SXHASH's relationship
with EQUAL with using "similar as constants" to compare objects that
exist in different Lisp sessions. For types where EQUAL and "similar
as constants" differ, or where "similar as constants" is not
well-defined, the relationship between Lisp sessions is unspecified.
Current Practice:
Unknown. In many implementations, the hash values returned by
SXHASH for objects of types that EQUAL compares with EQ are
apparently based on the type of the object.
JonL says he knows of at least one implementation that keeps a
hash-number slot inside every stored object.
Cost to Implementors:
Probably not too large.
Cost to Users:
The current definition of SXHASH is pretty garbled and probably
most uses of SXHASH do not depend on much more than the relationship to
EQUAL. That relationship is preserved in proposal SIMILAR-FOR-SXHASH.
Some users may have interpreted the read/print consistency language
in CLtL to require stricter behavior for SXHASH than is specified by
the current proposal. For example, some uses of SXHASH might assume
that arrays that are "similar" under read/print consistency have
the same hash codes. These applications could break in implementations
that choose to assign unique hash codes to non-EQ arrays.
Cost of non-adoption:
The definition of SXHASH will continue to be garbled. SXHASH will
be of limited usefulness.
Performance impact:
Probably small.
Benefits:
The cost of non-adoption is avoided.
Esthetics:
Better than the way things are now.
Discussion:
This issue was discussed briefly on the x3j13 mailing list in
November 89, but not written up until now.
There has been quite a bit of discussion about whether SXHASH should
be required to "work" on circular objects. Some have argued that such
a requirement would make it more useful, but others claim that it's
pointless to make the requirement since EQUAL doesn't have to "work"
on circular objects.
Barrett says he is concerned about proposal SIMILAR-FOR-SXHASH
potentially breaking applications that depend on SXHASH returning
the same values for arrays, etc. that are "similar", but concludes
that trying to specify what attributes of these objects might or might
not be used to compute the hash value is probably not practical.
Loosemore, Barrett, Moon, and MacLachlan have expressed agreement in
principle with proposal SIMILAR-FOR-SXHASH.