A while back, Vassili Bykov put up a great discussion about what's a binary selector. Then the other day, he showed up to taunt me because VisualWorks still has a 2 character binary token limit (which isn't a bad thing, I like talking to Vassili any time I can).
So I thought I'd put this to rest. My goal is to get this changed for 7.7. First things first. There's a package called 1000001-1001110-1010011-1001001 in the Open Repository. It modifies the core Scanner class used for method compilation, as well as the RBParser, used for rewriting. I would love to have others look at it closely to see if I did it right. It seems to work, but maybe I missed something. There's also a branch version of the RBCodeHighlighter package (63.2) which fixes that for arbitrary binary length selectors. Thanks to John Brant for helping me get this changed.
The real question, as Vassili points out, is not how long binary selectors are, but what limitations you place on the last character. What is the proper interpretation of the expression:
3--4
Is it a single - binary message followed by a negative literal? Or is it a two character -- selector followed by the positive literal four? The ANSI standard (pages 32/33) says that it would be the latter. A leading - for a literal number must be preceded by whitespace to be recognized as such. VisualWorks does it the other way. The packages above
change VisualWorks to adopt the ANSI behavior, not just make binary selectors as long as you want.
But what about old code? What about people who wrote code that relied on the old interpretation? This is a qualitative concern. So I sat down and attempted to put some quantity behind it. Who does this anyway? With much help from Alan Knight (who I'm very grateful to), I wrote some Storp queries against both the Open Repository and the Cincom's development repository. In all, they comprise about 2 million method versions. The query scans all MethodVersions ever submitted and looks for the pattern where a negated literal immediately follows a binary selector. It uses the RBTreeSearcher to do this, so it is more than just a text pattern search (thanks once again to John for the query constructed to do the match). Here's what I gleaned from the results:
- 0.00038 - that's how often this happens. There are 2347493 methods in both databases.This pattern shows up 896 times. In some cases, it shows up in the same method twice. If we reduce the hits to a unique set, the values go down to 235 or 0.0001.
- +- and -- - Thankfully... no one has ever done this. It would just be wrong. Especially the -- one. In many fonts, you can't tell if that's two $- characters, or a long dash.
- Points - This is where the predominant usage is. People have occasionally written code like: 0@-1. With the introduction of , as an XYZ point creator in the OpenGL stuff, it uses this there as well (e.g. 1,-2,-1). In fact, the OpenGL work constitutes at least half of these ANSI violations, presumably done to make the examples match the C code they were ported from.
- Other Selectors - The rest of the cases are nearly all test methods. They include placing the - directly after these selectors: =,->,/,\.
But how to fix even this small subset of cases? There's basically three different techniques that can be used to address this.
- Fix the Code - John Brant's search rule actually comes with a replace rule as well. One can sweep over existing code bases and fix them up using these. The seach rules is:
(``@a `p: `#l) `{:node | | arg |
node selector isInfix and: [arg := node arguments first.
node selectorParts first stop + 1 = arg start and: [(arg source at: arg start) = $-]]}
and the replace rule is
``@a `p: `{``@a stop + 1 = `#l parent selectorParts first start ifTrue: [
`#l addReplacement: (RBStringReplacement
replaceFrom: ``@a stop + 1
to: ``@a stop
with: ' ')].
`#l addReplacement: (RBStringReplacement replaceFrom: `#l start
to: `#l start - 1
with: ' ').
`#l}
- Special Messages - Alan Kay says "it's all in the messages." One way to solve this problem is to simply define those messages where we want to use this pattern. For example:
Number>>@- aNumber
"Return a Point constructed by interpreting the receiver as the x value and the argument aNumber as the negated y value."
^Point x: self y: aNumber negated
- General Handler - See above regarding the value of messages. One could solve this problem generally for all cases by assuming they involve Numbers and implementing the following exception handler on Number:
doesNotUnderstand: aMessage
aMessage selector last = $-
ifTrue:
[^self
perform: (aMessage selector allButLast: 1) asSymbol
with: aMessage arguments first negated].
^super doesNotUnderstand: aMessage