Agile Buzz Forum - Counting word lengths in VisualWorks

There's an interesting discussion being had between Vassili and Andres about keywords and implicit receivers and such.

I'm going to mostly stay out of the debate. I think it would be folly to judge after only looking at bits of code. I have a theory, which I'll proffer below, on why the debate may rage on with no resolution.

What interested me was the queries being made to count keyword lengths. I contend that in such an analysis, call site matters. Because the whole argument is about readability, supposedly. If 95% of the time, the keywords are shorter than some arbitrary threshold, but exceed it dramatically 5% of the time, then the nod would go to the guy contending they were short enough in practice.

My query is as such:


census := Bag new.

statician := [:string | census add: string size].

Root

	enumerateMethods:

		[:ignoredClass :ignoredSelector :code | 

		code allSymbolLiteralsDo: [:each | each keywords do: statician].

		code

			allLiteralsDo:

				[:each | 

				each isVariableBinding ifTrue: [statician value: each key].

				each isBindingReference ifTrue: [each path do: statician]]].

^census sorted at: results size half truncated

For my development image, this comes back with a value of 8. The code snippet demonstrates two things of interest (imo). One, it shows a nice handy way of enumerating all code bodies in the system. Methods and initializers. The other is that I've chosen to take the median value of the numbers. Not any sort of average. In my experience of data analysis, this is often the best measure of the "normal" value. Averages are influenced by outlying spikes.

I think the analysis is not complete at this point though. If the point is to measure the size of the average "word", then I think one has to recognize we get away with patternMatchingLongSingleWordsByCamelCasingThemSoThatTheBrainStillSeesAPoorMansSequenceOfWords. So I reran the analysis by changing the statician block above to:

statician := [:string | 

string

	piecesCutWhere: [:a :b | a isLowercase and: [b isUppercase]]

	do: [:subword | results add: subword size]]

This breaks the keywords up into their separate constituent symbols. Then the number goes down to 5.

What does this really prove? That we love these language systems that make it so easy to analyze them. That the old adage about being able to prove anything with statistics is alive and well.

My take on the implicit receiver, and it's like-ability or dislike-ability is separated from this analysis. It is hypothetical conjecture. I don't care how long the symbols are. I've not done enough Newspeak to know. I've watched a couple of presentations, and I've looked at some code. The argument about "symbol matching lists of words" sounds quite a bit like LISP to me. Knowing that Vassili spent some time doing LISP and liked it, this analysis doesn't surprise me. I think in the end, the debate will boil down (unresolvedly) to those that do, and those that don't like to think this way. Inside the message sending boundaries of methods, Smalltalk is really quite procedural. It reads like a children's reading primer:


I see: you.

Sam throws: football.

(Me and: you) feel: #happy.

It is blatantly, excessively, obsessively subject verb (object) oriented. Whereas the "symbolic list" crowd is a more mathematical/abstract composition of programming intent. When reducing to the bare minimum symbols, this imperative nature is lost somewhat (probably to the glee of those that appreciate this style). I'm willing to trust that for at least some subset of people, this style is preferential. Vassili's experience is surely so.


	Web Artima.com