Tim Bray has an interesting set of blogs that's keeping me on the edge. Tim is essentially saying that Java's immutable String class isn't suited for "heavyweight" text processing tasks. That's a big claim since there's just too many applications that require heavy text processing.
Tim Bray has an interesting set of blogs that's keeping me on the edge:
So for Java, should we abandon String and do all our work with char constructs? I don't think so, simply because I think the char primitive is just too deeply broken. Also, I want to use tricks like strcmp() and strncat(), beloved of grey-bearded Unix veterans.
Gasp! Surely he's not suggesting that we retreat to putting everything in a byte construct, and presumably revert to living in caves and courting women with clubs?!? Well no, because I am an object-oriented kinda guy, when I can get away with it. So, how do we get the heavy industrial machinery for doing superior text processing in modern languages without compromising their virtues? Stay tuned.
What could he have up his sleeve?
Tim is essentially saying that Java's immutable String class isn't suited for "heavyweight" text processing tasks. That's a big claim since there's just too many applications that require heavyweight text processing (i.e. parsers, information retrieval, semantic analysis, etc). Tim is saying that you can't scale using String or its dual StringBuffer for these types of applications! Now, I can only guess what he's trying to brew together, however I do recall a couple of alternative String (rather byte based) implementations for Java.
There's also the notion of "cords" or "ropes", which represent long strings as binary trees whose leaves are small sequences of characters. Strings can still be immutable, but concatenation only creates a root node with references to the two sub-ropes, and substrings can reuse many of the nodes in a subtree without copying.