Agile Buzz Forum - Global State is bad, mmkay?

One lesson we've all learnt, eventually, is that global state in our applications is a really bad thing. Information that leaks out of its context can be reused in odd and peculiar ways - so we tend to avoid programming with globals, even if our programming language provides them. One thing that Gilad has done in Newspeak is to get rid of global variables entirely.

Actually, aren't we really talking about global state, not global variables? So, what we're talking about is data that is placed outside its context? .. what does that mean exactly...

Time to turn a new page and observe this common wisdom from a new angle and see if it still applies. Let's talk about data, information and knowledge (let's leave wisdom out of this!).

Information Systems use data.. they interpret data and turn it in to information. That's what happens when we take some raw information and put it on a screen with a label next to it.. if we have the string 'bob'.. is it a verb or a name? We don't know until we give some context to it.

Knowledge Systems use information.. they store data with annotation such that we never have a piece of data that is simple 'bob', we instead have a 'name' that is 'bob'. Such systems can be self describing and one place you'll find them in common use is Lisp, Smalltalk and other such dynamic and "living" environments. It's sort of the difference between an object knowing its own type, versus declaring its type at compile time and conveniently forgetting that while the program is running to save space. Like Information Systems, Knowledge Systems also 'mark up' their information in to knowledge by applying it to a scenario. If I have a parent and my parent has a parent, I have a grand parent. If you've ever used prolog this will be very familiar to you.

Now back to the problem of global state.. if we save a piece of data or information in to a database and we can access it from any part of our program.. across multiple computers even.. doesn't that make it global state? .. why, yes, yes it does. So isn't that therefore a bad thing?

The answer to this question is a little heretical.. the answer is: Yes. All of the data we store everywhere, out of context with its intent and use, is just as bad as sticking a global variable in to your own program. We get away with it because we're really really careful to draw boundaries around our data. We have some very powerful tools to draw those boundaries and manage them - Relational Databases. There are other kinds of databases too but I'm picking on the most common form.

So if all that global state is bad, what should we be doing instead? Well, that's the nub of it.. we have to be pragmatic here.. because the true answer to this problem is to have only one global computer that is always running in a giant dynamic "living" environment.. sort of like one super duper stupidly big Smalltalk image. I'm not going nuts here.. people really do work on this kind of stuff.

It turns out that being able to identify a piece of data globally is really important.. so much so that we came up with a standard way of doing it - the URI. Technologies like RDF embrace this idea by assigning absolutely every piece of information that can be recorded its very own URI. I find this sort of amusing because in databases, we generally have to tell it a piece of data is its own thing - and then we give ourselves a bunch of normalization rules to make sure that we keep data as its own thing.. but anyway, I digress.

If I want to talk about me, I could have an identifier - I may have several identifiers in fact, but that's by-the-by. Let's borrow the N3 syntax for RDF instead of the XML syntax for RDF.. why? because N3 is human readable.

@base <http://www.census.org/people#> .
@prefix foaf: <http://xmlns.com/foaf/0.1> .
:1 foaf:name "Michael Lucas-Smith" .

There, we have just said one piece of information about http://www.census.org/people#1 -- which I'm giving the name of Michael Lucas-Smith. I'm also using Friend of a Friend to describe what kind of name I'm giving it. Turtles all the way down from here. (Coincidently, N3 is really just Turtle with some logic programming added in).

The point is that since we're on the internet, that piece of information is available if I know the address of it - and the identity of me is fixed.. it's global.. totally global state. We've broken this stuff out of its context in some pretty incredible ways. Let's list them:

I didn't include my middle name
We didn't describe when I got the name or how long it's going to stay like that
We didn't say what language it was in
We didn't describe who uses that name and for what purposes

In fact, the only bit of context we really have is that it's a person from some sort of world-wide 'census' and that we have a name that friends-of-a-friend should use. If it were still in its context of creation, we'd know a lot more about it. That's okay, with knowledge systems we can describe all of that too..

[:1 foaf:name "Michael Stephen Lucas-Smith"]
   biblio:source gov:birthcertificate ,
   iso639:language iso639:english .

Saying that it's a birth certificate source in english is probably enough to then further infer who uses it and for what, by following other sources of information associated with gov:birthcertificate.. in some country.. if we knew where person#1 was born.. or currently lives? ..

Information explodes rapidly when you step out of the world of relational databases. In fact, information explodes rapidly when you step out of the world of models. One thing that has been learnt in the world of knowledge representation is that idea that you can't really talk about the 'shape' of something, but you can at least describe the labels for things on something. Interestingly enough, we're still not very good at describing what something is or means.. we have technologies like OWL which goes along with RDF, but it's still not that great.

When you start to explore the world of OWL you begin to understand why functional programming can be very interesting indeed. Let's say I have a brother, which is a kind of sibling, sibling is symmetric but brother is not, the inverse of brother is functional, not declarative and is it transitive? We begin to see how computer reasoning is such a mess and why statistical machine learning is so successful.

If we want to capture all the meaning around a piece of data, then we have to make sure we never remove it from its context.. we have a big problem there, because capturing the context of a piece of data is pretty much impossible. It is at this point that we realize the best we can do is be pragmatic about it.. but where do we draw the line? Are relational databases enough for the future (no) and how do you scale a computer to be as big as the world (dunno) and how do we convince people to stop building models and starting building ontologies? (no idea).

One important first step is to make the programming languages support these ideas. I suppose a big step forward would be to only have one kind of global state - the internet. As such, Gilad's step of removing global variables from Newspeak is a good one... yet at the same time, doesn't it also suggest that describing data globally is something a language should support too? I can label things in my programming language temporarily, but I can't label them permanently. Example:

| person |

person := Person new.

However, I can't talk about person globally - instead I have to talk about it through a library, eg:

| person |

person := session readOneOf: Person primaryKey: 77

I know what id it is but I have to hard code where the data is coming from. Underneath the scenes there is even more going on describing how the Person looks, how it is stored, how to retrieve it, how to write it back again later.. and much much more.

Companies are starting to realize that an information approach to data makes a lot of sense. Take Oracle for example - if you get their spatial database, it includes N3-Triplets as part if its data storage paradigm. In fact, you can access relational data as if it were N3-Triplets too.. basically, they've made all relational databases act as an information store, using the table definitions as the labels, which can be mapped to a particular ontology. So what I really want to write in my program is this:

person:77

In this example, I assume I have previously initialized the @prefix of 'person' to point to my databases RDF interface. Of course, the 77 is probably in a variable - but some concepts are pretty solid.. for example.. what is a Person? If I want to create a person, it's just any old "object".. a thing I'm going to describe - if I want to tell other people that it is a person, we have to agree on what that means, so I might do something like this:

person type: foaf:Person

Now we can all agree on what a Person is, which is great because when you dig in to the ontologies, a foaf:Person is also a Dublin Core Person and a Wordnet Person and a PIM Person and a Geo Spacial Thing Person.. and so on and so forth. So now we have a nice -global- way of talking about what a Person is. We can program against it too:

foaf:Person printOn: aStream

aStream nextPutAll: foaf:firstName; space; nextPutAll: foaf:surname

Very powerful stuff we're playing with here. All of the decisions we made internally in our program, we're now sharing with the rest of the world. We're now entering in to a bigger idea of what contextually relevant information is.. we're also expanding our understanding of what a class is and what behavior is and what modularity is.

And as an added fun bonus, we can start exploring the world of inferred knowledge while we're at it, eg:

foaf:Person >> mycoolstuff:grandfather

^foaf:father foaf:father

In the above example, I've adopted some new syntactic rules. The first one is that the 'self' is implicit (like in Self and Newspeak and Javascript). The second is that there is a new kind of symbol <url | prefixed-url> which talks about a global piece of state on the internet. Apart from that, we're just doing message sends. I'm asking the object that is a foaf:Person> for its father, then that fathers father. I'm calling that method <mycoolstuff:grandfather> ... I can stick mycoolstuff at some web address and share it with anyone that is interested.

If the function we've created is truely a function - ie: it doesn't update any data, it's purely functional, then we can even safely share it with other programmers on the internet as a sort of cross-site-scripting. The only risk is wasting CPU cycles if their algorithm is slow.

Is the world ready for this kind of programming? .... maaaaaybe. Is Smalltalk ready for this kind of programming? ... getting there.


	Web Artima.com