Summary
More and more programmers and researchers have been suggesting heresies along the lines of "programmers should only work with a view of source code, not the source itself".
Traditionally programmers work with source code of a language directly within a text editor. In most language there are numerous automated tools to help with writing and viewing the source, but in the end the programmer still works directly on the source code.
An idea that I have heard more than once is that an IDE should only present a specific
view of the source code, while the real source is hidden behind the scenes in a more general format, like XML. If a source file was really an XML file at its base, and rarely directly edited by a programmer, it could have many advantages.
One of many advantages is that history tracking could be embedded in the XML source, without being in the fact of programmers all the time. When I first look at source code, I like to see the history of revisions, who did what when or why. If this is embedded in the source code as comments, I often strip it so that the code is easier for me to work with, and I know I am not the only one. If the editor only presents me with a view of the source it could be a simple matter of checking a property option to turn history viewing on or off.
Given the following program:
program answer {
_main() {
// the question of life, the universe and everything
x = 15 + 3 * 9;
// the answer to life, the universe and everything
write(x);
}
}
Here is an example of how an XML source-code might look:
<program name="answer">
<function name="_main">
<history>
<modified>
<author>Christoper Diggins</author>
<date>10/21/2005</date>
<license>BSD</license>
</modified>
<original>
<author>unknown</author>
<licence>Public Domain</licence>
<url>http://www.somewhere.there</url>
</original>
</history>
<statement>
<raw>x = 15 + 3 * 9</raw>
<ast>
<push>x</push>
<push>15</push>
<push>3</push>
<push>9</push>
<call>_star</call>
<call>_plus</call>
<call>_eq</call>
</ast>
<comment>
the question of life, the universe and everything
</comment>
</statement>
<statement>
<raw>write(x)</raw>
<ast>
<push>x</push>
<call>write</call>
</ast>
<comment>
the answer to life, the universe and everything
</comment>
</statement>
</function>
</program>
I think that by storing source as XML, it could give a rebirth to theroetically good ideas like literate programming, which tend to be ignored in practice.
There are a lot of possibilities with using XML source representation. What are your ideas on things you would like stored with the source code that you don't want to always have to look at?
> This is an interesting concept. > > Let's see, people generally need comments or other tools > to record: > > * build information/dependencies; > * todo lists; > * changes and rationale; > * design notes/Doxygen-like info; > * configuration info; > * RPC stuff (interfaces); > * tests; > * debug-specific info; > * lots of other things I'm not even aware of. > > I think this approach could address each of these.
Plan 9, the operating system, came with an interesting utility to find threading issues (http://www.cs.bell-labs.com/sys/doc/spin.html), but it requires you to create a model of the program. Hmm. Sounds like a job for metadata.
First, XML is crrraaaap (see other forum) and about the clumsiest format I can think of for code.
However, an AST representing the language isn't so far fetched. In fact, this is pretty much what you get in a Smalltalk environment. The text is a serialization format of the code. The code is what you execute and its just data.
Having an AST(think DOM if you like) allows interesting programmatic transformations.
CornerStone (database engine from Infocom in the mid 80s) had an interesting feature---user defined names (identifiers) where seperate from the internal IDs used by the system. Imagine changing the name of a function (or variable), and having it automatically change everywhere else in the source code where used (heck, IDEs could be doing that now, but since I still use an editor, I wouldn't know).
I am constantly re-amazed at the world's love affair with XML. Why use XML for every file format? In the case of storing a program, what's wrong with a context free grammar? YACC and LEX still work as far as I know, and the resulting files tend to be more readable and less bulky than XML in my opinion, and you can embed information in either.
Nevertheless, I think the notion of defining a programming language in terms of its AST instead of its syntax has a lot of merit, because it makes the "source code" a storage medium describing the AST, which developers can view and manipulate in interesting ways via tools.
My IDE, IntelliJ, does interesting and very useful transformations and analysis on the code, and though I don't know how it works on the inside, I somehow doubt it is working on text. A few years back I talked to Gosling about his Jackpot project, which does this sort of stuff. But his file format is defined as a context free grammar, not an XML schema, and that grammar is curiously identical to the grammar of a little language called Java. The hardest part, according to him, was the comments, because they aren't part of the grammar:
> I am constantly re-amazed at the world's love affair with > XML. Why use XML for every file format? In the case of > storing a program, what's wrong with a context free > grammar? YACC and LEX still work as far as I know, and the > resulting files tend to be more readable and less bulky > than XML in my opinion, and you can embed information in > either.
I won't argue that there aren't disadavantages to XML. I am not in love with XML, but I think there are several advantages to using XML as a serialization format for source-code such as:
- there already exist numerous tools for parsing, editing, displaying, transforming, manipulating and translating. - it is a mature format - it has encoding information embedded internally - it is quickly recognizable - it is unambiguous - it can be easily extended - it is robust (i.e. works with partial information) - it has a tree structure
For me however the issue is not so much how great XML is, but rather how bad a home-grown solution would be in comparison. Most non-trivial data representation formats that are hand-rolled are riddled with bugs, ambiguities, and often lead quickly down a road to incompatible versions from every vendor.
> I won't argue that there aren't disadavantages to XML. I > am not in love with XML, but I think there are several > advantages to using XML as a serialization format for > source-code such as: > ... > - it is unambiguous
I'll disagree here - given a group of developers, a hierarchical data structure and a request to represent it as xml, you are likely to get multiple formats because of the attribute/entity ambiguity.
> > - it is unambiguous > > I'll disagree here - given a group of developers, a > hierarchical data structure and a request to represent it > as xml, you are likely to get multiple formats because of > the attribute/entity ambiguity.
I agree with you. I was referring to the difficulty inherent in creating non-trivial data format with an unambiguous grammar. For instance a programming language grammar.
> When I first look at source code, I like to see the history of revisions, who did > what when or why. If this is embedded in the source code as comments, I > often strip it so that the code is easier for me to work with, and I know I am > not the only one.
What's wrong about CVS, subversion, ...? They provide you exactly with this kind of information. It seems to be rather clumsy to include this information in source files (imagine a file with 1000 revisions ...). I never understood why people want to do that. Besides, this leads to duplication of information (in the revision control system and in the source files...)...
This is an old idea. 30 years ago timeshared BASIC systems stored programs as bytecodes and rendered them as text for viewing and editing by a programmer. Numerous 4GLs did the same thing. Every article that begins with the author expressing wonder that Unix has survived for so long is immediately suspicious; I think the author is either inexperienced in the real world, unable to learn something difficult, or a crank.
Tacking XML onto any old idea makes it seem new to the wide-eyed inexperienced audience Mr. Wilson is writing for. Us old relics, hanging on to Unix and vi and command line programs, who Mr. Wilson arrogantly sneers at while he waits for us to die and retire, need to get out of the way of progress. We need to make way for the "next generation" of programmers who have new ideas that can't be held hostage by text files and command line tools and languages that can't be arbitrarily extended into millions of personalized Towers of Babel.
In the future Mr. Wilson describes, every programmer will have their own custom language, and their source code will be stored in incompatible and unreadable XML files. That will make text files and emacs and gcc flags look easy and fun.
"It has only taken HTML and XML a decade to become the most popular data format in history." I think that honor actually goes to the humble plain text format. Using XML as a "data format," contrary to the original intentions or design of XML, has created a lot of brittle, overwrought software in the last five years.
Ideas like this never really go anywhere even if they sound good on paper, so I'm not too worried. In ten years we'll still have Unix and C and vi, and XML will be forgotten except as a legacy format used to send invoices and shipping manifests around.
Let's compare plain old text to XML according to these criteria:
> - there already exist numerous tools for parsing, editing, > displaying, transforming, manipulating and translating.
text: YES XML: yes, but not as many as text
> - it is a mature format
text: YES XML: not really, it's still evolving
> - it has encoding information embedded internally
text: yes, but not according to any single standard XML: YES, frequently unnecessarily
> - it is quickly recognizable
text: yes, and it's human-readable too XML: yes, but not human-readable
> - it is unambiguous
text: not in the sense I think you mean XML: not really
> - it can be easily extended
text: infinitely XML: yes, as long as the extensions are described unambiguously
> - it is robust (i.e. works with partial information)
text: yes XML: not in my experience -- parsers crash and burn on bad XML XML is inherently more fragile than text because there is a lot more to go wrong
> - it has a tree structure
text: yes, can represent anything but not according to a single standard XML: yes
I'm not sure tree structure is an advantage, though. I don't agree that program source code is always structured hierarchically.
> For me however the issue is not so much how great XML is, > but rather how bad a home-grown solution would be in > comparison. Most non-trivial data representation formats > that are hand-rolled are riddled with bugs, ambiguities, > and often lead quickly down a road to incompatible > versions from every vendor.
That pretty much describes the state of XML today. Everyone is rolling their own using a more bloated language.
Flat View: This topic has 83 replies
on 6 pages
[
123456
|
»
]