Weblogs Forum - Programmers Shouldn't Touch the Source

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Weblogs Forum
Programmers Shouldn't Touch the Source

83 replies on 6 pages. Most recent reply: Jul 11, 2006 2:02 PM by Hossam Mashhady

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 83 replies on 6 pages [ « | 1 ... 2 3 4 5 6 | » ]

Martin Möbius

Posts: 2
Nickname: mmoebius
Registered: Oct, 2005

Re: Programmers Shouldn't Touch the Source

Posted: Oct 26, 2005 8:09 AM

You say making "things" (wich things?) shorter will help us?
All the newer languages are adding information, strong typing there, field prefixes here. Annotations, access modifier, class names, namespace, patterns all these things add information.
All wrong?

Harrison Ainsworth

Posts: 57
Nickname: hxa7241
Registered: Apr, 2005

separate sharable parsed layers

Posted: Oct 26, 2005 9:24 AM

ok then,

* leave the plain-text file alone
* put the analysis/parsing/semi-compilation into a separate file
* make that separate file XML or YAML or similar
* include an md5/sha1 hash of the original text file at the time of the parsing

so you now have:

* plain-text source to use as you always have, if you want
* a sophisticated form/layer openly sharable and usable by any editor/tool
* a means of knowing when to regenerate the parsed form
* a negligable overhead of serializing two file formats instead of one

The real issue is not one of adding data that isn't already there or made by editor or compiler. It is a matter of externalizing some of that processing into an openly sharable form. The above scheme appears to fulfil that, and the separation of formats allows it to be generalised further: multiple analyses can be applied simultaneously, or new ones developed, without disturbing any others.

-- no-one loses, everyone gains, global functionality-quotient increases a little! (unless I've missed something, it often happens...)

Max Lybbert

Posts: 314
Nickname: mlybbert
Registered: Apr, 2005

Re: Programmers Shouldn't Touch the Source

Posted: Oct 26, 2005 9:49 AM

/* It would seem to me that tools that generate views of the source that only expose the agreed interface contract are the best way to meet this requirement - javadoc is a good example. unfortunately we all know how hard it is to get developers to annotate their source richly enough :-)
*/

Yes, it's possible to do this. But then you get everybody using a million different interface contracts -- Javadoc vs. Doxygen, for instance. Picking a good interface, or at least interface builder, would go a long way to avoid that.

And, yes, you're right that having the ability to document code doesn't mean people will use it.

Max Lybbert

Posts: 314
Nickname: mlybbert
Registered: Apr, 2005

Re: separate sharable parsed layers

Posted: Oct 26, 2005 9:53 AM

/* ok then,

* leave the plain-text file alone
* put the analysis/parsing/semi-compilation into a separate file
* make that separate file XML or YAML or similar
* include an md5/sha1 hash of the original text file at the time of the parsing

so you now have:

* plain-text source to use as you always have.

[and,] if you want:

* a sophisticated form/layer openly sharable and usable by any editor/tool
* a means of knowing when to regenerate the parsed form
* a negligable overhead of serializing two file formats instead of one
*/

(Yes, I edited that a little).

This looks like a decent proposal. Then again, I don't think CDiggins ever implied that XML was the only true way.

OTOH, this proposal doesn't solve any problems that CDiggins' overlooks. Or does it?

Alfredo Aldundi

Posts: 6
Nickname: cheesy
Registered: Oct, 2005

Re: Programmers Shouldn't Touch the Source

Posted: Oct 26, 2005 10:28 AM

> I see no good reason to obstruct access to this information to users of a
> source file by hiding it in an SCM. I can't understand what in the world is
> possibly dangerous by providing access to this information to users in a
> standardized format.

I do not see a good reason to obstruct access this information either. But why not just create a report tool that gathers all the information from the revision control system, the source file itself, and ... and create a standard reporting output (because it seems that is what you want...)?

Alfredo Aldundi

Posts: 6
Nickname: cheesy
Registered: Oct, 2005

Re: Programmers Shouldn't Touch the Source

Posted: Oct 26, 2005 10:34 AM

> Why do these constant translations between textfile formats (including
> parsing the source), when you can parse it once, and potentially display and
> edit it in any way you like, without having to re-parse the source after every
> change?

I don't get it. You have to parse an XML file too. So where is the difference? And unless you want to edit the raw parse tree (I doubt it...) you still have to re-parse parts of the "source view" after every change. Isn't that what modern IDEs are already doing?

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Programmers Shouldn't Touch the Source

Posted: Oct 26, 2005 10:38 AM

> I do not see a good reason to obstruct access this
> information either. But why not just create a report tool
> that gathers all the information from the revision control
> system, the source file itself, and ... and create a
> standard reporting output (because it seems that is what
> you want...)?

I believe the meta-information is as important as the code iteself. I believe you are suggested the meta-information should be managed by the SCM, where I am suggesting it would simpler for the source file itself to contain the meta-information.


<document>
  <author>Christopher Diggins</author>
  <name>Hello World program</name>
  <history>
    <version>
      <major>1</major>
      <minor>0</minor>
      <modified>2005-10-26</modified>
    <version>
  </history>
  <source lang="C++ 98">
  <![CDATA[
    #include <iostream>
    int main() {
      std::cout << "hello world" << std::endl;
    }
  ]]>
  </source>
</document>

So my question is this: what is wrong with this approach? AFAICT it is as about as simple as can be.

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Programmers Shouldn't Touch the Source

Posted: Oct 26, 2005 10:53 AM

I left out a rather important element, the Abstract Syntax Tree representation:


 <document>
   <author>Christopher Diggins</author>
   <name>Hello World program</name>
   <history>
     <version>
       <major>1</major>
       <minor>0</minor>
       <modified>2005-10-26</modified>
     <version>
   </history>
   <source lang="C++ 98">
   <![CDATA[
     #include <iostream>
     int main() {
       std::cout << "hello world" << std::endl;
     }
   ]]>
   </source>
    <ast>
      <include>isostream</include>
      <function name="main">
        <type>int</type>
        <block>
          <call name="operator<<">
            <call name="operator<<">
              <eval>std::cout</eval>
              <eval>"hello world"</eval>
            </call>
            <eval>std::endl</eval>
          </call>
        </block>
      </function>
    </ast>
 </document>

John O'Shea

Posts: 4
Nickname: aehso
Registered: Oct, 2005

Re: Programmers Shouldn't Touch the Source

Posted: Oct 26, 2005 11:53 AM

> So my question is this: what is wrong with this approach?
> AFAICT it is as about as simple as can be.

It's too simple :-) Let me give you just one example of a problem, using a slightly modified schema (only because yours doesn't nicely support storing multiple revisions in the document) - I'll use Java because I'm a java head:


<document>
   <author>Christopher Diggins</author>
   <name>Hello World program</name>
   <history>
     <version>
       <major>1</major>
       <minor>1</minor>
       <modified>2005-10-26</modified>
       <source lang="Java 1.3">
         <![CDATA[
           package demo;

           public class Example {
             public static void main() {
               System.out.println("I don't depend on anything");
             }         
           }
         ]]>
       </source>
     <version>
     <version>
       <major>1</major>
       <minor>0</minor>
       <modified>2004-11-01</modified>
       <source lang="Java 1.2">
         <![CDATA[
           package demo;
           import OtherClass;

           public class Example {
             public static void main() {
               System.out.println("I do depend on another class");
               OtherClass other = new OtherClass();
               other.doSomething();
             }
           }
         ]]>
       </source>
     <version>
   </history>
 </document>

Now lets say I have a nice IDE that allows me to easily browse this history, and I browse back to v1.0 of Example. I switch back and see the reference to OtherClass. But OtherClass doesn't exist in my distribution either because it was refactored out of my source or because it was contained in an different binary library that the older revision of this source depended on. Now the code just doesn't make any sense to me - the facility is really of no use to me.

My point is it is impossible to capture the full revision history of a single code element without also distributing the entire revision history of every other element (code or library) that ever existed in the SCM respository. The example above illustrates that in all but the most trivial cases, the user needs access to all of that data to be able to make sense of the revision history of a single code element. For a single SDK release, distributing that much metadata, let alone encoding it in a common format like XML, just isn't practical - ask any SCM repository administrator how much bigger the repository is than a single snapshot of the source.

(Note, I could also mention that I should really have used some delta algorithm to reduce the size of version elements in the example above (as SCM systems do internally) But it's a simple example so I didn't bother - in any case, doing so would cause problems in of itself as it would be stored in a CDATA so it would be impossible to interpret it using an XML processor based tool anyway)

Anyway, I think we're gone waaay off-topic here, but I just wanted to clarify why attempting to include revision history in your metadata is not achievable. I do agree with the other posters here that most modern languages have constructs that allow programmers to embed the other metadata you want to have - a tool that could extract that information into a standard XML based format is do-able in but you end up adding a layer of indirection that in most cases doesn't achieve much.

Lastly, other posters here who have an interest in Eclipse might be interested to note that they are embarking on defining a Generic Language Model (http://www.eclipse.org/proposals/dltk/main.html) that is <i>slightly</i> related to this topic...

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Programmers Shouldn't Touch the Source

Posted: Oct 26, 2005 1:19 PM

> It's too simple :-) Let me give you just one example of a
> problem, using a slightly modified schema (only because
> yours doesn't nicely support storing multiple revisions in
> the document) - I'll use Java because I'm a java head:

Of course I agree storing multiple revisions in the document itself is a bad idea. I don't know where I lead you to believe that I think that that was a good idea.

John O'Shea

Posts: 4
Nickname: aehso
Registered: Oct, 2005

Re: Programmers Shouldn't Touch the Source

Posted: Oct 26, 2005 2:15 PM

> Of course I agree storing multiple revisions in the
> document itself is a bad idea. I don't know where I lead
> you to believe that I think that that was a good idea.

Hmmm -
"One of many advantages is that history tracking could be embedded in the XML source, without being in the fact of programmers all the time."

"For me the big problem with CVS is that there is a one-to-one mapping between source and CVS information, why share one without the other? If they are one and the same, why not a single file with all of the crucial data provided?"

This, and a couple of other subsequent comments lead me to believe you were including revisions in your thoughts (though I can see from your last few posts that you've changed that opinion.)

Possibly embedding a URI in the current revision's metadata might be useful - the URI could use a RCS specific protcol (e.g. pserver:x.com/cvsroot/x/y/z.java) to access earlier revisions, or a defined HTTP URL scheme to access via a web server front end, or perhaps a standardized "RCS" web service.

All CVS clients work a little like this - when they need to do something with a source element in the local snapshot they use the metadata in the associated CVS/ folder - that metadata contains the info required to connect back to the repository.
I think part of your problem is most OSS projects package their source but don't package the associated CVS metadata, presumably because their CVS repositories are not externally accessible (or of course they could be using a different repositories like Clearcase that does not embed metadata in the source tree)

Ben St. John

Posts: 3
Nickname: jbstjohn
Registered: Aug, 2005

Re: Programmers Shouldn't Touch the Source

Posted: Oct 27, 2005 9:09 AM

Definitely an interesting topic. You seem to have backed off on the the revision idea, but pushed forward other stuff (links to test suites, etc). IMO, most of this can be nicely covered by tags that are maintained by the version control system. And you *will* have version control system, so anything that doesn't use information from it directly will require extra work from developers, which isn't too likely.

And, in fact, is probably the stumbling block to not having all of this meta-information at the moment, rather than any technological issue.

Somewhat conflated with the revisioning, and let's say external, metadata, is the idea of abstracting out programming concepts, or what you could consider internal metadata. But as someone else astutely noted, this is already what most IDE's are doing. Yes, they're each doing it in their own way, but, at least per language, that's somewhat necessary. You need, in your language model, some way of representing every feature of that language. Which leads pretty much to a compiler/interpreter for that language.

What I see as going in the same direction, but perhaps being more useful, are things like Rational's binding of UML diagrams with code, with the certainty of a one-to-one correspondence (or at least the same correspondence as source code has to assembly...).

I would like to see more of this. I'm annoyed, for example, that with most editors, I can't click on a macro, and have it replace in my view, one level of expansion of that macro. Or a "forEach" in C++. It seems that IDEs are going in this direction -- the refactoring possibilities keep getting better, class diagrams too. Still it doesn't seem that things are consistent enough, when hopping between abstraction layers, to be as useful as we'd like.

There are other, perhaps more inherent problems too. Abstractions are often leaky (see Joel Spolsky). And for many things, it's not yet clear what the "most worthwhile" abstractions to have formalized would be. In going from assembly to source code, it was agreed sub-routines, variable names, and some flow control constructs were pretty good ideas (okay, the first two could be seen as assembly vs. machine code...). I think the formalization of some patterns (isn't that nice and vague) might be one the next possible abstraction step. [I.e. diagrams or forms for a Singleton class, or Class Factory, or model/view architecture that puts a lot of "standard" stuff in the background, and maybe enforces some programming contracts -- e.g. your view doesn't change your data, or this function never allocates or frees memory...]

In short, and to get back to your original ideas, I think it will be more important to improve the tools (perhaps including their export/report generation abilities) and to clarify what can be safely abstracted away, than to think about storing code in a particular format. Version control systems mean code is already stored as a database. As long as this can be happily exported, this is a good thing. Tags in this database should take care of external metadata. And a more clearly defined abstraction layer above current source code is needed, for internal metadata.

Cheers,
Ben

Greg Jorgensen

Posts: 65
Nickname: gregjor
Registered: Feb, 2004

Re: Programmers Shouldn't Touch the Source

Posted: Oct 27, 2005 10:03 PM

> However, at least for BASIC on the home computers, there
> was a one-to-one mapping between token and textual
> representation, and they didn't store the program in any
> kind of AST form, nor did the format allow selective
> viewing of the source, such as turning on/off comments.
> The goal was to save space in the program file, not to aid
> reading of the code, or enable program transformations. So
> if this is what you refer to, then you can't really say
> it's the same idea.

I wasn't drawing a parallel between the implementations of today vs. those of 30 years ago. Tokenizing source code for storage and execution, and rendering it as text for human viewing and editing are old ideas, similar to but not the same as storing source code as XML. Nothing in the old tokenized files precludes selective viewing, transformations, or embedded metadata. The interpreted BASIC language and the simple platforms it ran on had no need for those things, and perhaps no one imagined them back then. I worked on some HP 2000 timeshared BASIC computers a long time ago and did write code (in BASIC) that manipulated the tokenized versions of other programs.

> > Every article that begins with the author
> > expressing wonder that Unix has survived for so long is
> > immediately suspicious; I think the author is either
> > inexperienced in the real world, unable to learn
> > something difficult, or a crank.
>
> I don't understand what you refer to here, because I don't
> find anything of this sort in Christopher's blog, or the
> linked-to article. Could you provide a quote?

My comment was about Mr. Wilson's article, not any of Mr. Diggins' postings. I'm sure you run across articles and blog postings that begin by wondering why text files, Unix, vi, emacs, C, what have you have hung on for so long, followed by the author proposing a better whiz-bang solution. What they often miss is that the longevity and hardiness of the simple formats and tools is the reason for their continued usefulness and adaptibility. Mr. Wilson attributes the persistence of text files and vi to aging stick-in-the-mud programmers like me. I attribute the persistence (and ongoing adoption by lots of young programmers every day) to their usefulness. That's why I predicted that in ten years I (and lots of other programmers) will still be saving text files of source code from vi and its like, and XML will find its proper place as a replacement for proprietary binary EDI formats.

> You may scoff of this, but I also think this may be the
> "next big thing". A lot of indicators point in that
> direction, such as numerous research projects (several
> links were given in a posting by Christopher), conferences
> on "generative programming", etc.

When any of that moves beyond research projects into something useful I'll gladly change my mind. The disconnect between what researchers do and what happens in the world of real data and real programs is pretty big.

> I don't think Mr. Wilson thinks this will be "the end of
> command-line tools way of doing things" - using text as
> the common medium and piping it from program to program,
> as that still works fine for data structured as records or
> tables. However, it will take more than ridicule to
> counter his arguments of its shortcomings, such as the
> difficulty of representing data in a tree structure this
> way. Besides, as someone else pointed out, "text" is not a
> format in itself - it's more of a medium - so comparing
> "text" to "XML" (a way to structure text) is like
> comparing apples to oranges.

It was Mr. Wilson who offered XML as an alternative to text and ridiculed aging programmers and their devotion to text-based tools. The original posting by Mr. Diggins offered XML as a structured alternative to plain text. I don't know what the deal is with "tree structures" that keeps coming up. An outline (in text) can, for example, represent hierarchies. What do tree structures have to do with program source code?

> What you talk of as "personalized Towers of Babel", others
> may call a DSL.

A DSL is great as long as it stays in its specific domain. What Mr. Wilson and Mr. Diggins describe is DSLs released into the wild. Imagine the fun if all of us had our own collection of customized library routines and our own extended dialect of our favorite language. Exchanging code or writing anything portable would become close to impossible, and we'd lose the shared expertise and idioms of a common language.

> Similar to the Unix model of small
> programs doing one task well, you may also have DSLs
> working well for a specific task or domain, allowing you
> to write programs in a more natural way in that domain.
> That's already used to a great extent, today (HTML, SQL,
> regex, etc.).

Those are all good examples of text-based standards, not DSLs that every programmer twiddles and tweaks on their own.

bug not

Posts: 41
Nickname: bugmenot
Registered: Jul, 2004

Re: Programmers Shouldn't Touch the Source

Posted: Oct 29, 2005 3:15 PM

We shouldn't reinvent approaches but rather go back to using what others designed in an earlier more innovative age.

Smalltalk stores comments associated with code as well as the bytecodes. It shouldn't be too hard to change this to add more such features.

One advantage of such an approach is that you can write a database of your methods, variables, classes, and live objects. In Smalltalk, looking up who uses a method is very fast because it uses an indexed database of sorts. Eclipse is cool.. but slow by comparison. Just try Squeak, Smalltalk/X or VA Smalltalk if you don't believe me.

People are slow to adopt really revolutionary technology. We haven't had a major innovation in the OO space since Smalltalk and Self. All others seem rehashing of the same concepts with a lot less success.

Why Ruby when you Smalltalk is a better Ruby than Ruby! (Same goes for Java, C# and C++)

Eric Gillespie

Posts: 13
Nickname: viking
Registered: Jun, 2005

Re: Programmers Shouldn't Touch the Source

Posted: Oct 31, 2005 12:13 AM

I'm rather sorry I didn't get to this conversation earlier, but here goes. Coming at this from a new programmer's mindset, and trying to learn a language (C, Java), I'd rather be able to pick up an editor, and just view the file in plain text, without having to strip off the XML envelope first. If we wrapped up the source in an XML-serialised form (revision trees notwithstanding), then I'd need an XML-capable IDE to utilise the text. That IDE would also need to be able to import plaintext source and reformat it into well-formed XML. That would also take extra time.

On my system, an IDE takes up WAY more "grunt" of the system than a simple editor. Granted, I use Eclipse, and for what I use it for (project management) an editor isn't the most ideal tool to use, but it does okay, I just prefer doing management in Eclipse. But for editing, I think Eclipse sucks (subjective opinion here) because I'm too used to how quickly I can use vim. As you can probably tell, I'm not comfortable with programming yet, so the ... tools... are a bit of a mystery to me too.

For people who have resource-constrained machines, an editor is a far better bet for editing plaintext source than a XML-reading IDE. If they could bolt on an XML-parser onto either of vim or emacs, then this MIGHT solve (partially) the problem of deserialisation and representation, but that isn't the job of an editor to do serialisation. Not only that, but the XML parser would take up extra memory.

Am I off-topic here?

Flat View: This topic has 83 replies on 6 pages [ « | 2 3 4 5 6 | » ]

Previous Topic

Next Topic