The Artima Developer Community
Sponsored Link

Agile Buzz Forum
A handy heuristic for auditing source code

0 replies.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a flat view of this topic  Flat View
Previous Topic   Next Topic
Threaded View: This topic has 0 replies on 1 page
Laurent Bossavit

Posts: 397
Nickname: morendil
Registered: Aug, 2003

Laurent Bossavit's obsession is project effectiveness through clear and intentional conversations
A handy heuristic for auditing source code Posted: Oct 30, 2006 11:24 AM
Reply to this message Reply

This post originated from an RSS feed registered with Agile Buzz by Laurent Bossavit.
Original Post: A handy heuristic for auditing source code
Feed Title: Incipient(thoughts)
Feed URL: http://bossavit.com/thoughts/index.rdf
Feed Description: You're in a maze of twisty little decisions, all alike. You're in a maze of twisty little decisions, all different.
Latest Agile Buzz Posts
Latest Agile Buzz Posts by Laurent Bossavit
Latest Posts From Incipient(thoughts)

Clients often ask me to take a look at some project's code base. "What do you think of the design ? Is this well-factored ?"

The last time this happened, I spent a few minutes looking at the code and came up with several observations. "There's a huge amount of generated code here, you might want to watch the complexity cost of that... This code here looks quite complex and has little separation of concerns... Here is an entire class missing unit tests... This class here has long stretches of duplicated code..."

Then my client had an interesting remark. "I don't know how you do this - spot so many things in such a short time." I realized I didn't know how I did it, until he asked - I just did it.

My technique, for what it's worth, consists of getting a file list of all source code files, then sorting that list by (decreasing) file size. I start with the largest source file, and see if there's a reason it's the largest.

  • Generated code
Very often, generated code is excessively verbose, so if a project uses code generation at all, those files tend to be at the top of the list. There's a number of anti-patterns related to code generation; I'll make a note to look into that later. (Does it happen that people have to hand-edit generated code ? Are the generated code files checked into version control ?) I then delete these files and generate the file list again. (Yes, delete - I'm only working on a local copy and everything is in version control anyway. If that's not the case, don't bother with the audit - getting the team to start using version control takes priority .)

  • Core class
Another reason for a class to float to the top is that it "does everything". "Yes, it's a big class - but it's the object that's at the center of all our design, the entire app revolves around it." I'll check for a large number of defects associated with that class. Not all code is equally defect-prone; a small number of source file often accounts for a disproportionate fraction of all defects. Classes (or source modules in general) tend to have fewer when they do one and only thing: querying the database OR crunching numbers OR formatting reports. A class that does everything will do everything badly.

Languages like Java or C# have a useful convention of "one file, one class" which makes it easier to spot overlarge classes by looking at file sizes. In some C++ codebases things can get more complicated - but then, that's generally useful information too.

  • Smelly code
Even a class with reasonably focused responsibility can bloat up fast if you put your mind to it - that's what Copy and Paste are for. Accordingly, the largest source files are a good place to look for design smells: duplicated code, long methods, switch statements, etc. When the team is supposed to use TDD, I also check the unit tests for that class or module. TDD generally results in smaller methods (because code is written one test at a time, and tests must exercise methods in isolation) and limited duplication (because we're supposed to refactor each time we've made a test pass). So, either the team has not been applying TDD... or something else is happening that I should know about.

  • Other observations
It's also interesting to sum the file sizes (you can compare, for instance, the amount of test code and application code; an XP project is supposed to show roughly a 1:1 ratio), figure out the average, or look at the smallest files.

It's not that I know exactly what I'm doing or what I'm looking for when I do this sort of thing. I know that there's some interesting research concerning, for instance, power law distributions in source code. But as far as I'm aware there are no clear models of why this or that size distribution should arise, thus little practical guidance if you find code that does not obey the expected distribution. (I'd appreciate any pointers from readers in the direction of relevant research.)

The point is more that, looking at the "macroscopic" properties of source code, I will reach results faster than by poring over details. These results will orient my next steps, if I want to drill down to the details; or, quite often, they will point to immediate steps the team can take to improve its design - which is the kind of advice I get paid for.

Read: A handy heuristic for auditing source code


Topic: VW 7.4.1d engines released Previous Topic   Next Topic Topic: Validate that XHTML

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use