On the occasion of a new release of the Relo code explorer this week, Artima spoke with Sinha about the problems developers face when charting their paths through large code bases, and how Relo helps developers quickly comprehend large amounts of code.
The fundamental problem Relo helps with is understanding large projects. By large, I mean any code that’s more than a single method long. In fact, we’ve been doing evaluations of this tool on code bases that are more than 100,000 lines long.
Surveys indicate that developers spend on average half of their time understanding code. That shows to some extent that there is a problem with tools that try to help us understand code faster...
There’s been a lot of work on helping people understand code at the method level, or understand complex algorithms. But a harder problem most developers face today is comprehending how different methods interact with each other and, more generally, following the multitude of things connected in a project. Most tools currently have a hard time scaling in that aspect, and completely break down with code bases of over 30,000 lines of code.
It's important to point out that although developers spend half their time understanding code, understanding is typically secondary to what developers need to do. Fixing a bug or adding a feature are more important.
As a result, a typical developer will not try to understand the entire code base, because that would take a long time. He will look at those parts of a code base he cares about for a task assigned to him, such as a bug fix. A manager will point him to a class to look at, and then the developer needs to understand things starting from there.
The traditional exploration path many developers take is to follow methods and method references using an IDE’s built-in facilities. Studies we did showed that when people start exploring things that way, they can remember a limited amount of information. When you go past beyond two or three hops from the starting point, you start forgetting things. Some people at that point take down notes on paper to aid their memory...
Relo provides what I would call reverse engineering-based exploration: As you browse code inside your IDE [Editor's note: Currently only Eclipse is supported], Relo is paying attention in the background to whatever piece of code you look at, and keeps track of your path.
At any time, you can open a Relo session based on your history, and Relo will create a diagram based on that history. That supplements your short-term memory about the code.
In addition, instead of showing all the details, the diagram only shows aspects of the code relevant to your exploring. That is one way Relo differs from most UML tools. UML tools are great at helping you understand and create new designs, but not as good at helping you understand code that already exists. They don't help you focus on parts of an existing code base relevant to the tasks you need to accomplish.
To better understand code, you need to look at the interaction of multiple relationships. Eclipse’s package explorer, for instance, is great, but it shows items based on containment relationship only. Eclipse can also generate views based on method calls. But each of those tools, or views, focuses on just one particular relationship. As soon as you want to look at multiple relationships, you start to go further out in that exploration trail, forgetting what you were looking at. That's where Relo's strength comes in, because it can help you show multiple relationships focused on parts of the code you need to know about.
What methods have you found effective in helping you quickly understanding large code bases?