The Artima Developer Community
Sponsored Link

Java Community News
Krugle Adds CodeSpaces to Its Search Engine

6 replies on 1 page. Most recent reply: Nov 22, 2006 5:21 PM by Guido van van Rossum

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 6 replies on 1 page
Frank Sommers

Posts: 2642
Nickname: fsommers
Registered: Jan, 2002

Krugle Adds CodeSpaces to Its Search Engine Posted: Oct 30, 2006 1:10 PM
Reply to this message Reply
Summary
Krugle is a search engine for source code that currently indexes about 20 million code files from over 500 public source code repositories. Artima spoke with Krugle co-founder Ken Krugler about how code search can make a developer more effective, and about the latest features of his company's search engine.
Advertisement

Krugle released CodeSpaces, a new addition to its online source code search engine that allows users to share search results with other developers.

Krugle's search engine currently indexes about 20 million code files, or about 2 billion lines of source code from 500 repositories, according company co-founder Ken Krugler. Artima spoke with Krugler about CodeSpaces, and the role of search in the development process:

For many developers, the most powerful development tool is search. Yet, if you ask developers [what they consider their most powerful development tool], some would say Eclipse, Visual Studio, GCC, or vi. None of them would mention search. People talk about programming methodologies, extreme programming, or test-driven development, but I haven't seen anything about the role search plays in being a good software developer...

In many ways, search ... is more a means to an end than most other developer tools [are], such as an editor. People think of [search] as something you can do to get down to the business of writing code. They don't think of it as part of the software development process.

Krugler believes that search is one of the primary ways developers can find information from multiple locations not only about the code they're working on, but also about code that has already solved a problem they are about to work on. This sort of information can make a developer much more effective, according to Krugler:

In the old days, you would get the CD, and that had all you needed to become effective [in developing software]. Today, the information you need is very widely distributed. As there are more open-source projects, less of the information [related to a project] is on a single Web site.

There's just a lot more stuff available today [that you can] reuse. If you're a Java developer, [for instance], the number of publicly available JARs out there that have acceptable licenses is large, and keeps growing. The value of search, in the sense of the likelihood of finding something that you can use, keeps going up.

How do you effectively search across different types of information to quickly find something that's useful? And, equally important, how do you decide that you've searched enough, and found that the thing [you were looking for] doesn't exist, and therefore [it] makes sense to write it yourself? Making that decision too soon can be very costly.

Krugler noted that many leading developers still associate some stigma with looking at someone else's source code as a blueprint for their own work:

When we first started out, we thought that most of our users would come from the top ten percent of developers— the alpha geeks, those reading blogs, are well-connected, the people who would be at conferences.

What we found [is that] in a significant percentage of those people, there is this feeling like... here is a landscape painter, and they would never think of taking someone else's partially painted picture and painting [something] in the background. They're creating a work of art. The more there is a sense of art or passion, the more resistance there is to reuse.

That isn't to say that none of the top programmers are into reuse. There are some great programmers who accomplish a lot because they are open to the idea of reuse. We just found that it's a more prevalent attitude that if you're reusing code, then you're like a script kiddie.

When you look at the people who are most excited about what they are able to do with Krugle, a lot of them are, for instance, biologists... who are using code... and still think of themselves as programmers, but not necessarily as professional programmers.

While developers increasingly use general-purpose search engines, such as Google, to look up examples or usages of code, Krugler pointed out the importance of exploratory search, and the availability of metadata, in making code search more effective:

Exploratory search is browsing with a purpose. People think of browsing as wondering around the Web, but exploratory search is where you're on a task, but are not sure how to find the thing that you're looking for. That's where I think we currently deliver the most value.

For people who are just now getting into code search, they start with a more semantic type search—I'm looking for something that does X. That's one of the key reasons we went down a path that involves exploratory search, which drives our user inter face.

Suppose that you're looking for an MD5 implementation. If you just search for source code, you will find that many projects use an MD 5 implementation. But how do you find one that you can use in your own project?

Even if you find a project that has [the code you can use], how do you assess the project, how do you determine that that's something you want to use? And you want to look at the code, to get a sense of the quality of the code, as well as look at the activity on the code. Then you'd also want to look at the license. What other projects use this same code?

We have project information [associated with source code search results] that tell you about a project... We have about 50,000 project descriptions. Some of that is published on Apache or SourceForge, some of it is what people post on Freshmeat... For projects that have some [structured] project metadata available, we use that. For [other] projects, we have a [human editor] who goes out and collects that data from the project's home page. We pull that in, and then hook that data up with the source code...

Another thing that helps is understanding source code. We can parse out comments versus keywords, classes, or function calls, and that's [useful] in weighing the values of query terms and in ranking the results.

It has taken us over a year of tweaking ... the scoring. The static scoring comes primarily from information about the project the code comes from. There are a bunch of metrics about a project. We have information about project downloads, and project references on the web, and active developers, things like that that let us give each project a boost... and influences the scores of all the files that belong to the project...

Krugler noted that exploratory search often results in multiple interesting search results, and that saving search results and sharing them with others provides benefits not available in traditional search engines. His company's latest feature, CodeSpaces, aims to facilitate just such sharing:

With CodeSpaces, as you're doing searches and are collecting sets of things, you can create links to those sets of things. You may find some source code, project information, whatever, and email that link to the collection of results to a news group, or include that in a blog. Anyone following that link will wind up in the Krugle search environment with the same set of things displayed. That becomes pretty critical in exploratory search.

To what extent do you use search in your development work? Do you agree with Krugler that search is among the most important development tools?


Roland Pibinger

Posts: 93
Nickname: rp123
Registered: Jan, 2006

Wrong link Posted: Oct 30, 2006 1:56 PM
Reply to this message Reply
The link to the Krugle home page is wrong.

Bill Venners

Posts: 2284
Nickname: bv
Registered: Jan, 2002

Re: Wrong link Posted: Oct 30, 2006 2:15 PM
Reply to this message Reply
> The link to the Krugle home page is wrong.
>
Oops. Thanks. One too many http's. It works now.

Achilleas Margaritis

Posts: 674
Nickname: achilleas
Registered: Feb, 2005

Re: Krugle Adds CodeSpaces to Its Search Engine Posted: Oct 31, 2006 2:16 AM
Reply to this message Reply
The 'not invented here' syndrome would be reduced if programming languages were more boring...i.e. if programming languages did not allow code to be like a painting but rather like a blueprint: paintings are all different and they have the personal touch of their creator, whereas blueprints are all the same and are created by specifications...

Roland Pibinger

Posts: 93
Nickname: rp123
Registered: Jan, 2006

Re: Krugle Adds CodeSpaces to Its Search Engine Posted: Oct 31, 2006 5:40 AM
Reply to this message Reply
The language shall not be 'boring' but lightweight, consistent and powerful (in that order). What you call 'blueprints' are reusable functions, classes, libraries, frameworks, ... Reduce the available time and thereby automatically enhance code reuse: http://www.artima.com/weblogs/viewpost.jsp?thread=181464

Achilleas Margaritis

Posts: 674
Nickname: achilleas
Registered: Feb, 2005

Re: Krugle Adds CodeSpaces to Its Search Engine Posted: Oct 31, 2006 7:51 AM
Reply to this message Reply
> The language shall not be 'boring' but...

I disagree. A language shall be as boring as possible. It is only then where reuse can be truly realized. Being boring means that there every possible problem has been already solved and the only interesting part is how to combine the existing stuff into solving a particular problem.

Guido van van Rossum

Posts: 359
Nickname: guido
Registered: Apr, 2003

Re: Krugle Adds CodeSpaces to Its Search Engine Posted: Nov 22, 2006 5:21 PM
Reply to this message Reply
How does Krugle's code search index compare to the code search on code.google.com?

Flat View: This topic has 6 replies on 1 page
Topic: Scott Violet Releases Extreme Swing GUI Makeover Code Previous Topic   Next Topic Topic: A New Puzzler from Neal Gafter

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use