I received a message asking about how to decide whether to port to a new language first, redesign first, or do both at the same time.
We have a large project that is coded in language A and has existed for many years. This project now requires a redesign and at the same time we would like to recode it in language B. From your experience, would it would better to port it from language A to B first, then work on the redesign, or code it in language B with the redesign from the beginning? The advantage with the first way is that we could possible offshore the porting, but I don't know if the complexity of redesigning after the port would offset any gains in cost that offshoring would give us.
This is not a simple question because most of the answer is not about the accepted right thing to do, but more about "who are you?" The answer will be different from one team to the next.
You'll have to reexamine all the code when you translate to a new language, and make changes to work within the new language idioms, which constitutes a redesign at least at a low level. This argues for doing it all at once.
But without knowing more about the project, taking baby steps is often a wiser choice. Most Agile processes can basically be summed up by saying "take a baby step and see what happens" (then, of course, automate that process so you don't have to repeat all the steps by hand every time).
Some questions that must be answered before you can make your decision:
What development process do you use now?
Do you have unit testing, automated building and do you use version control?
What condition is the project in? Is the design tolerable enough that you can "just change languages" or will you be forced to do some redesigns as you go?
Is the existing design good enough that a third possibility, "Change Language + Refactor," might be considered?
How well do your people adapt to change? (This may be something that you can't evaluate if you're too close).
What percentage of the people already have experience with the new language, and how much experience?
What experience does your designer or design team have in (A) the new language, (B) this system and (C) this kind of transition.
What constraints are there for this project: money, schedule, political climate, etc.?
It's primarily a people decision, not a technical one, even if technology has an influence.
The fact that you're considering offshoring suggests that you're assuming something like "the existing application will work as a spec," which may be true to a degree, but what will likely happen is that the offshore coders will do the simplest translation possible (using automatic translation tools if they are available), preserving all the idioms from the old language along with any coding quirks. These are easier and faster to just translate rather than analyzing; the offshore programmer is probably going to be very risk-averse and will not chance making changes to meaning.
So there's some benefit in using offshore programmers, but now you will have a codebase that's filled with idioms and quirks and any programmer who looks at it will be uncertain what those are about, and leave them in.
This also suggests the middle path, "Change Language + Refactor":
Instead of paying an offshore company to use translation tools, do you have the infrastructure to just "sprint" it yourself?
If you don't already have the unit tests in place, you will need them, and again the
offshore programmers may not have the same incentives as you do so you can end up with subpar tests. You might be better off just using coverage-generation tools yourself.
Unless you know and have worked with the offshore programmers before, you won't know what you're getting and you might spend more time fixing the result than if you had done it yourself.
Note that approximately 50% of offshore projects are being canceled before completion (from a business magazine, I think it was Fast Company). The amount of effort required onshore is usually greatly underestimated. You're basically just hiring more programmers, but you may not be vetting them yourself, they're from a different culture and are on the other side of the world. I've worked with both Indians and Czechs in offshoring; both groups I worked with are very smart but both also required significant communication. The company that worked with Indian offshoring brought several of the key Indians to the US for months at a time, and I spent time in the Czech Republic. I'm not suggesting that offshoring isn't a viable alternative, but that it can be a minefield that involves far more in time and effort than you imagine. Most of the successful offshore programs I've ever heard about involve more travel in both directions than you might expect.
So my answer to the whole question is that it's really about understanding who you are as a company and a team, in order to discover the alternatives that might work best for you. There are far too many variables in this equation to be able to produce a single answer.
I suspect this topic will bring a lot of feedback from other readers, since many have had this experience.
Time and expertise will play a big role as well. One of the reasons why you might want to port is to move an acquired product into the core language of the company. An external company (on or offshore) might be better suited than to create or hire the expertise yourself. Also depending how quickly you want it ported the redesign at t he same time route might be too slow. Having seen some auto-translated code however I would generally suggest to use the opportunity of the language port to rewrite using the idioms of the new language and removing the issues you hated in the original version.
I think the "offshoring" bit is a red herring because similar issues apply whether or not you offshore because the team you would get for language B would have few people from the original team. Another tendency is for the redesign team to "try to get it right this time" which may be lots of rework and lots of relearning of lessons the taking baby steps too doesnt always apply because that leads to a sort of coexistence model(especially for web applications) where you port some functionality to the new platform and while the porting isnt complete , you have to live with the additional complexity of maintaining and managing two different platforms. I dont see how unit tests help that much or are you planning on porting the tests too :). Functional tests though (again web app specific) would be worth a lot. my 2 cents deepak
Personally, I will take the "middle path" (Change Language + Refactor). Reasons: 1) Off shoring almost always results to a barrier of communication between the two parties. Most probably, the offshore developers would end up auto-translating the code. In the end you might end up with a glob of sub standard code. 2) By choosing to Change Language (without a big redesign) you will take small baby steps that are less risky. 3) By choosing to refactor, you will get the chance to get rid of any language A specific quirks. 4) While porting, you may come across parts of the code that is not designed too well, but also not too expensive to redesign. Since you chose to refactor, you can iron out such small-scale warts without taking too much risk.
We recently did a rewrite of an application, not necessarily between different languages, but between different frameworks for the same language. There's a number of reasons for it, ranging from scalability to ease of training new developers, etc. The app needed both a revamped interface and new internals.
We opted to do both at the same time, if for no other reason than the fact we would have to do the work twice otherwise. The approach was successful, and we will be deploying sometime next week (it's on a private network, sorry). With a fresh start we were able to fix the internal design issues that have been bugging us for a long time.
While I try to do the port and then the redesign approach as much as possible, there comes a time where a straight port would be much more troublesome than the redesigned approach.
If we are dealing with developers who are highly skilled in language B and the task is porting over a very large application I would try to find areas in the application that are highly decoupled from other areas of the application. Then port each one of these areas to language B at a time while modifying the rest of the code that is built in language A to use the new code. As long as the original code has a reasonable good design this approach should be possible and likely to provide a smooth transition.
If you are not sure what areas of the code are high decoupled you may want to perform a cluster analysis using some reasonable metrics that can be used to determine degrees of coupling. These metrics can be extracted either from static analysis of the source code or by building a special version of you application that can collect this data while the code is running. The second method is only likely to be possible if your original application is written in a dynamic language (i.e Python, etc).
One crude metric can be collected by creating a list of modules in you application and for each module, depending on language, list the imports or includes. With this information you could create a square matrix with the module names across both the row and column headers and could place a 1 where every a module name in the row intersects with the import or include listed in the column headers. You should end up with a highly sparse table. If not, don’t even attempt to port the application, as it is so poorly designed you are better off just starting over and make sure the team that wrote the original application is not involved with the rewrite.
Now to do the cluster analysis what you want to do is rearrange the order of the module names in such a way as to get all your “1”s as close to the diagonal as possible. The column and row names have to be listed at all times in the same order. So if the order of the modules across the columns is mod1, mod5, mod2 then they should be listed in the same order in the row headings.
In order to find the best ordering of modules I calculate a weight for each matrix and adjust the order of modules until I find the order that produces the smallest weight. To calculate the weight I just add up the distances in a given location in the matrix by its distance to the diagonal and multiply it by a 1 for this case. Since the matrices are highly sparse I don’t actually iterate over every cell but just iterate over a dictionary or associated list of the sparse data points.
Instead of just guessing the order of modules I just first begin with a list of modules in a random order that I iterate through each module name and calculate the weight for each version of the matrix while I shift the current module name to the right. The order of modules that produces the lowest weight is then used as a starting point to start moving the next module name to the right. Keep repeating this procedure until you have cycled through all the module names. The end result will be a matrix that has the lowest weight. What you should see are clusters of “1” along the diagonal. The modules that form clusters are those that are highly coupled together and should be highly decoupled (assuming you don’t have many outliers) from other clusters. Use these clusters to determine areas of the code that should be easy to independently port to the new language.
Now personally I don’t like to use the import or include metric as described above as it’s weighing is always a 1. I rather calculate a weighting that represents the degree to which the included or imported module is used by a given module. Counting the number of function or method calls, Objects created etc would provide for a better weighting system. This is more work but will produce a better cluster than just assigning 1 to the weights. Even better would be to collect these metrics across classes instead of modules.
BTW. Any outliers that exist in the matrix will likely be due to one of 2 things. These are either areas of code that are poorly designed or maybe different aspects like security, GUI, persistence storage, etc. So if the are a few outliers it just means you have areas in the application that are coupled too tightly and should be cleaned up during the refactoring phase. If the resulting matrix is not clustered highly along the diagonal it means that the developers of the application have no clue how to properly architect an application so they are not separating the various aspects. You would for example see this result if the development team placed all of the application code with the GUI code.
Hopefully this makes some sense and will be helpful to someone.
I don't think the offshoring makes any difference. If you had a different team sitting next room, the amount of travel would be probably the same, if not higher, since travel would be less expensive.
I also think that the huge amount of travel is in most cases due to a lack of proper understanding and use of communication technologies. (And I think I know what I'm talking about, since I work as an offshore developer.)
I'd always go for the redesign then recode approach. A programmer always knows best how to do a program after he has done it, so there will always be design quirks and in any program, so this is a good opportunity to to a redesign, which is otherwise usually hard to get financed by management.
IMO, what's the critical issue here is the analysis. Many, if not most, apps nowadays get coded without proper analysis (I can notice this at many customers we are working with, and IMO this is the number one cause for failed/dropped projects). The existing version of the app is a sound basis for a thorough analysis, based on which a proper design can be built. If a simple translation is done, the opportunity of a proper analysis followed by a sound redesign is lost.
With a proper analysis, which should be many times easier to do given the existing application, a rewrite based on a redesign may be cheaper or even significantly cheaper than a one to one translation.
> One crude metric can be collected by creating a list of > modules in you application and for each module, depending > on language, list the imports or includes. With this > information you could create a square matrix with the > module names across both the row and column headers and > could place a 1 where every a module name in the row > intersects with the import or include listed in the column > headers. You should end up with a highly sparse table. > If not, don’t even attempt to port the application, as it > t is so poorly designed you are better off just starting > over and make sure the team that wrote the original > application is not involved with the rewrite.
John, this technique looks familiar :-). (I've watched John do this kind of analysis in person).
I work for a Russian company and legacy code migration and porting is part of our core expertize. Right now, we have two such projects in the works and they are way different. In project 1, we are using our proprietary automated translation tool and our customer says the quality of code has improved. In project 2, we opted for semi-automated translation and huge refactoring.
The reason for the difference are the source and target languages and library use. Language S1 maps quite nicely to language T1, and the use of the library in the first project is minimal (it is an embedded system mostly talking to hardware). Whereas straightforward conversion of S2 to T2 would yield an unmaintainable result.
Also, if you go via the automated translation route, beware of latent bugs, which due to certain circumstances do not trigger crashes or incorrect behavior in the original program, but would cause them in the translated program. Uninitialized variables are a good example. I would say such bugs are the #1 reason for unexpected costs and delays in projects that involve automated translation.
That said, I would suggest to first fix the original program and possibly modify it to make the life of the translation tool easier and improve the quality of the translated source.
If you would like a free consultation on the subject, our Web site URL is in my forum profile.
IMO, the main reason for questions like this is either lack of analysis, knowledge of the situation or lack of experience in one of the intended fields...
The bad thing here is that there is no "universal" correct answer that someone form the outside can give, that will apply to such a situation to any degree. I'll agree with Bruce that this is entirely a decision of the writer based on his/her situation.
So, what should be gained here? the gain is experience in the field, and possibly in similar situations. but the main danger is using this experience "as-is". Because, it applies to the situation and people it was gained by and not the target in question (the top level may be similar, but all the small detail and nuances of a particular team and project will render the whole situation entirely different!).
The main idea is to assess the current state and expected transition. To write out all the targets, priories, criteria and constraints, both present and expected to arise, and see what they suggest. In my experience, the answer will be obvious most of the time.
P.S. to make this a bit less clear, again, experience shows, that doing something new from time to time is quite good! (if one survives it, that it) :)