The predecessor system to the Large Hadron Collider produced a lot of data (for its day) - 100 TB. In 1989, that was a lot to archive, and it now exists on magnetic tapes. You would think that the archiving of that would be fairly trivial - after all, 1 TB drives are now $75. However, storing the raw data isn't really the problem - it's all the "meta" around the data that is:
More difficult to preserve is the software necessary to make sense of the data. "Clearly, data is useless without the associated software to read and analyse it," say Holzner and co.
The problem is that computer skills are changing. While much of the original LEP software was written in Fortran, the emphasis today is on C++. How the right kind of Fortran expertise can be preserved for future generations isn't clear.
Another problem is that much of the high-level software used to analyse the data-- user-specific analysis code and plotting macros--was never stored in a central database. Instead, it was kept in personal directories which are deleted a year after somebody leaves a lab. That is now lost.
So while future researchers will be able to access the raw data, they may never know exactly how it was processed into the form that appears in scientific publications.
Given the human nature aspect to those problems, I exoect to read something very similar about the piles of LHC data 30 years from now. The personal directory thing is especially likely to be an ongoing problem...
Technorati Tags:
data retention, storage