I think the biggest problem with software metrics is that we don't have any.
Consider "coverage" for example. What does "coverage" actually measure? We know how to compute coverage (for simplicity, let's count the percentage of statements tested), but that's just a count. What's the meaning behind this count?
In most fields, measurement starts from an attribute (aka a construct), something we want to measure. For example, we might want to measure productivity, quality, intelligence, aptitude, maintainability, scalability, thoroughness of testing, reliability--these are attributes.
Given an attribute, we use a measuring instrument of some sort to map the "value" of the attribute to a number. The instrument is easy to identify in some cases--think of rulers and voltmeters. Some instruments are more complex--for example, intelligence tests. Some instruments require multiple readings under different circumstances--for example, we might try to measure how loud a sound is, to you, by having you compare it to dozens of other sounds, indicating for each comparison which sound was louder. (If you wear glasses, you've gone through this type of measurement of subjective visual clarity.)
The reading from the instrument is the value that software engineers call "the metric." (There are varying uses of the word "metric"--see wikipedia http://en.wikipedia.org/wiki/Metric)
In most fields that use measurement, the fundamental question is whether the instrument you are using actually measures the construct (or attribute) that you think you are measuring. That concern is called "construct validity."
If you search the ACM's Guide to the Computing Literature (which indexes ACM, IEEE and many other computing publications), there are only 490 papers / books / etc. that include the phrase "construct validity" out of 1,095,884 references searched. There are 48,721 references that refer to "metrics" (only 490 of them mention the "construct validity" of these "metrics"). I read most of the available ACM-Guide-listed papers that mentioned "construct validity" a few years ago (Cem Kaner & Walter P. Bond, "Software engineering metrics: What do they measure and how do we know?" 10th International Software Metrics Symposium (Metrics 2004), Chicago, IL, September 14-16, 2004, http://www.kaner.com/pdfs/metrics2004.pdf) -- of those, most were discussions of social science issues (business, economics, psychology) rather than what we would normally think of as software metrics.
The problem with taking "measurements" when you don't have a clear idea of what attribute you are trying to measure is that you are likely to come up with very precise measurements of something other than the attribute you have sort-of in mind. Consider an example. Suppose you wanted to measure aptitude for algebra. We sample the population and discover a strong correlation between height and the ability to answer algebra questions in a written test. People who measure between 5" and 30" tall (who are, coincidentally, very young and they don't yet know how to read) are particularly algebra-challenged. What are we really measuring?
When people tell me that you can measure the complexity of a program by counting how many IF statements it has (McCabe's metric), I wonder whether they have a clue about the meaning of complexity.
When people tell me you can measure how thoroughly a program has been tested by computing the percentage of statements tested, I wonder if they have a clue about testing. See "Software negligence and testing coverage." (Keynote address) Software Testing, Analysis & Review Conference, Orlando, FL, p. 313, May 16, 1996. http://www.kaner.com/pdfs/negligence_and_testing_coverage.pdf
There is a lot of propaganda about measurement, starting with the fairy tale that "you can't manage what you don't measure." (Of course we can. We do it all the time.)
Much of this propaganda is moralistic in tone or derisive. People get bullied by this and they conform by using measurement systems that they don't understand. (In many cases, that perhaps no one understands.) The result is predictable. You can blame individual managers. I blame the theorists and consultants who push unvalidated "metrics" on the field. People trust us. When we put defective tools in the hands of executives and managers, it's like putting a loaded gun in the hands of a three-year old and later saying, "guns don't kill people, people kill people." By all means, blame the victim.
Capers Jones wrote in one of his books (maybe many of his books) that 95% of software companies don't use software metrics. Most of the times I hear this quoted, the writer or speaker goes on to deride the laziness and immaturity of our field. Mature, sensible people would be in the 5%, not the great unwashed 95% that won't keep their records.
My experience is a little different. I'm a professor now, but I did a lot of consulting in Sili Valley. I went to company after company that didn't have software measurement systems. But when I talked to their managers / executives, they told me that they had tried a software measurement system, in this company or a previous one. Many of these folks had been involved in MANY software measurement systems. But they had abandoned them. Not because they were too hard, too time consuming, too difficult -- but because, time after time, they did more harm than good. It's one thing to pay a lot of money for something that gives valuable information. It's another thing to pay for golden bullets if all you're going to use them for is shooting holes in your own foot.
It takes years of work to develop valid measurement systems. We are impatient. In our impatience, we too often fund people (some of them charlatans) who push unvalidated tools instead of investing in longer term research that might provide much more useful answers in the future.