I think coming from first principles is one good way to solve problems (it worked for Einstein). So, having a clear idea of what you are trying to measure is important.
As to the ACM search, I am curious how the synonym searches for construct validity worked out. It could very well be that people are describing the same concept with different terms; it happens all the time. How did the other 48,000 papers check out? I am certain you did good research, and that you just abbreviated this description to make your point. It would be interesting to hear more about how you measured the presence or absence of 'construct validity' in the actual approaches taken in all these papers.
Another interesting thing you said: on coverage. You ask what it means. If you wanted a clearer answer of what it measures, I recommend an interesting survey paper by Hong Zhu, Software Test Adequacy Criteria (it's in the ACM dl) that examined most of the, up-to-that-point work on testing adequacy criteria. It seems quite appropriate given that coverage is one adequacy criteria that could be measured. There are many criteria like def-use paths, state coverage, and so on and on and on.
It seems that the whole point with metrics is to put them into context, understand the narrow story they tell about the system being measured and then make intelligent decisions. To throw complexity or coverage out completely seems to insist that since we have no perfect answers we should give up and go home.
Your statement about complexity was a further curiosity. I think you made a slight equivocation. When someone tells you about the complexity as measured by the decision points, I hope it is understood by both of you that you are using jargon. "Complexity" in this instance only references McCabe's work. And hopefully, you both realize that within that context it is a measure (or a metric) for an aspect of the system that seems to be somewhat correlated with defect density (check McCabe's 96 NIST report where he points to a couple of projects that saw a correlation.) Based on that context, a complexity score is possibly a useful thing to know and to use for improving the software.
Even if it isn't correlated in perfect lock-step all the time, anyone who has written any substantial software knows the anguish of maintaining really ugly large methods/functions. McCabe is trying to measure something we know is there. Is his measure complete? No. Is it sufficient? No. Is it useful? Yes. It seems to be supported in the studies too. If you have references for field studies that contradict the 96 report, please post them.
Later you say, "When we try to manage anything on the basis of measurements that have not been carefully validated, we are likely to create side effects of measurement ... There is a lot of propaganda about measurement, starting with the fairy tale that "you can't manage what you don't measure." (Of course we can. We do it all the time.)"
So, this seems to contradict itself. If I understood the aphorism about managing and measuring, admittedly I haven't heard Tom DeMarco say it personally, what I took it to mean is that there is an implied "good" after the word 'managing'. That is, he was saying, we cannot do a good job managing without measuring. On the other hand, I agree that people manage (with no good after it) all the time without measuring. You might say that managing without measuring is a derivative form of managing on the basis of measurements that have not been carefully validated.
While we are clearing things up, I got the point you are trying to make, but what does it mean when you make the quote, ""guns don't kill people, people kill people." By all means, blame the victim." Where in there is the victim blamed?
Traditionally, that argument means that the guns should not be outlawed, but that criminals should be jailed. It's an argument by gun owners to keep their guns legally. How is it blaming the victims?
On that point, you say "putting defective tools in the hands of managers". Let's not forget putting tools that require expertise in the hands of managers. That is as likely to blow up in somebody's face, and is arguable the current state of affairs.
As to your summary point, I think we agree. It takes a lot of thinking to do metrics right. Most people get them wrong. We should spend tons of money on research that validates metrics. (I am willing to co-write a grant to study crap4j if anyone is game?)
What I disagree with is a perception that metrics are not useful, that we are managing just fine without them, and that because some people misuse them (over and over again no less) that nobody should use them without exorbitant expenditures of time and money. It sounds a lot like trying to ignore the problem.
We must keep trying to improve our measures by studying them, by validating them, and by improving them based on that study. And without a doubt, it requires a coherent approach, and a clear understanding of what is being measured -- whether we call it construct validity or something else.