Weblogs Forum - The Problem of Overspecification

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Weblogs Forum
The Problem of Overspecification

25 replies on 2 pages. Most recent reply: Jul 28, 2006 3:01 AM by Achilleas Margaritis

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 25 replies on 2 pages [ 1 2 | » ]

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

The Problem of Overspecification (View in Weblogs)

Posted: Jul 19, 2006 9:12 AM

Summary
Two of the cardinal sins of programming are overspecification and underspecification. Today I will discuss the one most programmers are guilty of: overspecification.

Underspecifying the solution to a problem usually reveals pretty obvious bugs. The typical result is a either failure to compile, or the software throws an exception during some unchecked edge case. However there are no bells or lights that go off when you overspecify the solution to a problem. In fact it is commonplace.

Consider the following summation function in C++:

  size_t sum(const vector& values)
  {
     size_t sum = 0;
     for (size_t i=0; i < values.size(); ++i)
     {
        sum += values[i];
     }     
     return sum;
  }

So where is the problem? Well the problem is that you told the computer to iterate through the list and add each value to the result. That produces a correct answer but it is more specific than is needed. You are specifying the following superflous facts:

the order to add numbers
to execute the summation in sequence rather than concurrently

By overspecifying the solution to a problem you tie the hands of a compiler and prevent it from doing anything clever, unless it can unravel your intent from your instructions. Generally that is a losing battle.

The culprit here is the language, which prevents you from specifying the solution as succintly and as precisely as you want. In Cat I would present the solution as follows:

  def sum : (list<int>) -> (int)
  {
     [+] reduce
  }

Reduce is an atomic concurrent program which takes an associative function and applies it to a list, like a fold function does, but in any order the compiler or runtime deems appropriate.

Isaac Gouy

Posts: 527
Nickname: igouy
Registered: Jul, 2003

The problem of not using library functions?

Posted: Jul 19, 2006 12:49 PM

Consider the following summation function in C++:


   size_t sum(const vector<int>& values)
   {
      size_t sum = 0;
      for (size_t i=0; i < values.size(); ++i)
      {
         sum += values[i];
      }     
      return sum;
   }

I've never written a program in C++ or really looked at one, but I remembered some comment from Bjarne Stroustrup that C++ was a multi paradigm language
http://www.research.att.com/~bs/slashdot_interview.html

and I remembered there were a couple of different C++ programs summing integers on The Computer Language Shootout
http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=sumcol&lang=gpp&id=3

which led me to Google "accumulate C++", and the notion that this might work

int sum = accumulate(values.begin(), values.end(), 0);

http://wwwasd.web.cern.ch/wwwasd/lhc++/RW/stdlibcr/acc_0611.htm

Mike Capp

Posts: 11
Nickname: mikecapp
Registered: Aug, 2005

Re: The problem of not using library functions?

Posted: Jul 20, 2006 9:26 AM

which led me to Google "accumulate C++", and the notion that this might work
int sum = accumulate(values.begin(), values.end(), 0);

Doesn't really help. The std::accumulate algorithm is defined (http://www.sgi.com/tech/stl/accumulate.html) as taking an InputIterator, which effectively specifies forward sequential access.

Mike Capp

Posts: 11
Nickname: mikecapp
Registered: Aug, 2005

Re: The Problem of Overspecification

Posted: Jul 20, 2006 10:04 AM

Chris, forgive me if this is a naive question, but doesn't 'list<int>' as the argument type still enforce unnecessary sequence? Given an array of (sufficiently large) known size and dual-core hardware, you could usefully chop the array in half, run parallel summations and sum the two results.

This seems less plausible if you're given a linked list rather than an array. Where the function to be applied is relatively trivial, like integer addition, you might as well have applied it anyway during the iteration required to find the midpoint.

Or are we understanding different things by "list"? I don't really have any background in FP.

Ivan Lazarte

Posts: 91
Nickname: ilazarte
Registered: Mar, 2006

Re: The Problem of Overspecification

Posted: Jul 20, 2006 10:22 AM

And you can't using a functor of some kind, because...?

Sean Conner

Posts: 19
Nickname: spc476
Registered: Aug, 2005

Re: The Problem of Overspecification

Posted: Jul 20, 2006 1:34 PM

The problem is summing an array of integers. The C++ solution Chris presented is a specific solution to that problem and is pretty much the solution to use on a monoprocessor system. But as you add more and more cores, the preferred solution (for a sufficiently sized array) is to split the problem among the separate processors to speed things up.

But first, how do you specify that? And secondly, why can't the language do this for you? Trying to write the code to distribute the summation of the array (or actually, anything for which reduce can be applied it) distracts from the actual problem at hand (and Chris, don't forget about map either!)

Max Lybbert

Posts: 314
Nickname: mlybbert
Registered: Apr, 2005

Re: The problem of not using library functions?

Posted: Jul 20, 2006 4:01 PM

/* The std::accumulate algorithm is defined as taking an InputIterator, which effectively specifies forward sequential access.
*/

That's easy enough to fix. Write a specialization that takes a RandomAccessIterator and that works in batches. Problem solved.

It may need to be labeled as an STL extension, but it's OK C++.

Mike Capp

Posts: 11
Nickname: mikecapp
Registered: Aug, 2005

Re: The problem of not using library functions?

Posted: Jul 20, 2006 4:19 PM

> Write a specialization [of std::accumulate] that
> takes a RandomAccessIterator and that works in batches.
> Problem solved.

Well, no. "The function object binary_op is not required to be either commutative or associative: the order of all of accumulate's operations is specified." If it works in batches, it isn't a conformant implementation of accumulate.

OK, so extend a bit further, call it something else and tighten the contract for binary_op. Fine. But at this point we're way outside the standard library; how is the *compiler* is going to make the batching optimization for you? (I'm not aware of any C++ compilers that optimize for the STL specifically, but they can do so in theory. That's not true of arbitrary user code.)

I'm not claiming that this problem isn't soluble, but it's not trivial.

Marcin Kowalczyk

Posts: 40
Nickname: qrczak
Registered: Oct, 2004

Re: The Problem of Overspecification

Posted: Jul 20, 2006 5:45 PM

Actually it doesn't produce the correct answer if the integer type used here overflows.

But more on topic is the question, does any Cat implementation really take advantage of the freedom to choose the order and parallelism? Or is it a theoretical possibility that isn't likely to be ever exercised?

Note that a sufficiently smart C++ compiler is allowed to perform this summation in parallel. The C++ language specifies only the observable effect, not the way the program is implemented. It isn't likely in practice I'm afraid: compilers aren't that smart today. Same for Cat.

The Scheme language deliberately leaves the order of evaluation of function arguments unspecified. This is a very common case of overspecification. As I understand it, in Cat the order of computing the arguments is explicit, and it's usually consistent with the order of values the function expects of the stack. You can choose another order and then permute the top of the stack; in this case the order may be different than anticipated by the function, but it's still explicit. Can Cat match Scheme in this aspect of avoiding overspecification?

Morgan Conrad

Posts: 307
Nickname: miata71
Registered: Mar, 2006

Re: The Problem of Overspecification

Posted: Jul 20, 2006 6:47 PM

Having just reread Java Concurrency in Practice, I'm beginning to thing that overspecification is GOOD. What if some other thread might be adding values to the end of the array? Then you want to go in order.

The "let the compiler figure out how to reorder and parallelize it" attitude sometimes leads to real strange and hard to understand behaviors, such as the "escaping" that takes up pages and pages in the book. The concept that

Foo foo = null;
foo = new Foo(maybe lots of arguments)

might assign a non-null value to foo before the constuctor is complete just blows my mind. It would be like

float pi = computePi(thisManyDigits)

assigning some value to pi halfway through the computation.

How would anybody expect this? I guess it's something we'll have to deal with, but it's as easy to understand as Quantum Mechanics.

Harrison Ainsworth

Posts: 57
Nickname: hxa7241
Registered: Apr, 2005

Re: The Problem of Overspecification

Posted: Jul 21, 2006 4:48 AM

It is not *always* an overspecification. In some circumstances it would be good. It is more of an overly fixed or monolithic specification --

-- you are defining an abstraction point that allows a part to be easily separated. That part can be produced from elsewhere, and at another time (up till execution). That other time and place may allow better implementation.

The important (or at least more general) factor seems to be that easy separation, rather than the detail of implemention.

Cleo Saulnier

Posts: 77
Nickname: vorlath
Registered: Dec, 2005

Re: The Problem of Overspecification

Posted: Jul 21, 2006 5:08 AM

All roads lead to Rome.

Michel Parisien

Posts: 25
Nickname: kriggs
Registered: Jul, 2005

Re: The Problem of Overspecification

Posted: Jul 21, 2006 7:04 AM

Marcin:
> Note that a sufficiently smart C++ compiler is allowed to perform this summation in parallel. The C++ language specifies only the observable effect, not the way the program is implemented. It isn't likely in practice I'm afraid: compilers aren't that smart today. Same for Cat.

There is a key difference here. C++ normally won't optimize a for loop, for example, even if the order of the sequence it travels doesn't matter, because it has a very tough time determining if the order is relevant. It is not specified by the user, so it is much safer to just let it be.

Here, with Cat, this is explicitly specified, so as to say "yes, travel me any which way you desire". The optimization can always be made, and if it doesn't work out, that is the developer's fault for saying something is what it isn't.

Morgan:
> Having just reread Java Concurrency in Practice, I'm beginning to thing that overspecification is GOOD. What if some other thread might be adding values to the end of the array? Then you want to go in order.

Well that is what design is all about. You're supposed to know whether you'll be appending before you write the code that specifies if there will be concurrency or not. In almost all cases, realizing you need to be doing something differently than you originally intended requires some code rewrite, and this would be no exception.

Harrison:

You made me think of this: C++ or Java can probably do reduce even better than Cat (perhaps a question of opinion?) by simply making a method similar as std::accumulate. Fine, this new implementation would not be STL, but that doesn't matter, since it is still easy and clear to do. In Cat, at least in this example, it seems like there needs to be a keyword ("reduce") in the language to do this operation, rather than give the user direct control over the control flow, which seems really limiting.

Isaac Gouy

Posts: 527
Nickname: igouy
Registered: Jul, 2003

Re: The Problem of Overspecification

Posted: Jul 21, 2006 7:31 AM

> You made me think of this: C++ or Java can probably do
> reduce even better than Cat (perhaps a question of
> opinion?) by simply making a method similar as
> std::accumulate. Fine, this new implementation would not
> be STL, but that doesn't matter, since it is still easy
> and clear to do. In Cat, at least in this example, it
> seems like there needs to be a keyword ("reduce") in the
> language to do this operation, rather than give the user
> direct control over the control flow, which seems really
> limiting.

This is where I came in - language vs library

Max Lybbert

Posts: 314
Nickname: mlybbert
Registered: Apr, 2005

Re: The Problem of Overspecification

Posted: Jul 21, 2006 8:53 AM

/* But at this point we're way outside the standard library; how is the *compiler* is going to make the batching optimization for you? (I'm not aware of any C++ compilers that optimize for the STL specifically, but they can do so in theory. That's not true of arbitrary user code.)
*/

and

/* The C++ language specifies only the observable effect, not the way the program is implemented. It isn't likely in practice I'm afraid: compilers aren't that smart today.
*/

GCC translates every language it works on into GIMPLE, runs various optimizations on that, translates the GIMPLE into RTL, performs various optimizations on that, and then translates the RTL into assembly code which gets handed off to the assembler.

I don't think GCC does it yet, but RTL (and Cat) have the ability to make the observation that the operator+() being called is (or is not, as the case may be) associative and may (or may not, as the case may be) perform the operation in batches.

But I do have to concede that my first solution (template specialization) would not work; and that an in-language (C++ language, that is) solution isn't a easy as I thought.

Flat View: This topic has 25 replies on 2 pages [ 1 2 | » ]

Previous Topic

Next Topic