Summary
When people think of code-reuse they usually think of function libraries, object hierarchies or cut-and-paste. A very powerful and too frequently overlooked method of code reuse is reuse of programs.
Advertisement
I have always been in awe of the productivity of unix wizards. Given a simple bash shell, they can write in one line a comple program which would take me possibly hundreds of lines of code to write in C++.
The secret to the productivity is code reuse. By piping from one program to another, it is a deceptively simple way to write a new program. Perhaps it is due to this simplicity that it has been overlooked by languages like Java, C++, C#, Delphi, etc.
I know that the Perl, Ruby and Perl programmers reading this right now are probably smirking. They know something that programmers like myself, who grew up on Dos and Windows, have trouble realizing. Why reinvent the wheel, when you can just invoke it from the shell? Of course, Windows and Dos never had a wheel!
Now here is an interesting problem, there is no simple and portable way to write a program in C++, and then reuse it within another program without resorting to the OS. Of course in Windows we can write something like:
ShellExecute("some_program.exe > some_file.txt");
This is fine if we are always going to be in Windows, except that it isn't integrated with the language. It is very hard to run a program and redirect its output into a stringstream without generating a file. What I think is really lacking in C++ is the ability to write directly:
SomeProgram() > SomeStream();
The ability to write such code, would make C++ behave much more like a higher level language (or agile language, if you accept such a thing can exist).
Since C++ is so darn powerful, I have written a library which facilitates reuse of programs by allowing them to be written as objects. I have also providing them with an operator so they can be redirected to and from a stream, or piped to other programs. For instance:
A program like UpperCaseProgram is written as follows:
// upper_case.hpp
#include <iostream.hpp>
#include <cctype.hpp>
#include "programs.hpp"
class UpperCaseProgram : public Program {
protected:
virtual void Main() {
char c;
while (cin.get(c)) cout.put(toupper(c));
}
};
So what I am proposing is both a library and a technique to make program reuse in C++ much easier by writing them as objects. The source code, and more a detailed explanation of the technique is available at CodeProject.com
What's wrong with just creating your program as a library with the desired functionality, and then wrapping that with a trivial main() method?
I regularly use this technique when programming in both C++, Java, and PHP, and yes, there have been times when I've embedded one whole program within another. Designing programs as libraries forces me to think through the API I want to provide and makes me carefully define the interface to the outside world.
To take a really simplistic example, I'd write your HelloWorld > uppercase example as:
No framework required, no fancy classes, no need to do anything other than build your programs in a modular way anyway. Of course, you might want classes or frameworks, depending on how complex the problem is, but there's no need for any additional complexity.
The one thing a UNIX-like pipe solution grants is a common data type. Each program knows they'll be able to deal with the output they're fed, because everything's a string (or stream, actually). But that can be a liability too: if you provide an abstraction that everybody can use, it'll be the lowest common abstraction they can all use. You're stuck parsing a lot of data because you have to encode it as a string.
Compare operating on a std::vector of file handles to parsing the output of 'ls' and then opening each individual file. Which would you rather do?
Interestingly, the Lisp world has had something similar for years, except they standardized on lists as the dominant data type. They get plenty of flack for that too: one of the main criticisms of Lisp has been that programmers tend to view everything as a list and don't use other abstract data types.
> What's wrong with just creating your program as a library > with the desired functionality, and then wrapping that > with a trivial main() method?
That is almost precisely what I am proposing. The only difference is that by placing the main in an object derived from Program you can then easily redirect the standard input and standard output using an overloaded operator, thus making it easier to write new programs by chaining together existing programs.
> To take a really simplistic example, I'd write your > HelloWorld > uppercase example as: [snip]
What you do here is rather than writing to cout, you are creating a string. Rewriting programs as functions which return strings works for this trivial example, but doesn't scale well. You would also have to modify the function so it accepts a string as a parameter (to represent the input of the program). This approach has the problem that it is wasteful of memory, and is inefficient.
Another function style approach which I think makes more sense would be to write your main as follows:
ProgramMain(istream& in, ostream& out) { // redirect cin and cout stream_buf* old_in_buf = cin.rdbuf(); stream_buf* old_out_buf = cout.rdbuf(); cin.rdbuf(in.rdbuf()); cout.rdbuf(out.rdbuf());
// do your thing here.
cin.rdbuf(old_in_buf); cout.rdbuf(old_out_buf); }
The details could be hidden by using a utility class with RAII semantics, so that the following could be written:
ProgramMain(istream& in, ostream& out) { RedirectManager rm(in, out);
// do your thing here. }
The details of RedirectManager should be obvious.
This approach is fine, and is more easily understood from a procedural standpoint. Chaining programs together now is more work however than the approach I was proposing. we would have to write:
So one approach uses objects, but provides a simpler syntax through operator overloading, whereas the alternative is simpler to understand, but has a slightly less elegant syntax.
Your comments sparked an idea. I have found a way to pipe from one arbitrary function to another. The only requirement is that the functions return void and take no parameters.
You are one productive guy! Quite interesting to follow your line of thought..
The idea is ok but I am not sure that I see the great win with this as it requires code in order to work - you have to modify a working program anyway. On the other hand, making your program class wrap ShellExecute and the equivalent on other platforms seems a good idea to me.. That way, you would actually be able to use existing programs (no modifications necessary) in your program, modifying a stream. In fact, making a program-like class that wraps ShellExecute and supplies ordinary c++ stream operators (>> and <<) is starting to seem like something that could really be useful to me..
This is all very interesting. But, there is a completely higher layer of thought that comes after you look at this. First, the threading in Java and the PipedInputStream and PipedOutputStream will let you tie some code that needs InputStream and Output stream together. But, the bigger issue for me is that this whole everything is a string business really complicates the use of some of the data by continuely inserting a formatting and translating action all over the place.
For me, the correct programming parallel to pipes are Jini (http://www.jini.org) services. With jini services, clients/client-services just need to ask for something that implements an interface, and then they can use that 'service' to do what it does. It's the API that spells out whether you get back a stream of data, or a Collection of data, or something else.
With Jini services, you get complete program/service reuse, because now tools are available from wherevere they are. So, you don't have to have it on your CPU to use it either. It's really time to get past all the everything is a string business. The shell tools are great, and I use them all the time to do some pretty neat stuff. But, in the end, for the real future of things, I think that we need to move up a layer and use things that are already written, like you are suggesting, but not having to compile them into our software, or do other silly, intermediate integration tasks that don't really add value to the process. And, having 100 programs linked into your program is not going to improve the load time of your application so that the user can use it quickly for the one task that just requires one of those 100 programs.
> You are one productive guy! Quite interesting to follow > your line of thought..
Thank you, and also thanks for the link. I will be (hopefully) producing a win32 specific library in the future, and I will try to incorporate that code.
I tend to agree with the other folks who said that ultimately, the string is the lowest common denominator, and that a lot of effort would go towards parsing such strings every time you traversed a pipe.
However, this could be improved by outputting XML instead of a string. This would give more of an indication of how to structure the data. In addition, the class designer could have a static method that returns the XML schema of the output, so that a user of the class could have some guidance as to how to parse it.
Another output format could just as easily be CSV or SQL, and the static method would then have some sort of a schema description of the output.
Just for your information: There is a C++ library that allows you to use any program from C++ in a way similar to that you described here. You may use any program, you don't need to write the program in some special way. The library works on Linux and Windows. Please look at http://libexecstream.sourceforge.net/
Thank you Artem for sharing your code. I am interested in possibly using your library combined with my latest effort to pipe between functions (see http://www.codeproject.com/useritems/function-pipes.asp ) and including it in the OOTL.