The Artima Developer Community
Sponsored Link

Weblogs Forum
Metaprogramming Musings

11 replies on 1 page. Most recent reply: Oct 17, 2005 10:06 PM by Sean Conner

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 11 replies on 1 page
Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Metaprogramming Musings (View in Weblogs)
Posted: Oct 16, 2005 12:50 PM
Reply to this message Reply
Summary
I want Heron control structures to be defined as macros. It should be easy for a programmer to introduce a do/while loop or a better for statement. One of the principal motivations, is that the programmer should be able to specialize these things for specific types, or meta-types.
Advertisement
Consider for instance a repeat (x) {...} construct within a library. There could be two versions: one which is essentially a rewritten while loop. But what if x is a constant known at compile-time? Then it would make sense for the loop to be unrolled, if x is sufficiently small. Often this is done by the compiler, but under some conditions a programmer, knows better than an optimizer and can do experiments to find specific conditions where unrolling would work better.

Compile-time constants in languages, have a special status. In C++ under some conditions, an expression involing compile-time constants is resolved at compile-time. When and if this occurs is often implementation defined, and is unreliable. What I want as a programmer is explicit control over when and how compile-time evaluations occur. Macros/Template Meta-Programming provide very clunky methods of computing compile-time expressions.

What this implies is that in the statement: int x = 3 + 5;, the value 3 and 5 do not really behave like variables of type int, but rather something different. One big difference is they occupy zero bytes. Ironically if I took the size, e.g. sizeof(3), I would get a non-zero value, but in reality, no storage is allocated. This is why I consider constant integer literals as instances of a separate type, which I call a meta-int. In Heron I plan on identifying a meta-int as #int. A meta-int would support both implicit converion to its corresponing run-time type, as well as possibly explicit conversion though the $ operator.

Anyway on to the problem of mapping a Heron statement to a macro call. Consider the statement:

  if (some_bool_expression) then { do_something; } else { do_something_else; };
I want to map this directly to a macro call, which is defined in a library, and can even be overloaded. So what does the macro declaration look-like? I don't really know! I have several ideas, but what gets me is the "then" and "else" keyword placeholders.

The syntax which is currently my leading favourite is:

  class if { }
  class then { }
  class else { }

  macro (#type x0==if, ...) =
    ERROR "invalid if statement";

  macro (#type x0==if, bool condition, #type ignore0==then, #code on_true) =
    COND condition on_true;

  macro (#type x0==if, bool condition, #type ignore0==then, #code on_true, #type ignore1==else, #code on_false) =
    COND condition on_true true on_false;
So that might be a confusing mess, but there is a disturbed kind of logic underlying it all. A macro would have no identifying label, it would be identified solely by the types and meta-types of the parameters. You could think of a statement as an overloaded call to the ";" macro, where every term is an argument. A term would be an identifier, symbol, paranthesized expression list of terms or a curly-braced list of statements. There would be a core set of predefined macros such as COND which corresponds to the Lisp operation of the same name, or ERROR which returns a compile-time error.

Now this is purely hypothetical, and it would be some time before I could implement all of this in the Heron prototypes. I am worrying about the syntax, because it makes it easier to imagine using the language in a thought experiment and can affect other design decisions (e.g. significance of ";") as well as the implementation.

I don't want to continue working on implementations until I figure this mess out, and come up with a consistent and logical design which unifies macros and the type-system.


Calvin Mathew Spealman

Posts: 13
Nickname: ironfroggy
Registered: Aug, 2005

Re: Metaprogramming Musings Posted: Oct 16, 2005 2:30 PM
Reply to this message Reply
Would it not be possible to represent this macro logic in more direct syntax? Instead of considering macros a different mechanism, what if they were defined as normal functions, but which take only special parameters representing such things as statements and constants. As these things would all be known at compile time, the function calling itself can be inlined and compiled away, resulting in the same thing with cleaner syntax.


class then {
#code do;
void then(#code do) {
*.do = do;
}
};

class else {
#code do;
void else(#code do) {
*.do = do;
}
};

class if {
void if(#statement cond, then if_true, else if_false) {
COND cond if_true.do true if_false.do;
}
};

Sean Conner

Posts: 19
Nickname: spc476
Registered: Aug, 2005

Re: Metaprogramming Musings Posted: Oct 16, 2005 2:58 PM
Reply to this message Reply
So, will there be support for something like:


x = sin(3.1415926);


sin() is a well defined function that for any given X will return a Y and in the above given code, since 3.1415926 is a constant, sin() will return a constant, so the compiler should turn it into:


x = 0


In fact, any function that will always return Y for given parameters X (any number of parameters, and if they're all constants) should be treated this way.

I'm also not sure if I buy your argument about meta-ints. The value 3 does take up X bytes, but in a code sequence like:


x = 3;


those bytes are typically in the code segment, not a data segment, and can't be changed (or referenced other than in the instruction). So I don't see a real difference between sizeof(int) and sizeof(3).

And the more you describe what you want, the more it sounds like you want LISP, but with a different syntax and typed variables.

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Metaprogramming Musings Posted: Oct 16, 2005 3:02 PM
Reply to this message Reply
> Would it not be possible to represent this macro logic in
> more direct syntax? Instead of considering macros a
> different mechanism, what if they were defined as normal
> functions, but which take only special parameters
> representing such things as statements and constants. As
> these things would all be known at compile time, the
> function calling itself can be inlined and compiled away,
> resulting in the same thing with cleaner syntax.

AFAICT this is a very good idea. I'll tweak the syntax to match Herons (at least as it currently stands):


class then {
public
{
typedef do : #code; // do is a type, not a field.

_init() {
then(#code do) {
this:do = do; // unfortunately illegal, you can't assign a type.
}
}
}
}

So to make things confusing, meta-values are in fact types not fields. This means they can't be assigned. Allowing types to change on the fly would exponentially increase the complexity.

However all is not lost, even though the types can't be assigned, they can be initialized, we just have to pass meta-values as template parameters.

So rewriting your idea as meta-functions (type transformations) I get:


type_alias then[code do_T] = do_T;

type_alias else[code do_T] = do_T;

type_alias if[code condition, type if_true, type if_false] =
COND[condition, if_true, true, if_false];

That is surely confusing to many readers, so I'll try to make it as close to valid C++ as possible:


template<code do_T>
struct then {
typedef do_T do;
};

template<code do_T>
struct else {
typedef do_T do;
};

template<code condition, typename if_true, typename if_false>
struct if : COND<condition, if_true::do, true, if_false::do>
{ };

The problem with the C++ is that you can't pass code-blocks as type parameters. You also can't write parameterized typedefs which is infuriating.

Unfortunately I have moved too far away from your original proposal, I don't know how to bridge the gap.

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Metaprogramming Musings Posted: Oct 16, 2005 3:20 PM
Reply to this message Reply
> So, will there be support for something like:
>
>

> x = sin(3.1415926);
>


Yes.

> those bytes are typically in the code segment, not a data
> segment, and can't be changed (or referenced other than in
> the instruction). So I don't see a real difference
> between sizeof(int) and
> sizeof(3).

We should first specify that we are talking about C++ here (other languages most definitely vary. You are correct that the value 3 may occupy the code segment. Most compilers will however store 8 in the code segment if we write <tt>x = 3 + 5;</tt>. Semantically it is still different. Consider:


template<int N>
void fubar() { };
fubar<3>();
int x;
cin >> x;
fubar<x>(); // doesn't compile


In C++, semantically speaking, a variable of type int, is not the same thing as constant of type int. Calling them both ints, is misleading.

Greg Jorgensen

Posts: 65
Nickname: gregjor
Registered: Feb, 2004

Re: Metaprogramming Musings Posted: Oct 17, 2005 2:28 AM
Reply to this message Reply
> Compile-time constants in languages, have a special
> status. In C++ under some conditions, an expression
> involing compile-time constants is resolved at
> compile-time. When and if this occurs is often
> implementation defined, and is unreliable.

I don't understand. Under what conditions would an expression involving compile-time constants not be resolved at compile time by a real compiler? And how is the behavior of any C++ compiler with regard to constants (compile time or otherwise) implementation-defined and unreliable? Do you mean that some compilers evaluate constant expressions at compile time and others leave evaluation to run time? What do you mean by unreliable?

> What this implies is that in the statement: int x = 3
> + 5;, the value 3 and 5 do not
> really behave like variables of type int, but rather
> something different.

They behave like integer constants. Since there is no stack or heap storage required for constants (they will be embedded in the compiled code, as another poster pointed out), they of course don't behave like variables.

I think of this differently. In C/C++ a variable is a name for a storage location that contains some value of a given type. A constant is a specific value with no name and no stack or heap storage location. A constant has a type specified by the programmer, implicitly or explicitly: 3 is an integer, 3L is a long, 3.0 is a double. A constant occupies the same amount of space and is represented the same way as one of the valid types: 3 is represented exactly as an int variable containing 3, and 300.0 is represented the same as a double containing 300.0. The compiler is free to simplify constant expressions, and it evaluates constants according to the same conversion and promotion rules as any other values.

Would you say that expressions don't have types because no storage is allocated (i.e. the value of an expression may exist only in registers)? If x and y are integer variables, what is the type of (x + y)? You can't take the size of a non-constant expression in C/C++, though the value does have a size when it is evaluated.

> One big difference is they occupy
> zero bytes. Ironically if I took the size, e.g.
> sizeof(3), I would get a non-zero value, but in
> reality, no storage is allocated. This is why I consider
> constant integer literals as instances of a separate type,
> which I call a meta-int.

If you take the size of a type, e.g. sizeof(int) you get a non-zero value, too, even though no storage is allocated -- not even in the code segment. Sizeof() is itself a constant expression evaluated at compile time (gcc extensions excepted). Sizeof() doesn't just answer the question "How many bytes does this occupy?" It answers the more general question "How many bytes does this type, or something with the same type as my operand, occupy?"

> Semantically it is still
> different. Consider:
>
> template<int N>
> void fubar() { };
> fubar<3>();
> int x;
> cin >> x;
> fubar<x>(); // doesn't compile
>
> In C++, semantically speaking, a variable of type int, is
> not the same thing as constant of type int. Calling them
> both ints, is misleading.

I don't think the C or C++ standards ever say that constants and variables are the same thing. It would be clearer to say that an integer constant and the value of an integer variable have the same type. I have never been confused by the distinction between constants and variables.

In your example it appears you are faulting the C++ compiler for its inability to compile fubar<x>(). You of course know that templates are compile-time constructs; attempting to specialize a template at run-time doesn't make any sense if we're still talking about C++. The example doesn't show me any fault with C++, nor does it help me to understand the point you are trying to make about the supposed semantic difference between constants and variables in C++.

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Metaprogramming Musings Posted: Oct 17, 2005 9:15 AM
Reply to this message Reply
> I don't understand. Under what conditions would an
> expression involving compile-time constants not be
> resolved at compile time by a real compiler?

3.141 * 2;

> Do you mean that some compilers evaluate
> constant expressions at compile time and others leave
> evaluation to run time?

Yes.

> > What this implies is that in the statement: int x = 3
> > + 5;, the value 3 and 5 do not
> > really behave like variables of type int, but rather
> > something different.
>
> They behave like integer constants. Since there is no
> stack or heap storage required for constants (they will be
> embedded in the compiled code,

Not neccessarily.

> as another poster pointed
> out), they of course don't behave like variables.
> I think of this differently.

That is valid.

> Would you say that expressions don't have types because no
> storage is allocated (i.e. the value of an expression may
> exist only in registers)?

Not that they don't have type, but they have a different type. The type of an expression is an rvalue (also known as a temporary).

x + y

has a type of "rvalue int". Which is different than "lvalue int". This indicates it has different semantics:

x + y = 12; // illegal

This extra label doesn't exist in the lexicon of C++, but the standard makes it clear that lvalues and rvalues behave differently. Just like the following types are different:

int, const int, int&, int*, int const*, int const&

Even though labels don't exist to distinguish rvalues/lvalues [yet -- the standard plans on introducing new notation to fix that] it is logical to say that they are distinct as well. Furthermore, once we start saying that some values are compile-time values, and some are run-time values. Well then these are very different things.

> It would be clearer to say that an integer constant and the value of an integer variable have the same type.

That would be false, because one can be reassigned and another can't. They have different semantics. Two values/variable of the same type must have the same semantics. I know that this is contrary to popular belief, but it is the only interpretation of type which makes sense from a rigorous standpoint.

Greg Jorgensen

Posts: 65
Nickname: gregjor
Registered: Feb, 2004

Re: Metaprogramming Musings Posted: Oct 17, 2005 12:12 PM
Reply to this message Reply
> > I don't understand. Under what conditions would an
> > expression involving compile-time constants not
> > be resolved at compile time by a real compiler?
>
> 3.141 * 2;

The compiler can see how a constant expression is used, so in a lot of cases it can go ahead and evaluate at compile time.

double d = 3.141 * 2;

That can be evaluated at compile time; I'd be surprised not to find 6.282 in the code segment.

The compiler is under no obligation that I know of to aggressively evaluate constant expressions; is that what you mean by implementation-dependent and unreliable? I interpreted unreliable to mean that a compiler might evaluate 3.141 * 2 as 5 -- that would be a problem.

> This extra label doesn't exist in the lexicon of C++, but
> the standard makes it clear that lvalues and rvalues
> behave differently. Just like the following types are
> different:
>
> int, const int, int&, int*, int const*, int
> const&

>
> Even though labels don't exist to distinguish
> rvalues/lvalues [yet -- the standard plans on introducing
> new notation to fix that] it is logical to say that they
> are distinct as well. Furthermore, once we start saying
> that some values are compile-time values, and some are
> run-time values. Well then these are very different
> things.

I agree that all of those things are distinct semantically. And I think the C++ standard explains the differences between rvalues and lvalues well enough. What it doesn't do is call them different types. You are extending the notion of type to encompass semantic differences beyond the concept of types. That's fine for talking about Heron but I got the impression from your original posting that you were talking about C++, at least the part about integer constants.

> > It would be clearer to say that an integer constant and
> > the value of an integer variable have the same type.
>
> That would be false, because one can be reassigned and
> another can't. They have different semantics. Two
> values/variable of the same type must have the same
> semantics. I know that this is contrary to popular belief,
> but it is the only interpretation of type which makes
> sense from a rigorous standpoint.

Hmmm, I think this is a slippery slope. It reminds me of one of Pascal's worst shortcomings: making the size of an array part of the type. Kernighan describes the problem in his paper "Why Pascal Is Not My Favorite Programming Language," available at http://www.lysator.liu.se/c/bwk-on-pascal.html. I think expanding the definition of "type" might make the language semantics even more complicated than C++.

An integer constant 3 can (and most likely does) have the same internal representation as the contents of an integer variable that contains 3, at least in C++. In that sense the two values are the same type. The compiler enforces different semantics but I was not categorizing those semantics as a type difference.

One of the shortcomings of C and C++ (and many other languages) is that types are built around low-level representations (int, char, float) rather than around domains of legal values. This problem plagues relational databases through SQL, too. Sometimes it makes sense to say that a variable can contain any integer representable with 32 bits or less, but more often what you really want is a variable that can contain a subset of values, and those values may or may not be the same machine-level type. Object oriented programming addresses this but only for user-defined types packaged as classes, and then the conversion and compatibility problems are pushed on to the programmer. Another approach is generics and duck typing, where the code doesn't have to care about types as much and instead works with what an object or variable can do or respond to.

Max Lybbert

Posts: 314
Nickname: mlybbert
Registered: Apr, 2005

Re: Metaprogramming Musings Posted: Oct 17, 2005 2:12 PM
Reply to this message Reply
/* The compiler can see how a constant expression is used, so in a lot of cases it can go ahead and evaluate at compile time.

double d = 3.141 * 2;

That can be evaluated at compile time; I'd be surprised not to find 6.282 in the code segment.
*/

With today's technology, you're probably right. Even interpreted languages are able to optimize like that.

OTOH, the currect C++ standard does not *guarantee* this behavior. I think that was the point of the original post. The proposal seems to make it possible for the programmer to get a guarantee. That doesn't prevent compilers from making the same optimization when it's possible.

/* I think the C++ standard explains the differences between rvalues and lvalues well enough. What it doesn't do is call them different types.
*/

If you really want to see some gruesome rvalue/lvalue issues, try http://www.artima.com/cppsource/foreach.html .

Greg Jorgensen

Posts: 65
Nickname: gregjor
Registered: Feb, 2004

Re: Metaprogramming Musings Posted: Oct 17, 2005 4:39 PM
Reply to this message Reply
> OTOH, the currect C++ standard does not *guarantee* this
> behavior. I think that was the point of the original
> post. The proposal seems to make it possible for the
> programmer to get a guarantee. That doesn't prevent
> compilers from making the same optimization when it's
> possible.

OK, I understand the original point now. What I don't understand is why I as a programmer would care if the compiler evaluated expressions at compile time or not, as long as the behavior of my code isn't affected. I think whether the compiler evaluates the expression or it's left for runtime, C and C++ do guarantee that the behavior is the same.

It seems that the idea is to get run-time visibility of things that the compiler may have evaluated or optimized away. Metaprogramming happens after the compiler has had its way with the source code, so some relevant information is gone... is that the issue?

Max Lybbert

Posts: 314
Nickname: mlybbert
Registered: Apr, 2005

Re: Metaprogramming Musings Posted: Oct 17, 2005 5:58 PM
Reply to this message Reply
/* OK, I understand the original point now. What I don't understand is why I as a programmer would care if the compiler evaluated expressions at compile time or not, as long as the behavior of my code isn't affected. I think whether the compiler evaluates the expression or it's left for runtime, C and C++ do guarantee that the behavior is the same.
*/

I'm a rather new programmer, so YMMV. However, in all honesty, I have not yet had a need for this particular guarantee. In that case, leaving the optimization up to the compiler is perfectly OK.

Then again, if you're familiar with tail recursion, you should recognize that sometimes "correct" behavior may not necessarily be the best behavior. Yes, the end result is the same, if you don't run out of memory first.

No, the original post does not add tail recursion to Heron. (I only make the disclaimer because somebody somewhere will misinterpret the example.)

OTOH, some people use Duff's Device (http://www.lysator.liu.se/c/duffs-device.html) to do what they can to avoid the computation involved in the top part of a for loop (the 'for(int i = 0; i < some_value; i++)' part).

/* It seems that the idea is to get run-time visibility of things that the compiler may have evaluated or optimized away. Metaprogramming happens after the compiler has had its way with the source code, so some relevant information is gone... is that the issue?
*/

That seems to also be wrapped up in this post. Unfortunately that's also a part I'm not all that clear on, so I'll need to let cdiggins answer it more fully. My impression is that there will be a way to get your macro values, *but* you'll have to identify them as macro values. The rationale is that there are subtle differences between compile-time constants and run-time variables, so it's good to remember which is which.

Sean Conner

Posts: 19
Nickname: spc476
Registered: Aug, 2005

Re: Metaprogramming Musings Posted: Oct 17, 2005 10:06 PM
Reply to this message Reply

template<int N>
void fubar() { };
fubar<3>();
int x;
cin >> x;
fubar<x>(); // doesn't compile


“In C++, semantically speaking, a variable of type int, is not the same thing as constant of type int. Calling them both ints, is misleading.”

Chris,

You may have to pick a better example than this one. Here, I could argue that you are mixing the concepts of compile time and runtime. Perhaps if the following:


template<int N>
void fubar() {};
fubar<3>();
const int x = 4;
fubar<x>();


doesn't compile (I don't know, I'm not a C++ expert here) then one might make the distinction between a bare constant (for lack of a better term) “3” and a defined constant x set at compile time.

I still don't see why there should be any semantic difference between x = 3 + 5 and x = 8 since both result in x having the value of 8. Sure, it may make a difference if one looks at the assembly code and sees:


mov eax,8


instead of the expected:


mov eax,3
add eax,5


but I would expect the compiler to generate the former, not the latter, code sequence (and in fact, the latter does have side effects, even if they are ignored side effects and left to the reader to puzzle out).

I do agree on a distinction between rvalues and lvalues as presented in proceduralesque langauges (like Pascal, C and C++) since they are a handy shorthand for what is going on (and avoid LISP's annoying set, setq and setf functions).

And speaking of LISP (actually, Common LISP) …

For some reason, this thread has reminded me of a quirk in (Common) LISP. The following is true:

(eq A A)

which passes all Objective muster, but:

(eq 2 2)

may be false, although why 2 (reguardless of how large a 2 it is) is not necessarily equal to 2 (reguardless of how small a 2 it is) is puzzling (actually, I know the reason, but it's one of those implementation details of LISP that make me classify LISP as a low-level language, like Forth). Are not all bare constants equal to themselves?

Flat View: This topic has 11 replies on 1 page
Topic: Designing a Language for Library Developers Previous Topic   Next Topic Topic: Open Laszlo

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use