Summary
I am examing the meaning of purity with regards to functions in an attempt to close the gap between imperative and functional languages.
Advertisement
Previously I was uing the terms "clean" and "dirty" but the terms
"pure" and "impure" are perhaps more standard and understandable.
Here I attempt to clarify the meaning of purity and impurity,
and how I am dealing with it.
A pure function is one which has no side-effects and does not depend on any state beyond its local scope. This means that it can be replaced
with any other pure function which returns the same result given the same input.
This property is often referred to as referential transparency.
Consider the following psuedo-code:
def pure_f(x) = { return x + 1; }
def impure_f(x) = { Console.WriteLine(x); return x + 1; }
Note: A map function generates a list from another list by applying a function
to each member of the original list.
The conundrum is that if we don't distinguish between the uses of map in these contexts then we
are stuck having to implement map in an order-dependant way.
I am trying to bridge the gap in the Cat language. My solution is to use purity labeling which is
similar to const labeling in C++. It has some parallels to Haskell monads, Clean uniqueness
and escape analysis.
In Cat there are only functions. Functions all share a global stack. They take their parameters from
the top of the shared stack, and return their results on the top of the shared stack. Local variables
are emulated by simply pushing and popping values from the top of the stack. The local scope of a function is the part of the stack above, and including, the lowest value it observes or manipulates on the shared stack.
A Cat function is pure iff takes a constant number of parameters from the top of the stack, it returns a constant
number of results to the top of the stack, and it has no side-effects. Some of the primitives of Cat are
pure, while others aren't. It is easy enough, for the compiler to identify an impure function. If a function contains an impure function, then it is impure.
Some cat primitive functions (such as map, ifthen, or whiledo) operate on other functions, in these cases there
needs to be overloaded versions of the functions: a pure version and an impure version.
Returning to the map problem,
the proposed Cat solution, is that depending on the type of the second
parameter (i.e. is it a pure function or an impure function) then either a pure_map or impure_map is chosen
by the compiler. The pure version of map is order independant, and as such is easily optimized,
and easily parallelized (which supports my earlier post ).
Postscript
As Jules Jacobs pointed out, some impure functions, can become pure when they are passed constant values.
It's not nice, but I don't think it's pureness is simple in Cat.
For example, there are two functions: startlist and endlist. startlist pushes a unique value on the stack, say #a. endlist builds a list from the values on the stack until it encounters #a. The endlist function is not pure, because it pops a variable number of things.
Now we write this function:
f = startlist 1 2 3 endlist
This function is not pure, because it contains an impure fucntion: endlist.
F can also be written like this:
f = [1 2 3]
This version *is* pure.
Now look at the function +:
2 3 + == 5
This function is obviously pure. However, if you write this function:
g = [+] do-times
assuming that do-times executes the quoted program n times:
1 2 3 4 3 g == 10
g contains no impure functions, but g itself isn't pure.
So + isn't pure, because + pops 2 items, and pushes 1 item back. Note that in other languages, like Haskell, + *is* pure. This could be a big problem for Cat.
I have just a couple small quibbles. A function doesn't have to have side-effects to be impure. For instance, a C function that reads from non-constant global variables. Because the function isn't dependent only on the function arguments, it might be considered impure.
How would you consider a function which has side effects but reverses them before it finishes, e.g. acquiring a lock, and releasing it later?
Also, a more general definition of side-effect probably wouldn't refer to stack frames. Talking about stack frames becomes quite confusing if Cat is involved.
... > This function is obviously pure. However, if you write > this function: > > g = [+] do-times > > assuming that do-times executes the quoted program n > times:
I assume n is a value on the stack? According to my rules, g would be impure.
> 1 2 3 4 3 g == 10 > > g contains no impure functions, but g itself isn't pure.
However, "3 g" itself is pure. It always consumes four parameters and returns one result. So there is definitely a flaw in my rules.
> So + isn't pure, because + pops 2 items, and pushes 1 item > back.
+ is always pure because it always consumes a constant number of parameters (2) and returns a constant number of results (1).
> Note that in other languages, like Haskell, + *is* > pure. This could be a big problem for Cat.
I think we agree that there is a problem, but disagree where. Hmmm....
> I have just a couple small quibbles. A function doesn't > have to have side-effects to be impure. For instance, a C > function that reads from non-constant global variables. > Because the function isn't dependent only on the function > arguments, it might be considered impure.
Yes, you are definitely correct. So simple observation of state is enough to render a function impure. I think I'll update my post to reflect this.
> How would you consider a function which has side effects > but reverses them before it finishes, e.g. acquiring a > lock, and releasing it later?
I would consider it impure. This would lose referential transparency. Shifting these kinds of functions around would result in creating a potentially unstable environment. So order dependency would remain.
> Also, a more general definition of side-effect probably > wouldn't refer to stack frames. Talking about stack frames > becomes quite confusing if Cat is involved.
Good point. In Cat I consider the stack frame to start at the lowest point where a function inspects values. So a binary function, such as asdd, inspects or manipulates, only the top two stack items. That is its stack frame. In Cat this is ony an abstract notion, nothing really enforces it.
> 1 2 3 4 3 g == 10 > > g contains no impure functions, but g itself isn't pure.
This could probably be discovered by a smart compiler, uncover opportunities for purification. That is taking an impure function and making it pure again.
> I have just a couple small quibbles. A function doesn't > have to have side-effects to be impure. For instance, a C > function that reads from non-constant global variables. > Because the function isn't dependent only on the function > arguments, it might be considered impure.
Yes. That's interesting because it can be trivially resolved but only if the global variable is threaded as a parameter through other functions that don't act on it. Of course OO encourages side-effects i.e. methods acting on member variables. Since objects but not functions are the units of OO programs "referential transparency" in the OO context could only mean that two instances a, b of a class A, created with the same set of constructor parameters will always behave the same for same input sequences of applied to the same methods. I stay sceptical about the attempts to unify FP and OO and resolving the "conceptual gap" by turning back to functions in order to preserve good design.
> I assume n is a value on the stack? According to my rules, > g would be impure.
Yes, n is a value on the stack. Maybe I don't understand your rules, but I think the program [+] is pure, and do-times is pure too (because i is pure). So [+] do-times would be pure too?
Could you explain your rules, I don't think I understand them...
This function puts an infinite number of 2's on the stack. Is this function pure? Rule: a function is pure it all functions it calls are pure. 2 is pure, but is f? You don't know.
Also, is i pure?
"[a-pure-func] i" is pure. "[an-impure-func] i" is not pure. "[an-impure-func]" is pure too? (as long as you don't de-quote it)
> > I assume n is a value on the stack? According to my > rules, > > g would be impure. > > Yes, n is a value on the stack. Maybe I don't understand > your rules, but I think the program [+] is pure, and > do-times is pure too (because i is pure). So [+] do-times > would be pure too? > > Could you explain your rules, I don't think I understand > them...
Pure functions consume a constant number of elements, produce a constant number of elements, and have no side-effects.
do_times is impure because as a function it potentially consumes a non-constant number of parameters, and produces a non-constant number of results.
> f = 2 f > > This function puts an infinite number of 2's on the stack. > Is this function pure? Rule: a function is pure it all > functions it calls are pure. 2 is pure, but is f? You > don't know.
f is impure because it doesn't produce a constant number of values.
> Also, is i pure?
That depends on whether it is provided a pure function or impure function to execute.
> "[a-pure-func] i" is pure.
Yes.
> "[an-impure-func] i" is not pure.
Yes.
> "[an-impure-func]" is pure too? (as long as you don't > de-quote it)
> f is impure because it doesn't produce a constant > number of values.
I meant, according to your rules. I don't think you can create rules to check if a function is pure or not. You can reason about it, but you can't be sure that certain functions are pure or impure (a solution would be to treat these functions as impure unless you flag them as pure).
For example:
function f(n) if true pure_func else impure_func end end
This function is pure.
function f(n) if is_sum_of_two_primes_or_odd(n) pure_func else impure_func end end
Is this function pure? If is_sum_of_two_primes_or_odd(n) returns true for every n, it's pure. However, mathematicians haven't been able too prove that yet.
Let's assume that it's true, is_sum_of_two_primes_or_odd(n) returns true, n doesn't matter. Is the compiler able to figure out that it's true? The compiler probably can't prove that this theorem is correct, so it labels f as impure, which is incorrect. This is an extreme case, but I'm sure there are many other situations where a compiler can't figure out whether a function is pure or not.
> I meant, according to your rules. I don't think you can > create rules to check if a function is pure or not. You > can reason about it, but you can't be sure that certain > functions are pure or impure (a solution would be to treat > these functions as impure unless you flag them as pure).
I fully agree.
Since you can still easily identify the majority of definitively pure functions, you can optimize and rewrite those programs with ease.
> <p/> > Previously I was uing the terms "clean" and "dirty" but > the terms > "pure" and "impure" are perhaps more standard and > understandable. > Here I attempt to clarify the meaning of purity and > impurity, > and how I am dealing with it. > <p/> > A pure function is one which has no side-effects and does > not depend on any state beyond its local scope. This means > that it can be replaced > with any other pure function which returns the same result > given the same input. > This property is often referred to as referential > transparency. > <p/> > Consider the following psuedo-code: > > <pre> > def pure_f(x) = { return x + 1; } > def impure_f(x) = { Console.WriteLine(x); return x + 1; > 1; } > </pre> > > Now consider the following function: > > <pre> > def pure_g(list) = { return map(list, pure_f); } > def impure_g(list) = { return map(list, impure_f); } > </pre> > > <blockquote> > Note: A map function generates a list from another list > st by applying a function > to each member of the original list. > </blockquote> > > The conundrum is that if we don't distinguish between the > uses of map in these contexts then we > are stuck having to implement map in an order-dependant > way. > <p/> > I am trying to bridge the gap in the Cat language. My > solution is to use purity labeling which is > similar to const labeling in C++. It has some parallels to > Haskell monads, Clean uniqueness > and escape analysis. > <p/> > In Cat there are only functions. Functions all share a > global stack. They take their parameters from > the top of the shared stack, and return their results on > the top of the shared stack. Local variables > are emulated by simply pushing and popping values from the > top of the stack. The local scope of a function is the > part of the stack above, and including, the lowest value > it observes or manipulates on the shared stack. > <p/> > A Cat function is pure iff takes a constant number of > parameters from the top of the stack, it returns a > constant > number of results to the top of the stack, and it has no > side-effects. Some of the primitives of Cat are > pure, while others aren't. It is easy enough, for the > compiler to identify an impure function. If a function > contains an impure function, then it is impure. > <p/> > Some cat primitive functions (such as map, ifthen, or > whiledo) operate on other functions, in these cases there > needs to be overloaded versions of the functions: a pure > version and an impure version. > Returning to the map problem, > the proposed Cat solution, is that depending on the type > of the second > parameter (i.e. is it a pure function or an impure > function) then either a pure_map or impure_map is chosen > by the compiler. The pure version of map is order > independant, and as such is easily optimized, > and easily parallelized (which supports my <a > href="http://www.artima.com/weblogs/viewpost.jsp?thread=166 > 178">earlier post</a> ). > <p> > > <h3>Postscript</h3> > > As Jules Jacobs pointed out, some impure functions, can > become pure when they are passed constant values.
Why don't you make the compiler infer the purity attribute? it would be great if this task could be automated.
There is clean functional programming languages. Ones that, on certain circumstances, allow impure functions yet stays garbage-free and have no side effects. I am talking about languages that disallows the change of "pasts". (pasts and futures are pieces in continuations)
For a functional language having a stack-based execution style the impure features can act on the arguments stack's top ONLY. Since changing just the top of that stack doesn't impact past results, it makes no difference between impure and pure features.
I repeat: impure features is allowed to change the top of the stack only. But in the case of a stack-based executed functional language it's that stack's top the only place changes can occur!
(I bet you didn't expect that; but it was that the main reason I was so interrested in Cat!)