Sunday, May 03, 2009

MetaPython 0.2.2 with Hygienic Macros

In my ever-expanding quest to, as @jgustak recently tweeted, "introduce evil to Python to prevent even scarier evil," I have released MetaPython 0.2.2 Once again, if you aren't familiar with MetaPython, a good place to start is the tutorialwhich walks you through the construction of a macro-ized collections.namedtuple from the Python 2.6 standard library. If you already know MetaPython, here's the stuff that's new in 0.2.2.



One issue that macro implementors must eventually face is whether they want their
macro system to be hygienic or not. According to wikipedia,



Hygienic macros are macros whose expansion is guaranteed not to cause collisions with existing symbol definitions. They are a feature of programming languages such as Scheme and Dylan.

So what's all that mean? The best way to explain is probably by showing the bad things unhygienic macros bring you. Here is something you could do in MetaPython 0.2:


def vadd(result, a, b):
defcode foo:
for i, (aa,bb) in enumerate(zip($a, $b)):
$<result>[i] = aa+bb
return foo

...

$vadd(?result, ?a, ?b)

Calling this macro $vadd(?z, ?x, ?y) then gives you something like:


for i,(aa,bb) in enumerate(zip(x, y)):
z[i] = aa+bb

Which is fine, as long as you weren't planning on using the i, aa, or bb variables for anything important in the surrounding code. This is especially bad because macros can introduce names that conflict with names in the context where they are expanded surprisingly, without any indication to the macro user that they are going to do so.


In MetaPython 0.2.1 and 0.2.2, you declare the block as defcode foo(): which tells MetaPython that the foo block should not capture any variables from its context when it is expanded. In this case, MetaPython detects that the variables i, aa, and bb are assigned in the block and replaces them with "known unique" names (names that should not exist in the surrounding block, wherever they are expanded. In MetaPython 0.2.1 and 0.2.2, you get this expansion:


for _mpy_0,(_mpy_1,_mpy_2) in enumerate(zip(x, y)):
z[_mpy_0] = _mpy_1 + _mpy_2

So as long as you avoid using names starting with _mpy, you should be fine.


Now sometimes, you actually want to capture a value from the context into which the macro is being defined. One example is when you're writing a class factory like namedtuple, covered in the tutorial. Your goal is to generate a new class, and if that class has some weird _mpy_* name, it's pretty useless. So MetaPython lets you specify, via the arguments to defcode, which names should be captured. In the namedtuple example, for instance, the defcode declaration looks like defcode result(typename):, where typename is a variable containing the name of the class being created. When MetaPython expands the code block, then, any names mentioned in the argument list will not be auto-renamed by the "sanitizer."


MetaPython 0.2.1 and 0.2.2 both had this ability to hygienically expand macros, but 0.2.2 added the ability to use variable arguments to the defcode block. One time when you might want to do this is in a domain-specific language context. Say you wanted to specify that a class contained certain properties, and that those properties should be accessed via the property builtin function. For instance, say you wanted to have a class Foo with properties a, b, c, and d implemented by semi-private instance variables _a, _b, _c, and _d. You might write something like this:


class Foo(object):
$has_properties(?a, ?b, ?c, ?d)

The implementation of has_properties, then, is the following:


def has_properties(*props):
str_props = (str(p) for p in props)
gen = ( (p,
'_' + p,
'_get_' + p,
'_set_' + p)
for p in str_props )
defcode result(*props):
$for pub, pri, getter, setter in gen:
def $<getter>(self):
return self.$pri
def $<setter>(self, value):
self.$pri = value
$pub = property($getter, $setter)
return result

The final expanded version of Foo is then:


class Foo (object ):
def _mpy_3 (self ):
return self ._a

def _mpy_7 (self ,value ):
self ._a =value

a =property (_mpy_3 ,_mpy_7 )
def _mpy_1 (self ):
return self ._b

def _mpy_5 (self ,value ):
self ._b =value

b =property (_mpy_1 ,_mpy_5 )
def _mpy_2 (self ):
return self ._c

def _mpy_6 (self ,value ):
self ._c =value

c =property (_mpy_2 ,_mpy_6 )
def _mpy_4 (self ):
return self ._d

def _mpy_8 (self ,value ):
self ._d =value

d =property (_mpy_4 ,_mpy_8 )

Note in particular that the getters and setters were sanitized, while the actual property names a, b, c, and d were skipped.


So MetaPython now has what I think is a workable hygienic macro system with appropriate escapes for "non-hygienic" operation. Any comments, questions, or criticisms are welcome, as well as ideas for how you are using or might use MetaPython. Let me know what you think!

4 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. Anonymous8:51 AM

    keep getting AssertionError in "c:\Python26\lib\tokenize.py", line 187, in add_whitespace() when I use metapython, and also in all.py testcases.

    ReplyDelete
  3. Thanks for the bug report. I have added an issue on the Google code project. The main problem is that I haven't added Python 2.6 compatibility yet, and MetaPython depends on the tokenize module, which seems to have changed a bit in the 2.5 => 2.6 upgrade.

    ReplyDelete
  4. Anonymous5:54 AM

    try replacing tokenize.generate_tokens() with tokenize.tokenize() in parse.py

    ReplyDelete