One of the things that's been nice about working with SourceForge for the last few months is the chance I get to work with new open source technology and contribute something back. Well, the first (in a long series, I hope) of projects we're putting out there as the result of recent work is a little library we wrote called Ming. It's been available over git for a while, but I just finished publishing our first "official" release, with a tutorial, over at http://merciless.sourceforge.net and http://pypi.python.org/pypi/Ming/0.1. Ming gives you a way of enforcing schemas on a MongoDB database in a Python application. We also throw in automatic, lazy schema migration for good measure.
If you haven't heard of it yet, MongoDB describes itself as on its website as follows:MongoDB (from "humongous") is a scalable, high-performance, open source, schema-free, document-oriented database.
There are lots of trade-offs to consider when using a non-relational database like MongoDB. On the plus side, MongoDB is simple to develop for (no SQL), extremely fast, and models hierarchical relationships really nicely. On the downside, we give up relational integrity constraints and transactional behavior.
With all the flexibility provided by MongoDB, however, comes some danger. When the database is schema-free, you have to spend extra code making sure that the data you put in and the data you take out is valid for your application. That's where Ming comes in. A play off the character of Ming the Merciless who ruled the planet of Mongo with an iron fist in Flash Gordon, the Ming library provides a succinct way to specify the requirements your application has for the data it produces and consumes. Ming also supports lazy migration of documents across schema revisions. To give you a flavor for how easy this can be, here is a sample schema with migration from the tutorial:from ming.datastore import DataStore
from ming import Session
from ming import Document, Field, schema
bind = DataStore('mongo://localhost:27017/tutorial')
session = Session(bind)
class OldWikiPage(Document):
class __mongometa__:
session = session
name = 'wiki_page'
_id = Field(schema.ObjectId)
title = Field(str)
text = Field(str, if_missing='')
metadata = Field(dict(
tags=[str],
categories=[str]))
class WikiPage(Document):
class __mongometa__:
session = session
name = 'wiki_page'
version_of = OldWikiPage
def migrate(data):
result = dict(
data,
tags=data['metadata']['tags'],
categories=data['metadata']['categories'],
version=1)
del result['metadata']
return result
_id = Field(schema.ObjectId)
version = Field(1)
title = Field(str)
text = Field(str, if_missing='')
tags = Field([str])
categories = Field([str])
Hopefully, that's enough to whet your appetite for now. Rather than duplicating the entire tutorial, I'll direct you to the docs at http://merciless.sourceforge.net for more information. Let me know what you think!
Monday, December 07, 2009
Ming 0.1 Released - Python Library for MongoDB
Posted by
Rick Copeland
at
5:11 PM
3
comments
Links to this post
Labels: ming, mongodb, programming, python
Sunday, May 03, 2009
MetaPython 0.2.2 with Hygienic Macros
In my ever-expanding quest to, as @jgustak recently tweeted, "introduce evil to Python to prevent even scarier evil," I have released MetaPython 0.2.2 Once again, if you aren't familiar with MetaPython, a good place to start is the tutorialwhich walks you through the construction of a macro-ized collections.namedtuple from the Python 2.6 standard library. If you already know MetaPython, here's the stuff that's new in 0.2.2.
One issue that macro implementors must eventually face is whether they want their
macro system to be hygienic or not. According to wikipedia,
Hygienic macros are macros whose expansion is guaranteed not to cause collisions with existing symbol definitions. They are a feature of programming languages such as Scheme and Dylan.
So what's all that mean? The best way to explain is probably by showing the bad things unhygienic macros bring you. Here is something you could do in MetaPython 0.2:
def vadd(result, a, b):
defcode foo:
for i, (aa,bb) in enumerate(zip($a, $b)):
$<result>[i] = aa+bb
return foo
...
$vadd(?result, ?a, ?b)
Calling this macro $vadd(?z, ?x, ?y) then gives you something like:
for i,(aa,bb) in enumerate(zip(x, y)):
z[i] = aa+bb
Which is fine, as long as you weren't planning on using the i, aa, or bb variables for anything important in the surrounding code. This is especially bad because macros can introduce names that conflict with names in the context where they are expanded surprisingly, without any indication to the macro user that they are going to do so.
In MetaPython 0.2.1 and 0.2.2, you declare the block as defcode foo(): which tells MetaPython that the foo block should not capture any variables from its context when it is expanded. In this case, MetaPython detects that the variables i, aa, and bb are assigned in the block and replaces them with "known unique" names (names that should not exist in the surrounding block, wherever they are expanded. In MetaPython 0.2.1 and 0.2.2, you get this expansion:
for _mpy_0,(_mpy_1,_mpy_2) in enumerate(zip(x, y)):
z[_mpy_0] = _mpy_1 + _mpy_2
So as long as you avoid using names starting with _mpy, you should be fine.
Now sometimes, you actually want to capture a value from the context into which the macro is being defined. One example is when you're writing a class factory like namedtuple, covered in the tutorial. Your goal is to generate a new class, and if that class has some weird _mpy_* name, it's pretty useless. So MetaPython lets you specify, via the arguments to defcode, which names should be captured. In the namedtuple example, for instance, the defcode declaration looks like defcode result(typename):, where typename is a variable containing the name of the class being created. When MetaPython expands the code block, then, any names mentioned in the argument list will not be auto-renamed by the "sanitizer."
MetaPython 0.2.1 and 0.2.2 both had this ability to hygienically expand macros, but 0.2.2 added the ability to use variable arguments to the defcode block. One time when you might want to do this is in a domain-specific language context. Say you wanted to specify that a class contained certain properties, and that those properties should be accessed via the property builtin function. For instance, say you wanted to have a class Foo with properties a, b, c, and d implemented by semi-private instance variables _a, _b, _c, and _d. You might write something like this:
class Foo(object):
$has_properties(?a, ?b, ?c, ?d)
The implementation of has_properties, then, is the following:
def has_properties(*props):
str_props = (str(p) for p in props)
gen = ( (p,
'_' + p,
'_get_' + p,
'_set_' + p)
for p in str_props )
defcode result(*props):
$for pub, pri, getter, setter in gen:
def $<getter>(self):
return self.$pri
def $<setter>(self, value):
self.$pri = value
$pub = property($getter, $setter)
return result
The final expanded version of Foo is then:
class Foo (object ):
def _mpy_3 (self ):
return self ._a
def _mpy_7 (self ,value ):
self ._a =value
a =property (_mpy_3 ,_mpy_7 )
def _mpy_1 (self ):
return self ._b
def _mpy_5 (self ,value ):
self ._b =value
b =property (_mpy_1 ,_mpy_5 )
def _mpy_2 (self ):
return self ._c
def _mpy_6 (self ,value ):
self ._c =value
c =property (_mpy_2 ,_mpy_6 )
def _mpy_4 (self ):
return self ._d
def _mpy_8 (self ,value ):
self ._d =value
d =property (_mpy_4 ,_mpy_8 )
Note in particular that the getters and setters were sanitized, while the actual property names a, b, c, and d were skipped.
So MetaPython now has what I think is a workable hygienic macro system with appropriate escapes for "non-hygienic" operation. Any comments, questions, or criticisms are welcome, as well as ideas for how you are using or might use MetaPython. Let me know what you think!
Posted by
Rick Copeland
at
1:40 PM
4
comments
Links to this post
Labels: hygiene, macros, metapython, python
Friday, April 17, 2009
MetaPython 0.2 Release
For those intrepid souls who are interested in generating Python code from the macros and code quoting facilities of MetaPython, I have spun a new release. If you aren't familiar with MetaPython, a good place to start is the tutorial, which walks you through the construction of a macro for generating a macro-ized version of collections.namedtuple from the Python 2.6 standard library. If you are already familiar with MetaPython, I will try to summarize the changes and ideas in this release here.
MetaPython 0.1 was mainly a proof of concept implementation to show that it was possible to generate a moderately useful macro facility in a short period of time. I did, however, make one really questionable decision: to use Jinja2 as the templating language for code quotes.
Don't get me wrong; Jinja2 is an awesome templating language. But if I am generating a language extension (MetaPython) that allows you to change the text of a module as just before it gets imported, wouldn't it be nice to also be able to change the text of a code quote using the same syntax? Hence version 0.2.
There was also the issue of ugly syntax with the ?, $, and {%...%} operators. Although ? and $ are still there, I have tried to normalize their use a bit. $ always introduces a construct that should be executed or evaluated "earlier" than the surrounding code. (Inside a defcode...: block, this means at block construction time, otherwise it means at import time.) The ? operator now has a much more limited use as an inline code quoting operator.
Other than syntax changes, one of the main things you'll notice in MetaPython 0.2 is the introduction of import-time control statements ($for..., $if..., etc.) These allow the conditional or repeated expansion of code blocks. These constructs basically obviate the need for another template language (Jinja2) inside defcode... blocks.
Another big change and move toward normalization is that macro calls are now just import-time function calls. This means that their arguments are evaluated before they are called, not passed through as code objects. In order to get the old behavior, you can simply quote the arguments you wish to be sent through unevaluated. For instance, in the old syntax, creating a named tuple was accomplished via ?namedtuple(Point, x, y), whereas in the new syntax, you would type $namedtuple(?Point, ?x, ?y). This makes it substantially easier to write macros which need non-code arguments, and follows the Python Zen of "Explicit is better than implicit."
There have also been significant reworking of the internal MetaPython parser and code construction machinery, although this should not be user-visible. So try it out, let me know what you think, and have fun!
Posted by
Rick Copeland
at
9:34 AM
4
comments
Links to this post
Labels: macros, metapython, python programming
Friday, April 10, 2009
MetaPython Presentation
Last night at the Python Atlanta meetup I gave a brief talk on MetaPython, including the motivations for doing something so profane as adding macros and code quoting to Python. The video is on blip.tv and you can find the slides on the MetaPython.org Enjoy!
Read More......
Posted by
Rick Copeland
at
1:23 PM
0
comments
Links to this post
Thursday, March 19, 2009
Announcing MetaPython - Macros for Python
As I mentioned in my last post, I have been considering writing some version of macros for Python and was looking for use cases. Well, having gotten the use cases I so desired from my wonderful commenters, I went ahead and put together an import hook and Google Code project that I'm calling MetaPython — all just in time for PyCon! (I have no talks, but I will be there, and would love to have a MetaPython Open Space if anyone's interested.)
So what's all the excitement about? MetaPython introduces some hooks to allow you to modify module code just before it is seen by the Python interpreter (at what I'm calling "import time"). The import-time syntax is pretty simple, and is (almost) all denoted by a question mark ? prefix (question marks are currently syntax errors in regular Python). Here is a trivial example that defines an import-time function (which as we will see can be used as a macro) that will conditionally remove a function call (and the evaluation of its associated arguments). Suppose the following text is saved in a file "cremove.mpy":
def cremove(debug, expr):
debug = eval(str(debug))
defcode empty_result:
pass
if debug:
result = expr
else:
result = empty_result
return result
The idea here is that cremove will be called with two metapython.Code values, debug and expr. cremove will convert debug to its Python code representation by calling str() and then evaluate the result. If debug is true, then expr will be returned. Otherwise a pass statement will be returned (defined using the MetaPython import-time construct defcode which defines a code template). To actually call cremove as a macro, we will need to define another MetaPython module, say "test_cremove.mpy":
import logging
logging.basicConfig(level=logging.DEBUG)
log = logging.getLogger(__name__)
?from cremove import cremove
def do_test():
?cremove(True, log.debug('This statement will be logged'))
?cremove(False, log.debug('This statement will not be logged'))
Here, we do an import-time import (the ?from... import business). This makes the module we're importing available at import-time (a regular import would be seen just as another line of Python code at import-time). To actually call cremove as a macro, we just need to prefix it with the "?" as shown.
Now, to actually test this, we'll need to install MetaPython and fire up an interpreter. MetaPython is available from the CheeseShop, so to get it just run easy_install MetaPython. Once it's installed, we can test our MetaPython code as follows:
>>> import metapython; metapython.install_import_hook()
>>> import test_cremove
>>> test_cremove.do_test()
DEBUG:test_cremove:This statement will be logged
Since macro expansion can get pretty complex and it's always tricky debugging code you've never seen, the fully-expanded module is available as the __expanded__ attribute of the MetaPython module:
>>> print test_cremove.__expanded__
import logging
logging .basicConfig (level =logging .DEBUG )
log =logging .getLogger (__name__ )
def do_test ():
log .debug ('This statement will be logged')
pass
There's a lot more to MetaPython, but hopefully this has whetted your appetite. There's a short tutorial available that shows how you can implement the collections.namedtuple class factory using a macro. It also shows how you can use Jinja2 syntax along with the defcode construct to dynamically produce Python code. Have fun, and let me know what you think!
Posted by
Rick Copeland
at
9:11 AM
13
comments
Links to this post
Labels: lisp, macros, programming, python
Thursday, March 12, 2009
Python Macros?
I've been thinking a bit about macros and what use they might be in Python. Basically, I was contemplating writing an import hook that would allow you to use code quoting and unquoting and stuff for your Python modules. My motive was just that Lisp people seem to rave about how awesome macros are all the time, so I figured they must be cool.
As I sat down to actually start figuring out what macro definitions and uses should look like in Python, I thought, hey, I'll just throw together a use case. But I haven't been able to come up with one (yet).
Most of the examples I found on the web focused on "hey, you can implement a 'while' loop with macros in Lisp!" or "hey, look at all the cool stuff the 'setf' macro can do!" So I started to wonder whether maybe Lisp people love macros because it allows them to extend Lisp's minimalist syntax with new constructs (like object-oriented programming with CLOS, while loops, etc.) Python, OTOH, has pretty rich syntax. It has a nice OOP system with syntactic support, while and for loops, generators, iterators, context managers, primitive coroutines, comprehensions, destructuring bind,.... -- What would I use macros for? (OK, depending on the syntax, I could add a "switch" statement, but that hardly seems worth the trouble.)
I should mention that I also saw some examples of people using macros for performance; you basically get rid of a function call and you can potentially make the inner loop of some critical function run really fast. But if that's all it buys me in Python-land (well, that and a switch statement), my motivation is pretty low. Because let's face it -- if your critical inner loop is written in pure Python, you can pretty easily throw it at Cython and get better performance than Python macros could ever provide.
So here's the question: does anyone out there have an idea of what macros would add to Python's power or expressiveness? Or maybe some Lisp, Meta OCAML, or Template Haskell hackers who can enlighten me as to what macros can add to a language with already rich syntax?Update 2008-03-19
I have implemented MetaPython 0.1, a macro and code quoting system for Python, covered in the next blog post.
Posted by
Rick Copeland
at
10:17 AM
16
comments
Links to this post
Labels: lisp, macros, programming, python
Friday, August 22, 2008
Lazy Descriptors
Today I had a need to create a property on an object "lazily." The Python builtin property does a great job of this, but it calls the getter function every time you access the property. Here is how I ended up solving the problem:
First of all, I had (almost) the behavior I wanted by using the following pattern:
class Foo(object):
def __init__(self):
self._bar = None
@property
def bar(self):
if self._bar is None:
print 'Calculating self._bar'
self._bar = 42
return self._bar
There are a couple of problems with this, however. First of all, I'm polluting my object's namespace with a _bar attribute that I don't want. Secondly, I'm using this pattern all over my codebase, and it's quite an eyesore.
Both problems can be fixed by using a descriptor. Basically, a descriptor is an object with a __get__ method which is called when the descriptor is accessed as a property of a class. The descriptor I created is below:
class LazyProperty(object):
def __init__(self, func):
self._func = func
self.__name__ = func.__name__
self.__doc__ = func.__doc__
def __get__(self, obj, klass=None):
if obj is None: return None
result = obj.__dict__[self.__name__] = self._func(obj)
return result
The descriptor is designed to be used as a decorator, and will save the decorated function and its name. When the descriptor is accessed, it will calculate the value by calling the function and save the calculated value back to the object's dict. Saving back to the object's dict has the additional benefit of preventing the descriptor from being called the next time the property is accessed. So I can now use it in the class above:
class Foo(object):
@LazyProperty
def bar(self):
print 'Calculating self._bar'
return 42
So I get a nice lazily calculated property that doesn't recalculate bar every time it's accessed and doesn't bother with any memoization itself. What do you think about it? Is this a patten you use in your code?
Posted by
Rick Copeland
at
2:53 PM
11
comments
Links to this post
Labels: decorator, descriptor, programming, python
