Tuesday, June 19, 2012

Declarative Schemas for MongoDB in Python using Ming

Continuing on in my series on MongoDB and Python, this article continues to explore the Python MongoDB toolkit Ming, this time looking at an alternative, declarative syntax for schema definition. If you're just getting started with MongoDB, you might want to read the previous articles in the series first:

And now that you're all caught up, let's take a look at that declarative syntax....

A couple of commenters on my last post mentioned that they didn't like the Ming syntax, particularly the use of .m. to namespace the various query operators. This article, while it might not justify that decision, will hopefully explain it, along with offering you another option for defining your schema in Ming.

Defining your schema (reprise)

Suppose we are writing a blog app and wish to store the posts, along with comments, in an 'embedded' style in MongoDB. In this case, our documents might look something like the following:

{
  _id: 'ming-declarative',
  title: 'Ming Declarative Schema Definitions',
  posted: ISODateTime(...),
  author: { 
    name: 'Rick Copeland',
    twitter: '@rick446' },
  text: 'Continuing on in my series...',
  comments: [
    { author: { name: 'Anonymous', twitter: null },
      posted: ISODateTime(...),
      text: 'This is still ugly' },
    { author: { name: 'Alessandro', twitter: '@__amol__' },
      posted: ISODateTime(...),
      text: "No, I think it's much better" } ]
}

In this case, the 'plain' schema definition might look like the following:

from datetime import datetime

from ming import collection, Field, Session, Index

user = dict(name=str, twitter=str)

session = Session()
PostFunctional = collection(
    'blog.posts', session,
    Field('_id', str),
    Field('title', str),
    Field('posted', datetime, if_missing=datetime.utcnow),
    Field('author', user),
    Field('text', str),
    Field('comments', [ dict(
        author=user, 
        posted=datetime,
        text=str) ]),
    Index('author.name'),
    Index('comments.author.name'))

The declarative syntax allows you to rewrite the Post declaration above as a class definition:

from datetime import datetime

from ming.declarative import Document
from ming import collection, Field, Session

user = dict(name=str, twitter=str)

session = Session()

class PostDeclarative(Document):
    class __mongometa__:
        session=session
        name='blog.posts'
        indexes=['author.name', 'comments.author.name']
    _id=Field(str)
    title=Field(str)
    posted=Field(datetime, if_missing=datetime.utcnow)
    author=Field(user)
    text=Field(str)
    comments=Field([dict(
        author=user, 
        posted=datetime,
        text=str) ])

There are a couple of differences to note. The most obvious is that the first method is a function call and the second is a class declaration. It's not immediately obvious from the first example, but it actually defined a class as well:

>>> PostFunctional
<class 'ming.metadata.Document<blog.posts>'>
>>> PostDeclarative
<class '__main__.PostDeclarative'>

In fact, they're both dict subclasses:

>>> issubclass(PostDeclarative, dict)
True
>>> issubclass(PostFunctional, dict)
True

Furthermore, they have exactly the same methods and properties as each other. Both use the .m. manager property to scope their queries, and both have their own constructors that create a model object from a dict (though of course you can override the constructor in the declarative syntax).

Why two syntaxes?

So why would you want to use the declarative syntax? Primarily because it allows you to add new methods and properties off your model. For instance, we might want to have a post_comment method:

class PostDeclarative(Document):
    ...
    def post_comment(self, author, text, posted=None):
        if posted is None: posted = datetime.utcnow()
        comment = dict(author=author, text=text, posted=posted)
        self.comments.append(comment)
        PostDeclarative.m.update_partial(
            dict(_id=self._id),
            { '$push': {'comments': comment } })
   ...

So if you can do everything in the declarative style that you can in the functional style, and you can add methods and properties, why would you ever want to use the functional style? The answer, as another commenter pointed out in the last post, really lies in the object-document mapper (ODM) layer of Ming, which I'll cover in a future post. For now, suffice it to say that if you just want schema validation and nothing else on your model, the functional syntax may be just a bit more succinct than the declarative syntax.

And why use that ugly .m.?

So hopefully now the purpose of the .m. is a bit more clear. It's there to allow you as much freedom as possible in naming your methods and properties in the declarative syntax. Suppose you wanted to get a post by its _id. One commenter suggested that he'd rather write the following:

>>> # Note: this will *not* work in Ming!
>>> PostDeclarative.get('ming-declarative')

Where Ming requires him to write the following instead:

>>> # This *will* work (note the .m.)
>>> PostDeclarative.m.get('ming-declarative')

So what's the big deal? Let's look at what PostDeclarative.get actually is:

>>> PostDeclarative.get
<method 'get' of 'dict' objects>

Aha! So that's what's going on! Ming, by scoping its methods under the m property, allows us to use the full repetoire of dict methods without shadowing any of them. Furthermore, for class-level information such as collection name and index defintions, Ming forces you to put everything in an inner subclass titled __mongometa__.

The end result is that you have to be aware/worried about only two names, m and __mongometa__, when you're defining your own methods and properties, rather than being worried about a plethora of methods that Ming has added to your model automatically.

So maybe that doesn't necessarily justify the design of Ming in your mind, but hopefully it does explain it. And of course, Ming is still being actively developed both by SourceForge and non-SourceForge employees, and we're more than happy to hear feedback and accept pull requests. So what do you think? Do you like the declarative syntax better? Still need convincing that anything on top of pymongo is overkill? Let me know in the comments below!

3 comments:

  1. There are actually a few other attributes besides `m` that we might want to consider moving into the manager: from_bson, make, make_safe. I remembered make, since it tripped me up once and is documented at http://merciless.sourceforge.net/tour.html#using-ming-objects-to-represent-mongo-records. Found the rest via `set(dir(WikiPage))-set(dir(dict))`

    ReplyDelete
  2. Nico Poppelier7:15 AM

    In my first experiments with Ming I ran into two problems. The first one was a conflict with another Python package called Ming, which is used to generate SWF files. I installed this with apt-get on my Ubuntu-based system thinking I was installing the Ming discussed here.

    After uninstalling SWF-Ming, and installing the MongoDB-Ming I got an error on the line

    from pymongo.son import SON

    in session.py. When I replaced this with

    from bson.son import SON

    I could continue my first test. Is there some version dependency between Ming and the Python driver for MongoDB?

    ReplyDelete
    Replies
    1. Hi Nico,

      Thanks for the comment! I believe that the MongoDB-Ming package may be out of date -- I don't know who's responsible for doing that packaging work. The most up-to-date version of Ming is always available from sf.net/p/merciless

      -Rick

      Delete