Monday, December 07, 2009

Ming 0.1 Released - Python Library for MongoDB

One of the things that's been nice about working with SourceForge for the last few months is the chance I get to work with new open source technology and contribute something back. Well, the first (in a long series, I hope) of projects we're putting out there as the result of recent work is a little library we wrote called Ming. It's been available over git for a while, but I just finished publishing our first "official" release, with a tutorial, over at http://merciless.sourceforge.net and http://pypi.python.org/pypi/Ming/0.1. Ming gives you a way of enforcing schemas on a MongoDB database in a Python application. We also throw in automatic, lazy schema migration for good measure.

If you haven't heard of it yet, MongoDB describes itself as on its website as follows:
MongoDB (from "humongous") is a scalable, high-performance, open source, schema-free, document-oriented database.

There are lots of trade-offs to consider when using a non-relational database like MongoDB. On the plus side, MongoDB is simple to develop for (no SQL), extremely fast, and models hierarchical relationships really nicely. On the downside, we give up relational integrity constraints and transactional behavior.

With all the flexibility provided by MongoDB, however, comes some danger. When the database is schema-free, you have to spend extra code making sure that the data you put in and the data you take out is valid for your application. That's where Ming comes in. A play off the character of Ming the Merciless who ruled the planet of Mongo with an iron fist in Flash Gordon, the Ming library provides a succinct way to specify the requirements your application has for the data it produces and consumes. Ming also supports lazy migration of documents across schema revisions. To give you a flavor for how easy this can be, here is a sample schema with migration from the tutorial:

from ming.datastore import DataStore
from ming import Session
from ming import Document, Field, schema

bind = DataStore('mongo://localhost:27017/tutorial')
session = Session(bind)

class OldWikiPage(Document):

class __mongometa__:
session = session
name = 'wiki_page'

_id = Field(schema.ObjectId)
title = Field(str)
text = Field(str, if_missing='')
metadata = Field(dict(
tags=[str],
categories=[str]))

class WikiPage(Document):

class __mongometa__:
session = session
name = 'wiki_page'
version_of = OldWikiPage
def migrate(data):
result = dict(
data,
tags=data['metadata']['tags'],
categories=data['metadata']['categories'],
version=1)
del result['metadata']
return result

_id = Field(schema.ObjectId)
version = Field(1)
title = Field(str)
text = Field(str, if_missing='')
tags = Field([str])
categories = Field([str])


Hopefully, that's enough to whet your appetite for now. Rather than duplicating the entire tutorial, I'll direct you to the docs at http://merciless.sourceforge.net for more information. Let me know what you think!

13 comments:

  1. Thanks to you and Mark and SF.net for getting this out there. The TurboGears team and I have been hard at work building a CMS based on this technology. It can be seen at:

    http://bitbucket.org/percious/c5t/

    cheers.
    -chris

    ReplyDelete
  2. This is a really cool looking project - thanks for releasing it.

    @Chris: c5t looks really great too

    @both: I'd like to add these projects to this page:
    http://api.mongodb.org/python/1.1.2%2B/tools.html

    If you want some control over what's there just fork and add your projects and I'll merge it back in.

    ReplyDelete
  3. FYI: there is another ming library dealing with flash SWF

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. Looks pretty interesting. Certainly way better than my homebrew mongo library. I'm currently using using formencode to convert/validate incoming user data, then saving to db (no schema enforcement at the db level).

    Is schema enforcement and migration help the primary goal of this project or will do you see it taking on other aspects as well (document relationships, enhanced querying)? I know its only 0.1, but just curious as to where it is headed :)

    I'll download it and try it out. May be able to replace my current solution without too much hassle.

    ReplyDelete
  6. @jdog - Thanks for the kind words! The initial release of Ming was focused on schema enforcement and migration. 0.2 (which you can get now from git) is focused on a higher-level "ORM-style" mapping. Right now, we have document relationships (read-only) implemented and will probably have those read-write before the 0.2 release. Let me know what you think of the library! And we're always happy to take contributions. :-)

    ReplyDelete
  7. @rick
    Great to hear the ORM mapping will be added. It looks like you've got a good deal of experience with ORMs too, so I'm sure it'll be nice.

    Let me ask one question here: is there a way I can vary the collection name at runtime for a given document? For example, based on the current subdomain of the http request, I need to vary the mongo collection where documents are stored/retrieved. (I realize this probably a rare requirement, so I may need to hack a solution.)

    It appears I could set __mongometa__.name to a custom descriptor, and have that descriptor return my collection name based on the current request. Recommendations?


    I'll send any other feedback your way. Thanks and keep up the great work!

    ReplyDelete
  8. @jdog
    I think the idea of setting __mongometa__.name to a custom descriptor makes a lot of sense. You might, in fact, want to use a regular object for __mongometa__ rather than the inner class I use in the docs; that might work a little better for your use case. Just make sure it has the attributes you need.

    ReplyDelete
  9. This comment has been removed by a blog administrator.

    ReplyDelete
  10. Rick,
    Me again! I've been using ming a bit now and really like it. But, I've run into something and I'm wondering if you have any recommendations.

    I'd like to programmatically get a list of all Fields that are 'reportable' in order to implement a reporting like feature.

    name = Field(str, reportable=True)

    In addition, nested document fields may or may not be reportable.

    So, it seems like one option would be to subclass SchemaItem (and DocumentMeta I suppose) and have it keep *args and **kwargs so they can be referenced later. That is, schema_item.kwargs.get('reportable')

    With this, I can iterate the schema and find the reportable fields (or any other 'extra' attributes). good/bad/other?

    ReplyDelete
  11. This comment has been removed by a blog administrator.

    ReplyDelete
  12. This comment has been removed by a blog administrator.

    ReplyDelete
  13. This comment has been removed by a blog administrator.

    ReplyDelete