Just a little Python: Schema Maintenance with Ming and MongoDB

Continuing on in my series on MongoDB and Python, this article introduces the Python MongoDB toolkit Ming and what it can do to simplify your MongoDB code and ease maintenance. If you're just getting started with MongoDB, you might want to read the previous articles in the series first:

And now that you're all caught up, let's jump right in with Ming....

Why Ming?

If you've come to MongoDB from the world of relational databases, you have probably been struck by just how easy everything is: no big object/relational mapper needed, no new query language to learn (well, maybe a little, but we'll gloss over that for now), everything is just Python dictionaries, and it's so, so fast! While this is all true to some extent, one of the big things you give up with MongoDB is structure.

MongoDB is sometimes referred to as a schema-free database. (This is not technically true; I find it more useful to think of MongoDB as having dynamically typed documents. The collection doesn't tell you anything about the type of documents it contains, but each individual document can be inspected.) While this can be nice, as it's easy to evolve your schema quickly in development, it's easy to get yourself in trouble the first time your application tries to query by a field that only exists in some of your documents.

The fact of the matter is that even if the database cares nothing about your schema, your application does, and if you play too fast and lose with document structure, it will come back to haunt you in the end. The main reason Ming was created at SourceForge was to deal with just this problem. We wanted a (thin) layer on top of pymongo that would do a couple of things for us:

Make sure that we don't put malformed data into the database
Try to 'fix' malformed data coming back from the database

So, without belaboring the point of its existence, let's jump into Ming.

Defining your schema

When using Ming, the first thing you need to do is to tell it what your documents look like. For this, Ming provides the collection function.

from datetime import datetime

from ming import collection, Field, Session
from ming import schema as S

session = Session()
MyDoc = collection(
    'user', session,
    Field('_id', S.ObjectId),
    Field('username', str),
    Field('created', datetime, if_missing=datetime.utcnow),
    ...)

There are a few of things to note above:

The MongoDB collection name is passed as the first argument to collection
The Session object is used to abstract away the pymongo connection. We will see how to configure it below.
Each field in our schema gets its own Field definition. Fields contain a name, a schema item (S.ObjectId, str, and datetime in this example), and optional arguments that affect the field.
The special if_missing keyword argument allows you to supply default arguments which will be 'filled in' by Ming. If you pass a function, as above, the function will be called to generate a default value.

Schema items bear a bit more explanation. Ming internally always works with objects from the ming.schema module, but it also provides shortcuts to ease schema definitions. The translation between shortcut and ming.schema.SchemaItem appears below:

shorthand	SchemaItem	Notes
`None`	`Anything`
`int`	`Int`
`str`	`String`	Unicode
`float`	`Float`
`bool`	`Bool`
`datetime`	`DateTime`
`[]`	`Array(Anything())`	Any valid array
`[int]`	`Array(Int())`
`{str:None}`	`Object({str:None})`	Any valid object
`{"a": int}`	`Object({"a": int})`	Embedded schema

Note above that we can create complex schemas using Ming. A blog post might have the following definition, for example:

BlogPost = collection(
   'blog.post', session,
   Field('_id', S.ObjectId),
   Field('posted', datetime, if_missing=datetime.utcnow),
   Field('title', str),
   Field('author', dict(
       username=str,
       display_name=str)),
   Field('text', str),
   Field('comments', [
       dict(
           author=dict(
               username=str,
               display_name=str),
           posted=S.DateTime(if_missing=datetime.utcnow),
           text=str) ]))

Note in the schema above that author is an embedded document, and comments is an embedded array of documents.

Indexing

If we expected to do a lot of queries on user.username, we could add an index simply by updating the code above to read:

    ...
    Field('username', str, index=True)
    ...

Creating the indexes in the schema like this has the nice property that Ming will ensure that those indexes exist the first time it touches the database. We can also set a unique index on a field by using the unique optional argument:

    ...
    Field('username', str, unique=True)
    ...

Ming also support specifying compound indexes by using the Index object in the collection definition. Suppose we wished to keep a separate list of users, scoped by client_id. In this case, the schema might look more like the following:

from datetime import datetime

from ming import collection, Field, Index, Session
from ming import schema as S

session = Session()
MyDoc = collection(
    'user', session,
    Field('_id', S.ObjectId),
    Field('client_id', S.ObjectId, if_missing=None),
    Field('username', str),
    Field('created', datetime, if_missing=datetime.utcnow),
    Index('client_id', 'username', unique=True),
    ...)

In the example above, the index would be created as follows:

db.user.ensure_index([('client_id', 1), ('username', 1)], unique=True)

By default, each key in an index created by Ming is sorted in ascending order. If you want to change this, you can explicitly specify the sort order for the index:

    ...
    Index(('client_id', -1), ('username', 1), unique=True)
    ...

Connection and configuration

Once we've defined our schema, we can use it by binding the session to the appropriate MongoDB database using ming.datastore:

from ming import datastore

session.bind = datastore.DataStore(
    'mongodb://localhost:27017', database='test')

More typically, we will create our session as a named session and bind it somewhere else in our application (perhaps in our startup script):

session = ming.Session.by_name('test)

...

ming.config.configure_from_nested_dict(dict(
    test=dict(
        master='mongodb://localhost:27017', 
        database='test')
    ))

By using named schemas, you can decouple your schema definition code from the actual configuration of your database connection. This is often useful when you will be reading connection information from a configuration file, for instance.

Querying and updating

To show how Ming supports querying and updating, let's go back to our simple User schema above:

from datetime import datetime

from ming import collection, Field, Index, Session
from ming import schema as S

session = Session()
MyDoc = collection(
    'user', session,
    Field('_id', S.ObjectId),
    Field('client_id', S.ObjectId, if_missing=None),
    Field('username', str),
    Field('created', datetime, if_missing=datetime.utcnow),
    Index('client_id', 'username', unique=True),
    ...)

Now let's insert some data:

>>> import pymongo
>>> conn = pymongo.Connection()
>>> db = conn.test
>>> db.user.insert([
...     dict(username='rick'),
...     dict(username='jenny'),
...     dict(username='mark')])
[ObjectId('4fd24c96fb72f08265000000'), 
 ObjectId('4fd24c96fb72f08265000001'), 
 ObjectId('4fd24c96fb72f08265000002')]

To get the data back out, we simply use the collection's manager property m:

>>> MyDoc.m.find().all()
[{'username': u'rick', 
  '_id': ObjectId('4fd24c96fb72f08265000000'), 
  'client_id': None, 
  'created': datetime.datetime(2012, 6, 8, 19, 8, 28, 522073)}, 
 {'username': u'jenny', 
  '_id': ObjectId('4fd24c96fb72f08265000001'), 
  'client_id': None, 
  'created': datetime.datetime(2012, 6, 8, 19, 8, 28, 522195)}, 
 {'username': u'mark', 
  '_id': ObjectId('4fd24c96fb72f08265000002'), 
  'client_id': None, 
  'created': datetime.datetime(2012, 6, 8, 19, 8, 28, 522315)}]

Notice how Ming has filled in the values we omitted when creating the user documents. In this case, it's actually filling them in as they are returned from the database. We can drop down to the pymongo layer to see this by using the m.collection property on MyDoc:

>>> list(MyDoc.m.collection.find())
[{u'username': u'rick', 
  u'_id': ObjectId('4fd24c96fb72f08265000000')}, 
 {u'username': u'jenny', 
  u'_id': ObjectId('4fd24c96fb72f08265000001')}, 
 {u'username': u'mark', 
  u'_id': ObjectId('4fd24c96fb72f08265000002')}]

Now let's remove the documents we created and create some using Ming:

>>> MyDoc.m.remove()
>>> 
>>> MyDoc(dict(username='rick')).m.insert()
>>> MyDoc(dict(username='jenny')).m.insert()
>>> MyDoc(dict(username='mark')).m.insert()
>>> 
>>> MyDoc.m.collection.find_one()
{u'username': u'rick', 
 u'_id': ObjectId('4fd24f95fb72f08265000003'), 
 u'client_id': None, 
 u'created': datetime.datetime(2012, 6, 8, 19, 16, 37, 565000)}

Note that when we created the documents using Ming, we see the default values stored in the database.

Another thing to note above is that when we inserted the new documents, we didn't have to specify the table. Ming documents are actually dict subclasses, but they "remember" where they came from. To update a document, all we need to do is to call .m.save() on the document:

>>> doc = MyDoc.m.get(username='rick')
>>> import bson
>>> doc.client_id=bson.ObjectId()
>>> doc.username
u'rick'
>>> doc.client_id
ObjectId('4fd250bdfb72f08265000006')
>>> doc.m.save()

If you'd prefer to use MongoDB's atomic updates, you can use the manager method update_partial:

>>> MyDoc.m.update_partial(
...     dict(username='rick'), 
...     {'$set': { 'client_id': None}})
{u'updatedExisting': True, u'connectionId': 232, 
 u'ok': 1.0, u'err': None, u'n': 1}

More to come

There's a lot more to Ming, which I'll cover in future articles, including data polymorphism, eager and lazy data migration, [gridfs][gridfs] support, and an object-document mapper providing object-relational type capabilities.

So what do you think? Is Ming something that you would use for your projects? Have you chosen one of the other MongoDB mappers? Please let me know in the comments below.

Other announcements

If you're looking for MongoDB and Python training classes, please sign up to hear about it when I start offering them, and to get a 25% discount on registration. And if you happen to be attending the SouthEast LinuxFest, I'd love it if you'd drop by my talk on building your first MongoDB application on Saturday morning at 11:30.

5 comments:

Big 40wt Svetlyak6:55 AM
Hi, everybody like to write his own object mapper for MongoDB. I wrote mine 3 years ago: https://github.com/svetlyak40wt/pymongo-bongo but currently it is abandoned, because pymongo is good enough.

I never used Ming, yet, but some of it's sintax looks ugly for me.

For example, why don't use `MyDoc.get` instead of `MyDoc.m.get`? And why not `MyDoc(username='rick').insert()` instead of `MyDoc(dict(username='rick')).m.insert()`?

Just a little Python

Friday, June 08, 2012

Schema Maintenance with Ming and MongoDB

Why Ming?

Defining your schema

Indexing

Connection and configuration

Querying and updating

More to come

Other announcements

5 comments:

Search

Useful Resources

Interested in practical MongoDB programming?

Want to learn MongoDB using Python?

Want more personalized training?

Pages

Rick's Resources

FeedBurner FeedCount

Email

Labels

Links

Blog Archive

Email

Popular Posts