Continuing on in my series on MongoDB and Python, this article introduces the Python MongoDB toolkit Ming and what it can do to simplify your MongoDB code and ease maintenance. If you're just getting started with MongoDB, you might want to read the previous articles in the series first:
- Getting Started with MongoDB and Python
- Moving Along With PyMongo
- GridFS: The MongoDB Filesystem
- Aggregation in MongoDB (Part 1)
- MongoDB's New Aggregation Framework
And now that you're all caught up, let's jump right in with Ming....
Why Ming?
If you've come to MongoDB from the world of relational databases, you have probably been struck by just how easy everything is: no big object/relational mapper needed, no new query language to learn (well, maybe a little, but we'll gloss over that for now), everything is just Python dictionaries, and it's so, so fast! While this is all true to some extent, one of the big things you give up with MongoDB is structure.
MongoDB is sometimes referred to as a schema-free database. (This is not technically true; I find it more useful to think of MongoDB as having dynamically typed documents. The collection doesn't tell you anything about the type of documents it contains, but each individual document can be inspected.) While this can be nice, as it's easy to evolve your schema quickly in development, it's easy to get yourself in trouble the first time your application tries to query by a field that only exists in some of your documents.
The fact of the matter is that even if the database cares nothing about your schema, your application does, and if you play too fast and lose with document structure, it will come back to haunt you in the end. The main reason Ming was created at SourceForge was to deal with just this problem. We wanted a (thin) layer on top of pymongo that would do a couple of things for us:
- Make sure that we don't put malformed data into the database
- Try to 'fix' malformed data coming back from the database
So, without belaboring the point of its existence, let's jump into Ming.
Defining your schema
When using Ming, the first thing you need to do is to tell it what your documents
look like. For this, Ming provides the collection
function.
from datetime import datetime from ming import collection, Field, Session from ming import schema as S session = Session() MyDoc = collection( 'user', session, Field('_id', S.ObjectId), Field('username', str), Field('created', datetime, if_missing=datetime.utcnow), ...)
There are a few of things to note above:
- The MongoDB collection name is passed as the first argument to
collection
- The
Session
object is used to abstract away thepymongo
connection. We will see how to configure it below. - Each field in our schema gets its own
Field
definition.Field
s contain a name, a schema item (S.ObjectId
,str
, anddatetime
in this example), and optional arguments that affect the field. - The special
if_missing
keyword argument allows you to supply default arguments which will be 'filled in' by Ming. If you pass a function, as above, the function will be called to generate a default value.
Schema items bear a bit more explanation. Ming internally always works with
objects from the ming.schema
module, but it also provides shortcuts to ease
schema definitions. The translation between shortcut and ming.schema.SchemaItem
appears below:
shorthand | SchemaItem | Notes |
---|---|---|
None |
Anything |
|
int |
Int |
|
str |
String |
Unicode |
float |
Float |
|
bool |
Bool |
|
datetime |
DateTime |
|
[] |
Array(Anything()) |
Any valid array |
[int] |
Array(Int()) |
|
{str:None} |
Object({str:None}) |
Any valid object |
{"a": int} |
Object({"a": int}) |
Embedded schema |
Note above that we can create complex schemas using Ming. A blog post might have the following definition, for example:
BlogPost = collection( 'blog.post', session, Field('_id', S.ObjectId), Field('posted', datetime, if_missing=datetime.utcnow), Field('title', str), Field('author', dict( username=str, display_name=str)), Field('text', str), Field('comments', [ dict( author=dict( username=str, display_name=str), posted=S.DateTime(if_missing=datetime.utcnow), text=str) ]))
Note in the schema above that author
is an embedded document, and comments
is
an embedded array of documents.
Indexing
If we expected to do a lot of queries on user.username
, we could add an index
simply by updating the code above to read:
... Field('username', str, index=True) ...
Creating the indexes in the schema like this has the nice property that Ming will
ensure that those indexes exist the first time it touches the database. We can
also set a unique index on a field by using the unique
optional argument:
... Field('username', str, unique=True) ...
Ming also support specifying compound indexes by using the Index
object in the
collection
definition. Suppose we wished to keep a separate list of users,
scoped by client_id
. In this case, the schema might look more like the
following:
from datetime import datetime from ming import collection, Field, Index, Session from ming import schema as S session = Session() MyDoc = collection( 'user', session, Field('_id', S.ObjectId), Field('client_id', S.ObjectId, if_missing=None), Field('username', str), Field('created', datetime, if_missing=datetime.utcnow), Index('client_id', 'username', unique=True), ...)
In the example above, the index would be created as follows:
db.user.ensure_index([('client_id', 1), ('username', 1)], unique=True)
By default, each key in an index created by Ming is sorted in ascending order. If you want to change this, you can explicitly specify the sort order for the index:
... Index(('client_id', -1), ('username', 1), unique=True) ...
Connection and configuration
Once we've defined our schema, we can use it by binding the session to the
appropriate MongoDB database using ming.datastore
:
from ming import datastore session.bind = datastore.DataStore( 'mongodb://localhost:27017', database='test')
More typically, we will create our session as a named session and bind it somewhere else in our application (perhaps in our startup script):
session = ming.Session.by_name('test) ... ming.config.configure_from_nested_dict(dict( test=dict( master='mongodb://localhost:27017', database='test') ))
By using named schemas, you can decouple your schema definition code from the actual configuration of your database connection. This is often useful when you will be reading connection information from a configuration file, for instance.
Querying and updating
To show how Ming supports querying and updating, let's go back to our simple User schema above:
from datetime import datetime from ming import collection, Field, Index, Session from ming import schema as S session = Session() MyDoc = collection( 'user', session, Field('_id', S.ObjectId), Field('client_id', S.ObjectId, if_missing=None), Field('username', str), Field('created', datetime, if_missing=datetime.utcnow), Index('client_id', 'username', unique=True), ...)
Now let's insert some data:
>>> import pymongo >>> conn = pymongo.Connection() >>> db = conn.test >>> db.user.insert([ ... dict(username='rick'), ... dict(username='jenny'), ... dict(username='mark')]) [ObjectId('4fd24c96fb72f08265000000'), ObjectId('4fd24c96fb72f08265000001'), ObjectId('4fd24c96fb72f08265000002')]
To get the data back out, we simply use the collection's manager property m
:
>>> MyDoc.m.find().all() [{'username': u'rick', '_id': ObjectId('4fd24c96fb72f08265000000'), 'client_id': None, 'created': datetime.datetime(2012, 6, 8, 19, 8, 28, 522073)}, {'username': u'jenny', '_id': ObjectId('4fd24c96fb72f08265000001'), 'client_id': None, 'created': datetime.datetime(2012, 6, 8, 19, 8, 28, 522195)}, {'username': u'mark', '_id': ObjectId('4fd24c96fb72f08265000002'), 'client_id': None, 'created': datetime.datetime(2012, 6, 8, 19, 8, 28, 522315)}]
Notice how Ming has filled in the values we omitted when creating the user
documents. In this case, it's actually filling them in as they are returned
from the database. We can drop down to the pymongo
layer to see this by using
the m.collection
property on MyDoc
:
>>> list(MyDoc.m.collection.find()) [{u'username': u'rick', u'_id': ObjectId('4fd24c96fb72f08265000000')}, {u'username': u'jenny', u'_id': ObjectId('4fd24c96fb72f08265000001')}, {u'username': u'mark', u'_id': ObjectId('4fd24c96fb72f08265000002')}]
Now let's remove the documents we created and create some using Ming:
>>> MyDoc.m.remove() >>> >>> MyDoc(dict(username='rick')).m.insert() >>> MyDoc(dict(username='jenny')).m.insert() >>> MyDoc(dict(username='mark')).m.insert() >>> >>> MyDoc.m.collection.find_one() {u'username': u'rick', u'_id': ObjectId('4fd24f95fb72f08265000003'), u'client_id': None, u'created': datetime.datetime(2012, 6, 8, 19, 16, 37, 565000)}
Note that when we created the documents using Ming, we see the default values stored in the database.
Another thing to note above is that when we inserted the new documents, we didn't
have to specify the table. Ming documents are actually dict
subclasses, but
they "remember" where they came from. To update a document, all we need to do is
to call .m.save()
on the document:
>>> doc = MyDoc.m.get(username='rick') >>> import bson >>> doc.client_id=bson.ObjectId() >>> doc.username u'rick' >>> doc.client_id ObjectId('4fd250bdfb72f08265000006') >>> doc.m.save()
If you'd prefer to use MongoDB's atomic updates, you can use the manager method
update_partial
:
>>> MyDoc.m.update_partial( ... dict(username='rick'), ... {'$set': { 'client_id': None}}) {u'updatedExisting': True, u'connectionId': 232, u'ok': 1.0, u'err': None, u'n': 1}
More to come
There's a lot more to Ming, which I'll cover in future articles, including data polymorphism, eager and lazy data migration, [gridfs][gridfs] support, and an object-document mapper providing object-relational type capabilities.
So what do you think? Is Ming something that you would use for your projects? Have you chosen one of the other MongoDB mappers? Please let me know in the comments below.
Other announcements
If you're looking for MongoDB and Python training classes, please sign up to hear about it when I start offering them, and to get a 25% discount on registration. And if you happen to be attending the SouthEast LinuxFest, I'd love it if you'd drop by my talk on building your first MongoDB application on Saturday morning at 11:30.
Hi, everybody like to write his own object mapper for MongoDB. I wrote mine 3 years ago: https://github.com/svetlyak40wt/pymongo-bongo but currently it is abandoned, because pymongo is good enough.
ReplyDeleteI never used Ming, yet, but some of it's sintax looks ugly for me.
For example, why don't use `MyDoc.get` instead of `MyDoc.m.get`? And why not `MyDoc(username='rick').insert()` instead of `MyDoc(dict(username='rick')).m.insert()`?
Agreed with big 40.
DeleteIn the world of MongoDB ODMs, Ming is the one with ugliest language.
We also use our own ODM based on plain pymongo + validations system
Thanks for the comments! I had intended to reply earlier, but apparently blogger ate my responses.
DeleteThe reason that you can't use .get() is because Ming collection objects are subclasses of dict, and I didn't want to hide the dict.get method.
Passing a dict into the constructor rather than keyword arguments was for the purpose of being able to pass other keyword arguments as well, although I agree that it's kind of ugly.
And I do disagree that pymongo is "good enough", as it doesn't provide any support for schema enforcement / documentation, but to each his own I suppose! There is also quite a bit more to Ming that I'll cover in future posts, going far beyond what pymongo provides, but I think that if Ming were nothing but schema validation it'd still be useful.
Thanks again for the comments!
Leandro: do you have some examples of what your ODM code looks like? Ming is constantly evolving and actively maintained, so it might be nice to add some syntactic sugar as we go.
DeleteThanks again!
Ming being inspired to SQLAlchemy has two layers, using the lower one brings the syntax showed inside the post.
DeleteWhile the foundation layer is the most flexible one you would probably more often end using the ODM layer which has declarative syntax and usage similar to SQLA ORM+Declarative.
You can give a look at the Ming ODM tutorial for a quick introduction to the ODM layer http://merciless.sourceforge.net/odm.html
The TurboGears2 ming support also provides a quick overview of the ODM layer: http://www.turbogears.org/2.1/docs/main/Ming.html