Wednesday, January 18, 2012

Getting Started with MongoDB and Python

If you've been following this blog for a while, you've seen me mention MongoDB more than once. One exciting thing for me is that I'll be co-teaching a tutorial at PyCon this year on Python and MongoDB that will cover MongoDB, PyMongo, and Ming. So to hopefully whet your appetite for learning more at the tutorial, I thought I'd write a few posts covering MongoDB, PyMongo, and Ming from a beginner's perspective.

What is MongoDB?

From MongoDB.org:

MongoDB (from "humongous") is a scalable, high-performance, open source NoSQL database.

Well, that's not all that enlightening, so I'll expand a bit here on MongoDB's features...

MongoDB is a document database

MongoDB is a document database, which means that instead of storing "rows" in "tables" like you do in a relational database, you store "documents" in "collections." Documents are basically JSON objects (technically BSON. This is to be distinguished from other NoSQL-type databases such as key-value stores (e.g. Tokyo Cabinet), column family stores (e.g. Cassandra) or column stores (e.g. MonetDB).

MongoDB has a flexible query language

This is one thing that makes MongoDB a pleasure to work with, particularly if you come from another NoSQL database where querying is either restrictive (key-value stores which can only be queried by key) or cumbersome (something like CouchDB that requires you to write a map-reduce query). MongoDB has a BSON-based query language that's a bit more restrictive than SQL, that you can still use to get a lot done.

Here's an example of a simple MongoDB query that we use at SourceForge to find all the blog posts for a project:

blog_post.find({'state':'published','app_config_id':{'$in':app_config_ids}})

There are also several other operators like '$lt', '$nin', '$not', and '$or' that allow you to construct quite complex queries, though you are somewhat restricted from what you can do in SQL (even with a single table).

MongoDB is fast and scalable

A single MongoDB node is able to comfortably serve 1000s of requests per second on cheap hardware. When you need to scale beyond that, you can use either replication (keeping several copies of the data on different servers) or sharding (partitioning the data across servers). MongoDB even includes logic to automatically load-balance your shards as your database and load increase.

Getting Started with MongoDB

While MongoDB is fairly straightforward to install on (64-bit) systems, there are also a couple of companies that provide a free tier of MongoDB hosting, MongoLab and MongoHQ that are great for getting started. I've been using, for no particular reason, MongoLab for my own things and I can recommend them, and it's what I have experience with, so that's what I'll cover here.

Let's assume you sign up for a MongoLab account. Once you've done this, you can create a database using their web-based control panel and click on it, you'll note the connection info at the top of the page:

(Your server name and port number may be different.) At this point, most tutorials would tell you to install and launch the 'mongo' command-line tool to begin exploring your database. We'll skip that here and use the python driver PyMongo directly. I like to use virtualenv myself and ipython, so that's the approach I'll take here:

$ virtualenv mongo
... install messages ...
$ source mongo/bin/activate
(mongo) $ pip install pymongo ipython
... install messages ...
(mongo) $ ipython
... banner message ...

Now that we're in ipython, we'll go ahead and connect to the database and create a document.

In [1]: import pymongo

In [2]: conn = pymongo.Connection('mongodb://tutorial-test:u3ZYh136@ds029187.mongolab.com:29187/tutorial-test')

In [3]: db = conn['tutorial-test']

In [4]: db.test_collection.insert({})
Out[4]: ObjectId('4f16f5c7eb03306a92000000')

In [5]: db.test_collection.find()
Out[5]: <pymongo.cursor.Cursor at 0x7fbb9006f350>

In [6]: list(db.test_collection.find())
Out[6]: 
[{u'_id': ObjectId('4f16f5c7eb03306a92000000')}]

Well, that's it for now. I'll be posting several followup articles in this series that will go into more detail on how to do various queries and updates using PyMongo, the MongoDB python driver, as well as how to effectively use Ming, so stay tuned!

7 comments:

  1. FYI I deleted the user and DB I used in the post, so don't go trying any funny business ;-).

    ReplyDelete
  2. Warning to others

    blog_post.find({'state':'published','app_config_id':{'$in':app_config_ids}})

    I just wasted an hour to find this syntax does not work.

    I have a collection "articles" with a field "title" and one document with title
    "Hadoop Development Environment OS X"

    these work

    temp = articles.find({"title": "Hadoop Development Environment OS X"});

    temp = articles.find({"title":{"$in":["Hadoop Development Environment OS X"]}})

    This returns nothing.
    keys = ["Hadoop"]
    temp = articles.find({"title":{"$in":keys}})

    i.e all I can get back os a perfect match not a partial match.

    Either I am misunderstanding, but I worked form an internet example the author said works fine, or there is a problem with the driver.

    I am fairly confident I did not misunderstand the mongo docs.

    ReplyDelete
    Replies
    1. Thanks for the comment. I'm sorry if the example was confusing. The example I used was looking for blog articles with an *exact match* with one of the app_config_ids passed in.

      $in is always looking for an exact match in a list. If you want to find a partial match (such as a prefix), you need to use either $regex or a compiled python regular expression. For instance, if you're trying to find the articles starting with 'Hadoop', the query would be

      articles.find({'title': {'$regex': '^Hadoop.*'}})

      Again, sorry for the confusion. I'll try to be more explicit in the future. Thanks again for commenting.

      Delete
  3. Ah, I worked out the "perfect match" just after posting this. My fault for trying to do things in a rush. Then I spent a while looking for how to use regexes.

    that did the job and got me a bit further on.


    Many Thanks.

    ReplyDelete
  4. Cool rundown, thanks Rick! In case anyone who is learning MongoDB finds it useful, I just launched a free tool called querymongo.com that translates MySQL syntax into MongoDB syntax. Hope someone can use it to get up to speed faster!

    ReplyDelete
    Replies
    1. Cool tool, Bob. Thanks for the comment!

      Delete