Friday, May 25, 2012

GridFS: The MongoDB Filesystem

In some previous posts on mongodb and python and pymongo, I introduced the NoSQL database MongoDB and how you can use it from Python. This post goes beyond the basics of MongoDB and pymongo to give you a taste for MongoDB's take on filesystems, GridFS.

Why a filesystem?

If you've been doing MongoDB for a while, you may have heard about the 16 MB document size limit. When I started using MongoDB (around version 0.8), the limit was actually 4 MB. What this means is that everything is working just fine, your system is screaming fast, until you try to create a document that's 4.001 MB and boom nothing works any more. For us at SourceForge, what that meant was that we had to restructure our schema and use less embedding.

But what if it's not something that can be restructured? Maybe your site allows users to upload large attachments of unknown size. In such cases you probably can get away with using a Binary field type and crossing your fingers, but a better solution, in my opinion, is to actually store the contents your upload in a series of documents (let's call them "chunks") of limited size. Then you can tie them all together with another document that specifies all the file metadata.

GridFS to the rescue

Well, that's exactly what GridFS does, but it does it with a nice API with a few more bells and whistles than you'd probably build on your own. It's important to note that GridFS, implemented in all the MongoDB language drivers, is a convention and an api, not something that's provided natively by the server. As far as the server is concerned, it's all just collections and documents.

The GridFS schema

GridFS actually stores your files in two collections, named by default fs.files and fs.chunks, although you can change the fs to something else if you'd like. The fs.files collection is where reading or writing a file begins. A typical fs.files document looks like the following (credit):

{
  // unique ID for this file
  "_id" : <unspecified>,
  // size of the file in bytes
  "length" : data_number,
  // size of each of the chunks.  Default is 256k
  "chunkSize" : data_number,
  // date when object first stored
  "uploadDate" : data_date,
  // result of running the "filemd5" command on this file's chunks
  "md5" : data_string
}

The fs.chunks collection contains all the data for your files:

{
  // object id of the chunk in the _chunks collection
  "_id" : <unspecified>,
  // _id of the corresponding files collection entry
  "files_id" : <unspecified>,
  // chunks are numbered in order, starting with 0
  "n" : chunk_number,
  // the chunk's payload as a BSON binary type
  "data" : data_binary,
}

In the Python gridfs package (included with the pymongo driver), several other fields are inserted as well:

filename
This is the 'human' name for the file, which may be path-delimited to simulate directories.
contentType
This is the mime-type of the file
encoding
This is the unicode encoding used for text files.

You can also add in your own attributes to files. At SourceForge, we used things like project_id or forum_id to allow the same filename to be uploaded to multiple places on the site without worrying about namespace collisions. To keep your code future-proof, you should put any custom attributes inside an embedded metadata document, just in case the gridfs spec expands to incorporate more fields.

Using GridFS

So with all that out of the way, how to you actually use GridFS? It's actually pretty straightforward. The first thing you need is a reference to a GridFS filesystem:

>>> import pymongo
>>> import gridfs
>>> conn = pymongo.Connection()
>>> db = conn.gridfs_test
>>> fs = gridfs.GridFS(db)

Basic reading and writing

Once you have the filesystem, you can start putting stuff in it:

>>> with fs.new_file() as fp:
...     fp.write('This is my new file. It is teh awezum!')

Let's examine the underlying collections to see what actually happened:

>>> list(db.fs.files.find())
[{u'length': 38,
  u'_id': ObjectId('4fbfa7b9fb72f096bd000000'),
  u'uploadDate': datetime.datetime(2012, 5, 25, 15, 39, 37, 55000),
  u'md5': u'332de5ca08b73218a8777da69293576a',
  u'chunkSize': 262144}]
>>> list(db.fs.chunks.find())
[{u'files_id': ObjectId('4fbfa7b9fb72f096bd000000'),
  u'_id': ObjectId('4fbfa7b9fb72f096bd000001'),
  u'data': Binary('This is my new file. It is teh awezum!', 0),
  u'n': 0}]

You can see that there's really nothing surprising or mysterious happening there; it's just mapping the filesystem metaphor onto MongoDB documents. In this case, our file was small enough that it didn't need to be split into chunks. We can force split it by specifying a small chunkSize when creating the file:

>>> with fs.new_file(chunkSize=10) as fp:
...     fp.write('This is file number 2. It should be split into several chunks')
...
>>> fp
<gridfs.grid_file.GridIn object at 0x1010f5950>
>>> fp._id
ObjectId('4fbfa8ddfb72f0971c000000')
>>> list(db.fs.chunks.find(dict(files_id=fp._id)))
[{... u'data': Binary('This is fi', 0), u'n': 0},
 {... u'data': Binary('le number ', 0), u'n': 1},
 {... u'data': Binary('2. It shou', 0), u'n': 2},
 {... u'data': Binary('ld be spli', 0), u'n': 3},
 {... u'data': Binary('t into sev', 0), u'n': 4},
 {... u'data': Binary('eral chunk', 0), u'n': 5},
 {... u'data': Binary('s', 0), u'n': 6}]

Now, if we actually want to read the file as a file, we'll need to use the gridfs api:

>>> with fs.get(fp._id) as fp_read:
...     print fp_read.read()
...
This is file number 2. It should be split into several chunks

Treating it more like a filesystem

There are several other convenience methods bundled into the GridFS object to give more filesystem-like behavior. For instance, new_file() takes any number of keyword arguments that will get added onto the fs.files document being created:

>>> with fs.new_file(
...     filename='file.txt', 
...     content_type='text/plain', 
...     my_other_attribute=42) as fp:
...     fp.write('New file')
...
>>> fp
<gridfs.grid_file.GridIn object at 0x1010f59d0>
>>> db.fs.files.find_one(dict(_id=fp._id))
{u'contentType': u'text/plain',
 u'chunkSize': 262144,
 u'my_other_attribute': 42,
 u'filename': u'file.txt',
 u'length': 8,
 u'uploadDate': datetime.datetime(2012, 5, 25, 15, 53, 1, 973000),
 u'_id': ObjectId('4fbfaaddfb72f0971c000008'), u'md5':
 u'681e10aecbafd7dd385fa51798ca0fd6'}

Better would be to encapsulate my_other_attribute into the metadata property:

>>> with fs.new_file(
...     filename='file2.txt', 
...     content_type='text/plain', 
...     metadata=dict(my_other_attribute=42)) as fp:
...     fp.write('New file 2')
...
>>> db.fs.files.find_one(dict(_id=fp._id))
{u'contentType': u'text/plain',
 u'chunkSize': 262144,
 u'metadata': {u'my_other_attribute': 42},
 u'filename': u'file2.txt',
 u'length': 10,
 u'uploadDate': datetime.datetime(2012, 5, 25, 15, 54, 5, 67000),
 u'_id':ObjectId('4fbfab1dfb72f0971c00000a'),
 u'md5': u'9e4eea3dec28d8346b52f18086437ac7'}

We can also "overwrite" files by filename, but since GridFS actually indexes files by _id, it doesn't get rid of the old file, it just versions it:

>>> with fs.new_file(filename='file.txt', content_type='text/plain') as fp:
...     fp.write('Overwrite the so-called "New file"')
...

Now, if we want to retrieve the file by filename, we can use get_version or get_last_version:

>>> fs.get_last_version('file.txt').read()
'Overwrite the so-called "New file"'
>>> fs.get_version('file.txt', 0).read()
'New file'

Since we've been uploading files with a filename property, we can also list the files in gridfs:

>>> fs.list()
[u'file.txt', u'file2.txt']

We can also remove files, of course:

>>> fp = fs.get_last_version('file.txt')
>>> fs.delete(fp._id)
>>> fs.list()
[u'file.txt', u'file2.txt']
>>> fs.get_last_version('file.txt').read()
'New file'

Note that since only one version of "file.txt" was removed, we still have a file named "file.txt" in the filesystem.

Finally, gridfs also provides convenience methods for determining if a file exists and for quickly writing a short file into grifs:

>>> fs.exists(fp._id)
False
>>> fs.exists(filename='file.txt')
True
>>> fs.exists({'filename': 'file.txt'}) # equivalent to above
True
>>> fs.put('The quick brown fox', filename='typingtest.txt')
ObjectId('4fbfad74fb72f0971c00000e')
>>> fs.get_last_version('typingtest.txt').read()
'The quick brown fox'

So that's the whirlwind tour of GridFS. I'd love to hear how you're using GridFS in your project, or if you think it might be a good fit, so please drop me a line in the comments.

20 comments:

  1. Great intro to GridFS, thanks Rick. While I'd heard of GridFS, I'd never paid attention to what it was, how it worked, or how to use it. Your post explains it all very well - cool stuff!

    ReplyDelete
  2. Anonymous6:27 PM

    the truth is, gridfs is not production ready. You will have insane problems and it will reduce performance on your main collections by 70%.

    Either use 2 mongocluster just for gridfs or just dont :) and use file system.

    ReplyDelete
    Replies
    1. I'm not sure what exactly you were doing to cause the problems you describe, but your experience certainly doesn't square with mine working at SourceForge. There's no magic to GridFS; reading any file under 256k will cause two document fetches from MongoDB; up to 512k, 3 document fetches, etc. Remember that *all* the gridfs magic (well, except for md5 computation, IIRC) happens in the *client*.

      Perhaps if you're constantly writing to GridFS, you might cause problems, but that's due more to MongoDB's global read/write lock that I covered in a previous post.

      If you actually have any test cases that cause performance degradation due to gridfs usage, I'd be more than interested in seeing them....

      Delete
  3. After playing with GridFS a bit, people start to wonder how they can serve files to web browsers from it. I've written a Python server for this, at https://bitbucket.org/btubbs/khartoum/.

    ReplyDelete
    Replies
    1. Thanks for the comment. I'll definitely have to check out Khartoum. I should also mention nginx-gridfs, an nginx module with similar functionality. I haven't used it, but if you want to serve frequently-changing large files, it's probably worth checking out.

      Delete
  4. Shameless self-promotion: http://xm.x-infinity.com/2012/04/as-were-to-move-our-terabytes-of-files.html

    ReplyDelete
    Replies
    1. mod_gridfs looks interesting, thanks for pointing it out!

      Delete
  5. Anonymous9:24 AM

    what's the benefit of using this over using the server's filesystem? wouldn't an association be simpler and faster then trying to fit a file in to the db? unless ofcourse you want to copy the db and ship it off to another server, but then again there are many file sync options out there..
    pls enlighten me

    ReplyDelete
    Replies
    1. One benefit to using gridfs over the server's native filesystem is that gridfs will be available to your application servers automatically, without having to worry about setting up NFS. Another is that as you grow your MongoDB cluster, adding shards and replicas, the gridfs performance can scale as well. It's really a question (to me) of reducing the number of moving parts that can break.

      Already using MongoDB for the majority of your app's data and want to support multiple app servers, but some of your objects are too big to fit in a bson.Binary field? Gridfs is probably the shortest path to completion. Already have a filesystem shared between your app servers? Then that filesystem might be the best place to put stuff. Never intend to grow beyond the need for a single application server? Might as well use the filesystem.

      Delete
  6. How many files do you store? We currently store hundreds of millions of mostly small files on a single server. However, file systems don't handle this well.

    We're looking at moving to CouchDB or Mongo. We eventually want to be able to store billions of small files.

    ReplyDelete
    Replies
    1. Thanks for the comment, David!

      I don't have the exact data on the number of files stored on the SourceForge MongoDB gridfs, but my off-the-cuff response is that 100s of millions of files should be fine as long as whatever you're using to query is well-indexed. Whether you can get good performance out of such a system depends on your usage patterns, hardware, etc.

      If your files are guaranteed to be under 16MB and frequently smaller than say 256kB, you're probably better off using bson.Binary objects to store them inside documents. Assuming you're usually reading or writing an entire file at once, this will perform much better than GridFS. With the advent of MongoDB 2.2, you might also consider storing your files in a separate database on the main server to reduce the impact that GridFS has on your "normal" MongoDB performance.

      Hope that helps!

      Delete
    2. Thanks Rick. That was very helpful!

      Delete
  7. Hey Rick -
    Does gridFS require you to have an underlying nfs layer to provide storage, or does it store everything locally on each server? Are you aware of a limit in terms of the number of servers that can be a part of the same database with Mongo and gridFS? If we expand a network out to be 100 servers wide I think we would have issues with replica sets since mongo's limit is 12 members I thought?

    I guess I'm just thinking through practical applications for a large scale website.

    ReplyDelete
  8. This tutorial is very helpful. This really helped me getting started with GridFS. Thank a lot.


    btw can you let me know how did you get colors in python shell. Thanks.

    ReplyDelete
    Replies
    1. Thanks for the comment!

      I get the colors via Pygments (pygments.org)

      -Rick

      Delete
  9. Hi Rick, very informative post, thanks, I have a django app that processes a file and yields N segments (segments are encrypted) of the file each to be stored on a different server, each segment's size being from KBs to 70MBs. do you think using GridFS would be a wise choice for such application?

    ReplyDelete
  10. If you're thinking you'll be storing the file's segments as 'chunks' in gridfs, it would only really work well if all the segments were the same size (and it looks like that's not the case). You could still store the segments as independent files in gridfs, though.

    ReplyDelete
    Replies
    1. Segments are all the same size, what matters to me is the efficiency since the segments retrieved by django server from the GridFS servers are to be processed and compiled together to yield the real file.
      Is there any specific configuration needed on the servers that MongoDBs are installed to serve uploads/downloads?

      Delete
    2. If you're trying to ensure that MongoDB places each segment on a different shard, you can do that with shard tags (see https://docs.mongodb.org/manual/core/tag-aware-sharding/ https://docs.mongodb.org/manual/tutorial/administer-shard-tags/ and https://docs.mongodb.org/manual/core/tag-aware-sharding/). You could set up gridfs to store 'chunks' of the size of your segments (this is a client option). Then if you want to force every 'chunk #0' to shard 0, you would use the 'n' field in the chunks collection as the shard key and then tag shard 0 with 'chunk #0'. This is pretty fragile, however (since you have to tag each 'n' individually), and you might be better off writing a custom gridfs-like layer that included a shard_id in each chunk. Hopefuly this helps!

      Delete