In fact, when you come to choosing your uploading methodology, there are a lot of options you can go with. One of these options is saving your files as binary data into the database, MongoDB GridFS applies this pattern. It is a file system abstraction on top of MongoDB in which the uploaded file is divided into chunks during the uploading process and reassembled during retrieval.
Let’s represent how MongoDB GridFS works in simple steps:
- During the first file upload, a new bucket
fs(unless you specify its name) will be created (if not exist) and this bucket consists of two collections (
- A new index will be created (if not exist) in both collections for the sake of fast retrieval.
- The uploaded file will be divided into chunks (by default 255KB per chunk unless you specify the chunk size) and stored in the
fs.chunkscollection. And to track the uploaded file portions ordering, this collection contains a field
nthat is the portion order.
- A new metadata document will be created for the uploaded file in the
fs.filescollection containing its
- In the retrieval process, GridFS gets the file metadata from
fs.filescollection and uses this data to reassemble the file chunks from
fs.chunkscollection and return the file to the client as a stream or in memory.
You can go with MongoDB GridFS in these cases:
- If your file size exceeds 16MB (which is the default MongoDB document size limit).
- If you frequently want to access or update specific file portions without retrieving the entire file into memory.
- If your file system limits the number of files in a directory, you can use GridFS to store as many files as you need.
- If you want to track the metadata of your files. Which is provided as a built-in feature in GridFS.
- As your files are part of your database, then your files can benefit from MongoDB’s built-in replication, backup, and sharding) features instead of handling them manually in the file system.
- In fact, deleting files in GridFs is very easy as deleting an object in the database, in contrast to the file system, deleting is a bit more overwhelming.
In fact, there is no one-fits-all solution in the world and MongoDB GridFS is not an exception. So bare in mind these limitations:
- Continuously serving big files from the database as many chunks can indeed affect your working set (A 16MB file is retrieved as 65 chunks with 255KB for each) especially if you deal with gigabytes or terabytes of data.
- Serving a file from the database is a bit slower than serving it from the file system.
- GridFS doesn’t natively provide a way to update the entire file atomically). So if your system frequently updates the entire file, don’t use GridFS or use a workaround as discussed below.
These are some best practices when dealing with MongoDB GridFS which mitigate its limitations:
- To mitigate the working set consumption, you can serve your files from another MongoDB server dedicated to the GridFS storage.
- Also, for the working set consumption, you can increase the chunk size instead of 255KB.
- Regarding the atomic update, if your system tends to update the entire files frequently or access the files concurrently by many users, then you can use the versioning approach to track the files update. So based on your needs, you can retrieve only the latest version of the file and delete the other versions or consider them as the file’s history.
In this example, we will know how to upload, download and retrieve files from a bucket using GridFS.
I assume you are familiar with Node.js.
First of all, let’s create (if not exist) or retrieve our bucket:
Let’s upload a file using GridFS:
Bear in mind, that you can depend on the previous code to create your bucket during the first upload instead of the first step. But to guarantee the bucket creation after the database connection and having a reference to the bucket.
Let’s list our files metadata:
find method returns a FindCursor which you can iterate through to get your result. The
toArray promise replaces the cursor with an array.
To retrieve specific file metadata:
Finally, let’s download a file:
That’s it, you can find this code here in this repo.
At the end of the day, as we saw there is no one-size-fits-all solution, so choosing GridFS as your storage option is your decision and depends on your needs and your understanding of the pros and cons of the available options.
I hope you have found this useful. Thank you for reading! If you liked this article please rate and share it to spread the word, really, that encourages me a lot to create more content like this.
If you found this article useful, check out these articles as well:
- Locking-Based Isolation At SQL Server
- Do You Really Know, What Is Single Responsibility?
- Strategy vs State vs Template Design Patterns
Thanks a lot for staying with me up till this point. I hope you enjoy reading this article.