Batch image uploading with Amazon S3 and RethinkDB
Many applications provide a way to upload images, offering users a convenient way to share photos and other rich content. Driven in part by the ubiquity of built-in cameras on smartphones, image uploading is practically expected in any application with social or messaging features. Fortunately, cloud-based hosting services like Amazon’s S3 can shoulder the burden of storing large amounts of user-generated content.
Of course, you can also use RethinkDB to store thumbnails, important image metadata, and application-specific details about your S3 uploads. In this tutorial, I’ll demonstrate how to handle batch image uploading with Amazon S3 and RethinkDB. The demo application puts full-sized images in an Amazon S3 bucket while using RethinkDB to store image metadata and small thumbnails. I’m also going to show you some useful techniques for building a good frontend image uploading experience on the web, featuring drag-and-drop support and a live progress bar.
Process multipart form data
I built my backend with Node.js and Express. In order to accommodate batch image uploads, I made my application support multipart form data. There are a number of third-party libraries that offer the requisite functionality, but I’m partial to multiparty.
The following code demonstrates how to use Express and multiparty to set up an API endpoint for file uploads:
var express = require("express");
var multiparty = require("multiparty");
var app = express();
app.listen(8095, function() {
console.log("Listening on port " + 8095);
});
app.post("/upload", function(req, res) {
new multiparty.Form().parse(req, function(err, fields, files) {
...
files.images.forEach(function(file) {
console.log(file.path, file.originalFilename);
});
...
});
});
The parse
method processes multipart form data attached to a request.
When the user invokes the parse
method with a callback, the framework
will automatically cache the uploaded files to disk for easy access. As
you will see later, the application will have to manually purge the
temporary files when they are no longer needed.
It’s worth noting that multiparty also provides an event-based API that exposes raw streams instead of relying on temporary files. Although the cache-based approach is simpler for this particular demo, you might want to consider a more idiomatic streaming model for other usage scenarios.
In the example above, the files
parameter of the parse
callback
contains an object with all of the file-bearing form fields. You can
iterate over the files and access each one. File objects have several
useful properties:
path
: the location where multiparty cached the uploaded file on the local filesystemoriginalFilename
: the name and extension that came with the original file uploaded by the user
Upload and resize images
Amazon provides a comprehensive AWS SDK for Node.js, which makes it easy to interact with services like S3. The SDK includes a convenience method for performing S3 uploads, with support for consuming Node.js streams. The following example shows how to upload files:
files.images.forEach(function(file) {
s3.upload({
Key: file.originalFilename,
Bucket: "rethinkdb-demo",
ACL:"public-read",
Body: fs.createReadStream(file.path)
}, function(err, output) {
console.log("Finished uploading:", output.Location);
});
});
To create image thumbnails, I used the resize
command from gm
,
Node.js bindings for the GraphicsMagick library. The
gm
library provides a number of image transformation commands that the
user can chain together in sequence. The gm
library also includes a
toBuffer
method that outputs the transformed image as a Node.js Buffer
object, suitable for insertion into the database.
Although a specific example is beyond the scope of this tutorial, it’s
worth noting that gm
offers some functions for metadata
extraction–which could be useful in cases where you wan to store
additional information about an image in a database record for later use.
RethinkDB’s Node.js client driver automatically treats Buffer
objects as
binary data, so there’s no need to explicitly use ReQL’s
r.binary
command. The following example shows how to resize a
file and generate a Buffer
as output:
gm(file.path).resize(100).toBuffer(function(err, buffer) {
// ... insert `buffer` into the database
});
Use Promises to control the flow of asynchronous operations
Uploading and resizing individual files is a fairly straightforward
undertaking, but now the asynchronous nature of Node.js makes it difficult
to put everything together. The application needs to upload the files,
generate thumbnails, and then perform a ReQL insert
query that
incorporates all of that output.
Fortunately, Promises provide a useful way to control the flow of
execution and aggregate the output of the asynchronous operations. I was
able to tame the beast by taking advantage of some of the advanced
features included in the bluebird
Promise library:
var express = require("express");
var shortid = require("shortid");
var bluebird = require("bluebird");
var multiparty = require("multiparty");
var r = require("rethinkdb");
var aws = require("aws-sdk");
var gm = require("gm");
var fs = require("fs");
// Configure the AWS SDK with my access credentials
aws.config.update({
accessKeyId: "XXXXXXXXXXXXXXXXXXXX",
secretAccessKey: "XXXXXXXXXXXXXXXXXXXX"
});
// Create a Promise-based wrapper around S3 APIS
var s3 = bluebird.promisifyAll(new aws.S3());
// Initialize Express application
var app = express();
app.listen(8095, function() {
console.log("Listening on port " + 8095);
});
// Serve static files on the "/public" route
app.use(express.static(__dirname + "/public"));
// Wrapper that adds Promise-based interface to the
// GraphicsMagick library's image resizing function.
// It outputs a Buffer, which will work with ReQL's r.binary
var resizeImg = bluebird.promisify(function(input, size, cb) {
gm(input).resize(size).toBuffer(function(err, buffer) {
if (err) cb(err); else cb(null, buffer);
});
});
// Handler for image upload POST requests
app.post("/upload", function(req, res) {
// Parse multipart form data included with the request
new multiparty.Form().parse(req, function(err, fields, files) {
// Iterate over files and return an array of Promises
// that will concurrently resize and upload the images
var operations = files.images.map(function(file) {
// Generate a short unique ID for each file
var id = shortid.generate();
// Return a Promise that incorporates concurrent
// image uploading and resizing, while also
// passing some useful values along the chain
return bluebird.join(id, file,
resizeImg(file.path, 100),
s3.uploadAsync({
Key: id + "_" + file.originalFilename,
Bucket: "rethinkdb-demos",
ACL:"public-read",
Body: fs.createReadStream(file.path)
}));
});
// Connect to RethinkDB and simultaneously perform
// the upload/resize operations
bluebird.join(r.connect(), bluebird.all(operations),
function(conn, images) {
// Iterate over the data returned by the upload/resize
// and replace that with a record that has only the
// properties we want to put in the database
var items = images.map(function(i) {
// Delete the cached temporary file
fs.unlink(i[1].path);
return {id: i[0], thumb: i[2],
url: i[3].Location, file: i[1].originalFilename};
});
// Insert the database records for the new images
// and close the DB connection when finished
return r.table("graphics").insert(items, {returnChanges: true})
("changes")("new_val").without("thumb").run(conn)
.finally(function() { conn.close(); });
})
.then(function(output) {
// Pass the new records (without the binary thumbnail)
// to the end user as JSON
console.log("Completed upload:", output);
res.json({success: true, images: output});
})
.error(function(e) {
// Handle any errors or failures
console.log("Failed to upload:", e);
res.status(400).json({success: false, err: e});
});
});
});
I used Bluebird’s promisify
feature to create Promise-based wrappers
around the desired gm
and S3 library functions. Next, I used a map
operation to iterate over all of the uploaded files, returning an array of
Promises that perform concurrent image uploading and resizing for each
item. When the application passes that array to bluebird.all
, I get a
Promise that waits for those operations to complete and then provides all
of the output. From there, I took the aggregated output and used it to
craft an array of records to insert into RethinkDB.
I took advantage of the returnChanges
option so that the ReQL insert
query can also retrieve the new records. The ReQL query strips the binary
thumbnail data from the output, returning the resulting JSON structure as
the response to the user’s HTTP POST request. The application returns the
JSON data in order to ensures that frontend will be able to display the
images when the upload is complete.
Serve images from RethinkDB with Express
Now that the full-sized images are in S3 and the corresponding thumbnail
is stored in a RethinkDB document, I want to present those on the
frontend. I uploaded the image to S3 with the public-read
permission,
which means that I can load it from a conventional URL that is hosted on
Amazon’s infrastructure.
Accessing the image in the database, however, requires a little bit more work. I created an Express URL route that dynamically fetches an image from the database and serves it to the user:
app.get("/thumb/:id", function(req, res) {
r.connect(config.db).then(function(conn) {
return r.table("graphics").get(req.params.id).run(conn)
.finally(function() { conn.close(); });
})
.then(function(output) {
if (!output) return res.status(404).json({err: "Not found"});
res.write(output.thumb);
res.end();
});
});
The above GET
request handler takes the ID provided in the URL path and
retrieves the corresponding RethinkDB document from the graphics
table.
If the document exists, the application will take the contents of its
thumb
property and serve the binary data directly to the user. This
approach makes it possible to display the thumbnail with a conventional
HTML img
tag that references the URL route of a thumbnail in its src
attribute.
Build a web frontend for batch uploads
I took advantage of several useful HTML5 features when I built the accompanying browser-based frontend for my batch image uploader. It uses native drag-and-drop, making it possible for the user to drag in files from their file manager or desktop. My frontend also uses a native progress bar element to display the status of the batch upload.
I used the following HTML markup to set up the form and the div
container that will receive file drop events:
<div id="dropsite">
<h1 id="instruction">Drop files here</h1>
<form id="upload" action="/upload" method="POST" enctype="multipart/form-data">
<input type="file" id="fileselect" name="images" multiple="multiple" />
</form>
<progress id="progress" max="100" value="0"></progress>
</div>
<button id="submit" onclick="uploadFiles()">Upload</button>
I attached drag-and-drop event handlers to the div
tag, programming it
to pass any dropped files to the file selection input
tag. The advantage
of this approach is that it gives users the option of using the
conventional file selection dialog as an alternative to drag-and-drop.
var dropsite = document.getElementById("dropsite");
dropsite.ondragover = function() { return false; };
dropsite.ondragend = function() { return false; };
dropsite.ondrop = function(ev) {
ev.stopPropagation(); ev.preventDefault();
document.getElementById("fileselect").files = ev.dataTransfer.files;
return false;
}
Instead of standard form submission, I programmed the page to perform the
image upload operation in the background with an XHR. The submit button
calls an uploadFiles
function that sets up the XHR and performs the
upload:
function uploadFiles() {
var req = new XMLHttpRequest();
req.onload = function() {
console.log(JSON.parse(req.response).images);
document.getElementById("progress").value = 0
};
req.upload.onprogress = function(ev) {
document.getElementById("progress").value =
(ev.loaded / ev.total) * 100;
};
req.open("POST", "/upload", true);
req.send(new FormData(document.getElementById("upload")));
}
The function instantiates a FormData
object and populates it with the
contents of the upload form, thereby attaching the files from the file
selection input to the request in proper multipart format. I attached a
callback to the upload.onprogress
event so that I can regularly update
the native progress bar throughout the upload process. It compares the
number of uploaded bytes against the number of total bytes in order to
compute the completion percentage.
When the upload completes, the server returns a JSON object with metadata about each image. You can use that metadata to append the new images to the page. In my demo, I accomplished that step with a simple handlebars template:
<script id="template" type="text/x-handlebars-template">
<div class="thumb">
<a href="{{url}}"><img src="/thumb/{{id}}"></a>
</div>
</script>
var template = Handlebars.compile(document.getElementById("template").innerHTML);
...
function addImages(items) {
for (var i in items)
document.getElementById("thumbs").innerHTML += template(items[i]);
}
To insert the new images into the page, I just take the JSON output of the
XHR and pass it to the addImages
function described above.
Next steps
Now you know how to add batch image uploads to your own RethinkDB application. In addition to the browser-based web frontend described in this article, you could also build your own native mobile frontends that rely on the same backend URL endpoints.
You can find the complete source code for this demo application on GitHub. Install RethinkDB and try it yourself today. You can also follow our ten-minute quickstart guide to learn more about RethinkDB.