Amazon AWS case study for Gazemetrix

May 16, 2013 at 10:51 am (Uncategorized)

Gazemetrix allows brands to find out when pictures of their products appear on social media. We process millions of images everyday using image processing algorithms and match these images against a set of logos. This is a compute and IO intensive process and we decided Amazon would be the best platform to build our infrastructure on. Amazon has also been very generous to provide us with credits from time to time to make sure that we as a young startup can focus on our customers and buiness properly.

We use the following Amazon AWS services:

  • EC2
  • S3
  • CloudWatch
  • Route 53
  • SES
  • SNS
  • Developer level paid support

We are a fan of spot instances of Amazon. They are a cheap way of doing large scale compute intensive work. We use the Amazon Auto-scaling API to spawn and terminate spot instances based on current CPU utilisation of the auto-scaling group. This roughly translates to “Scale up if more images are coming and scale down when lesser images are coming”. We get many more images during daytime in the US/Europe region and during that period our image processing cluster has the maximum number of instances spawned. Here is a graph to depict how the incoming image load varies and also our churn rate, and number of spot instances spawned at any given time:

Image

So as the above graph demonstrates, we use somewhere between 50 to 150 spot instances at any given time. Weekends are busier than weekdays since people like to take photographs when they are having fun.

We have five different auto-scaling groups. Each auto-scaling group takes care of one part of the image matching process. One group does its job and passes on the result to the next group via redis. We would have liked to use ElasticCache someday but we like the in-memory sorted sets that redis provides us with. The day ElasticCache provides similar data structures, we will probably transition to it.

We have written custom monitoring scripts which send data from the infrastructure to CloudWatch. We use SNS to get alerted when something important happens. For example, the load on the redis queues is something we need to keep in check, and we plot it and also send out alerts via email and SMS to people who need to take care of it immediately:

Image

We have, till date, processed 320 million images on Amazon:

Image

We use c1.xlarge machines for the image processing groups. We like them because they have good processing power yet decent amounts of RAM.

Apart from the image processing core, we also host a web server (www.gazemetrix.com), a mongo DB server and a redis server. All of these machines have pretty generous amounts of RAM. Both redis and mongo are very memory intensive applications. We use m2.2xlarge and m2.4xlarge machines for them.

We use S3 to store training logos to be used by the image processing core, Route 53 for our DNS requirement and SES for sending out alerts to our team and mail reports to our customers. We also use IAM to manage user logins, and follow MFA device policy to secure our logins. All of this is provided by Amazon. Here is a look at how good Amazon SES is:

Image

We played around for a while with some of the GPU based instances which Amazon provides. GPU based instances are great for image processing work, but we are yet to deploy it into production. We also spent some time playing around with the newly launched OpsWorks to do rapid deployments of the image processing AMIs and it looks pretty neat.

We have been particularly impressed and thankful with Amazon Developer level paid support plan. The responses from the team have been prompt, courteous and technically superior. Having spent around a year at my previous company working with the support team at RackSpace, I can say for sure Amazon is far ahead in this regard as well. We actually had a big problem with inflated network transfer bills for a couple of months. It took us some time to figure out that the bill was due to having our redis server in a different availability zone compared to the rest of image processing groups. The Amazon team not only helped us to diagnose the problem, but also reversed much of our network transfer charges in good faith. 

Would like to end the post saying a big thanks to Amazon for not only being an extremely professional and performing player in this field, but also looking out for younger startups like us.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: