Heroku recently released the public beta of their new stack called Celadon Cedar. Check out their blog post for more information. There are a few minor adjustments to how their pricing model work. Free plans are not affected and it generally benefits the consumer. I am truly excited to see this because I’ve been using Foreman, a gem by David Dollar, which allows you to nicely organize, view and manage your processes. I’ll be doing an article on Foreman in the future. Heroku’s Cedar stack takes the same approach as Foreman, and you can (and should) actually use Foreman in development. If it works there, it probably works on Heroku’s new Cedar stack as well.
The newly mentioned capabilities of the Cedar platform however, are not the only thing I’m excited about. When someone told me about this stack prior to the release of the public beta, and showed me a
Procfile, the first thing I asked was:
“Can I ride a Unicorn?” - And his answer was: “Yes”
I’m sure most of you know, or at least have heard about Unicorn. If you’re a Ruby on Rails developer, you’ve probably seen the Unicorn gem in your Gemfile. But, for those that don’t know, or never tried it out before, I highly encourage you to check out this post sometime, by Chris Wanstrath over at Github.
What I’d like to point out right now for this article is that Unicorn has a “master” process, and has one or more “worker” processes. The “master” process itself doesn’t see any requests, but instead, it manages the worker processes, which are its child processes. They get forked from the master process. The kernel handles load balancing, and you only connect via a single unix socket or port using an upstream in, for example, NGINX. Unlike with for example Mongrel and Thin, where you upstream to multiple ports.
In any case, what’s important to understand for now is the fact that there’s a master process which loads in your application’s environment, and forks off child processes to serve your application.
Cedar and Unicorn - More concurrency per Heroku dyno
At this point you may already know where I’m going with this. Let’s create a basic Rails application.
rails new cedar-on-unicorn
Generate a simple controller which won’t interact with the database
rails generate controller home index
Open up your
Gemfile and ensure you have the following defined
gem 'rails', '3.0.7' gem 'unicorn'
And now comes the interesting part, create a unicorn config file in
worker_processes 4 # amount of unicorn workers to spin up timeout 30 # restarts workers that hang for 30 seconds
And a Procfile for Heroku’s new Cedar stack in
web: bundle exec unicorn -p $PORT -c ./config/unicorn.rb
Bundle it up
Initialize a new Git repository, commit it, and create your new Heroku Cedar Stack-based environment and push to it
git init git add . && git commit -m "concurrecy" heroku create --stack cedar git push heroku master
Now, visit http://myapp.heroku.com/home/index and you should be up and running.. with 4 unicorn worker instances, on a single Dyno! Prior to Celadon Cedar, you could only deploy Thin application servers, which don’t have this kind of process management. Thin doesn’t have a master process which forks children, so this wasn’t possible. If you wanted more concurrency / instances, you had to spin up more Heroku dynos.
But don’t over-do it
“Too much” is never a good idea. A dyno is essentially it’s own little isolated environment. You probably get your own dedicated amount of CPU cycles, and there is a limit to how much RAM you may consume (for obvious reasons), even if you can’t see your memory usage. I believe that there is a 300mb memory usage limit per dyno. Also, the more instances you try to cram in to a single VM, the worse your throughput gets when your VM starts to reach its resource limits, in which case you’re better off with less instances.
Yup. I’ve done some basic AB runs to see the difference in throughput. Now, I’m not going to show every detail of these results, but I think you’ll get the main picture.
All tests were performed on a page where there is no interaction with a database, because that’s not what I’m testing. The tests were performed by doing 1000 requests, at a concurrency level of 100. The tests have been performed multiple times and I took the average results of each test.
1 Unicorn Worker (Roughly equivalent to a single Thin instance)
Time taken for tests: 3.835 seconds Requests per second: 260.77 [#/sec]
2 Unicorn Workers
Time taken for tests: 1.689 seconds Requests per second: 592.00 [#/sec]
3 Unicorn Workers
Time taken for tests: 1.164 seconds Requests per second: 833.91 [#/sec]
4 Unicorn Workers
Time taken for tests: 0.979 seconds Requests per second: 918.90 [#/sec]
5 Unicorn Workers
Time taken for tests: 10.621 seconds Requests per second: 83.32 [#/sec]
See what happened? It seems that spawning 5 workers is “too much” and actually reduces throughput. 3-4 workers per dyno seems to be your best bet in this case. There were also times that spinning up 4 workers was too much. Notice how little 4 workers improves over 3 workers. This is clearly its limit. This is a bare-bones web application, if you actually wrote code, added gems and such, then 4 unicorn workers might be too much.
But! If having 3 Unicorn workers isn’t enough for you, go ahead and spin up 3 more:
heroku scale web=2
This will spin up a second dyno that’ll spin up 3 more Unicorn workers, bringing it up to a total of 6 Unicorn workers spread across 2 dynos! Need 30 workers?
heroku scale web=10
With this you have 30 Unicorn workers spread across 10 dynos, and so forth.
Imagine if you were using Sinatra instead of Rails, and your app consumes a lot less memory. You may even be able to spin up 7-8 Unicorn worker instances per dyno.
Before Heroku released the Celadon Cedar stack, this simply wasn’t possible. You’d have a dyno that should (in terms of memory) be able to spin up 3 or 4 instances, but you were only able to spin up a single Thin instance per dyno. With Thin, even after the introduction of Cedar, this is still the case. But now since the application server constraint no longer exists, you can use Unicorn to do it. With this, you should be able to improve your application’s throughput by a factor of 3 to 4 on the same dyno.