Our Stack. Scalability Is Key!



We use Java and we like it! When I say this to some people, they cringe. They look at me like I'm a dinosaur who doesn't realize his extinction is coming. I usually smirk, I get it. These are folks that or so in love with Ruby/Rails or Python/Django that they forget it's just a tool for solving a certain kind of problem. I can understand why, they are both great tools! I built my personal site in Rails and it was a lot of fun. It's easy to iterate, and the community makes adding value very simple. The problem is you when you fall in love with your hammer, everything looks like a nail. In this post, I'll tell you some of the problems we faced, and why that made a pure 100% Rails app a bad choice for us. Then we will talk about how we are thinking about leveraging Ruby in our stack. But first, a little history is required.

Our Old Stack

Over three years ago our founders hired some consultants to build an MVP for their vision. They came back with PHP/Drupal and MySQL, a monolithic architecture. It was functional, allowing the founders to start growing the business. Customers were trying it out, and loving the service. The business had legs and traffic was steadily rising. Fast forward to last year, and our infrastructure was handling the flow, but it was creaking loudly. This architecture was fragile, and it didn't scale well in any direction.

Some of the issues we had with the codebase included:

  • Monolithic code base. How do you make changes to one part without releasing everything else? How do you this with 20 devs? 100?
  • How do you improve performance in this architecture? Caching will help some if your data is mostly static. Scaling horizontally at this level may not be not enough to increase throughput, it's too coarse and it requires logical separation to avoid DB concurrency issues.
  • Concerns weren't separated, so one component going down risks taking down the whole site. Ideally you have "swim lanes", basically silos where if any component in the tech stack goes down, only that silo is affected. The product detail page can go down without taking down checkout, the homepage can be unavailable but those on product pages don't notice.
  • How do you support multiple platforms? We don't want to write the same database logic in many different places, and the monolithic code base won't support apps for iphone, android, or ipad.
  • The Drupal Code was spaghetti code. There was SQL in the views and controllers, which meant even making simple view changes was complex and risky. Adding insult to injury, the test coverage was non-existent. This made iterating a dangerous task.

Making it Scale

We like to think of Scalability in terms of the AFK Scalability Cube[1]. There are three axes for scaling you application

  1. The Y - axis (Splitting architecture out by service or function)
  2. The X - axis (Creating N instances of a component, all replicas. Excluding 1 designated for writes. Fronted by a load balancer)
  3. The Z - axis (Splitting resources by user characteristics. i.e The West coast on 1 pool, the East on another)

The further along these axes you get, the better your application scales. We know we needed a highly availability and concurrency, so we needed to scale in multiple directions.

Here is how we made this happen in practice, some of which we are still iterating on:

  • (Y axis) We started dismantling the monolithic beast and migrating to a Service Oriented Architecture (SOA), based on Java backend web services.
    • RESTful web services delivering JSON
    • The concerns are well separated, a service has a specific job and does that job well
    • Easier for development teams to own a full stack of components. This allows us to create small teams which act on goals without spelling out direction, fostering innovation.
    • Allows for more frequent releases
    • Creates disaster isolation: if only that service goes down, the rest of the site should remain available where you don't have cross swim lane dependencies.
    • Why Java? It is easy to test, easier to scale, and more powerful than most high-level languages.
    • (X axis) We have pools of service instances fronted by load balancers.
      • For a service that is slower or more popular, throw more instances in the pool
      • Since the services themselves are smaller than one big app, you can better leverage hardware resources
      • (Y axis) The view is thin, and renders whatever the services provide.
        • Create different view layers for different platforms
        • Services don't care who the client is, they just respond to requests!
        • (All) Metrics gathering. You can't make it better if you can't measure it!

Ruby usage at RTR

When I think about SOA and scalability, I think about the JVM. It's stable, proven, and optimized. When you factor in the size of the community and tools available, it makes a great choice for writing your services. I couldn't see myself writing highly concurrent services in Ruby or Python, but maybe it's because I'm not as familiar with those languages. With that said, I haven't heard of many companies with similar performance requirements going down that route either. Going with Java comes with some overhead. It's verbose, and isn't as convention driven. This means it takes devs longer to write/read code and understand APIs. It also means you need strong leadership pushing solid practices, otherwise you can end up with multiple approaches. This is actually where I think Rails excels, convention over configuration speeds up development. So if you don't require high scalability or have a big dev team, Rails might be the way to go. It's definitely faster time to market, and a better tool for building an MVP.

As we continue to dismantle our monolithic beast and move away from PHP, we want something light and simple that we can control. For that reason, we are moving to Sinatra/Ruby. Stay tuned for more info on how that goes...


[1] AKF Cube

Dropwizard and Quartz Part 1: Scheduled Java Jobs In A Nice Package

ColinM_1-thumbIt's very common to take simple,  regular, or automated tasks, cook them into your favorite executable and add another entry to crontab, and off you go, because you have bigger fish to fry. And this is fine, sort of.. It will work well, and your job(s) may continue to run for quite some time without errors. But what happens when you have lots of  jobs? Jobs that run on multiple schedules? Jobs that must not run concurrently?  Jobs that rely on flaky 3rd party services that fail every now and then? How are you going to manage and keep track of that easily? Over time it can become a nasty mess. The problem is that you have out-grown CRON and need something a little more sophisticated. There are a number of solutions out there: some are free, others can be expensive; some may be too simple, others too complex or cumbersome, and in the end  they may be decent but not exactly what you want. You don't want to reinvent the wheel with a grassroots solution, but you will need something that is flexible and malleable enough for your needs. If, and this may be a big IF, you are a Java shop, or have jobs/tasks that are platform/language agnostic, Dropwizard + Quartz could be a great solution for you.

Quartz is a Java scheduler, that is in principle very similar to good-old CRON, but with a lot more bells and whistles. Dropwizard is a well thought-out web services framework which will provide an excellent wrapper for managing and keeping tabs on your scheduled jobs.

A little bit about Dropwizard

 Dropwizard is a Java framework for developing ops-friendly, high-performance, RESTful web services. Dropwizard pulls together stable, mature libraries from the Java ecosystem into a simple, light-weight package that lets you focus on getting things done.

That pretty much says it all. Using Jetty, Jersey, and Jackson, among other things, it makes building a web service very very simple, giving you a number of nice features out of the box like configuration and health checking which we will discuss briefly below.

A little bit about Quartz

Quartz is a full-featured, open source job scheduling service that can be integrated with, or used along side virtually any Java EE or Java SE application - from the smallest stand-alone application to the largest e-commerce system. Quartz can be used to create simple or complex schedules for executing tens, hundreds, or even tens-of-thousands of jobs; jobs whose tasks are defined as standard Java components that are programmed to fulfill the requirements of your application. The Quartz Scheduler includes many enterprise-class features, such as JTA transactions and clustering. 

Quartz does everything CRON can do and much much more. Jobs can be stateful or stateless, monitored throughout every step of their life-cycle, and it comes with all of the Java error and event handling goodness you need. There are 3 main components to Quartz: the scheduler (of which you can have many, be we'll be using one), triggers, and jobs. As mentioned above Quartz has a clustered mode of operation where a schedules and jobs can be shared among multiple instances, but we haven't played around with that (yet!).

What we want out of this

The goal here is to have a centralized system that:

  • Runs our jobs exactly when and as often we want (Flexible)
  • Handle temporary failures (Robust)
  • Sends notifications of critical/permanent failures (Reliable)
  • Handles complex jobs using (almost) any 3rd party Java lib or service (Useful)
  • Allows non-tech personnel to see what's going on (User Friendly)
  • Testable and maintainable (Quality)

To do this we use some of Dropwizard's and Quartz's awesomesauce:

  • Create a managed instance of a Quartz scheduler for graceful start-up, shutdown, etc
  • Use a Dropwizard Health check to watch our Quartz Scheduler
  • Quartz Scheduler and  Job listeners
    • To track the current state of the system
    • Handle errors
    • Retry jobs that failed due to temporary issues (locked resources, timeouts, etc)
  • Dropwizard Configuration to set up our Quartz Scheduler
  • Quartz Job XML files to instantiate our jobs and triggers
  • Add web resources that interact with out managed Quartz Scheduler.

Dropwizard + Quartz: The Nitty Gritty

The code samples below provide a skeleton to get Quartz up and running within a Dropwizard web service. In this post we will be breaking down the most basic parts we need to give us a simple foundation to build upon in later posts. The first step is to create your main Dropwizard Service class that kicks everything off.


Our Dropwizard managed Quartz class is responsible for starting, stopping and checking in on our Scheduler and its jobs.


Configuring Dropwizard & Quartz

Dropwizard has a straightforward  configuration mechanism that uses YAML or JSON configuration files, making it easy to set environment and initialization parameters. We will be making use of this to set our Quartz Scheduler properties. This could be done in a separate file, but it in most cases it is better to keep all of your environment settings in one place.


Dropwizard YAML Confiuration for Quartz

The YAML configuration is pulled  in when the service is kick off using a simple command line argument. For example:

java -jar myDropwizardQuartzService.jar server production.yml


Dropwizard - Quartz Healthcheck

Dropwizard has a simple metrics and health check system that makes keeping tabs on your services or service features very straight forward. As our managed Quartz Manager / error handling gets more complex, its state/health can be completely encapsulated such that the health check doesn't need to be altered.


Creating Jobs Through XML

In this example we are instantiating/scheduling our jobs by listing XML files that described the jobs themselves, any data we want to pass in, and their triggers. It is possible to have multiple XML files, as seen in the configuration example above. Each XML file can contain multiple jobs and triggers.


For more about Quartz jobs and triggers take a look at the examples and tutorials.

In later posts we will cover and go into some more detail on the following topics:

  • More about  jobs, passing data, using  job and scheduler contexts
  • Scheduler and Job listeners
  • Handling and Retrying a job when it fails
  • Web resources and interface for the Quartz Service

Stay tuned!

Unit Testing With the BDD Format

CharlesH_1-thumbUse the “Given/When/Then” template for unit tests for extra conceptual clarity. Modern software development emphasizes collaboration with a Customer, who is a representative for the final users of the software, and development in conjunction with automated unit tests, which ensure that the software meets the specifications intended by the customer.

Behavior Driven Development is a process that provides methods for clear communication between developers and customers. Its most elegant technique is a template for writing unit tests that bridges this gap, and makes unit tests readable by even customers, bringing them deeper into the development loop and reducing miscommunication. This format is sometimes called the “Given/When/Then” style.

The format

Unit tests should have descriptive names that explain what they do and distinguish them from other scenarios.

The assumptions, written as code that sets up preconditions for the scenario, will be first in the test. The first precondition should be described by a phrase beginning with the word “Given”, and any following preconditions should be appended with the word “and”.

The invocation under test should be described by a phrase beginning with the word “When”. There should only be one of these, since the unit test should only be testing one single operation.

The first result being tested for should be described by a phrase beginning with the word “Then”, and any following results should be explained with phrases starting with the word “and”.


Here are two examples of a scenario and its equivalent unit test in Java + Junit.

Example 1

Given an admin user when user requests secret data then return the data [gist id=2869367]

Example 2

Given an admin user and it's after office hours when user requests top secret data then return a message saying that we're closed for the day [gist id=2869548]


This technique is ideal for pure unit tests, having only one operation under test. It is not applicable for the more comprehensive tests that simulate a conversation between a client and server, testing multiple operations.