Reliable Internal Site Analytics

In our quest to constantly improve user experience, we need to know how people are currently using our product. The “event page,” where people can purchase tickets to their favorite events, is a key component of Eventbrite’s website. But how do people get there? Do they click on a link from the homepage? Does a friend post a link on Facebook? Do they use search to find an event in their area?

Knowing how users interact with your page can tell you two important things: what they use and what they don’t. Continue reading

Writing a Composable JS/CSS build system

Building frontend assets is no longer just concatenation and minification of JS/CSS files. It now involves multiple pre-compilation steps, crawling through dependency graphs, and pulling together files for your apps from various repositories. We have adopted grunt (a node.js based task runner) and npm (node’s package manager) to manage our build and its dependencies.

After adding all these tools and architecting our system we stood back and saw a need to reorganize. Namely, we needed a way to build our components living in other repositories and keep our build system neat and tidy.

Continue reading

Setting the title of AirDrop shares under iOS 7

In iOS 7, Apple introduced AirDrop, a way of sharing files and links between nearby devices using a combination of Bluetooth and Wi-Fi. They also updated UIActivityViewController to allow sharing over AirDrop from your own native application as well.

If you’re sharing images, or simple web links, you’ll get the behavior for free just by compiling against the new iOS SDK, but if you want to share deep links into your native application you have to do a little more work. When we added AirDrop to version 3.0 of the Eventbrite iPhone app we ran into a problem. We wanted to be able to share events from one phone to another using our custom URI scheme. It’s very easy for the app to select an appropriate URL to share based on the sharing method, so that Twitter, for instance, will get an HTTP URL but sharing via AirDrop sends the custom Eventbrite URL. Unfortunately, that ends up not looking so pretty:

share_url

Continue reading

Replayable Pub/Sub Queues with Cassandra and ZooKeeper

When first playing around with Cassandra and discovering how fast it is at giving you columns for a row, it appears to be an excellent choice for implementing a distributed queue. However, in reality queues tend to bring out the worst of Cassandra’s thorniest areas: tombstones and consistency level, and are thus seen as an antipattern.

Row-Based vs Column-Based

To implement a queue in Cassandra, you must choose from either row-based or column-based.  In row-based, the item to be processed is stored as a row key. In column-based, the item to be processed is stored as a column in a specific row.

With the item to be processed stored as a row key, consistency becomes a bottleneck. Since the items to process are unknown, getting range slices across row keys is the only way to fetch data; this operation ends up querying every node when all keys are needed, as the location and number of keys are unknown ahead of time. Since not all nodes are available at any given time, this is less than ideal.

Continue reading

Tracking Method Calls During Testing

Our automated testing is broken into two broad areas: unit tests and integration tests. Unit tests are where we test the domain logic for our models, with few dependencies. The tests may hit a MySQL database to Django ORM related logic, but the test runner can’t access external services or things like Cassandra. (We’re using Django and the Django test runner, which creates test databases during setup. You may object that hitting the database means these aren’t “unit” tests. I agree. Nonetheless, we call them unit tests.) Our integration tests, on the other hand, are run against full builds of the site, and have access to all of the services that our site normally does. Continue reading

Whirlwind Week: OSCON and PyOhio

I spent last week attending two very different conferences. In both cases I was honored to have the opportunity to present the work I’ve been doing at Eventbrite. It was exciting to me that even though the conferences were different in just about every way — size, venue, focus, geography, cost — they were filled with people working on interesting technologies and ideas.

Continue reading

Optimizing a table with composite primary keys

To scale our data storage, Eventbrite’s strategy has been a combination of: move data to NoSQL solutions, aggressively move queries to slave databases, buy better database hardware, maintain different indexes on database slaves that receive different queries, and finally: design the most optimal tables possible for large and highly-utilized data-sets.

This is a story of optimizing a design for a single MySQL table to store multiple email-addresses per-user (needed by some forward-looking infrastructure we are building). We’ll discuss the Django implementation in a future post.

Multiple Email Address Table

To support multiple email-addresses per-user in MySQL, we need a one-to-many table. A typical access pattern is lookup by email-address, and a join to the users table.

Here is the basic design, followed by our improvements.

The Naïve Implementation

The basic design’s one-to-many table would have an auto-increment primary-key, a column for the email-address, and an index on the email-address. Lookups by email-address will pass through that index.

DROP TABLE IF EXISTS `user_emails`;
CREATE TABLE `user_emails` (
 `id` int NOT NULL AUTO_INCREMENT,
 `email_address` varchar(255) NOT NULL,
 … --other columns about the user
 `user_id` int, --foreign key to users
 KEY (`email_address`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Continue reading