Eventbrite CTO and Co-Founder Renaud Visage spoke at API Days in December about why it’s critical to prioritize revisiting your API and to not be afraid to scrap the entire thing if necessary.
Welcome to Eventbrite's Engineering Blog
Building frontend assets is no longer just concatenation and minification of JS/CSS files. It now involves multiple pre-compilation steps, crawling through dependency graphs, and pulling together files for your apps from various repositories. We have adopted grunt (a node.js based task runner) and npm (node’s package manager) to manage our build and its dependencies.
After adding all these tools and architecting our system we stood back and saw a need to reorganize. Namely, we needed a way to build our components living in other repositories and keep our build system neat and tidy.
In iOS 7, Apple introduced AirDrop, a way of sharing files and links between nearby devices using a combination of Bluetooth and Wi-Fi. They also updated
UIActivityViewController to allow sharing over AirDrop from your own native application as well.
If you’re sharing images, or simple web links, you’ll get the behavior for free just by compiling against the new iOS SDK, but if you want to share deep links into your native application you have to do a little more work. When we added AirDrop to version 3.0 of the Eventbrite iPhone app we ran into a problem. We wanted to be able to share events from one phone to another using our custom URI scheme. It’s very easy for the app to select an appropriate URL to share based on the sharing method, so that Twitter, for instance, will get an HTTP URL but sharing via AirDrop sends the custom Eventbrite URL. Unfortunately, that ends up not looking so pretty:
Changing the IP address of a Cassandra node is a common maintenance operation. It is done when using offsite backups (via for example tablesnap ) to replace a failed node or when doing an in-place upgrade of a cassandra node’s hardware.
A frequent question on cassandra-user@ and #cassandra on freenode IRC is summarized as follows:
“X.Y.Z is the current ‘stable’ version, should I deploy it to production or should I deploy the ‘oldstable’ version?”
My stock answer to this question is to say that Cassandra history shows : Continue reading
When first playing around with Cassandra and discovering how fast it is at giving you columns for a row, it appears to be an excellent choice for implementing a distributed queue. However, in reality queues tend to bring out the worst of Cassandra’s thorniest areas: tombstones and consistency level, and are thus seen as an antipattern.
Row-Based vs Column-Based
To implement a queue in Cassandra, you must choose from either row-based or column-based. In row-based, the item to be processed is stored as a row key. In column-based, the item to be processed is stored as a column in a specific row.
With the item to be processed stored as a row key, consistency becomes a bottleneck. Since the items to process are unknown, getting range slices across row keys is the only way to fetch data; this operation ends up querying every node when all keys are needed, as the location and number of keys are unknown ahead of time. Since not all nodes are available at any given time, this is less than ideal.
Our automated testing is broken into two broad areas: unit tests and integration tests. Unit tests are where we test the domain logic for our models, with few dependencies. The tests may hit a MySQL database to Django ORM related logic, but the test runner can’t access external services or things like Cassandra. (We’re using Django and the Django test runner, which creates test databases during setup. You may object that hitting the database means these aren’t “unit” tests. I agree. Nonetheless, we call them unit tests.) Our integration tests, on the other hand, are run against full builds of the site, and have access to all of the services that our site normally does. Continue reading
I spent last week attending two very different conferences. In both cases I was honored to have the opportunity to present the work I’ve been doing at Eventbrite. It was exciting to me that even though the conferences were different in just about every way — size, venue, focus, geography, cost — they were filled with people working on interesting technologies and ideas.
To scale our data storage, Eventbrite’s strategy has been a combination of: move data to NoSQL solutions, aggressively move queries to slave databases, buy better database hardware, maintain different indexes on database slaves that receive different queries, and finally: design the most optimal tables possible for large and highly-utilized data-sets.
This is a story of optimizing a design for a single MySQL table to store multiple email-addresses per-user (needed by some forward-looking infrastructure we are building). We’ll discuss the Django implementation in a future post.
Multiple Email Address Table
To support multiple email-addresses per-user in MySQL, we need a one-to-many table. A typical access pattern is lookup by email-address, and a join to the users table.
Here is the basic design, followed by our improvements.
The Naïve Implementation
The basic design’s one-to-many table would have an auto-increment primary-key, a column for the email-address, and an index on the email-address. Lookups by email-address will pass through that index.
DROP TABLE IF EXISTS `user_emails`; CREATE TABLE `user_emails` ( `id` int NOT NULL AUTO_INCREMENT, `email_address` varchar(255) NOT NULL, … --other columns about the user `user_id` int, --foreign key to users KEY (`email_address`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
No one likes to break unit tests. You get all stressed about it, feel like you’ve let your peers down, and sometimes even have to get everyone donuts the next day. Our production Python codebase is complex, and the smallest changes can have an unexpectedly large impact; this is only complicated by the fact that Python is a dynamic language, making it hard to figure out what code touches what.
nose-knows, a plugin for the
nose unit test runner (and
py.test, experimentally). It traces your code while unit tests are running, and figures out which files have been touched by which tests. Now, running your full test suite with code tracing turned on is expensive, so we have a daily Jenkins job that does it and creates an output file. It can also do the converse, as it knows how to leverage this file to run specific tests.