Under the Hood
We have often been asked about the challenges we faced in scaling up our technology stack to manage big data. I have attempted to address this in this post which is the first of a series of blog posts on this and similar topics.
20/Twenty was created ground-up as the most intuitive and easy to use cloud based (SaaS) Social Intelligence platform in the world. Based on our deep understanding of what marketers needed and the awesome designs we created, we signed up our first client even before the product was officially launched. The pressure to quickly deliver the first version of the product was intense
From an engineering point of view, there’s a huge amount of data that we pull (Think Big Data!), process, augment and then visualize in the platform all on a near real-time basis. Imagine someone tweeting and it appears on our platform within a few seconds along with augmented information including Gender, Sentiment, Engagement, Spam score etc.
The evolution of 20/Twenty has already seen a few stages of growth. The graph below shows how 20/Twenty data has grown over the last 2 years since our product launch. This is a really cool growth for a startup like Circus Social both from a business perspective as well as from an engineering standpoint. We used several tricks from the books and a few practical hacks to ensure our ability to fetch, process, augment and visualize high volumes of data continued to become better, though this journey was not without pain!
We created over 200 custom marketing applications in our previous Avatar at Circus Social working with some of the biggest brands in the world. We used the same open source technologies (PHP / MySQL) to create the first version of 20/Twenty. This worked well and as our data grew in the first few months, we continued to grow vertically by adding more capacity (CPU/RAM).
Most of the queries from the application were read queries whereas a bulk of “write operations” were being performed by our data crawlers. We therefore created an efficient master-slave architecture where the application would read from the slaves and the crawler scripts would write into the master. This worked well in general but the exponential increase in the volume of data meant that certain queries were running extremely slow and impacting the user experience.
Since our data volume was growing exponentially and the relational aspects of the database were not the core of our application, we realized that sooner or later, we would have to move to a NoSQL database. However, the performance issues that were cropping up had to be sorted quickly and without a downtime. We quickly realized that we needed a dedicated search engine and MySQL was not good enough for this purpose.
We explored several options and Elasticsearch came to our rescue here. Elasticsearch is a distributed, RESTful search and analytics engine that centrally stores your data in a manner which can be retrieved / read really fast by your applications. Our awesome tech team deployed this in a matter of days. The improvement in performance was remarkable. The plan worked and we cheered!
Word spread in Singapore and Asia about how good our platform was (and our sales team did a good job too!) and we continued to sign up new clients. The volume of data continued to grow for existing clients as well as new clients. The tech stack of MySQL and ElasticSearch did not let us down but we wanted to create an architecture that would scale infinitely, if there’s a thing like that.
In Stage 3, we moved the core of our database from MySQL to Cassandra (Elasticsearch was now interacting with Cassandra) and the backend code from PHP to Node.js. We also migrated most of our front end code to Angular.js for better performance. This was a major architectural change on a live application being used by several clients so we created a parallel production like environment and ran it parallelly for several weeks to ensure everything was working as desired before switching over.
While we did the above, we continued to work on cool new features on the product and opened up our data API’s to a few clients who wanted a deeper integration with their own applications. Other tools we used during this and other stages were Postman, Github and JIRA.
As we scale further from here, we will probably have newer and more exciting technology challenges and we will keep posting about them. If you are excited to work on some of these, do connect with me on LinkedIn.
This article first appeared on the Circus Social Blog here