Internet News South Africa

Running at Facebook scale? Open code is the way to go

The busiest websites on the internet used to use high-end enterprise technology from large systems integrators to cope with the volumes they experienced. The database systems, the load balancers that farmed out large request volumes to multiple machines, the intelligent software to keep it all running: all were enterprise-grade solutions from specialist vendors, but in the last eight to 10 years there's been a revolution at the high-end of the Web: today's busiest web sites use freely available software that anyone can download and install for themselves. And the companies that run them contribute fixes and improvements back to the community.
Running at Facebook scale? Open code is the way to go

"It may seem surprising that the social media sites we all use every day: Facebook, Twitter, Google and Instagram, to name a few, use mostly open-source software to serve up their incredible loads, but it's true," said LSD MD Sven Lesicnik.

"For instance, Twitter uses the freely available MySQL database to store most of its data. Facebook also uses MySQL and has created a branch in the code that it's called WebScaleSQL, which is specifically designed for very high-end applications. Additionally, Facebook uses Apache Hadoop, an open data-processing platform that can crunch large amounts of data very quickly compared with previous solutions. LinkedIn uses another Apache project - the Kafka messaging framework - as well as making all of its internal tools available for anyone to download and work on."

Lesicnik says that there are two main reasons that open-source software is underneath all of the websites we use from day to day.

Can't cope with today's volumes of data

"The first is that traditional database approaches simply can't cope with today's volumes of data. Facebook has roughly a billion users and a significant percentage of them are adding content, searching for content and browsing the site continually. That means the company needs a back-end that is designed to deal with these sorts of volumes and over which it has complete control. It's too expensive to build one from scratch, which is why it's chosen Hadoop.

"And, secondly, no one vendor will have the manpower to innovate faster than all of the high-end users put together. When Facebook improves MySQL, all users of MySQL benefit. When LinkedIn improves Kafka, Twitter, Tumblr, Netflix and Pinterest benefit as well. The speed at which these open software solutions have come to rule the very high end is simply amazing."

But he cautions that although open source gives a definite competitive advantage to the Facebooks and Twitters of the Web, it is not a panacea.

"These companies spend a great deal of time and money recruiting the very best skills to work on that code and manage their infrastructure. Companies looking to replicate this kind of advantage must be prepared to invest in the right skills or the right partner to assist them. Our role at LSD is to help our customers make well-informed open-source choices that best suit their business needs. However, the costs of enterprise open source are still far below the costs of running proprietary systems, even taking this into consideration."

Let's do Biz