I've told about why we started to explore serverless in my previous post. And this one will be about what we did and how it went so far.
We used to have 9 servers (i.e., AWS EC2 instances) on multiple availability zones, each running a docker engine and all were interconnected with Rancher. In order to get most of our self-made infrastructure, we decided not to use other AWS services like RDS or SQS. We went with MySQL for database and RabbitMQ as queue. All servers were logging to Papertrail and alerts we configured there helped us to quickly notice when something went wrong. The biggest problem (cough price cough), I mean the second biggest problem we encountered was that something about docker engine was occasionally causing servers to freeze and require a reboot (or it was Rancher, not sure, I couldn't manage to neither understand nor fix this issue). Making sure that everything works OK was not very fun I must say, but it wasn't a big hassle either.
The serverless adventure started by the exploration. I spent some time trying to find proper equivalents of the components we were already using. Database was the easiest, RDS was the obvious answer, but we swapped to PostgreSQL when experienced some of MySQL's quirks while trying to generate reports for gazers. SQS was the RabbitMQ's counterpart but the idea of queueing wasn't the serverless' "thing". Why would you queue stuff and get to them one by one when you can get all of them once, right? That's where SNS came in: We created an SNS topic for each lambda function that handles a background task and set the function to be invoked by the SNS message. Once these were settled refactoring started. Most of the WebGazer is written in python. Realising all the buildup in go's popularity, I considered switching since we were already making changes. I spent some time with go but couldn't get a solid grasp on it. Also, it felt more secure to stay with python since I have been using it almost for 10 years. In order not to reinvent the wheel, I did some research about frameworks, too. Zappa and Serverless were prominent alternatives. I decided to go with Zappa at the beginning, since it was written in python. WebGazer was running with the help of Zappa when we were #2 Product of the Day on Product Hunt (shameless self-promotion ✌️). Recently we switched to Serverless when I realised we needed a more settled framework, which provides tooling for easier resource management and maintaining.
Serverless is really great. It helps both creating required instances, lambda functions, roles and log groups and gluing them together to get a good overall infrastructure. The saddest part was we had to leave Papertrail out for Clodwatch and since Cloudwatch's alerting mechanism isn't very "intuitive" I had to implement a custom solution for issue reporting. But not worrying about the servers' running out of RAM or CPU or not having to spin up more servers when we get a sign-up frenzy is really worth it.
As of today, we help people keep their websites running by gazing for more than 600000 times a day and notifying people instantly when something goes wrong. And so far, it is nice not to have to spend time for servers and maintenance but being able to concentrate on building.