Google’s mastermind behind SRE, Ben Treynor, still hasn’t published a single-sentence definition but describes site reliability as “what happens when a software engineer is tasked with what used to be called operations.
Site Reliability Engineer is the most promising jobs of 2019 Unveiled by LinkedIn based on LinkedIn data.
What is a Site Reliability Engineer?
What exactly is SRE?
Lets take a scenario from development environment.
There are typically two teams called developers team and operations teams. Developers design new features and try to add more value to products. On the other hand, the Operations team has to make sure that the newly added features don’t blow up things that are working fine without them. Operations teams try to make a few releases as possible while Developers try to continuously deliver as many features as they can. This causes a lot of debate over what can be launched and when.
The goal of SRE is to resolve the above-discussed issue. SREs are tasked with a dual purpose of developing new features while also ensuring that production systems ran smoothly. Many times they are compared to DevOps but here is a general difference, DevOps are more like a collaboration between Developers and Operations and focus more on deployments while SRE is more focussed on monitoring and operations.
What is an Average Salary?
The average salary for a Site Reliability Engineer is $134,762 per year in the United States. Salary estimates are based on 1,357 salaries submitted anonymously to Indeed by Site Reliability Engineer employees, users, and collected from past and present job advertisements on Indeed in the past 36 months.
According to Glassdoor the national average salary for a Site Reliability Engineer is ₹14 lakhs/yr in India.
In the survey by payscale, an entry-level Site Reliability Engineer (SRE) with less than 1 year experience can expect to earn an average total compensation (includes tips, bonus, and overtime pay) of $86,253 based on 91 salaries. An early career Site Reliability Engineer (SRE) with 1-4 years of experience earns an average total compensation of $108,462 based on 246 salaries. A mid-career Site Reliability Engineer (SRE) with 5-9 years of experience earns an average total compensation of $117,849 based on 201 salaries. An experienced Site Reliability Engineer (SRE) with 10-19 years of experience earns an average total compensation of $134,755 based on 163 salaries. In their late career (20 years and higher), employees earn an average total compensation of $142,850.
As Google’s own SRE Andrew Widdowson describes it, “Our work is like being a part of the world’s most intense pit crew. We change the tires of a race car as it’s going 100mph.”
SRE teams are staffed with a highly apt developer/sys-admin hybrids who are expert in finding problems as well as fixing them. They can work easily collaborate with development teams and ensure that production systems are running smoothly while developing new features.
In fact, one of the core principles mandates that SRE’s can only spend 50% of their time on operations work. As much of their time as possible should be spent writing code and building systems to improve performance and operational efficiency.
Some of the responsibilities of a Site Reliability Engineer are:
Work alongside the Developer team to ensure new features are being developed smoothly and being deployed without a hitch.
Help prevent and investigate production issues.
Partner with Product Engineering teams to improve operational best practices.
Incident Management – leading the charge and coordinating an investigation for production issues, post-issue follow-up, and ensuring learned improvements are implemented.
Coordinate with Platform and Product Engineering orgs to develop, deploy, evangelize, and enforce best practice processes
Improve predictability and reliability of software releases, workflows and operating software.
Reduce application deployment windows by the leading company towards a Continuous Deployment environment
Reduce mean time to recovery (MTTR) by helping troubleshoot, monitor, alert, and automated recovery.
Ensure the highest level of uptime and Quality of Service (QoS)
Define service level objectives (SLOs) and service level indicators (SLIs) to represent and measure service quality
Automate common, repeatable tasks at large scale to streamline operational procedures
Advise on short term plans for the team to effectively utilize the resource.
What are the Key Skills to be a Site Reliability Engineer?
- Good knowledge of operating systems(Linux preferred).
- Should know how to work with web servers like Apache, Nginx and how to manage them.
- Should have a sound knowledge of Database Management Systems like MySQL, MongoDB, PostgreSQL.
- Source Code Management – Git/SVN
- Continuous Integration (CI) : Jenkins/GO/Circle
- Continuous Delivery (CD) : Fabric/Capistrano
- Configuration Management (CM) : Ansible/Chef/Puppet/Saltstack
- Monitoring Systems: nagios/icinga/sensu
- Docker and Kubernetes
- Should know how to Install & Configure Middlewares
- How To Deploy & Manage Serverless Applications
Top rated courses for developing skills to get hired:
Divided into three parts :
2.SRE Practice areas
Videos length range from 2 to 6 mins
Duration: 1h 20m
Skill Level: Advanced
1.How to make systems reliable
2.Understanding SLIs, SLOs and SLAs
3.Quantifying risks to and consequences of SLOs
Google’s SRE Resources