Adzuna logo

Site Reliability Engineer

Location: Bangalore
Company: Cigna
Direct Employers
Apply for this job
Site Reliability Engineer IM D&A
Job Role: Site Reliability Engineer.
Location: Bangalore, India
Job Role: Site Reliability Engineer.
Location: Bangalore, India
Job Objective:
At Cigna, we're passionate about building data intelligent analytical software products that solves business problems. We count on our site reliability engineers (SRE) to empower our users/stakeholders with a rich feature set, high availability, and stellar performance level to pursue their missions. As we expand our analytics use case deployments, we are currently seeking an experienced SRE to deliver insights from massive scale data in real time. Specifically, we are searching for someone who brings fresh ideas, demonstrates a unique and informed viewpoint, and enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences at every interaction.
Job Responsibilities
Building Software to Help Operations and Support Teams
An SRE is in charge of proactively building and implementing services to make IT, analytics and Data engineers and support better at their jobs. This can be anything from adjustments to monitoring and alerting to code changes in production. A site reliability engineer can be tasked with building a home grown tool from scratch to help with weaknesses in software delivery or incident management.
Automation of IT Operations
Handling IT operations involves performing the same functions day in, day out. Instead of manually performing these functions, an SRE emphasizes the need to automate them. As part of their core responsibilities, an SRE Engineer should build tools that aid automation in managing IT operations and support. As such, Site Reliability Engineer should enable automation for some of the following key functions -
Continuous Integration and Delivery (CI/CD) across SDLC phases:
- Monitoring
- Alerts
- Incident Response
- Infrastructure Component Provisioning
- Patching
Fixing Support Escalation Issues
Similarly to the point above, a site reliability engineer can expect to spend time fixing support escalation cases. But, as your SRE operations mature, your systems will become more reliable and you'll see fewer critical incidents in production - leading to fewer support escalations. Because an SRE team ( touches so many different parts of the engineering and IT organization, they can be a great source of knowledge and can be helpful for routing issues to the right people and teams.
Specifying Service Level Indicators and Objectives
Service Level Objectives are defined levels of service uptime or availability that act as essential metric indicators for measuring performance. An SRE is responsible for identifying and creating indicators, while keeping an eye on performance. This involves analyzing historical data and setting realistic objectives to meet Service Level Agreements (SLAs).
Incident Management and Disaster Recovery
Possibly one of the most crucial responsibilities of an SRE Engineer is to collaborate for high-priority Incident Tickets and ensuring system recovery within an SLA. When an outage occurs, the first step to recovery is to utilize monitoring systems and diagnose the root cause. Armed with this information, it is expected of the SRE to manage the incident properly and restore the system online.
Optimizing On-Call Rotations and Processes
More times than not, site reliability engineers will need to take on-call responsibilities. At most organizations, the SRE role will have a lot of say in how the team can improve system reliability through the optimization of on-call processes. SRE teams will help add automation and context to alerts - leading to better real-time collaborative response from on-call responders. Additionally, site reliability engineers can update runbooks, tools and documentation to help prepare on-call teams for future incidents.
Documenting "Tribal" Knowledge
An SRE gain exposure to systems in both staging and production, as well as all technical teams. The SRE will take part in work with software development, support, IT operations and on-call duties - meaning he/she build up a great amount of historical knowledge over time. Instead of siloing this knowledge into the mind of one team or one person, site reliability engineer document much of what he/she knows. Constant upkeep of documentation and runbooks can ensure that teams get the information they need right when they need it.
The SRE is also expected to keep records of outages of the system. These records provide critical insights about long-term trends while assisting the organization to produce reasonable Service Level Agreements. More so, keeping records of incidents, especially low priority ones, is specifically useful in identifying and resolving elusive bugs within the system.
Conducting Post-Incident Reviews
Without thorough post-incident reviews, you have no way to identify what's working and what's not. An SRE needs to keep teams honest and ensure that everyone - software developers and IT professionals - are conducting post-incident reviews, documenting their findings and taking action on their learnings. Then, site reliability engineer is often tasked with action items for building or optimizing some part of the SDLC or incident lifecycle to bolster the reliability of their service.
On-Call Support and Issue Resolution
Overlapping the role above, SRE engineer has to be on stand-by to interface with developers when issues arise and get escalated. The SRE must interact with developers to provide consultation and troubleshooting services when alerts get raised.
When a developer escalates an issue, the Site Reliability Engineer should investigate, diagnose the problem, and subsequently resolves it. An SRE engineer may also include other engineers if required. Besides, SRE engineer ensures high-priority tickets are handled for a speedy resolution to meet Service Level Agreement.
Key Tasks -
- Run the production environment by monitoring availability and taking a holistic view of system health
- Build software and systems to manage platform infrastructure and applications
- Improve reliability, quality, and time-to-market of our suite of software solutions
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
- Provide primary operational support and engineering for multiple large distributed software applications
- Gather and analyse metrics from both operating systems and applications to assist in performance tuning and fault finding
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and uplifts
- Balance feature development speed and reliability with well-defined service level objectives
Required Skills and Qualifications
- Bachelor's degree in computer science or other highly technical, scientific discipline
- Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C/C++, Ruby, and JavaScript
- Minimum 5-7 years of relevant industrial experience
- Experience with distributed storage technologies like NFS, HDFS, Ceph, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn)
- A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
- Experience working in cloud environments and relevant tools e.g. AWS
Preferred Qualifications
- Previous success in technical engineering
- Coding experience beyond simple scripts
About Cigna
Cigna Corporation exists to improve lives. We are a global health service company dedicated to improving the health, well-being and peace of mind of those we serve. Together, with colleagues around the world, we aspire to transform health services, making them more affordable and accessible to millions. Through our unmatched expertise, bold action, fresh ideas and an unwavering commitment to patient-centered care, we are a force of health services innovation. When you work with us, or one of our subsidiaries, you'll enjoy meaningful career experiences that enrich people's lives. What difference will you make?
Apply for this job


The number of jobs in each salary range for all:

Similar jobs

Site Reliability Engineer
Clientserver Solutions Private Limited
Site Reliability Engineer
Site Reliability Engineer
Site Reliability Engineer
Site Reliability Engineer
25 - 40 lacs/annum
Uniphore Software Systems