Site Reliability Engineer will build, maintain and evolve the mission critical production systems. This includes supervising initial release deployments, rollouts and roll backs and guiding release process and system checks.
Responsibilities:
- Primary operational, engineering and support for multiple distributed applications
- Improving all aspects of application reliability, including better monitoring, alerting and documentation
- Engaging with our software engineering teams on support issues and improvements to our tools, processes
- Being a bridge between infrastructure and development teams
- Educating and implement DevOps best practices, policies
Qualifications:
- computer science masters degree, equivalent discipline or relevant work experience
- Strong knowledge of Linux, system and cloud computing
- The ideal candidate will have at least 2 – 3 years of relevant experience
- Ability to interact with both software developers and customer operations staff
- Experience using GCP, Azure
- The ability to program in Python/Golang
- In-depth knowledge and experience in at least one of: Ansible, Puppet, Cloudformation, Terraform, Docker, Kubernetes and a desire to learn more
- Ability to debug systems in real time
- Hands on with CICD and toolsets Jenkins, GitLab, artifactory is bonus