Site Reliability Engineer

  • Pune
  • Onit India

Site Reliability Engineer


Onit, Inc. is looking for a Site Reliability Engineer L2 to join our Core Infrastructure team. This role will help to ensure the reliability of a diverse set of applications across our AWS infrastructure. To be successful in this role you will need to collaborate and pair with team members, have strong technical skills, and a passion for technology. The individual we seek is skilled in observability, excellent at troubleshooting, and has strong problem-solving skills. You must be able to multi-task in a fast-paced environment and be a self-starter with the ability to work independently.


Responsibilities:

  • Troubleshoot deployment failures and infrastructure issues across our full AWS infrastructure stack (EKS, RDS, ..);
    This incudes dev, test,and production environments
  • Create and maintain monitors for uptime and performance using Datadog, CloudWatch and other monitoring tools.
  • Find ways to help reduce errors in systems and reduce noise in monitors and alerts
  • Work with others on user stories to improve system health
  • Help create and prioritize work / stories
  • Participate in standups with US and India team
  • Help define runbooks and automation to solve production problems
  • Troubleshoot applications from a configuration and logging perspective
  • Assist with responding to and analyzing security events from security tooling
  • Help train others to take on SRE responsibilities
  • Assist with performance optimization by identifying performing bottlenecks and making recommendations on improvements
  • Verify systems are monitored, backed up, and following best practices ... via audits and automation
  • Investigate how to take better advantage of the tools we use for monitoring, security, …


Requirements:

  • Bachelor's degree in computer science or equivalent experience is required.
  • 3+ years of experience for the following:
  • AWS (EC2, EKS, ECS, S3, RDS, CloudWatch, CloudTrail, IAM, AWS CLI, etc.). Experience with containers and EKS is a must.
  • Linux (Centos, Amazon Linux, Ubuntu, ..)
  • Git source code management (Gitlab, GitHub)
  • Bash shell scripting or other scripting / programming experience
  • SaaS based Web application experience
  • Relational Database performance and monitoring (Postgres RDS preferred)
  • Experience with Jenkins or similar CI/CD tooling


  • A solid understanding of the components that make up production systems (Memory, CPU, Disk space, Disk i/o, Network i/o, etc.) is required.
  • Strong experience with monitoring, alerting, and log aggregation tools:
    Datadog, AWS CloudWatch, PagerDuty, Statuspage.
  • Ability to read and interpret application server logs, outputs, CloudTrail and other critical logging output
  • Excellent troubleshooting skills required.


Nice to Have Skills

  • Prior application coding and debugging experience (Ruby, Python, etc.)
  • Terraform and/or CloudFormation
  • Experience troubleshooting application integrations
  • Other Technologies:
    Cloudflare, AWS Guard duty, Crowdstrike


Insert your email to proceed to Onit India's job offer

or