Site Reliability Engineer

  • dindigul

Key Responsibilities:

  • Maintain and enhance the reliability, availability, and performance of large-scale distributed systems.
  • Automate deployment, monitoring, and management of production systems. - Implement and manage CI/CD pipelines for software delivery.
  • Collaborate with software engineers to design, build, and manage scalable and resilient infrastructure.
  • Troubleshoot complex system issues, identify root causes and implement long-term solutions.
  • Monitor system performance and optimize configurations for better performance and cost efficiency.
  • Implement security best practices and ensure compliance with industry standards.

    Required Skills:
  • Proficiency in cloud platforms (AWS, Google Cloud, or Azure) and containerization technologies like Docker and Kubernetes.
  • Strong scripting and automation skills using Python, Bash, or similar languages.
  • Experience with infrastructure as code (IaC) tools such as Terraform or Ansible.
  • Deep understanding of monitoring and logging tools (Prometheus, Grafana, ELK Stack).
  • Knowledge of database management (SQL/NoSQL) and networking fundamentals.
  • Experience with CI/CD tools like Jenkins, GitLab CI, or CircleCI.
  • Strong problem-solving skills and experience in troubleshooting large-scale systems.

    Education:
  • A degree in Computer Science, Engineering, or a related field from a recognized institution.
  • Ideally, 5-10 years of experience in a similar role at a product company.

Insert your email to proceed to ATTB - standard 's job offer

or