Description

As part of this job, you will be responsible for the reliability of multiple Fortanix production environments. Design operations as code, continuously work with the product engineering to improve reliability, implement actionable monitoring framework and be part of production on-call.

Key Responsibilities

  • Improve production reliability of multiple Fortanix products via automation.
  • Participate in production upgrades, migrations, disaster recovery drills, backup/restore, securing cloud environments, logging, log analytics etc
  • Work with Devops, Networking, Customer Success and Development to continuously improve production environment.
  • Manage service status and incidence portal.
  • Participate in the on-call incident response for critical issues
  • Responding/communicating to impacted customers and providing root-cause-analysis/action plan.
  • Design tests to simulate scenarios/events before they occur.
  • Manage IAM of production system.

Requirements

Technical Experience

Experience with modern enterprise Site reliability engineering. Along with experience in the following areas

  • Automation experience with Python, Ansible, Terraform, CloudFormation, etc
  • Advanced experience with Linux administration and automation.
  • Experience with production debugging and the ability to implement fast workarounds.
  • Advanced experience in managing software deployment on Cloud via pipelines (example: bitbucket/Gitlab) and Datacentre.
  • Understanding DevOps practices on how modern software is deployed, upgraded and monitored.
  • Experience with both managed (AKS, EKS, GKE.) and unmanaged (on-prem) Kubernetes. Especially production experiences with Kubernetes and Docker.
  • Experience with high-level network infrastructure for Datacentre and Cloud

Key Requirements

  • Bachelors/Masters in Computer Science, Engineering or a related field.
  • Engineering: 8+ Years of engineering experience with 3+ Years of core Site reliability engineering experience.
  • Solid understanding of Cloud technologies.
  • Demonstrated ability to coordinate cross-functional work teams toward completion.
  • Demonstrated multitasking, effective leadership, and analytical skills.
  • Must be a team player.

Benefits

  • Top range of market compensation
  • A friendly culture that brings the best out of everybody
  • Mediclaim Insurance – Employees and their eligible dependents including dental coverage
  • Personal Accident Insurance
  • Internet Reimbursement