Senior SRE Engineer
Fortanix
This job is no longer accepting applications
See open jobs at Fortanix.See open jobs similar to "Senior SRE Engineer" Foundation Capital.As a Senior Site Reliability Engineer at Fortanix, you will be at the forefront of ensuring the reliability, scalability, and performance of our cutting-edge production environments. You’ll design and build operations as code, architecting automated solutions that enhance system stability. Partnering closely with our product engineering teams, you'll have a hands-on role in continuously improving the reliability of our platforms, ensuring our systems are robust and resilient. You'll develop and implement a comprehensive, actionable monitoring framework that detects and prevents issues before they impact our users.
In this role, you'll be a critical part of our production on-call rotation, responding to incidents with agility and executing post-incident reviews to drive continuous improvement. If you’re passionate about automation, enjoy tackling complex reliability challenges, and thrive in a fast-paced, high-impact environment, this role is for you!
Join us to shape the future of secure computing with a focus on building reliable, scalable, and secure production systems.
Key Responsibilities
- System Architecture & Design
- Collaborate with software development teams to design scalable, reliable, and secure systems.
- Architect and build robust infrastructure to handle growth and ensure system uptime.
- Automation & Infrastructure as Code (IaC)
- Automate infrastructure deployment and management using tools like Terraform, Ansible, or CloudFormation.
- Implement continuous integration and continuous deployment (CI/CD) pipelines for automated testing and deployment.
- Write automation scripts and code for scaling and self-healing systems.
- Monitoring & Incident Management
- Design and implement comprehensive monitoring and alerting solutions to detect anomalies and issues before they impact users.
- Implement logging and observability tools to gain insight into system health and performance (e.g., Prometheus, Grafana, ELK stack).
- Manage on-call rotations, ensure timely responses to incidents, and perform root cause analysis and post-mortems.
- Performance Tuning & Optimization
- Perform load testing and system benchmarking to identify performance bottlenecks.
- Optimize application and infrastructure performance, reducing latency and improving response times.
- Security & Compliance
- Ensure systems are secure by design, incorporating security best practices (e.g., encryption, firewalls, access controls).
- Stay up-to-date with security vulnerabilities and patch systems accordingly.
- Implement compliance standards (e.g., GDPR, HIPAA) where applicable.
- Collaboration & Mentoring
- Work closely with developers to ensure that applications are designed for reliability and scalability.
- Serve as a mentor to junior engineers, fostering a culture of reliability and best practices.
- Collaborate across teams (DevOps, Development, QA) to enhance system robustness.
- Disaster Recovery & High Availability
- Develop and maintain disaster recovery and business continuity plans.
- Ensure systems are highly available, designing systems that can withstand failures without service disruptions.
- Capacity Planning & Scalability
- Forecast future system demand and plan for capacity increases as needed.
- Design infrastructure that scales automatically to handle increased loads.
- Continuous Improvement & Reliability Culture
- Analyze incidents and failures to identify opportunities for improving system reliability.
- Drive a culture of reliability across the engineering organization, advocating for best practices and SRE principles.
- Cloud & Hybrid Infrastructure Management
- Manage cloud infrastructure (AWS, GCP, Azure) and hybrid environments, ensuring optimal usage of cloud resources.
- Implement cost optimization strategies for cloud resources while maintaining performance and reliability.
This role requires a deep understanding of both software engineering and infrastructure management, as well as strong collaboration and problem-solving skills
This job is no longer accepting applications
See open jobs at Fortanix.See open jobs similar to "Senior SRE Engineer" Foundation Capital.