Senior Site Reliability Engineer / DevOps Engineer
Senior Site Reliability Engineer / DevOps Engineer (REMOTE - Bangalore, India or Palo Alto, CA)
We are Skyflow, a Silicon Valley startup that has built the world’s first data privacy vault delivered as an API. Our mission is to transform how businesses handle and protect their users’ financial, healthcare, and personal information — the data that powers our digital economy. Inspired by the zero trust data vaults that Apple and Netflix built to handle customer data, we've built a cloud-based vault that is available through a simple and elegant API. With Skyflow, developers can easily build best-of-breed data privacy, security and compliance directly into their applications, the same way they use Stripe, Twilio, or Okta.
Skyflow is based in Palo Alto California, with offices in Bangalore, India, and team members working from locations all around the world. We have former Executives and Leaders from the likes of Salesforce, Google, Twilio, and Oracle. Come join us!
About the role:
As a Senior Site Reliability Engineer / DevOps Engineer you will have end-to-end accountability for the reliability of IT services within Skyflow’s application portfolio. A prerequisite to the role will be a “build-to-manage”, problem-solving and innovative mindset applied to the design, build, test, deploy, change and maintenance of services drawing from deep engineering expertise. Key measures of success will include service stability, effective delivery and environment instrumentation, deployment quality, technical debt reduction, asset resiliency, risk/security compliance, cost efficiency, as well as proactive and preventative maintenance mechanisms.
We know great Site Reliability Engineers and DevOps Engineers come from diverse backgrounds so no single individual may have all the desired skills on day one. But if you are the kind of software engineer who would have loved to engineer infrastructure solutions for Stripe or Twilio API's, or the Slack or Zendesk app, or the Snowflake or MongoDB platform - we want to talk to you.
- 3+ years in a Site Reliability Engineering or DevOps Engineering position at a web-scale company
- Experience creating and editing scripts with Python or Golang
- Hands-on experience with container technologies (Docker, ArgoCD, Helm, Borg, etc.) and microservice architectures
- Experience with monitoring and observability tools and applications, such as Splunk, DataDog, NewRelic, AppDynamics, ElasticSearch, etc.
- Experience implementing AWS/GCP/Azure services in a variety of distributed computing environments
- Proven ability to debug and troubleshoot performance issues across the stack
- Experience working with development teams in a SCRUM
- Participate in the overall design and implementation of secure, scalable, and fault-tolerant infrastructure
- Design and implement observability tools used to optimize systems for uptime, performance, and reliability, and provide visibility to internal teams
- Automate infrastructure provisioning, demand forecasting, and capacity planning
- Refine and expand incident response best practices, ensuring that engineers, including yourself, are able to respond efficiently when incidents occur
- Proposes initial technical implementation which supports architectural changes that solve scaling and performance problems
- Excellent Health, Dental, and Vision Insurance Options (Varies by Country)
- Vanguard 401k
- Very generous PTO
- Flexible Hours
- Generous Equity
At Skyflow, we believe that diverse teams are the strongest teams. We invite applicants of all genders, races, ethnicities, nationalities, ages, religions, sexual orientations, disability statuses, educational experiences, family situations, and socio-economic backgrounds.
Something looks off?