- Full Time
- Apply now →
About the job
AI is rapidly changing the world. From processing job applications and credit decisions, to making content recommendations and helping researchers analyze genetic markers at scale -- many aspects of our daily lives are touched by machine learned systems in some way.
Arize is the leading machine learning observability platform to help ML teams discover issues, diagnose problems, and improve the results of machine learning models. In short: we are here to build world class software that helps make AI work better.
Our On-Prem engineering team is responsible for the deployment of Arize in customer environments. In addition to working with customers in defining infrastructure requirements, the team designs and develops software and tooling that enables the management of these systems at large scale. The On-Prem team has grown to be expert in Kubernetes and cloud deployment on GCP, Azure, and AWS as well as dealing with networking and security aspects of on-premise deployments. The team is dynamic and relies on few talented individuals with a high degree of autonomy and initiative.
What You’ll Do
- Work hands-on with the infrastructure that supports our distributed & highly scalable services in both SaaS and on-prem offerings
- Gather requirements from customers and adapt manifests and software to support new environments
- Use and augment monitoring tools to observe platform health, ensure performance and reliability
- Interact with the product team to test new features and package new on-prem releases
- Automate and optimize the release pipeline to make it as frictionless as possible
- Exhibit continuous curiosity for emerging technology that could solve our challenges
What We’re Looking For
- 1-2+ years experience in site reliability engineering, DevOps, and system administration
- CS (preferred) or other technical degree, or equivalent practical experience
- Experience working with DevOps tools such as Kubernetes, Terraform, Ansible, Puppet and Chef
- Proficiency with scripting languages such as Python and bash
- Experience managing cloud infrastructure in AWS, GCP, and/or Azure
- Expertise in Linux administration, configuration, and networking protocols
Bonus Points, But Not Required
- Experience with on-prem deployment architectures
- Experience running a 24x7 SaaS platform with defined SLI, SLO, SLA
- Familiarity with operating machine learning & AI applications
Technologies You’ll Work With:
- Messaging systems
- Go, Java, Python
- AWS, GCP
The estimated annual salary for this role is between $100,000 - $185,000, plus a competitive equity package. Actual compensation is determined based upon a variety of job related factors that may include: transferable work experience, skill sets, and qualifications. Total compensation also includes a comprehensive benefit package, including: medical, dental, vision, 401(k) plan, unlimited paid time off, generous parental leave plan, and others for mental and wellness support.
More About Arize
Arize’s mission is to make the world’s AI work and work for the people. Our founders came together through a common frustration: investments in AI are growing rapidly across businesses and organizations of all types, yet it is incredibly difficult to understand why a machine learning model behaves the way it does after it is deployed into the real world.
Learn more about Arize in an interview with our founders: https://www.forbes.com/sites/frederickdaso/2020/09/01/arize-ai-helps-us-understand-how-ai-works/#322488d7753c
Diversity & Inclusion @ Arize
Our company's mission is to make AI work and make AI work for the people, we hope to make an impact in bias industry-wide and that's a big motivator for people who work here. We actively hope that individuals contribute to a good culture
- Regularly have chats with industry experts, researchers, and ethicists across the ecosystem to advance the use of responsible AI
- Culturally conscious events such as LGBTQ trivia during pride month
- We have an active Lady Arizers subgroup