Sr. Site Reliability Engineer
OpenX, a leading provider of digital and mobile advertising technology, seeks a Site Reliability Engineer responsible for the performance and uptime of various OpenX systems and services. OpenX serves hundreds of thousands of requests per second from thousands of servers across a worldwide data center footprint. Experience at large scale is desirable, though we are willing to train people with the right skills and attitude. You will be responsible for maintaining and improving service uptime and scaling our systems for continued rapid growth.
Additionally, this role will handle planned or unplanned maintenance events and execute a consistent software release process. The ideal candidate has experience with management of physical servers on-premises at scale and on-demand, burstable virtualized environments hosted on public cloud providers. Excellent communication skills are required in order to successfully interact with the rest of OpenX Engineering. Developing and supporting our infrastructure presents many interesting technical challenges. We especially desire candidates with a passion for open-source software and an interest in the latest system architecture trends, for example: Docker and Kubernetes.
- Design, implement, and support highly-performant, highly-available infrastructure with both on-premises hardware and public clouds such as AWS, GCP, or equivalent providers.
- Improve the efficiency and flexibility of our data centers
- Build and maintain models for growth and capacity
- Tune large-scale clusters for optimal performance and efficiency
- Develop technologies for low latency access to very large data resources
- Participate in on-call rotation, as needed
- Own the day-to-day health, uptime, monitoring, and reliability of all server infrastructure
- Work closely with engineering, project management, operational, and engineering peers to develop innovative technical tools and solutions
- Identify tactical issues and react to emerging areas of concern
- Adhere to the DevOps philosophy by evangelizing communication, collaboration, and integration with software development teams
- Think long-term and be unsatisfied with band-aids
- Identify unnecessary complexity and remove it
- At least three years experience in a SRE/SysAdmin, DevOps, or equivalent role
- At least three years experience maintaining a production infrastructure hosted on AWS or GCP or equivalent public cloud providers
- Capability to automate tasks in at least one language (other than Bash), ideally Python, Ruby, or Perl
- Solid knowledge of the UNIX command-line and architecture
- Strong knowledge of core protocols and tech such as: TCP/IP, HTTP, DNS, load balancers, distributed file systems, key-value and relational databases
- Solid understanding how to manage public cloud services and tasks, such as: load balancing, automation through provider API, VPC, serverless computing (Lambda, GCF), backup/restore procedures, and managing policies and resources.
- Extensive experience with configuration management tools such as Puppet, Chef, Salt, or Ansible is a big plus
- Excellent organizational skills and the ability to work in a fast-paced and hectic work environment
- Capable of technical deep-dives into code, networking, systems, and storage with very bright, experienced engineers
- Demonstrated experience in network and large scale UNIX system troubleshooting and maintenance practices
- Humility and Integrity
- Experience working with developers in programming languages such as Erlang, Java, Go, C/C++, or others is desirable
- Self-starter with the ability to independently identify and act on areas of improvement
- Knowledge and interest in the latest system architecture trends
- Ability to rapidly learn and understand new systems
- Ability to communicate effectively and write accurate, clear documentation
Our five company values form a solid bedrock serving to define us as a group and guide the company. Our values remind us that how we do things often matters as much as what we do.
We are one
One team. No exceptions. We are a group of strong and diverse individuals unified by a clear common purpose.
Our customers define us
We know our business flourishes or dies because of our customers.
OpenX is mine
We are all owners of OpenX. We stake our personal and professional reputations on the excellence of our work.
We are an open book
We are eager to teach and share what we know with others.
We evolve fast
We take risks and confront failure openly. We recognize and repeat success aggressively. We actively seek out and provide constructive criticism. Defensiveness is for weaklings!