Manager, Site Reliability Engineering
KEYPR is redefining the guest experience at hotels worldwide. Our mission is nothing less than a whole new way to travel, taking advantage of new mobile technologies to deliver a superior hotel stay. We put guests in charge of what, when and how they want their hotel to meet their needs.
We are building the next-generation hardware and software platform for the hospitality industry and are looking for brilliant and versatile engineers who know the technology landscape and can craft simple and creative solutions.
We are currently looking for a strong, hands-on Manager, Site Reliability Engineering to assist in implementing automated processes that will result in a highly consistent, highly available, performant server stack. This position requires strong knowledge in Linux systems administration, modern build, test, and release processes (continuous integration, continuous delivery), flexible and highly scalable architectures, enterprise network administration, and cloud based deployments.
Responsibilities
Manage distributed devops team
Manage all aspects of the end to end release process
Create, communicate and enforce release and deployment policies and plans
Work with QA manager and Director of Engineering to ensure the integrity of the software released
Contribute to the architecture and implementation of our fully automated release pipeline.
Ensure that all production systems are well monitored, instrumented, and analyzed to ensure performance and availability.
Manage all aspects of the development, testing and production environments
Provide executive team with reports on release progress and infrastructure cost based on Key Performance Indicator
Optimizing performance from the Cloud architecture / deployment perspective
Optimizing performance of web server technologies from a system perspective
Development and management of internal automated health checks and auto-remediation systems
Job Requirements
Practical knowledge of system programming in one of the following: Python, Go, Ruby, Java, or Javascript.
Expert knowledge in cloud deployments in AWS and accompanying technologies (Load Balancers, Cache Stores, VPC, etc.)
Experience administering data infrastructure such as PostgreSQL, Elasticsearch and MongoDB
Experience debugging distributed systems.
Knowledge and practical deployment and orchestration of Containers using Kubernetes or Docker.
Strong experience with build and deployment management in particular with Continuous Integration systems such as Jenkins.
Experience with a configuration management tools such as Saltstack and Terraform.
Experience releasing client/server products into 24x7 availability production environments
Experience with around the clock support
Minimum 2 years site reliability management experience
Excellent oral and written communication skills
Taking part in 24/7 on-call rotations
Other Desired Skills
Experience with other Cloud platform like Google Cloud Platform
Experience deploying web applications utilizing wsgi, django or traefik.
Direct experience installing, configuring, and tuning systems such as AWS Kinesis and SNS
Knowledge of networking protocols including router and firewall configuration, network topology models including VLANs, DMZs, wireless networking best practices and network security.
KEYPR offers a competitive salary commensurate with experience as well as comprehensive benefits plans for its employees and their qualified dependents. The Company currently pays 90% of the premium cost for employees’ and their dependents’ medical and dental coverage. Employees have the option to purchase vision, additional life, disability and AD&D coverage. Employees may also utilize Flexible Spending Accounts (Dependent Day Care and Medical) and may participate in the Company’s 401(k) plan.