Senior Site Reliability Engineer
Honey is helping millions save money when they shop online, and we're growing! We’re looking for a Senior Site Reliability Engineer to design and implement infrastructure solutions to improve the scalability and efficiency of Honey’s services. The ideal candidate should possess a background in systems engineering, automation, cloud computing, and DevOps tooling, as well as strong problem solving abilities. We're still a fairly small team of 65 engineers so this will be an extremely hands on and important role for us.
What You'll Do:
- Maintain the core infrastructure
- Manage, monitor, and improve highly scalable, distributed systems to create highly available services
- Collaborate with engineers in the deployment and scaling of new product features
- Investigate production outages, and help determine root causes / implement fixes
- Identify and automate repetitive, manual tasks.
- Develop effective tooling, alerts, and responses to both identify and address reliability risks
- Debug software at the code and infrastructure level
- Plan for the growth of Honey’s infrastructure and help define best practices
- Participate in an on-call rotation
- Provide technical leadership and mentor junior team members
- Experience with git
- Production experience with major public cloud providers - we use GCP, but experience with AWS or Azure is also fine.
- Docker & Kubernetes
- Comfort with databases and in-memory key/value stores.
- Experience with monitoring and continuous integration and delivery
- Monitoring: Nagios, Stackdriver, Graphite, or similar tools
- CI / CD: Jenkins, Circle CI, Travis, or similar tools
- Experience with business continuity and capacity planning best practices
- Infrastructure automation
- Solid knowledge of Linux/UNIX and networking fundamentals
- Passionate about open-source software and the latest system architecture trends
- Curious, and able to communicate effectively
Bonus points for:
- Experience with Node.js and NPM
- Previous experience with GCP
- Experience with service discovery or service meshes