Senior Site Reliability Engineer (GoGuardian App) at GoGuardian
At GoGuardian, we're helping share the future of digital learning by providing educators, students, and schools with tools to create engaging and equitable learning environments. Together, we build innovative solutions to empower students, deliver insights, and encourage experimentation. With employees around the globe, we're committed to building a culture of inclusivity, curiosity, and courage. GoGuardian's growth is fast and ever-evolving, and our teams are growing along with it - always ready to experiment and learn.
We're here for the cause, but also for the culture. We celebrate our successes and wins together, and make time to appreciate our teammates every day. Take a peek into your future at GoGuardian: our Slack channels include #gardenclub, #boardgametime, #bookrecs, and #petphotos. There's always something fun going on, including concerts, classes, book clubs, and more! From virtual trivia to local meet-ups, Guardians are always finding ways to connect.
The Senior Site Reliability Engineer is a critical role at GoGuardian as we expand our services across the globe and mobile applications (Windows, iOS, macOS, Android), impacting millions of students and educators every day. The GoGuardian App allows schools to administer access/deny policies and alerts for student learning devices. You'll join a small team of motivated and empowered engineers, where you will work closely to drive the adoption of modern reliability practices like SLOs, error budget policies, actionable alerts, incident retrospectives, chaos testing, and end-to-end ownership.
What You'll Do
- Work with teams across engineering to support services and capabilities through activities like system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
- Implement and provide necessary infrastructure changes for continued and/or improved site reliability
- Build frameworks that test the performance and resiliency of our platform services/tools
- Read, understand, and review application code to support software development efforts from a reliability/infrastructure perspective
- Develop robust monitoring and observability services and patterns to consistently improve the team's ability to identify, react, respond, and recover from complex failures.
- Monitor the health of production infrastructure and investigate/analyze any issues and abnormalities to identify problems or bottlenecksCommunicate uptime and quality of service issues effectively
- Identify and measure SLOs, SLAs, and SLIsDemand Forecasting and Capacity planning for continued and/or improved site reliability
- On-call rotations and incident response during off-hours
- Plan, track and perform routine system maintenance and software updates to infrastructure
- Track and document reliability-related issues and incidents
- Map business goals to architectural/infrastructure decisions
Who You Are
- 5+ years experience as an operations engineer, DevOps engineer, or SRE supporting SaaS applications in large-scale cloud environments
- Proficiency with the following technologies and practices: AWS or GCP, ECS or Kubernetes, Docker, Terraform
- On the job exposure or experience with the following: Golang, Datadog, MongoDB, Redis, MemSQL, Postgres, Athena
- History of strong Network Management (firewalls, proxies, IP management, routing, DNS)
- Experience in support and troubleshooting MDM such as Jamf for Mac OS and Chrome OS platforms as well as iOS, Android, Windows mobile devices
- Familiar with common design and architectural patterns for building microservices
- Writes production-grade code for well-scoped features; integrates feedback from code reviewers
- Experience with authentication protocols such as SAML, OAUTH, or OpenID Connect
- Confident in making technical decisions and explaining the reasoning behind them
- Comfortable developing solid technical solutions to ambiguous or open-ended problems
- Driven to teach, lead and help others in areas of strongest skill and experience
- Has software development experience and/or understanding of programming languages, data structures, and algorithms
What We Offer
- A varied and challenging role in a multinational and highly innovative company
- A robust benefits package including health insurance, 401(k) retirement savings plan with company match, employee stock option plan, paid parental leave, 13 paid company holidays, and much more
- Development and further training opportunities for shaping and realizing your career goals
- Exceptional colleagues with a passion for EdTech
P.S. - Share this with your friends or co-workers who may be interested in working at GoGuardian! We have multiple openings and are always looking for amazing people.
GoGuardian is an equal opportunity employer and makes employment decisions on the basis of merit and business needs. GoGuardian does not discriminate against employees, applicants, interns or volunteers on the basis of race, religion, color, national origin, ancestry, physical disability, mental disability, medical condition, pregnancy, marital status, sex, age, sexual orientation, military and veteran status, registered domestic partner status, genetic information, gender, gender identity, gender expression, or any other characteristic protected by applicable law.