Summary
Are you a systems-minded engineer who thrives on building resilient infrastructure, driving operational excellence, and enabling teams to move fast with confidence? As a Staff Site Reliability Engineer at Topstep, you'll play a foundational role in shaping how we approach reliability, observability, and infrastructure at scale. You'll be instrumental in building out our SRE practice, defining our incident response culture, closing observability gaps, and optimizing our AWS infrastructure for both performance and cost. This role is ideal for someone who brings both deep technical expertise and a builder's mindset. Someone who's excited to establish best practices from the ground up, embed reliability into engineering culture, and create the foundations that let teams ship with speed and confidence. Join us and help define what operational excellence looks like at Topstep.
Key Responsibilities
- Set technical direction for reliability and observability across the entire engineering organization, influencing architectural decisions.
- Build and mature our SRE practice defining SLOs, incident response protocols, and on-call standards
- Own the observability stack using DataDog (primary platform for metrics, APM, logging) and CloudWatch (AWS-native monitoring), instrumenting distributed tracing and closing gaps that currently prevent diagnosis of production issues
- Partner with engineering teams to embed reliability principles early in the design process and improve system resilience
- Lead incident response and blameless post-mortems, turning outages into opportunities for systematic improvement
- Mentor engineers across the organization on reliability practices, operational thinking, and production ownership
- Champion a culture of transparency, continuous improvement, and shared ownership of production systems
Required Qualifications and Key Competencies
- 7+ years of professional experience in SRE, infrastructure, or platform engineering, with demonstrated impact building practices that scaled across multiple teams
- Proven track record either starting an SRE function from scratch or scaling an existing practice with measurable improvements to MTTR, MTTD, change failure rate, or availability
- Strong proficiency with DataDog for end-to-end observability (metrics, APM, logs, distributed tracing) and building alerting that catches real issues without causing fatigue
- Deep expertise with AWS infrastructure (EKS, ECS, EC2, and RDS) running production services at scale, and hands-on experience optimizing for both reliability and cost
- Solid foundation in distributed systems, networking, database performance, and debugging complex system failures across service boundaries
- Comfortable reading code, writing automation scripts, and contributing to infrastructure tooling when needed
- Proficiency with infrastructure as code (Terraform) and GitOps practices
- Track record of influencing engineering culture through documentation, tooling, mentorship, and technical leadership
- Excellent communication skills with the ability to explain complex system behavior and trade-offs to varied audiences
- Comfortable making pragmatic trade-offs between long-term platform vision and immediate business needs
Company Culture & Perks
- Topstep is an engaging working environment which ranges from fully remote to hybrid. We foster a culture of collaboration with cameras on during meetings and a robust Slack environment for communication.
- 10 Company paid Holidays and generous Family Leave. Paid time off is accrued monthly.
- Competitive 401(k) matching, health, dental, and vision insurance is offered for full time employees
- Vacations are encouraged with a bonus for taking 5 consecutive days. Employee referrals are bonused. Topstep offers a food and groceries budget and contributes towards health and wellness.
New Hire Base Salary Range
- $200,000-$250,000
- Bonus: This position is eligible for a performance-based bonus as provided by the plan terms and governing documents.
- The compensation offered will take into account internal compensation structure and may vary depending on the candidate's geographic region, job-related knowledge, skills, and experience among other factors.
Equal Opportunity Employer
Topstep is an Equal Opportunity Employer. We are committed to fostering an inclusive environment where all employees and applicants are valued. All qualified candidates will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, age, disability, or veteran status, in compliance with applicable federal, state, and local laws.
Interested in the role? Apply today with your resume and cover letter!
At this time immigration sponsorship is not available for this position (including H-1B, STEM OPT training plans, etc.).
Top Skills
Similar Jobs
What you need to know about the Los Angeles Tech Scene
Key Facts About Los Angeles Tech
- Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
- Key Industries: Artificial intelligence, adtech, media, software, game development
- Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
- Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering



