Senior Site Reliability Engineer
ZEFR is hiring! We are seeking a Senior Site Reliability Engineer - Software Engineer to join our team in Venice, CA! Our SRE-SWE's solve operational problems with a software engineering mindset. We force multiply Engineering's productivity and service production worthiness through coding prowess, knowledge of distributed systems, software architecture, and SRE principles. We are tightly integrated into the Software Development lifecycle from inception and design, through deployment, operation and refinement.
SRE's culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.
Here's what you'll get to do:
- Help to build a culture of solving Operations problems with Software Engineering discipline
- Propose and review Engineering Request for Comment's (RFC) which drive Engineering Practices and Software Architecture at ZEFR.
- Mature our container orchestration suite, ie. Docker, Service Discovery, Configuration Management
- Improve build and Continuous Integration, Testing and Deployment Pipelines
- Utilize production experience with Amazon Web Services or Google Cloud
- Be effective and rapid at troubleshooting with root cause analysis
- Maintain the health of production environments proactively
- Create an effective monitoring plan for products
- Respond to system performance issues and outages
- Participate in change management and deployment plan creation and review
- Participate in an on-call rotation
Here's what we're looking for:
- Bachelor Degree in Computer Science or related field or equivalent work experience
- 5+ years building distributed systems as a software or systems engineer
- Mastery of one or more programming languages: Python, Go, Java, or Scala would be a plus
- *Must* have strong Linux skills in most major environments
- Experience with Container Orchestration: Docker Swarm, ECS, Kubernetes, etc
- Familiarity with message busses. (Kafka, Kinesis, etc)
- Desire and ability to work independently and take ownership of complex tasks
- Strong written and oral communication, organization, and documentation skills