Software Engineer, Site Reliability (Remote) at Sensible Weather
| Greater LA Area | Remote
Sorry, this job was removed at 3:07 p.m. (PST) on Monday, February 27, 2023
By clicking Apply Now you agree to share your profile information with the hiring company.
Who we are
Sensible is built to help consumers and businesses understand, plan for, and mitigate all types of climate and weather risk. We work at the intersection of deep technology, science and experience design. Our first product embeds with travel and outdoor events partners, offering their customers a guarantee against bad weather. This means a customer can have confidence that they will have a great time in the sun. If not, they get their money back!
We recognize that we're living in a world with more climate disruption than ever before. We also believe that it is one of unprecedented opportunity for solutions.
With rich data from satellites and other developing technologies, we have the right information, engineering, and technology to help us relate to our environment with a new kind of awareness and understanding.
Sensible is a team built on trust, feedback, and communication. We recognize that diversity of background, skills, and experiences makes stronger teams. We are, therefore, an equal opportunity employer.
What you'll be working on
- Coordinate with engineering and product leaders to maintain a working roadmap for business systems reliability and developer experience improvements and projects
- Document and maintain SRE best practices
- Maintain existing cloud based infrastructure including AWS resources and Kubernetes clusters
- Maintain and improve monitoring, logging, and instrumentation/tracing systems
- Implement and improve observability, alerting, on-call systems and procedures
- Improve and implement CI/CD practices and pipelines for deploying containerized apps
- Improve and implement monitoring for basic cloud security concerns including AWS/Kubernetes access management, endpoint security, and obfuscation of sensitive information
- A bachelor's degree in a STEM related field, or equivalent industry experience
- Commitment to the spirit of continuous improvement
- Flexibility around working hours in order to maintain high systems availability
- AWS: IAM, VPC, EC2, Routing/Security, EKS, S3, ALB/NLB, RDS/Aurora
- Kubernetes: Cluster management, deployments/services/pods, autoscaling, metrics, ingress, certificate management
- Observability: SLOs/SLAs, SLIs/KPIs, metrics
- CI/CD: Github actions or another common CI system like Circle, Travis, AWS Codepipeline, etc…
- Programming: an imperative language like Python, Node, Go, Java, and/or Rust
- Tooling: Terraform, Docker, AWS Cloudformation, Git
Experience and comfort working with the following technologies or their equivalent:
- Experience with developing custom event-based pipelines for CI/CD and/or systems automation/management
- Experience with creating custom SlackOps integrations for systems notifications and administration
- Demonstrated ability to create basic internal tool webapps to facilitate things like configuration management, deployments, security, and/or monitoring systems
- Experience maintaining system reliability in high-traffic environments - 10000+ requests/minute
- Experience being on, maintaining, and shepherding On Call rotations.