Software Engineer, Site Reliability (Remote)

Sensible Weather

| Remote

Sorry, this job was removed at 3:07 p.m. (PST) on Monday, February 27, 2023

View 278 Jobs

Find out who's hiring in Greater LA Area.

See all Cybersecurity + IT jobs in Greater LA Area

View 278 Jobs

Apply

By clicking Apply Now you agree to share your profile information with the hiring company.

Save job

Who we are

Sensible is built to help consumers and businesses understand, plan for, and mitigate all types of climate and weather risk. We work at the intersection of deep technology, science and experience design. Our first product embeds with travel and outdoor events partners, offering their customers a guarantee against bad weather. This means a customer can have confidence that they will have a great time in the sun. If not, they get their money back!

We recognize that we're living in a world with more climate disruption than ever before. We also believe that it is one of unprecedented opportunity for solutions.

With rich data from satellites and other developing technologies, we have the right information, engineering, and technology to help us relate to our environment with a new kind of awareness and understanding.

Sensible is a team built on trust, feedback, and communication. We recognize that diversity of background, skills, and experiences makes stronger teams. We are, therefore, an equal opportunity employer.

What you'll be working on

Coordinate with engineering and product leaders to maintain a working roadmap for business systems reliability and developer experience improvements and projects
Document and maintain SRE best practices
Maintain existing cloud based infrastructure including AWS resources and Kubernetes clusters
Maintain and improve monitoring, logging, and instrumentation/tracing systems
Implement and improve observability, alerting, on-call systems and procedures
Improve and implement CI/CD practices and pipelines for deploying containerized apps
Improve and implement monitoring for basic cloud security concerns including AWS/Kubernetes access management, endpoint security, and obfuscation of sensitive information

Required Qualifications

A bachelor's degree in a STEM related field, or equivalent industry experience
Commitment to the spirit of continuous improvement
Flexibility around working hours in order to maintain high systems availability

‍

Experience and comfort working with the following technologies or their equivalent:

AWS: IAM, VPC, EC2, Routing/Security, EKS, S3, ALB/NLB, RDS/Aurora
Kubernetes: Cluster management, deployments/services/pods, autoscaling, metrics, ingress, certificate management
Observability: SLOs/SLAs, SLIs/KPIs, metrics
CI/CD: Github actions or another common CI system like Circle, Travis, AWS Codepipeline, etc…
Programming: an imperative language like Python, Node, Go, Java, and/or Rust
Tooling: Terraform, Docker, AWS Cloudformation, Git

Desired Qualifications

Experience with developing custom event-based pipelines for CI/CD and/or systems automation/management
Experience with creating custom SlackOps integrations for systems notifications and administration
Demonstrated ability to create basic internal tool webapps to facilitate things like configuration management, deployments, security, and/or monitoring systems
Experience maintaining system reliability in high-traffic environments - 10000+ requests/minute
Experience being on, maintaining, and shepherding On Call rotations.

#LI-Remote

#BI-Remote

Read Full Job Description

Software Engineer, Site Reliability (Remote)

Location

Similar Jobs