Site Reliability Engineer (Remote)

Blue Pisces Consulting Inc

| Remote

Sorry, this job was removed at 11:57 a.m. (PST) on Monday, October 25, 2021

View 1033 Jobs

Find out who's hiring in Greater LA Area.

See all Developer + Engineer jobs in Greater LA Area

View 1033 Jobs

Apply

By clicking Apply Now you agree to share your profile information with the hiring company.

Save job

We have an exciting opportunity with a premier client of ours. Our client is in the Financial Services industry and is a well-established and well-known company. They are growingly rapidly, offer a very comprehensive and competitive compensation package and have a great company culture. This is a full-time, W2 position with our client.

If you're a SRE with strong DevOps or Development experience, this could be an incredibly exciting opportunity for you. Check out the full job description below:

About this Position:

As a Site Reliability Engineer, you will work directly with both development and operations teams.

For development, the SRE will provide and implement automated tooling for monitoring, visibility, and troubleshooting. They will partner directly with dev teams (by embedding within those teams for short periods of time) and provide recommendations for reliability, including code reviews, capacity planning, gathering and implementing charting and dashboard requirements, and designing on-call and alerting needs.
For operations, the SRE will work directly with Infra and Ops teams to provide automation enhancements across the reliability spectrum, including Monitoring as Code capabilities, automatic instrumentation of tracing technologies, continuous deployment and canary deployment capabilities, capacity planning, and working within the BAU of these teams as an Operations engineer to provide cloud infrastructure and automated pipelines via technologies such as Terraform, Ansible, and Spinnaker.

RESPONSIBILITIES & DUTIES

Embed directly into application teams as an ops engineer, and resolve pain points developers experience with the cloud platform
Rotate through on-call and directly support complex technical outages, following up with blameless post-mortems and action items
Provide and deploy automation enhancements for both Ops and Dev teams
Automate Monitoring technologies such as Grafana, Promtheus, Dynatrace, Telegraf, etc. to provide our developers with self-service monitoring systems
Debug and troubleshoot issues alongside developers
Work directly with cloud/ops engineers on automation tooling

REQUIREMENTS:

Bachelor's degree in a technical field or equivalent experience
4+ years of experience in IT or Software Development
2+ years Programming experience with at least one of:
- Golang
- Python
- Java
- Javascript / Node.js
2+ years' experience with the following technologies:
- Infrastructure as Code (Terraform a plus)
- Containers
- Kubernetes (GKE, EKS, or Rancher a plus)
- Jenkins or a similar CI tool
- Spinnaker, Argo, or another similar Cloud Native CD tool
- Observability and Tracing platform(s) (Dynatrace a plus)
- Open-Source monitoring tools, at least one of:
  - Jaeger, Prometheus, Grafana, or InfluxDB
Deeply familiar with:
- Tracing, OpenTracing
- Designing SLOs & SLIs
- Instrumenting code for metrics and observability
- Deployment patterns (Canary, etc.)
- Cloud Architecture patterns
- Automation tooling
Must be comfortable with:
- Collaborating/Screen Sharing for 4+ hours a day
- Pair programming
- Dealing with ambiguity and producing disciplined robust solutions
- Delivering projects all the way to production

Experience with any of the following a plus:

Nutanix
Rancher / RKE
Azure or AWS networking
Site Reliability Engineer experience
Public cloud infrastructure and/or architecture design
Cloud Native security concerns and engineering patterns (a relevant example would be familiarity with static code security analysis of terraform, or 0-trust container networking, etc.)

Certifications considered a plus

CKA (Certified Kubernetes Administrator), CKAD (Certified Kubernetes Application Developer), or Azure/AWS Architect
Architecture, Development, Security, or Automation certification(s) in one of the major public cloud vendors (GCP, AWS, or Azure)

Our Benefits:

Day one health, dental, and vision insurance
401(k) Plan with competitive employer match
Vacation, sick, holiday and volunteer time off
Life and disability insurance
Flexible Savings Account & Health Savings Account
Professional development
Tuition reimbursement
Company-sponsored social and philanthropy events

Please note: We cannot currently provide visa sponsorship. If you require visa sponsorship now, or in the future, please do not apply for this position.

Read Full Job Description

Site Reliability Engineer (Remote)

Location

Similar Jobs