Site Reliability Engineer (Remote)

Sorry, this job was removed at 11:57 a.m. (PST) on Monday, October 25, 2021
Find out who's hiring in Greater LA Area.
See all Developer + Engineer jobs in Greater LA Area
Apply
By clicking Apply Now you agree to share your profile information with the hiring company.

We have an exciting opportunity with a premier client of ours. Our client is in the Financial Services industry and is a well-established and well-known company. They are growingly rapidly, offer a very comprehensive and competitive compensation package and have a great company culture. This is a full-time, W2 position with our client.

If you're a SRE with strong DevOps or Development experience, this could be an incredibly exciting opportunity for you. Check out the full job description below:

About this Position:

As a Site Reliability Engineer, you will work directly with both development and operations teams.

  • For development, the SRE will provide and implement automated tooling for monitoring, visibility, and troubleshooting. They will partner directly with dev teams (by embedding within those teams for short periods of time) and provide recommendations for reliability, including code reviews, capacity planning, gathering and implementing charting and dashboard requirements, and designing on-call and alerting needs.
  • For operations, the SRE will work directly with Infra and Ops teams to provide automation enhancements across the reliability spectrum, including Monitoring as Code capabilities, automatic instrumentation of tracing technologies, continuous deployment and canary deployment capabilities, capacity planning, and working within the BAU of these teams as an Operations engineer to provide cloud infrastructure and automated pipelines via technologies such as Terraform, Ansible, and Spinnaker.

RESPONSIBILITIES & DUTIES

  • Embed directly into application teams as an ops engineer, and resolve pain points developers experience with the cloud platform
  • Rotate through on-call and directly support complex technical outages, following up with blameless post-mortems and action items
  • Provide and deploy automation enhancements for both Ops and Dev teams
  • Automate Monitoring technologies such as Grafana, Promtheus, Dynatrace, Telegraf, etc. to provide our developers with self-service monitoring systems
  • Debug and troubleshoot issues alongside developers
  • Work directly with cloud/ops engineers on automation tooling

REQUIREMENTS:

  • Bachelor's degree in a technical field or equivalent experience
  • 4+ years of experience in IT or Software Development
  • 2+ years Programming experience with at least one of:
    • Golang
    • Python
    • Java
    • Javascript / Node.js
  • 2+ years' experience with the following technologies:
    • Infrastructure as Code (Terraform a plus)
    • Containers
    • Kubernetes (GKE, EKS, or Rancher a plus)
    • Jenkins or a similar CI tool
    • Spinnaker, Argo, or another similar Cloud Native CD tool
    • Observability and Tracing platform(s) (Dynatrace a plus)
    • Open-Source monitoring tools, at least one of:
      • Jaeger, Prometheus, Grafana, or InfluxDB
  • Deeply familiar with:
    • Tracing, OpenTracing
    • Designing SLOs & SLIs
    • Instrumenting code for metrics and observability
    • Deployment patterns (Canary, etc.)
    • Cloud Architecture patterns
    • Automation tooling
  • Must be comfortable with:
    • Collaborating/Screen Sharing for 4+ hours a day
    • Pair programming
    • Dealing with ambiguity and producing disciplined robust solutions
    • Delivering projects all the way to production

Experience with any of the following a plus:

  • Nutanix
  • Rancher / RKE
  • Azure or AWS networking
  • Site Reliability Engineer experience
  • Public cloud infrastructure and/or architecture design
  • Cloud Native security concerns and engineering patterns (a relevant example would be familiarity with static code security analysis of terraform, or 0-trust container networking, etc.)

Certifications considered a plus

  • CKA (Certified Kubernetes Administrator), CKAD (Certified Kubernetes Application Developer), or Azure/AWS Architect
  • Architecture, Development, Security, or Automation certification(s) in one of the major public cloud vendors (GCP, AWS, or Azure)

Our Benefits:

  • Day one health, dental, and vision insurance
  • 401(k) Plan with competitive employer match
  • Vacation, sick, holiday and volunteer time off
  • Life and disability insurance
  • Flexible Savings Account & Health Savings Account
  • Professional development
  • Tuition reimbursement
  • Company-sponsored social and philanthropy events

Please note: We cannot currently provide visa sponsorship. If you require visa sponsorship now, or in the future, please do not apply for this position.

Read Full Job Description
Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.

Location

We serve our clients so our locations are diverse. Simi Valley represents headquarters, but employees generally are remote, or in rare cases at our client offices.

Similar Jobs

Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.
Learn more about Blue Pisces Consulting IncFind similar jobs