Eng III, Site Reliability

| Hybrid
Sorry, this job was removed at 6:59 a.m. (PST) on Friday, June 25, 2021
Find out who's hiring in Greater LA Area.
See all Developer + Engineer jobs in Greater LA Area
Apply
By clicking Apply Now you agree to share your profile information with the hiring company.

This is a 24/7 team responsible for production systems health monitoring, deployment of code changes, escalation handling and standardized communication of all change management within the technical operations organization. Multi-task and prioritize system events according to severity and escalation procedures. Communicate accurately in the event of production emergencies, both with internal and external groups. This individual must also be comfortable navigating through both Unix and Windows environments and be involved in actively troubleshooting and/or resolving production issues.

Job Description

  • Establish round-the-clock health monitoring of Unix and Windows environments hosting various platforms, (web, mobile and telephony)  using server, network and application monitoring systems 
  • Own escalation process, bringing production issues to resolution via troubleshooting, communication, and subsequent updates. Issues are owned from start to finish and tracked in the enterprise change management ticketing system. The operation center is responsible for gathering troubleshooting information either for direct resolution or for an escalation destination party.
  • Handle stressful situations, such as initiating emergency conference bridge calls and sending quick and accurate outage notifications
  • Use standardized communications for code releases, schedule maintenances and service interruptions
  • Monitor the infrastructure change management policies and procedures
  • Communicate with departments, vendors and partners as a central repository for information regarding production site, customer support, help desk and core systems issues across the entire organization 
  • Deploy/release engineering codes across multiple environments - communicate and apply to staging and production environments all builds/releases, according to standard operating procedures
  • Provide application support for Unix and Windows applications, including performing various system administration tasks and performing standard operating procedures as needed to maintain system health
  • Perform other related duties as required and assigned
  • Demonstrate behaviors which are aligned with the organization’s desired culture and values

Ideal Candidate will have the following:

  • 3+ years of previous operations center or equivalent experience 
  • Must be comfortable working in a command line as well as GUI environments
  • 3+ years of experience running full stack application deployments and infrastructure cloud services (Storage, VMs, Network, etc.)
  • 3+ years of direct experience (running scripts, grepping logs, troubleshooting errors) 
  • 3+ years of direct Windows experience (running scripts, processing event log messages, troubleshooting errors) 
  • 3+ years of direct Vmware Horizon 7 experience 
  • Hands-on experience with backup solution (Commvault, Veeam)
  • Hands-on experience with configuration frameworks for deployment such as AWS CloudFormation Templates, Ansible, Terraform, Chef, Puppet, or Salt
  • Proficiency with at least one scripting languages such as Powershell, Python, Ruby, and JavaScript, and build tools such as Maven, Ant, Gradle or Ivy
  • Solid knowledge of various Linux environments. Distro's include Ubuntu, Centos, Debian, RedHat, and Amazon Linux, as well as familiarity with Windows OS’s.
  • Understand security and encryption technologies. Knowledge of authentication protocols including OpenID, OIDC, OAuth, SAML, and LDAP.
  • Knowledge of Docker containerization and Kubernetes/EKS cluster management for container orchestration
  • Experience with automated monitoring frameworks such as InfluxDB, Grafana, Telegraf, Fluentd, Kapacitor, etc.
  • Experience managing distributed runtime applications using programmatic interfaces
  • Practical experience migrating large-scale infrastructure onto public cloud technology platforms such as AWS, Azure, GCP a plus. 
  • Excellent written and oral communication skills
  • Strong business acumen and ability to interface with executive management
Read Full Job Description
Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.

Location

We’re a national company with Tech hubs in Raleigh, NC; Plano, TX; Phoenix, AZ; Moorpark, CA, and remote workers in many states. Our Tech headquarters in Agoura Hills, CA is just miles away from Malibu Beach, nestled in the quiet hills, with access to excellent restaurants and great hiking trails.

Similar Jobs

Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.
Learn more about PennymacFind similar jobs