Senior Site Reliabiltiy Engineer
Sorry, this job was removed at 11:18 a.m. (PST) on Tuesday, March 24, 2020
By clicking Apply Now you agree to share your profile information with the hiring company.
Responsiblities:
- Create and maintain a continuous testing framework that observes and records and trends real-time availability data for all of our clients
- Develop and maintain on premise and cloud capacity plans that ensure we are delivering a BlackLine service that is performant and cost effective
- Collaborate with development and other technology teams on requirements definition, capacity planning, and process refinement
- Collaborate with development of tools and systems to automate the identification, analysis, and remediation of application events, infrastructure issues, or requests
- Establish and maintain Key Performance Indicators for the overall health of the service and build tools to exercise and evaluate if these KPIs are being met
- Works cross-functionally to troubleshoot surface common pain points, find solutions, and establish conventions
- Serve as technical lead for medium to large projects
- Regularly learn new systems and tools as the BlackLine platform and ecosystem evolves.
- Mentor and train junior engineers to develop knowledge, skills, and personal qualities to solve real-life problems in a bleeding-edge, high-performance, and high-traffic environment
- Assessing, testing, tracking, predicting, and reporting all related performance aspects of a suite of production applications from a performance, responsiveness, capacity, and availability perspective
- Publish performance result findings, conclusions, recommendations
- Support integration of performance data into customer experience analytics tools and reporting
- Participate in our on-call rotation and conduct incident reviews
- Other duties as assigned
Qualifications:
- BS or MS in Computer Science (or equivalent diploma and/or certifications) with 5+ years' related experience
- Advanced knowledge of at least one or intermediate knowledge of two of the following programming languages: C#, Visual Basic, PowerShell, Java, Go, Linux Shell, Ruby
- Demonstrated history of developing or operating production web applications and solid understanding of HTTP(S), HTML, JavaScript, CSS, and XML
- Significant Knowledge of software development best practices and SDLC
- Baseline understanding of project management process/procedures with experience: agile and waterfall. Experience managing one or more small to medium projects.
- Experience deploying high availability systems and software
- Experience with troubleshooting distributed web applications in a production environment.
- Advanced level knowledge of IIS and Windows Server or Linux and Apache
- Experience with infrastructure as a code and platform as a service
- Experience with configuration management tools Ex Chef, Ansible, Puppet
- Must possess the ability to handle multiple goals concurrently and function in a fast-paced, demanding, ever changing high-growth environment
- Must maintain the highest level of integrity, courtesy and respect while interacting with internal customers, employees and business contacts
- Ability to effectively communicate (oral and written) in all business relationship and various levels of management in a clear, direct manner
- Ability to interface with internal technical experts using professional interpersonal skills
- Experience in analyzing datasets to draw conclusions, and graph datasets supporting these conclusions
- Intermediate level proficiency in application load balancing methods (F5 LTM, Windows NLB, etc.)
- Working knowledge of TCP/IP and networking concepts
- Proficiency with statistical concepts; confidence interval, hypothesis testing, sampling
- Operating systems concepts such as CPU, memory, disk queues and graphing/analyzing these over time
- Must possess strong organizational skills and be able to work with minimal oversight
- Ability to understand new technologies quickly and adapt these into daily work and goals
See More