Site Reliability Engineer
Responsibilities:
- Create and maintain a continuous testing framework that observes and records and trends real time availability data for all of our clients
- Develop and maintain on premise and cloud capacity plans that ensure we are delivering a BlackLine service that is performant and cost effective
- Collaborate with development and other technology teams on requirements definition, capacity planning, and process refinement
- Improve the BlackLine SaaS service experience by discovering and highlighting optimization opportunities with existing code to address application availability, performance, observability, efficiency, and security challenges.
- Develop tools and systems to automate the identification, analysis, and remediation of application events, infrastructure issues, or requests.
- Establish and maintain Key Performance Indicators for the overall health of the service and build tools to exercise and evaluate if these KPI's are being met.
- Works cross-functionally with other teams to surface common pain points, architect solutions, establish conventions, and evangelize application development and operations best practices.
- Transform discoveries into requests to others or action items for you and your team.
- Regularly learn new systems and tools as the BlackLine platform and ecosystem evolves.
- Own and evolve the BlackLine trust site to include real time availability and performance information
- Contribute knowledge, skills, and personal qualities to a dedicated team of top engineers solving real-life problems in a bleeding-edge, high-performance, and high-traffic environment.
- Assessing, testing, tracking, predicting, and reporting all related performance aspects of a suite of production applications from a performance, responsiveness, capacity, and availability perspective.
- Publish performance result findings, conclusions, recommendations
- Create second tier level analysis of capacity constraint points and performance and discuss with development teams/infrastructure
- Support integration of performance data into customer experience analytics tools and reporting
- Ensure application and infrastructure capacity management efforts have verifiable capacity data to support business cases
- Monitor industry trends and keep abreast of new tools and technologies.
- Participate in our on-call rotation and conduct incident reviews
- Other duties as assigned
Requirements:
- BS or MS in Computer Science (or equivalent diploma and/or certifications) with 3-5 years related experience.
- Intermediate to advanced knowledge of at least one of the following programming languages: C#, Visual Basic, PowerShell, Java, Go, Linux Shell, Ruby.
- Demonstrated history of developing or operating production web applications and solid understanding of HTTP(S), HTML, JavaScript, CSS, and XML.
- Knowledge of software development best practices and SDLC.
- Experience deploying high availability systems and software.
- Experience with troubleshooting distributed web applications in a production environment.
- Intermediate level knowledge of IIS and Windows Server or Linux and Apache.
- Experience with infrastructure as a code and platform as a service.
- Experience with configuration management tools Ex Chef, Ansible, Puppet.
- Must possess the ability to handle multiple goals concurrently and function in a fast-paced, demanding, ever changing high growth environment
- Must maintain the highest level of integrity, courtesy and respect while interacting with internal customers, employees and business contacts
- Excellent oral and written communication skills
- Ability to interface with internal technical experts using professional interpersonal skills
- Experience in analyzing datasets to draw conclusions, and graph datasets supporting these conclusions
- Exhibit creative problem-solving, logical troubleshooting and analytical skills
- Basic level proficiency in application load balancing methods (F5 LTM, Windows NLB, etc.)
- Working knowledge of TCP/IP and networking concepts
- Proficiency with statistical concepts; confidence interval, hypothesis testing, sampling
- Operating systems concepts such as CPU, memory, disk queues and graphing/analyzing these over time
- Must possess strong organizational skills and be able to work with minimal oversight
- Ability to understand new technologies quickly and adapt these into daily work and goals
Preferred Requirements:
- Prior C#, ASP.NET, Ruby, Go or Java development experience, preferably in an agile SaaS environment
- Significant experience with open source platforms and technologies.
- Experience with software development processes and methodologies
- Track record of architecting, developing, implementing robust, distributed online solutions