Director, Site Reliability Engineering
The Director, Site Reliability Engineering is at the core of preparing for the continued exponential growth of Blackline's SaaS product and other programs in the Technical Operations group that powers all of Blackline. This role plays a pivotal role in ensuring that Blackline's services/infrastructure are carefully planned and deployed in a time, place, and configuration which is ideal for serving BL's users. Your role sits at a nexus of capacity planning, technical project execution, product planning, business analysis, site reliability, and software engineering.
You must be equally at home explaining analyses and project recommendations to executives as you are discussing the technical merits of next-generation architectures with engineers, or building tools to automate and scale their impact.
- Improves the BlackLine SaaS service experience by discovering and highlighting optimization opportunities with existing code to address application availability, performance, observability, efficiency, and security challenges.
- Develops tools and systems to automate the identification, analysis, and remediation of application events, infrastructure issues, or requests.
- Ability to commit to road maps with capabilities and timelines, with a proven track record for timely, cost-effective and quality deliverables
- Advocates for change across the organization. Ensures the implementation of change with appropriate communications, goals, resources, metrics, and reviews.
- Partners with internal organization and vendors to develop multi-year roadmaps influencing the direction and evolution of the operating environment and support protocols
- Establish and maintain Key Performance Indicators for the overall health of the service and build tools to exercise and evaluate if these KPI's are being met.
- Leads cross-functionally with other teams to surface common pain points, architect solutions, establish conventions, and evangelize application development and operations best practices.
- Transform discoveries into requests to others or action items for you and your team.
- Regularly learn new systems and tools as the BlackLine platform and ecosystem evolves.
- Own and evolve the BlackLine trust site to include real time availability and performance information.
- Passion for pioneering engineering in an industry ripe for transformation.
- Empathy for working with support teams to identify and remedy pain points.
- Expertise in reliable and repeatable web application deployment and architecture.
- Someone energized by a fast-paced, iterative approach.
- An ability to balance the urgent needs along with long term strategy.
- Strong ownership, pride of work, and ability to take things across the finish line. Someone can see around corners and who finishes well.
- Of particular interest is a specialty in one or more of the following: Single-page web apps, API integrations, monitoring/alerting, cloud infrastructure management, distributed systems, cloud networking, or application security.
- Hands-on problem-solving skills, technical leadership and mentoring qualities.
- Strong written and oral communication skills.
- Manage end-to-end availability and performance of mission critical services and build automation to prevent problem recurrence.
- Lead by example, care for your team, and establish credibility with the quality of the teams' technical execution.
- Manage on-call rotations across continents, using a follow-the-sun model.
- Design, write and deliver software to improve the availability, scalability, latency, and efficiency of Blackline's services.
- Cross-system and full-stack architecture experience and awareness.
- Ability to communicate well with both business owners, Executives and technical staff, at the appropriate levels.
- Proven ability to work across a wide organization to achieve results based on technical vision.
- Strong intra team and cross functional collaboration skills.
- Strong quantitative and qualitative reasoning skills.
- Strong interpersonal, presentation and communication skills.
- Strong organizational skills and detail oriented.
- 10+ years industry experience and 5+ years in a managerial role
- A minimum of three years of experience leading a 24x7 operations organization
- Bachelor's degree in Computer Science or related discipline or equivalent experience.
- Ability to communicate well with both business owners and technical staff, at the appropriate level for both.
- Prior C#, ASP.NET, Ruby, Go or Java development experience, preferably in an agile SaaS environment.
- Significant experience with open source platforms and technologies.
- Experience in recruiting and managing a team of experienced Engineers on large scale projects.
- Experience in problem solving and analyzing global scale distributed systems.
- Proficiency in algorithms, data structures, complexity analysis and software design and/or expertise in Unix/Windows systems, IP networking, performance and application issues.
- Capable of technical deep-dives into code, networking, operating systems and storage, yet verbally and cognitively agile enough to hold your own in a strategy discussion with leadership team
- Experience with software development processes and methodologies.
- Track record of architecting, developing, implementing robust, distributed online solutions.
- ITIL knowledge is a plus.