Dev Ops Manager
The Role
You will be a part of a small, geographically distributed, cross-functional Scrum team that is massively scaling Grindr’s cutting edge platform infrastructure. This team is committed to delivering the highest systems uptime and operations transparency. As a DevOps Manager at Grindr, you are responsible for “baking in” high scalability and high availability into our infrastructure. You will be charged with architecting, designing, developing, and supporting the most visible Internet-scale infrastructure. Managing the infrastructure powering Grindr at massive scale requires you to master AWS cloud architectures, designing effective high availability network topologies, as well as, having a mastery of hardware and software load balancers. You will apply deep understanding of clustering techniques to SQL and NoSQL storage and design and implement effective disaster recovery strategies. You are an expert in Ops Management tools and you have a deep understanding of change control protocols and procedures. Finally you will apply your deep knowledge of monitoring tools, and NOC operations to manage 24/7 NOC team. You will own the availability and scalability of everything you touch whether someone else wrote it, fixed it or modified it.
Responsibilities
- Lead geographically distributed DevOps and NOC engineering practices including definition of processes, metrics, tools selection and automation.
- Deliver key DevOps and NOC initiatives including planning and team organization and execution.
- Ensure standardized, mature change management and release management processes and automate deployment and maintenance procedures using industry-standard scripting languages.
- Implement automated infrastructure testing.
- Configure and tune an enterprise monitoring and instrumentation system(s) to efficiently detect existing issues and predict future issues based on trends.
- Orchestrate the provisioning, load balancing, dynamic configuration/re-configuration, monitoring and spend optimization of servers across cloud providers, data centers, availability zones.
- Improve the performance, security, redundancy and availability of systems.
- Maintaining database security and controlling permissions.
- Monitoring database performance.
- Configuration Management building blocks using tools such as Chef or Puppet.
- Performance analysis and capacity planning – maintaining performance models on all applications and systems.
- Rapid response, trouble-shoot, and triage of production issues.
- Develop a comprehensive infrastructure policy.
- Apply information security best practices and respond to security events.
- Be very comfortable in the command line, writing scripts, etc.
- Analyze processes, recommend improvements, and write process documentation.
- Ability to work as part of an agile development team.
Skills & Requirements
Required Knowledge, Skills and Abilities
- Bachelor’s Degree in Computer Science.
- Eight (8) years of hands on experience developing highly scalable distributed systems.
- Expert knowledge of Cloud Computing Architectures, Networking Topologies and Clustering techniques.
- Experience using AWS technologies such as EC2,RDS, Cloudwatch, and Elastic Beanstalk.
- Strong scripting and automation skills including hands-on knowledge of Chef.
- Experience with massively scaling Java based components.
- Hands-on knowledge of MongoDB or other NoSQL databases.
- Passionate about technology and enjoys challenges.
- Able to contribute and work independently on a small team.