Engineering Manager, Network Operations at Pluto TV (Greater LA Area, CA)
*This is a 100% Remote job*
Pluto TV, a ViacomCBS company, is the leading free streaming television service in America, delivering 250+ live and original channels and thousands of on-demand movies in partnership with major TV networks, movie studios, publishers, and digital media companies. Pluto TV is available on all mobile, web and connected TV streaming devices and millions of viewers tune in each month to watch premium news, TV shows, movies, sports, lifestyle, and trending digital series. Headquartered in West Hollywood, Pluto TV has offices in New York, Silicon Valley, Chicago and Berlin.
The Manager, NOC for the Pluto TV Team is responsible for leading the incident response to any issues that may impact our quality of service, whether related to infrastructure, code deploys, partner outages and other related issues.
This is a critical role with a wide range of responsibilities, including:
- Lead the lifecycle process for incident response, mitigation, escalation, analysis and reporting.
- Create the necessary processes and documentation for responding to early alerts and detection of common patterns of issues that may escalate into larger issues.
- Working with Engineering and Operations teams to resolve any active incidents as well as proactively mitigate future platform issues.
- Leverage our observability platforms to detect variance/outliers in our key systems and work with engineering teams and the Production Operations team to proactively avoid incidents and reduce MTBF and MTTR.
- Measure everything, establish and publish relevant site/service metrics and alerting (SLA/SLO).
- Review and improve existing processes and operational runbooks for the NOC team.
- Assist with FinOps (reporting, visibility and optimization of cloud costs/spend).
Qualities / Experience We’re Seeking
We believe the right individual will have the following skills and experience in order to be successful in the role:
- Effective, clear communicator, able to articulate across all levels of the organization.
- 6+ years of supervisory experience leading a NOC or Incident Response command center.
- Strong incident management or ITIL background with increasing responsibility.
- 2+ years of experience with configuration management tools such as Terraform (preferred), Cloudformation, Ansible, Chef, or Puppet.
- 2+ years of DevOps experience for large scale AWS services including EKS, EC2, VPC, S3, Lambda, Cloudwatch.
- Ideal candidate will have experience with Vault and Elasticsearch/Kibana.
- Monitoring experience using Prometheus and Grafana is a plus.
Paramount is an equal opportunity employer (EOE) including disability/vet.
At Paramount, the spirit of inclusion feeds into everything that we do, on-screen and off. From the programming and movies we create to employee benefits/programs and social impact outreach initiatives, we believe that opportunity, access, resources and rewards should be available to and for the benefit of all. Paramount is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ethnicity, ancestry, religion, creed, sex, national origin, sexual orientation, age, citizenship status, marital status, disability, gender identity, gender expression, and Veteran status.
If you are a qualified individual with a disability or a disabled veteran, you may request a reasonable accommodation if you are unable or limited in your ability to use or access. https://www.paramount.com/careers as a result of your disability. You can request reasonable accommodations by calling 212.846.5500 or by sending an email to [email protected] Only messages left for this purpose will be returned.