Please note that this position is a fixed term 6 month contract
Discovery Networks International is the world's #1 nonfiction media company reaching more than 1.5 billion cumulative subscribers in 210 countries and territories. Discovery is dedicated to satisfying curiosity through 130-plus worldwide television networks, led by Discovery Channel, TLC, Animal Planet and Eurosport, the leading pan-regional sports entertainment destination across Europe and Asia-Pacific.
Discovery Inc is a global leader in real life entertainment and with the recent launch of our new global OTT streaming service, Discovery+, we are now able to aggregate content from our massive library and distribute live and on-demand content directly to consumers on a global scale, whether it be real life entertainment or sports. This is on top of our other OTT brands, Eurosport, Food Network Kitchen, GolfTV, Global Cycling Network and MotorTrend On-Demand.
Reporting to a Digital Operations Manager, the analyst will provide Tier 1 and 2 operational support for all issues arising from Discovery’s OTT platforms including live events, back-end infrastructure, content supply/distribution chains and customer service escalations. While there are a variety of technical skill sets involved in supporting a direct-to-consumer platform, the core principle is Incident and problem management to ensure operational excellence and continual improvement.
Due to the operations center being 24/7/365 and multiple high priority live events throughout the year, there will be varied working hours, including weekends and nights on a rotating and ad-hoc basis.
The centers are also the point of contact and owners for an IT Major Incident Incidents and as an Analyst are responsible for initiating IT Major Incident process and procedures in our Platform Infrastructure.
- Monitor Discovery’s digital platforms and video streams in real-time
- Provide swift and accurate incident classification and analysis as per agreed SLA for all issues being the first point of contact and create and update incidents on the company’s incident management system until resolution.
- Perform necessary escalation to internal and external support teams utilizing prescribed runbooks for analysis.
- Initiate major incident coordination and stakeholder communications with continual update as per agreed SLAs.
- Acknowledge and co-ordinate all streaming, VOD and infra issues from both internal and external (vendors and affiliates) sources effectively until resolution.
- Provide dedicated monitoring of end-to-end infrastructure, content supply chain and streaming output for live events which includes both automated alerting and eye-on-glass monitoring during key events (Highly publicized event, infrastructure changes and onboarding)
- Continually contribute to maintaining updated runbooks and documentation.
- Continually identify, adjust and help establish new automated monitoring and alerting metrics.
- Work in conjunction with various live operations, content editorial teams, platform/infra devops, product teams and customer services to establish and improve all operational processes on a continual basis.
- Perform urgent fixes for assets and feeds (live and VOD) on both origin database, CDN distribution configs as well as metadata changes when escalated by other operational teams and customer services.
- Collate streaming and incident metrics on a daily basis for daily reporting to internal stakeholders.
- Assist QA, DevOps, client product and other teams with testing and monitoring when required.
Skills & Experience
- Working knowledge of end-to-end OTT platforms including live and VOD encoding & transcoding workflow, video manifest formats, CDN distribution workflow, content/metadata ingestion workflows into CMS, databases and client/user journey workflows (Authn/Authz)
- In-depth knowledge of API functionalities within an OTT tech stack
- Knowledge of AWS applications - Cloudwatch metrics for analysis, Media Services and support case creation.
- Working experience with Atlassian applications - JIRA and Confluence.
- Working experience with Incident Management applications such as ServiceNow, PagerDuty and JIRA.
- Strong working experience with event and logs management tools such as Grafana, Splunk, PagerDuty, AWS Cloudwatch, GCP, QoE (MUX/Conviva), New Relic, DataDog and Prometheus. Must be able to demonstrate issue/pattern identification and analysis skills from a set of metrics based on affected service components.
- Junior QA automation experience is desirable. Must at least have a good understanding of product release & deployment process as well as triaging issues and analysing logs from client platforms/devices.
- Must be able to represent ops manager/team on a calls with stakeholders/teams as well as co-ordinate ad-hoc calls and processes with internal operational teams
- Problem determination and analytical skills.
- Strong written communication skills with the ability to explain technical concepts in a clear and easy to understand manner as this is a critical part of stakeholder communication.
- Ability to provide Total Call Ownership to include handling customers, escalating issues as appropriate and providing the necessary follow up before incidents are closed
- Advanced knowledge of administrative applications such as Excel, Word and Slides or Google equivalent apps.
- Proven ability to work in a high-pressure environment. Should thrive in a dynamic and fast-paced environment, being able to meet short deadlines.