Simplify Operations for the Telco Cloud

As CSPs undergo digital transformation – which means offering digital services through a Telco Cloud environment – they face a whole set of operational challenges, including operating a new virtualized network, focusing on customer experience more than before and providing high reliability and availability for upcoming IoT services. Simplifying the operations can help in tackling these challenges.

The complexity of the operations in a Telco Cloud owes to the following:

  • The high speed at which digital services will be deployed: Digital services require real-time dynamic deployment, adaptation and customization. Automation of many Operations Center processes, including monitoring, orchestration, feedback, audit and messaging, are needed to support this
  • Running a hybrid network: part virtualized, part physical: Since the process of virtualization will take 3-4 years to stabilize, extra vigilance is required as new nodes/VNFs are added/removed. Seamless operations will require systems that quickly adapt to the network changes
  • Dynamic services need constant and consistent management: Policy-based management is required for constant and rapid management, leading to automated simplified configuration in a virtualized environment
  • Additional attention to IoT operations: IoT traffic is expected to run on highly reliable and error-free networks, which drive expectations or objectives for the IoT network, service and devices to have minimum failures. Every new piece of equipment, software and device will bring its own failure points and requires upping of the fault management to ensure reduction in the number of faults
  • Impact on life-critical or mission-critical communication: In a hyper-connected world, failed devices or connections might not only breach SLAs with massive penalties, but, more importantly, impact lives. Although complex mesh topologies with high availability and redundancy will serve to minimize failures, they still require a highly efficient system to discover, interpret and manage the faults
  • Operation Centers need to be more proactive and predictive: This comes from the need to minimize performance degradations, prevent failures and eliminate critical customer-impacting problems

Integration and consolidation of OSS components is the first step towards simplification of the Telco Cloud operations. Automation –including machine learning– is the next.

  • Integration and consolidation: The introduction of NFV with network functions and services hosted on common resources inherently helps to achieve the required integration to an extent. Open REST APIs also help in connecting the OSS layers. Finally, hosting of OSS functionalities (analytics, automation and SQM) in the cloud can also accelerate the integration of the required functionalities of the Operation Center. Introducing topology-based root cause analyses integrates services with the underlying network, closing the remediation loop
  • Automation: Automating the Operation Center means encapsulating the best practices for standard operating procedures and using machine learning to derive or improve them. This frees up resources by automating and orchestrating complex processes across multiple domains and functions. Not only does it reduce human error and increase employee productivity, but it also greatly simplifies complex operations involving a large number of processes. The simplification benefits can be reaped by various functions, including planning, optimization and business teams. The highest level of automation would lead to the desired zero-touch Operation Center. Building a zero-touch Operation Center for the Telco Cloud will require the following key steps:
    • Automating critical OSS actions
    • Exploiting machine learning for efficiency
    • Self-healing and optimization by feedback loop

Here are some suggested use cases for the simplified (Integrated and Automated) zero-touch Operations Center:

  • QoS-driven orchestration in hybrid networks: Using integrated performance and fault data on network/services, QoS policies can be derived and operated to orchestrate both physical and virtualized (hybrid) networks. This requires an integrated SQM/automation/orchestration system
  • Management of end-to-end IoT: Managing IoT traffic by using analytics to forecast patterns and prevent IoT network, service and device failures. This includes building dashboards for service availability, incident and unavailability breakdown by location and geolocation-based service impact
  • Prediction of SLA breaches: Machine learning, when integrated with analytics based on performance/fault data, offers powerful predictive management capability to anticipate problems and helps in protecting customer SLAs
  • Service impact analysis and root cause analysis: With SQM integrated with fault data, faster service impact visualization is possible for the Telco Cloud. Also by automating root cause analysis problems can be quickly identified to reduce mean time to repair
  • Automating outage recoveries: By automating fault management, network outage recoveries can be accelerated. Additionally, by integrating fault management with the OSS ecosystem (Trouble-Ticket, Inventory, Orchestrators, SQM, CRM, Work Force Management, etc.) problems are reported and solved much faster

A simplified zero-touch Operations Center provides many benefits. However, it does require drastic changes in the way OSS components integrate and interact with each other and how network/service data is visualized and actioned in the Operation Center. Introducing analytics, machine learning, messaging bots, automated RCA and orchestration will simplify the operational complexities of the hyper-converged network and its services.