Assuring the Telco Cloud

As CSPs undergo digital transformation – which means running their business out of a cloud, selling digital services and operating like web-scale internet companies – assuring the Telco Cloud business will take high priority. With networks virtualizing, services digitalizing and IoT looming large, the business risks are much higher than envisaged. Assurance of the Telco Cloud network and services will be key in assuring the new Telco Cloud business.

The Telco Cloud is defined as a virtualized telecom infrastructure to run digital services and agile operations. Accuracy, speed and error-free operations of the Telco Cloud are critical to the success of a digital business. Clever solutions, which derive and offer customer intelligence in addition to service assurance, will play a critical role in the success of the Telco Cloud business.

Certain OSS concepts will need revisiting to make them relevant to the Telco Cloud unknowns. These are Service Quality Management, creating a faultless Telco Cloud and enabling the digital service provider to be ‘an intelligent platform’.

Service Quality Management for the Telco Cloud

Service Quality Management (SQM) is not a new concept; however, it will be an important one in the coming years. Current SQM focuses on proactive monitoring of customer-facing services, which have not always required reliable, secure, fast and always-available networks. However, with the anticipated increasing rollout of Telco Cloud services in 2017, the current functionality of an SQM system will be stretched to cover the higher speed and scale of a digital services environment.

NFV, the underlying technology of Telco Cloud, has partly evolved as a consequence of the growing appetite of the consumers for faster, on-demand and reliable services. Some of the most popular digital services in the new networks will be video streaming, telemetry, mobile gaming and home automation.

In addition, NFV is associated with demanding SLAs between the Service Provider and its customers. With VoLTE, ViLTE and other advanced communication services launched as digital services, high levels of corporate SLAs will be required to compete with the slick services offered by the OTT providers. The SLA situation worsens with IoT, where inter-communicating sensored devices become the new ‘customers’ and may make high demands on reliability and availability, if they are ‘mission-critical’ connections like autonomous cars or remote surgery.

The importance of SQM in the Telco Cloud can be assigned to the following key reasons:

  • Since virtualization/Telco Cloud promises higher agility in creation, delivery, alteration and retiring of services, it gives rise to a proportional need for agility in managing and maintaining QoS. The iterative and continuous deployment and tearing down of services expects the SQM systems to monitor high-revenue short-life services which might only last from a few days to a few hours
  • The virtualization of network functions introduces elasticity and dynamicity of network resources. The dynamic adjustments to network element capacities (scale-up and scale-down, topology configuration, redirecting of traffic routes, etc.) have an immediate impact on the services offered. The SQM needs to respond to such changes much faster now
  • In the inevitable hybrid – physical and virtualized – networks that must exist, digital services will be delivered over both parts. Therefore, the Service Quality Management system should act as an overarching ‘Manager of managers’ for uniform, unbiased reporting and orchestration

In its digital avatar, SQM helps CSPs to address the new service challenges posed by the Telco Cloud.

A Faultless Telco Cloud

The cloud-based digital services are expected to run on highly reliable and error-free networks. Digital services require real-time dynamic adaptation and customization of the communication network, which drive expectations or objectives for network/service/device failures to be reduced to a minimum.

Moreover, in an IoT connected world, failed devices or connections might not only breach SLAs with massive penalties but, more importantly, they might impact life-critical or mission-critical communication. Although complex mesh topologies with high availability and inbuilt redundancy will reduce the impact of such failures, they still require a system to discover, interpret and manage the faults.

With the network disruption induced by NFV and IoT, every new equipment, software and device will bring its own failure points. For this, the traditional network/device fault management will need to be raised to the next level.

Other than the mentioned technology turn (NFV and IoT), the revamping of fault management is necessitated by the demand for higher speed of service delivery and problem resolution. Monitoring and assessing the impact of failures on the new network elements and user devices is critical, especially when services are time-critical and, in many cases, life-critical too.

This justifies the evolution of current NOC/SOCs to a zero-touch Operations Center, where extensive automation will speed up the reporting, fault-finding and remediation. By feeding fault data to Service Quality Management systems, CSPs can instantaneously understand the impact of faults on services and, with the use of predictive algorithms, prevent faults from occurring.

Many use cases can be served through a highly automated, predictive fault management system:

  • Telco Cloud orchestration: This uses fault data in SQM system, which highlights policy violations, followed by Automation/Orchestration across physical as well as virtualized networks
  • Predicting IoT failures: It requires managing IoT traffic by using analytics on top of fault data, to forecast patterns and prevent IoT network/service/device failures. This includes building dashboards for service availability, incident/unavailability breakdown by region/location and also geolocation-based service impact
  • Protecting SLAs: When integrated with fault data, machine learning offers powerful predictive capability to anticipate problems and helps CSPs in protecting their customer SLAs
  • Service impact: With SQM based on faults, faster service impact visualization is possible for the hybrid, NFV and IoT networks
  • Zero-touch Operations Center: Automating network outage recoveries, device configuration and integrating fault management with OSS ecosystem (Trouble-Ticket, Inventory, Orchestrators, SQM, CRM, Work Force Management, etc.) will lead to an automated, zero-touch Operations Center

Automation of Operations Center processes is key to achieving success in the virtualized and digitalized Telco Cloud environment. CSPs are working towards realizing a fully automated, zero-touch Operations Center using closed-loop corrective actions, complex algorithms and machine learning. And to support the dynamic SLAs of the Telco Cloud, the OSS is expected to support on-demand capacity configuration and dynamic topology changes, which can happen only through automated real-time network feedback and automatic configurations.

Analytics to evolve to an ‘Intelligent Platform’

CSPs are ready to shake off the label of being the ‘Dumb Pipe’, through the use of sophisticated analytics of the massive and valuable data traversing their networks. As digital service providers, they are looking at monetizing customer behavioral data as well as connectivity, as they aggressively launch new digital services to challenge the growing popularity of OTT services.

Analytics deliver trends on performance, capacity and faults using machine-learning tools. But more than the operational benefits of analytics, they provide critical intelligence which can be used for network monetization and service personalization, by understanding the usage of the Telco Cloud, the services it offers, its customers and devices. As an example, CSPs can proactively identify low-congestion zones/locations (Free Zones) and rapidly fill spare capacity with revenue-generating traffic from new service offers such as video streaming, mobile TV or smartphone apps, contextualized by location, time and customer need.

In addition, with CSPs extending their business to become IoT service providers, machine-learning based analytics will be popular to manipulate Big Data and generate critical business intelligence for each of the IoT industry verticals.

Underlying Technologies to make Telco Cloud Management successful

The next generation virtualization/Telco Cloud promises the creation and deployment of new services in shorter time periods, down from a few months to a few days. To respond to this need, CTIOs are now developing new architectures for service assurance, of which SQM, automation and analytics form a key component. The architectures are based on open APIs, Big Data clustering and OpenStack capabilities.

Other than the introduction of these new technologies to the underlying platforms, it is important to develop a micro-services architecture, which uses DevOps-enabled iterative processes to quickly respond to customer needs by developing services faster. This is how the customer expectation of using new features every week or every few days will be realized. This also helps in conducting root cause analysis faster and resolving customer issues quickly.

An integrated approach of analytics, automation and SQM requires some drastic changes in the way data is churned, visualized and actioned. For a successful launch of the Telco Cloud, long-term assurance of digital services and the creation of business value out of data, it is critical to re-define features of Service Quality Management, zero-touch predictive Operation Centers and analytics for data monetization.