As Communications Service Providers (CSPs) manage the shift to 5G and the increasing virtualization of network resources, new AIOps-based solutions are required to manage the increasing complex operations of their networks.
Typically, network operations engineers spend most of their time on manually triaging low impact tickets. Due to the complexity of modern telecom networks, they also often rely on varying levels of individual knowledge and undocumented Standard Operating Procedures (SOPs) to solve them, resulting in a number of network incidents, leading to incorrect resolutions.
With AIOps-based solutions, operations engineers could cut down thousands of incoming events and trouble tickets each day and systematically track accuracy of root cause analysis. They could, in fact, spend their time focusing only on the most critical network and service events or invest in upskilling for more complex engineering roles. Meanwhile, CSPs get more granular visibility into the state of their operations, eliminate silos, and reduce operating expenditure. In addition, a reporting dashboard view could provide the Operations Centre (Network Operations Centre and Service Operations Centre) managers with visibility of the network status, frequency of events, time to resolution, and performance over time.
AI/ML-based solutions for CSPs can identify most common root causes and automatically solve issues. These solutions could consist of three modules:
- Network Root Cause Analysis (RCA)
- Service RCA
- Auto-remediations
While Network RCA ingests network data and automates tasks like event ticketing, diagnosis, and prioritisation, the service RCA can enrich these with data available from existing Service Quality Management (SQM) layers. The auto-remediation recommendations can intelligently analyse previous incident resolutions to augment the knowledge base of operations engineers (who manually process all tasks currently) and speed up the resolution of incidents.
Underpinned by AI/ML, the solutions can help the CSP in using pattern analysis of historical events across data sources and closed-loop feedback to provide accurate, unbiased root cause analysis with either automated or user-actioned remediations. Features, such as automatic suggestion of root causes, auto-generation of trouble tickets for most common network issues and customisable alarm prioritisation can help the operations engineers to work on the most impactful events first.
Operations engineers equipped with auto-remediation could automatically resolve issues where there is a high level of confidence. Alternatively, the user could be presented with multiple resolution options based on historical data and similar cases, which they can choose from, based on their judgement.
It would also help engineers at the Operations Centre to build a library of use cases (which typically cover KPIs, algorithms and workflows), and can be used by multiple operations teams. This will add to the accuracy and agility in resolving problems.