The Rogers outage, which left 12 million people without wireless and wireline service in 2022, was caused by human error and exacerbated by management and systems “deficiencies,” an independent investigation conducted for Canada’s telecommunications regulator said.
The investigation report also stated that the steps Rogers has taken since the outage “are sufficient to improve the resiliency and reliability of Rogers’ network and address the underlying causes of the July 2022 outage.”
The 26-hour blackout began in the early morning hours of July 8, leaving individuals and businesses without access to cellphones, home phones, the internet and 911 services.
The Canadian Broadcasting and Telecommunications Commission (CRTC) commissioned Xona Partners in September 2023 to investigate and identify the cause of the outage.
An engineering consultancy has also been tasked with investigating whether measures taken by Rogers since the blackout are sufficient to prevent further accidents.
Xona Partners’ findings: Summary of review reportwas released this month. The CRTC has said the full report contains confidential information and will be released in redacted form at a later date, but that date has not yet been determined.
According to a summary of the report, in the weeks before the outage, Rogers had been going through a seven-step process to upgrade its network. The outage occurred during the sixth step of the upgrade.
“The July 2022 outage was caused by a configuration error in a distribution router within Rogers’ IP network,” the report said.
The report said the shutdown was triggered when Rogers staff removed a control filter that routes information to the proper destination.
With no filters in place, a flood of information was sent into Rogers’ core network, overloading the system and causing it to crash within minutes of the control filters being removed.
Algorithm identifies network upgrade as “low” risk
According to the report, Rogers’ core network manages wireless and wired data for internal and external customers and service providers.
“Because both wireless and wireline networks share a common IP core network, the scope of the outage was extreme, resulting in catastrophic loss of all services,” the report said.
The report said that wireless and wireline services sharing the same network is a practice “common among many service providers” and that companies see it as an efficient way to “balance cost and performance.”
Rogers later said it would develop a new, separate network for the wireless system while maintaining wired services over the old core network, with work underway, according to the report.
The review said the first five stages of the network update had been carried out without issue, leading to a “risk assessment algorithm lowering the risk level for the sixth stage of the upgrade.”
Designating Stage 6 as a “low” risk allowed Rogers staff to avoid additional levels of scrutiny and approval as the upgrades proceeded, but doing so “violates industry norms,” the report said.
Rogers says it has introduced a new risk assessment algorithm to address the issue.
The summary of Xona Partners’ review also said that “the network outage could have been prevented” if Rogers had had an “overload prevention mechanism” to limit the amount of information entering its core network.
The review recommends that all Canadian communications networks implement overload protection mechanisms in their core networks.
Network Restoration Challenges
After the systems went down, the main problem hampering Rogers’ efforts to restore them was the company’s inability to communicate properly.
When the core network went down, remote employees were unable to access Rogers’ systems or use the internet, and were unable to get online using other service providers, according to the report.
“Rogers had to dispatch staff to remote locations to gain physical access to the affected routers, delaying efforts to restore the network,” the report said.
Since then, all members of Rogers’ incident response and crisis management team have been provided with backup third-party internet access to “maintain communications capability in the event of a power outage.”
The investigation also said Rogers staff did not have access to critical error logs detailing the root cause of the outage until 14 hours after it began, “adversely impacting recovery efforts.”
John Lawford, executive director of Ottawa’s Public Interest Advocacy Centre, has called on Rogers and the CRTC to be more transparent about the blackouts.
He criticised the CRTC for taking two years to submit its report on the blackouts, calling it a “cover-up that allowed both the CRTC and Rogers to avoid responsibility”.
“The report alleges that Mr Rogers put things right, but I don’t have enough evidence to back that up,” Lawford said. “It’s just the opinion of certain experts.”
Rogers declined a CBC News request for an interview.
A Rogers spokesman said in a statement that the company “remains focused on providing the most reliable network experience so Canadians can stay connected anytime, anywhere.”
A spokesperson said: August 2023 report from analytics firm umaultDuring the study period, Rogers was found to have the most reliable wireless network in Canada.
“We have completed a comprehensive review of our network, strengthened the resilience of our network and implemented all of the recommendations in this report, and we will continue to invest to ensure Canadians enjoy the best network in the world,” the spokesperson said.
In a letter to RogersThe CRTC said the company “confirmed its implementation of all measures recommended by Zona Partners.”
A spokesperson for Minister of Innovation, Science and Industry François-Philippe Champagne told CBC News that Rogers has addressed all of the report’s recommendations and continues to invest in making its network more resilient.