Impact Start Time (UTC): 2026-05-07 11:55:00
Impact End Time (UTC): 2026-05-07 13:51:00
Incident Summary:
On 2026-05-07, some NiCE CXone Mpower customers in the EU region experienced slowness when accessing sites, while others were unable to access the platform entirely with a "504 Gateway Timeout" error within the CXone Mpower Expert knowledge portal. The service degradation was caused by increased traffic volumes combined with performance limitations in certain backend processes. The impact was resolved after scaling up pod resources and restarting the proxy pods, which restored platform stability.
Root Cause:
The service degradation was caused by increased traffic volumes combined with performance limitations in certain backend processes, impacting the EU regional platform.
Under elevated traffic conditions, including automated crawler activity, some requests followed less optimized processing paths, increasing system load. This was further amplified by legacy or complex page content requiring more intensive processing.
While scaling actions helped restore capacity, they also introduced temporary overhead that contributed to intermittent performance degradation. Additionally, although autoscaling functioned as designed, it reached its limits and was insufficient to address constraints related to per-pod Central Processing Unit (CPU) capacity.
Overall, evolving traffic patterns exposed underlying performance limitations, highlighting the need for targeted code optimizations and increased per-service capacity.
Corrective Actions:
Detection
Remediation
Prevention
Incident Timeline (UTC):
2026-05-07 11:55 - The first customer case opened, and Tech Support (TS) engineers began the troubleshooting investigation
2026-05-07 11:56 - TS engineers notified the Network Operations Center (NOC) engineers about the reported customer impact; a major incident was proposed and confirmed
2026-05-07 12:09 - Engineers identified a suspected cause and increased the resources of the web pods to improve system performance
2026-05-07 12:18 - Engineers also scaled up resources for the Application Programming Interface (API) pods to further stabilize performance
2026-05-07 12:28 - Peak of 504 Gateway Timeout errors observed across EU sites
2026-05-07 13:18 - The platform continued to catch-up and engineers were already seeing improvements in system performance
2026-05-07 13:42 - Engineers restarted proxy pods, resulting in continued performance improvements while monitoring system stability
2026-05-07 13:48 - Platform performance returned to normal levels, with continued validation and monitoring underway
2026-05-07 13:51 - The platform stabilized fully. The impact was resolved following resource scaling, and after successful validation, the major incident was marked as resolved