Sparked Host LLC - Outbound Connection Failures – Incident details

All systems operational

Outbound Connection Failures

Resolved
Degraded performance
Started 21 days agoLasted 5 days

Affected

Locations

Degraded performance from 8:40 PM to 3:31 PM

Miami

Degraded performance from 8:40 PM to 3:31 PM

Updates
  • Resolved
    Resolved

    We've waited about a full day after stable connectiivity has been established.

    The following is a (modified for public release) post-mortem from our DDoS Mitigation provider:

    An issue emerged, affecting outgoing connections in the 216.173.77.0/24 range. The problem was traced to our protection algorithm, which incorrectly flagged legitimate TCP SYN-ACK traffic as an attack. Initially, we explored several potential causes, including routing issues, before confirming the source of the problem within the protection system.

    To mitigate the impact while investigating further, temporary measures were implemented for affected customers. These included adjustments to allow traffic through specific rules while we worked on a more comprehensive solution. A fix was deployed promptly and extended to additional regions after thorough testing.

    During the resolution process, secondary complications arose due to overlapping reports of unrelated issues, such as threshold settings and connectivity challenges. These required separate investigations but were not linked to the main problem.

    Communication challenges throughout the process slowed down troubleshooting and resolution efforts. However, the root cause was identified and addressed, and the algorithm responsible for the issue is being retired in an upcoming update to prevent similar occurrences.

    We remain committed to improving our processes and communication to ensure more efficient handling of such incidents in the future. We appreciate your understanding and patience as we worked to resolve this issue promptly and effectively.

  • Monitoring
    Monitoring

    We've applied a blanket fix globally. Currently waiting on a full explanation on what happened from our upstream.

  • Update
    Update

    We are continuing to resolve connectivity issues. They have identified a possible cause and are rolling out a fix globally to attempt to resolve any connectivity issues.

    In addition, we noticed that a side effect of this behavior is that throughput speeds may have decreased. Our testing towards a number of speedtest(dot)net servers confirms this and that is being resolved as well.

  • Update
    Update

    We have applied a temporary fix that manually allows outbound connections over port 32000-65000. This was suggested by our upstream and is showing positive results.

    If you continue to encounter issues, please let us know and we will investigate further with you individually.

  • Identified
    Identified

    We have identified elevated rates of failed outbound connections what we've identified to be HTTP and HTTPS, but could be further affecting other protocols.

    Our network has been ruled out as the cause of it and the issue has been escalated to our upstream in order to resolve it as quickly as possible.

    We initially thought these were DNS resolution failures. After a deeper look and logs provided by customers, it was determined that it was not causing the failures to reach to outbound sources such as Plugin APIs, 1.1.1.1 (Cloudflare DNS), and other sources.

    We will provide more information as it becomes available.