Hello. My name is Gabriel, I’m a network analyst here at Made4IT and today I’m going to tell you about an interesting situation involving CDN and traffic engineering that we caught here in the consulting team a few days ago.


We received a call from a customer with a case where the traffic of a local CDN had a drastic decrease after a traffic engineering change. After extensive validations by the customer they were unable to find the root cause of this behavior and so they called our team.
We started there on 11/04 with a strange decrease in Inbound traffic on the interface with the local CDN. It was very succinct and represented 1Gbps less traffic on the interface.

With this info in hand, the first thing I did was access Made4Flow to help understand what happened using of course my favorite graphic, the “Interface by ASN”.

I took care to take a time interval contemplating exactly the event from the 3rd to the 4th precisely to understand which ASNs were no longer “served” by the local cache.

With that in hand, I noticed 2 things right away:

– The “Orange” ASN here is no longer “served” by this CDN, this is due to the decrease in traffic evident in the graph..
– The “Blue” ASN(s) (a set of specific ASNs custom configured in our software) also showed a significant decrease in the graph. The “Blue” ASN is who hosts the cache.


The ASN “Orange” being also a Made4IT client allowed us a more efficient test within its network. ASN “Azul” and ASN “Orange” as partners gave us total freedom to carry out the troubleshooting and validate what was necessary on both sides.

After some checks, we used the content provider’s own tool to validate which “node/cache” was “serving” this traffic that stopped coming from the local CDN of the “Azul” ASN.

This traffic no longer comes from the ASN “Azul” CDN, but it needs to come from somewhere, do you agree?

In the image above, 2 things caught my attention:

– A Node in Guarulhos, referring to this content provider’s network, is the one that started to deliver the traffic that used to come from the local CDN of the “Blue ASN
x.x.x.x.0/25 ?

At the time, a very important piece of information about ads for this content provider’s CDNs came to mind. They obey a rule that says:

– Direct ads (ASN hosting cache) node accepts up to /27 for direct influence on traffic engineering and content delivery eligibility.
– Indirect advertisements (ASNs adjacent to the ASN hosting the cache) the node only accepts up to /24 for direct influence on traffic engineering and content delivery eligibility.

Alright, for the ASN “Azul” traffic I already had the possible cause of the problem. When trying to induce this local node to deliver more traffic, they ended up announcing /25 blocks to the cache node, implying that the content provider network would “serve” this traffic via its nodes in Guarulhos.

However, we still have a decrease in traffic on the “Azul” ASN, what happened?

The “Blue” ASN in this case was actually 3 ASNs, something that was custom created in the Made4Flow software. Interestingly, 2 of these ASNs were “injecting” /25 prefixes into the CDN node, where the behavior found was exactly the same as the one reported above for the “Orange” ASN. With all this information in hand, we set out to get our hands dirty and adjust the ads respecting what the content provider requests (and has documented) for its CDN. After the necessary adjustments, see the result:


Conclusion:

We know that for a CDN node to work there are several factors and that it involves protocols working in an orchestrated way, such as BGP, DNS, TCP, among others… CDN out of acceptable range.

By announcing the /25 blocks of ASNs adjacent to the ASN that hosts the node/cache, the “player” started delivering traffic to that ASN/Prefix via its CDN nodes in Guarulhos, resulting in an increase in indirect traffic on our “Azul” ASN client and loss of performance and efficiency of the local node.

Users behind this network were directly affected as their content started to be delivered by an infrastructure over 1000kms away, increasing latency, loading time and overall user experience.

After adjustments in traffic engineering respecting the content provider’s documentation and good BGP practices, the node started to behave again as expected.

For traffic engineering contemplating CDNs, always respect the documentation of the player in question and use its tools to detect which “node” is “serving” the traffic. Usually CDN players also have a portal to monitor the performance of the cache with metrics and very important data about it, use these tools to your advantage.

– Gabriel Henrique, Network Analyst at Made4IT, has been working with technology and ISPs for over 10 years.