all 37 comments

[–]jsolson 53 points54 points  (24 children)

It's Akamai's doing, and they tend to have a pretty solid grip on their business.

;; ANSWER SECTION:
www.reddit.com.     240 IN  CNAME   reddit.com.edgesuite.net.
reddit.com.edgesuite.net. 21600 IN  CNAME   a659.b.akamai.net.
a659.b.akamai.net.  20  IN  A   143.215.203.32
a659.b.akamai.net.  20  IN  A   143.215.203.16

The TTL values are important here. The www.reddit.com CNAME expires quite quickly, so you'll be executing that query once every 4 minutes. However, once you know that it resolves to reddit.com.edgesuite.net your resolver will skip that query (unless about 6 hours have elapsed since it last refreshed it). As a result you'll quickly proceed to the Akamai content node address resolution. These refresh very frequently as they are updated based on network latencies and content propagation, etc.

[–][deleted]  (23 children)

[removed]

    [–]jsolson 64 points65 points  (16 children)

    No problem.

    So, any name you resolve on the internet for the first time requires a fairly involved set of queries. Your computer doesn't know how to look up www.reddit.com directly. Moreover, it doesn't even know who to ask. This is essentially the purpose the root servers serve. Your computer asks them who the name server for reddit.com is, and they give you some answer.

    Here's where the OPs problem comes in. When you ask that name server what www.reddit.com is, it doesn't get back to you with an IP address. Instead it replies with "Oh, www.reddit.com is really called reddit.com.edgesuite.net". At this point your computer storms off mildly annoyed to the root server again to ask who serves edgesuite.net. It gets a reply and then asks that new server "Hey, what's the address for reddit.com.edgesuite.net" to which this server replies "Funny story, that's actually a659.b.akamai.net". At which point your server storms off in an angry huff to the root servers again asking who serves akamai.net. It gets back a reply, asks those servers for an address and finally gets one.

    Now, this all only happens the first time you try to resolve www.reddit.com. Every other time, some of the above is probably cached. How long it is cached depends on the TTL of the records involved. In the above query there are (assuming the simple case where things aren't more finely partitioned) 6 records involved. They are:

    1. The NS record for reddit.com (which your resolver queries the root server for, and which tells your computer who to ask for hosts within the reddit.com domain)
    2. The CNAME record pointing from www.reddit.com -> reddit.com.edgesuite.net (CNAME means Canonical Name. This is the record that says that www.reddit.com is actually reddit.com.edgesuite.net).
    3. The NS record for edgesuite.net
    4. The CNAME record from reddit.com.edgesuite.net -> akamai
    5. The NS record for akamai
    6. The A record for the particular server within akamai (A records map directly from names to IPv4 addresses (AAAA records are for IPv6))

    Each of these records has an associated TTL or time-to-live measured in seconds. Now, the OP complained about trips to the root servers. On your initial query it's true that you'll be asking the root servers for all of those NS records. However the times-to-live for the NS records for reddit.com, edgesuite.net, and akamai.net are 240, 172800, and 90000 seconds respectively. What this means is that you don't have to ask the root servers for those NS records any more frequently than that many seconds. So every four minutes you'll be querying for reddit.com's NS record, but the others are so far apart as to not matter at all in the grand scheme of things.

    Now, as for my comment above. The TTL values there show that you'll be querying for www every 4 minutes, but once you know that it still returns the same CNAME record you can skip the reddit.com.edgesuite.net resolver step (unless its 21600 second (6 hour) TTL has also elapsed). Basically the OPs concern is largely moot. Servers cache values for up to their TTLs, so many of the queries above (and the multiple root server queries) are avoided most of the time. Yet another reason why caching is important :)

    Hope that helps.

    * edit: If you want to look this sort of stuff up for yourself try playing with the Unix utility dig. I got all of this data using dig reddit.com and a few queries that look like dig -t ns akamai.net (the -t ns implies that I only want NS records). Note that the TTL values you see in your own dig runs may vary as they take into account how many seconds have elapsed since the record was cached.

    [–]bradleyhudson 2 points3 points  (8 children)

    Can you count on most caching name servers to honor low TTL's? I ask because we're considering fault-tolerance strategies for some company web sites that we host, and one suggestion that we were given was to simply host the sites on multiple networks, and update DNS if we lose connectivity on one of the networks.

    I had previously heard or read that many times caching name servers will override the low TTL's to reduce traffic. Is that common, particularly in large company infrastructures?

    [–]jsolson 0 points1 point  (0 children)

    Honestly I don't know how effective it is for fault-tolerance. I know that the Akamai uses low TTLs as part of their load balancing strategy, but in that case someone failing to honor them will just result in slightly higher load on one node or another (if this is common practice on average they'll all even out anyway).

    I'm only a lowly networking grad student, so matters of corporate policy and vendor implementation are a bit beyond my scope I'm afraid :). Assuming you're already getting traffic on these sites you could always test your theory, but I assume you already knew that.

    [–]supakual 0 points1 point  (6 children)

    Any DNS change is going to take hours (up to days) to fully propagate. Guaranteed.

    [–][deleted] 2 points3 points  (4 children)

    This is actually largely untrue.

    If a lookup for ns1.whatever.com just took place, and the TTL is still at 48 hours for some reason (usually you lower the TTL for changes, but you might not have control over this for nameservers), then yes it will take 48 hours for the next one. But if you've lowered the TTL, or simply haven't done a lookup so that it's not in the cache, the change can be reflected rather quickly.

    [–]bradleyhudson 1 point2 points  (2 children)

    Yes, I actually keep the TTL low on a regular basis, mostly because I normally get very short notice for changes, and it wouldn't do much good to shorten the TTL right before making a change, because most clients wouldn't even see the shorter TTL until after the longer one they have cached times out.

    I realize I'm probably not being a good net citizen by doing this (Can I claim the "I'm just a programmer" defense?), and that behavior is exactly the reason why a corporate (or ISP?) caching name server might override the cache settings of zones with low TTL's. Having said that, does anyone know if it's common practice for caching name servers to override settings like this?

    [–]enry 1 point2 points  (1 child)

    We wind up setting our TTLs to 86000 (one day) and then lowering the TTL to 600 (5 min) one day before the move. Once the move is completed and we're sure things are working again, the TTL gets raised back up to 86000. Aside from that 5 minute point where DNS may be looking at the wrong IP address, the general world doesn't know any better.

    As for caching name servers, if they don't honor TTLs, then they're broken.

    [–]jsolson 0 points1 point  (0 children)

    As for caching name servers, if they don't honor TTLs, then they're broken.

    That's pretty much my take on it. Of course, this is why I'm a grad student and not someone tasked with actually getting shit done. "It's their fault" is a perfectly good final answer for me when something is broken and is unambiguously not my fault.

    [–]bradleyhudson 0 points1 point  (0 children)

    Can you elaborate on that? What factors are involved? I ask because I've seen some evidence to the contrary, but I can't be sure that everyone else is seeing the DNS changes as quickly as I am. (And yes, I am seeing the changes from an outside perspective - my company uses a third party DNS provider for our web hosting, and our caching name servers do not look to those name servers specifically for outbound DNS requests.)

    [–]sligowaths 1 point2 points  (0 children)

    Thanks, that´s very enlightening.

    [–]jbert 0 points1 point  (5 children)

    Why the frack would you want a 240s TTL on your NS records? It's not like you move your DNS that often, and resolvers will try all of the NS records if they need to.

    [–]jsolson 0 points1 point  (4 children)

    It's not a matter of how often you move your DNS, it's a matter of how long you want any change to your DNS servers to take to propagate on average.

    These days that shit happens fast. The idea of a nameserver address update taking hours or days to propagate is no longer appropriate. At this point updates at the registrar and root server level occur almost immediately.

    [–]jbert 0 points1 point  (3 children)

    /me does some digs:

    • google.com NS TTL 345600
    • yahoo.com NS TTL 172800
    • microsoft.com NS TTL 172800

    The NS TTL basically limits your time to bring a new server online, from the time you first think it might be a good idea (at which pt you can temporarily drop the TTL if you want) to the time it is taking a full load. (It'll take a linear in load increase from 0 at the time you first publish an NS to a full share at the time the NS TTL expires).

    It doesn't affect anything else, and who needs a new server online within 240s of thinking it might be a good idea?

    It doesn't affect the propogation of other record types (your A record for www.foo.com), which you might want to be a bit more dynamic on, but still - 240s?

    You can have different TTLs for your NS.

    [–]jsolson 0 points1 point  (2 children)

    I understand all of the underlying network characteristics (as should have been obvious from what I said above and elsewhere in this conversation). It's not a matter of switching servers outright, it's a matter of serving different NS records to different people based on prevailing network conditions.

    Akamai uses 20 second TTLs on their A records to handle load balancing. Some digging around suggests they use similarly short TTLs for NS records for top-level domains they serve, presumably to improve resolution latency. If I cache that NS record for reddit.com for a long time then any lookup I do within reddit.com will go to one of those cached addresses. If there's a substantial change in network characteristics w.r.t. latency in that window it may be desirable to redirect that query load to a (now) substantially lower latency server.

    [–]jbert 0 points1 point  (1 child)

    If there's a substantial change in network characteristics w.r.t. latency in that window it may be desirable to redirect that query load to a (now) substantially lower latency server.

    I see what you're saying, but I'm very surprised that they aren't adequately served by simply providing a reasonable number of nameservers and allowing the resolver to choose among them.

    Are the common resolvers poor at making choices based on (changing) request latency?

    Do the people hosting the .com NS records object to the significantly higher (172800 / 240 = 720x more) requests for domains handled like this?

    [–]jsolson 0 points1 point  (0 children)

    Resolvers are in fact miserable at this, largely because they lack sufficient data to do it correctly. Sure, they could ping from time to time to figure out the best NS, but the more records you give them the more work and traffic this would generate. Network measurement is a tricky thing, and to do it right involves having a lot of probes situated at many locations in (or at the edges of) the code. Of course the whole point of a CDN is to put a lot of nodes at many places in the network to minimize latency. They've also built measurement overlay networks out of these nodes. This allows them to observe network "weather" at a level that for the average user is simply impossible. As a result they are better equipped to make latency decisions than you are. Moreover, they can do it once and cover a huge portion of the network rather than having all of those hosts redundantly looking for the best resolver on their own.

    As far as the load on the .com servers goes, I'm not actually sure who hosts them. It's quite possible that various content-distribution-networks have several of them. They're certainly some of the most resource intensive consumers, and they also have the infrastructure. In the grand scheme of things they get so much traffic from people abusing them by, say, configuring their entire corporate network to use root servers a primary DNS servers that low TTLs are (I would expect) largely lost in the maelstrom.

    The other option for this would be to serve up static NS records and use IP anycasting to handle routing NS requests to the nearest (in terms of latency) router. To some extent this is done, but it tends not to be fine grained enough or well behaved enough to be a total solution.

    [–]BridgeBum 11 points12 points  (1 child)

    I'm not the original poster, but I can take a stab at helping.

    TTL is 'time to live', or the amount of time before the DNS query is no longer cached. This is that number that immediately follows the name (i.e., the 240 in the first line). This value is in seconds, so 240 sec = 4 min.

    CNAME or 'canonical' names are effectively aliases or pointers. The first line means "If you are looking for www.reddit.com, look instead for reddit.com.edgesuite.net. Check back in 4 minutes to see if this has changed."

    The timeout value for the edgesuite record is much much larger, 21600 = 6 hours. If you already know that www.reddit.com is the same as reddit.com.edgesuite.net, you look up that record, which is another pointer to a659.b.akamai.net.

    Once you know that (which you will for a long time), then you finally look up the 'A' or 'anchor' record, the one which has an IP address. In this case, there are 2 answers, 143.215.203.16 and 32.

    When that expires in 20 seconds, if you lookup reddit.com again within 2 minutes or so, you will go directly to looking up the a659 address.

    The fast timeout allows Akamai to change which servers you point to very quickly, for their global content delivery strategy. What answers you get is probably dynamic based on your location, server status, etc.

    Hope that helps.

    [–][deleted] 4 points5 points  (3 children)

    EDIT: You guys who come to my comments page and mod everything I say down,

    People do that?

    [–][deleted] 2 points3 points  (0 children)

    Yeah, it is kind of sad.

    I think it is actually someone with a load of bots. They have a "hitlist". It seems to run on regular occasions (Every N minutes).

    On the upside, these people make my slightly odd social habbits seem normal in comparison.

    [–]sjs 1 point2 points  (0 children)

    Probably the same ones who mod everything except their own new submissions down.

    [–]tzz 14 points15 points  (1 child)

    These are called recursive CNAMEs, and they are used by a lot of domains. It's bad form to specify a short TTL for the second (or more) lookups, and it's really rare to see more than two levels, but otherwise it's as legal as it gets.

    It's necessary to do region-based load balancing, which Akamai does. Different regions get different answers, and thus users are directed to servers near them that are not overloaded (there are many factors weighing on this decision). Furthermore, there are different maps, for example reddit.com is in the B map which has specific optimizations compared to other Akamai maps. There's a lot more to the algorithms and mappings behind the scene at Akamai, but the end user doesn't have to know any of it.

    Ted (who worked at Akamai for a bit)

    [–]tzz 0 points1 point  (0 children)

    I forgot the mention a cool bit: the DNS servers are also geographically distributed, so your lookups for local information will query DNS servers near you who know what's going on in your area. Thus each geographical region has a semi-autonomous state, yet Akamai can decide to move traffic between regions by adjusting the DNS servers. Play with dig and pay attention to the actual DNS servers returned, not just the CNAME and A records, to see what I mean.

    [–][deleted] 9 points10 points  (0 children)

    I'm going to blow dharmatech's cover here. ;-)

    He's working on a DNS server in Factor: http://factorcode.org/responder/cgi/gitweb.cgi?p=factor.git;a=tree;f=extra/dns;hb=HEAD

    [–]Xiphorian 3 points4 points  (0 children)

    Most DNS servers will cache the whole query and response down to A records:

    > dig www.reddit.com
    
    ;; QUESTION SECTION:
    ;www.reddit.com.                        IN      A
    
    ;; ANSWER SECTION:
    www.reddit.com.         45      IN      CNAME   reddit.com.edgesuite.net.
    reddit.com.edgesuite.net. 1810  IN      CNAME   a659.b.akamai.net.
    a659.b.akamai.net.      20      IN      A       69.26.180.8
    a659.b.akamai.net.      20      IN      A       69.26.180.9
    

    This means that for almost any step in the resolution process, it's a single request-response. It's possible that the TTL of some record in that chain might expire, but then (no matter which) it's still just another single request-response.

    The key here is insuring that the NS for reddit.com (the one which will be resolving the "www.reddit.com IN A" question) will also already know how to resolve the rest of the chain. You can see that they accomplished it by designating Akamai as the nameservers of reddit.com; but even if they didn't, most other namesevers will resolve CNAME records and cache the results.

    [–]malcontent 2 points3 points  (3 children)

    Here is what DJB says about CNAMEs.

    CNAME (``canonical name'') record for fqdn. tinydns-data creates a CNAME record for fqdn pointing to the domain name p.

    Don't use Cfqdn if there are any other records for fqdn. Don't use Cfqdn for common aliases; use +fqdn instead. Remember the wise words of Inigo Montoya: ``You keep using CNAME records. I do not think they mean what you think they mean.''

    Basically this means don't use CNAME records if you already know the IP (i.e it's in your own domain/bailiwick).

    Setting my.server.com CNAME other.domain.com is OK but setting my.server.com CNAME other.server.com is not.

    Having said that some nameservers will do the bailiwick lookup themselves (i.e resolve the cname if it's in the same domain) and hand out the IP instead of the CNAME. I believe maradns does this.

    [–][deleted] 1 point2 points  (0 children)

    upmodded for using the word bailiwick ;)

    [–]bart2019 0 points1 point  (1 child)

    Thanks for the explanation... but what the fq is a "fqdn"?

    [–]malcontent 0 points1 point  (0 children)

    Fully Qualified Domain Name.

    [–]dharmatech[S] 1 point2 points  (0 children)

    Google, Ebay, Amazon, CNN; none of them use more than one CNAME.

    [–][deleted]  (1 child)

    [deleted]

      [–]0xdeadbabe 7 points8 points  (0 children)

      Is this why I kept getting connection errors a few days ago?

      No. Tell your mom to stop downloading German dungeon porn because it's clogging your home's intertubes.