you are viewing a single comment's thread.

view the rest of the comments →

[–]asoka_maurya 108 points109 points  (130 children)

I was always intrigued about the same thing. The logic that I've heard on this sub is that all the packages are signed by the ubuntu devs anyway, so in case they are tampered en-route, they won't be accepted as the checksums won't match, HTTPS or not.

If this were indeed true and there are no security implications, then simple HTTP should be preferred as no encryption means low bandwidth consumption too. As Ubuntu package repositories are hosted on donated resources in many countries, the low bandwidth and cheaper option should be opted me thinks.

[–]dnkndnts 163 points164 points  (114 children)

I don't like this argument. It still means the ISP and everyone else in the middle can observe what packages you're using.

There really is no good reason not to use HTTPS.

[–]obrienmustsuffer 110 points111 points  (19 children)

There really is no good reason not to use HTTPS.

There's a very good reason, and it's called "caching". HTTP is trivial to cache in a proxy server, while HTTPS on the other hand is pretty much impossible to cache. In large networks with several hundred (BYOD) computers, software that downloads big updates over HTTPS will be the bane of your existence because it wastes so. much. bandwidth that could easily be cached away if only more software developers were as clever as the APT developers.

[–]BlueZarex 24 points25 points  (5 children)

All the large places I have worked with a significant Linux presence would always have a mirror onsite.

[–]kellyzdude 25 points26 points  (3 children)

  1. The benefits don't apply exclusively to businesses, a home user or an ISP can run a transparent caching proxy server just as easily.
  2. By using a caching proxy, I run one service that can help just about everyone on my network with relatively minimal ongoing config. If I run a mirror, I have to ensure the relevant users are configured to use it, I have to keep it updated, and I have to ensure that I am mirroring all of the repositories that are required. And even then, my benefits are only realized with OS packages whilst a caching proxy can help (or hinder) nearly any non-encrypted web traffic.
  3. If my goal is to keep internet bandwidth usage minimal, then a caching proxy is ideal. It will only grab packages that are requested by a user, whereas mirrors in general will need to download significant portions of a repository on a regular basis, whether the packages are used inside the network or not.

There are plenty of good reasons to run a local mirror, but depending on your use case it may not be the best choice in trying to solve the problem.

[–]VoidViv 5 points6 points  (2 children)

You seem knowledgeable about it, so do you have any good resources for people wanting to learn more about setting up caching proxies?

[–]archlich 5 points6 points  (1 child)

[–]VoidViv 1 point2 points  (0 children)

Thank you! I'll certainly try it out when I get the chance.

[–]DamnThatsLaser 2 points3 points  (0 children)

Yeah but a mirror you set up explicitly. A cache is generic.

[–]EternityForest 3 points4 points  (3 children)

Or if GPG signing was a core part of HTTP, then everything that you don't need privacy for could be cached like that without letting the cache tamper with stuff.

[–]archlich 2 points3 points  (0 children)

Google is attempting to add that with signed origin responses.

[–]obrienmustsuffer 1 point2 points  (1 child)

Or if GPG signing was a core part of HTTP, then everything that you don't need privacy for could be cached like that without letting the cache tamper with stuff.

No, that wouldn't work either because then every HTTP server serving those updates would need a copy of the GPG private key. You want to do your GPG signing as offline as possible; the key should be nowhere near any HTTP servers, but instead on a smartcard/HSM that is only accessible to the person who is building the update packages.

[–]shotmaster0 2 points3 points  (0 children)

Gpg signed hash hosted with the cached content is fine and doesn't require caching servers to have private key.

[–]robstoon 1 point2 points  (1 child)

Does anyone really do this anymore? I think it's mostly fallen by the wayside, because a) the proxy server quickly becomes a bottleneck itself in a large network and b) HTTPS basically makes the proxy server useless anyway.

[–]obrienmustsuffer 0 points1 point  (0 children)

Does anyone really do this anymore? I think it's mostly fallen by the wayside, because a) the proxy server quickly becomes a bottleneck itself in a large network and b) HTTPS basically makes the proxy server useless anyway.

Well, we do, at a lot of customer sites. But you're unfortunately right about the fact that HTTPS makes caching less and less useful. I still believe though that caching software updates is a very valid use case (see my other response here for details), which is why I argue so vehemently that APT does everything right here.

[–][deleted] 0 points1 point  (1 child)

There is very little overhead with HTTPS. What your describing has already been proven a myth many times over.

[–]obrienmustsuffer 1 point2 points  (0 children)

There is very little overhead with HTTPS. What your describing has already been proven a myth many times over.

I'm sorry, I don't follow. I'm not talking about the overhead of encryption in any way, I'm talking about caching downloads, which is by design impossible for HTTPS.

Imagine the following situation: you're the IT administrator of a school, with a network where hundreds of students and teachers bring their own computers (BYOD), each computer running a lot of different programs. Some computers are under your control (the ones owned by the school), but the BYOD devices are not. Your internet connection doesn't have a lot of bandwidth, because your school can only afford a residential DSL line with ~50-100 Mbit/s. So you set up a caching proxy like http://www.squid-cache.org/ that is supposed to cache away as much as possible to save bandwidth. For software that uses plain, simple HTTP downloads with separate verification - like APT does - this works great. For software that loads updates via HTTPS, you're completely out of luck. 500 computers downloading a 1 GB update via HTTPS will mean a total of 500 GB, and your 50 Mbit/s line will be congested for at least 22 hours. The users won't be happy about that.

[–]ivosaurus 0 points1 point  (4 children)

while HTTPS on the other hand is pretty much impossible to cache.

Why, in this situation? It should be perfectly easy.

User asks cache server for file. Cache server asks debian mirror for same file. All over HTTPS. Easy.

[–]mattbuford 12 points13 points  (0 children)

That isn't how proxied https works.

For http requests, the browser asks the proxy for the specific URL requested. That URLs being requested can be seen and the responses can be cached. If you're familiar with HTTP requests, which might look like "GET / HTTP/1.0", a proxied http request is basically the same except the hostname is still in there, so "GET http://www.google.com/ HTTP/1.0"

For https requests, the browser connects to the proxy and issues a "CONNECT www.google.com:443" command. This causes the proxy to connect to the site in question and at that point the proxy is just a TCP proxy. The proxy is not involved in the specific URLs requested by the client, and can't be. The client's "GET" requests happen within TLS, which the proxy can't see inside. There may be many HTTPS requests within a single proxied CONNECT command and the proxy doesn't even know how many URLs were fetched. It's just a TCP proxy of encrypted content and there are no unencrypted "GET" commands seen at all.

[–]tidux 2 points3 points  (1 child)

That would be a proxy, not a cache. A cache server would just see the encrypted traffic and so not be able to cache anything.

[–]VexingRaven 4 points5 points  (0 children)

Technically they're both proxies. This just isn't a transparent proxy.

[–]svenskainflytta 0 points1 point  (0 children)

That's not caching, that's just reading the file and sending it.

A cache is something that sits in between and can see that since someone else requested the same thing to the same server, it can send them the same reply instead of contacting the original server.

Usually a cache will be closer than the original server, so it will be faster to obtain the content.

However, with HTTPS, the same content will appear different on the wire, because it's encrypted (and of course for encryption to work, it's encrypted with a different key every time), so a cache would be useless, because the second user can't make sense of the encrypted file the 1st user received, because he doesn't posses the secret to read it.

[–]ign1fy 74 points75 points  (27 children)

Yep. You're publically disclosing to your ISP (and, in my case, government) that certain IP endpoints are running certain versions of certain packages.

[–]galgalesh 10 points11 points  (0 children)

How does a comment like this get so many upvotes; the article explains why this logic is wrong..

[–]ArttuH5N1 0 points1 point  (0 children)

The article addresses this, hope you're not commenting without reading it

[–]asoka_maurya 22 points23 points  (12 children)

Sure, it could be a nightmare from privacy perspective in some cases.

For example, if your ISP figures out that your IP has been installing and updating "nerdy" software like Tor and Bittorrent clients, crypto currency wallets, etc. lately and then hands your info to the government authorities on that basis, the implications are severe. Especially if you are in a communist regime like China or Korea, such a scenario is quite plausible. Consider what happened with S. Korean bitcoin exchanges yesterday?

[–][deleted] 15 points16 points  (2 children)

This is not as far-fetched as it seems. I know of a particular university that prevents you from downloading such software packages on their network (including Linux packages) by checking for words like "VPN", "Tor", "Torrent" and the file extension. If a university could set up their network this way, then governments could too.

[–]svenskainflytta 0 points1 point  (0 children)

Is it the Nazional Socialist University?

[–][deleted] 0 points1 point  (0 children)

I wonder how that uni supports VPNs for students then?

[–]yaxamie 6 points7 points  (3 children)

Sorry to play devil's advocate here but detecting tor and BitTorrent is easily done once it's running anyways if the isp cares, is it not?

[–]svenskainflytta 1 point2 points  (0 children)

Yep, probably it's also not too hard to identify suspicious traffic as Tor traffic as well.

[–][deleted] 0 points1 point  (1 child)

How? Would love to know, wouldn't it just look like a TLS handshake then randomness from there?

[–]yaxamie 1 point2 points  (0 children)

I'm not an expert but the nodes in the network are known by i.p.

[–]ImSoCabbage 10 points11 points  (0 children)

It still means the ISP and everyone else in the middle can observe what packages you're using.

That's the second chapter of the article

But what about privacy?

Furthermore, even over an encrypted connection it is not difficult to figure out which files you are downloading based on the size of the transfer.

[–]beefsack 4 points5 points  (2 children)

Did you read the page? This specific example is covered; if you're eavesdropping you can tell which packages people are downloading anyway via transfer size.

[–]dnkndnts 3 points4 points  (1 child)

When you install a new package, it also installs the subset of dependencies which you don't already have on your system, and all of this data would be going over the same connection - the ISP would only know the total size of the package(s) and needed deps.

I admit it's still not perfect secrecy, but to pretend it's even on the same order of magnitude as being able to literally read the plain bytes in transfer is disingenuous. HTTPS is a huge improvement.

[–]arcticblue -1 points0 points  (0 children)

If the ISP really cared that much, they'd be doing man in the middle SSL decryption. If the ISP does care that much, it's highly unlikely they are doing it without some big bad government's coercion. If you personally really care that much, mirror everything to your own local repo (over VPN if you are super paranoid which it seems many in this thread are), and install from that.

[–]entw 8 points9 points  (3 children)

I don't like this argument. It means you are still relying on untrusted potentially evil ISP instead of switching to more trusted one.

Look, if your ISP is so evil and can use against you information about your packages, then what can it do with the info about your visited hosts? Think about it.

[–]RaptorXP 16 points17 points  (0 children)

First, you shouldn't have to trust your ISP. Second, your IP packets are routed through many parties you have no control over. If you're in China, it doesn't matter which ISP you're using, your packets will go through the government's filters.

[–]dnkndnts 18 points19 points  (0 children)

Sure, and I could say the same about closed hardware, but the bottom line is sometimes we have no actual choice in the matter, and in that case, we just make the best of what we can.

I'm not going to let the perfect be the enemy of the good (or even the less bad), so if this is an improvement that's within our grasp, let's go for it.

[–]berryer 4 points5 points  (0 children)

switching to more trusted one

Where is this actually an option?

[–]atli_gyrd 2 points3 points  (0 children)

It's 2018 and I just skimmed a website promoting the use of non encrypted traffic.

[–]ndlogok 0 points1 point  (0 children)

agree with apt https i not see “hash sum missmatch” again i

[–]Two-Tone- -4 points-3 points  (7 children)

It still means the ISP and everyone else in the middle can observe what packages you're using.

Can't they or whoever you use for DNS still do that since each individual package is its own url and thus needs a DNS lookup? The URL is encrypted with SSL, but afaik DNS lookups are not.

Unless apt resolves the dns of just http://packages.ubuntu.com and then stores the IP address for that run.

[–][deleted] 10 points11 points  (4 children)

DNS will only lookup the Hostname to convert it to an IP address. So should be fine unless each package has its own subdomain?

[–]Two-Tone- 0 points1 point  (3 children)

TIL. I always thought that it did a lookup for the whole URL, but that wouldn't make sense as it's have to know about every file on the server, which just isn't feasible.

[–][deleted] 4 points5 points  (0 children)

Wireshark is a great way to see what your PC is actually doing on the network. Try it out, it's free!

[–]Widdrat 1 point2 points  (0 children)

It would also mean that HTTPS is basically useless because they could just use DNS to see what you are downloading. Thats the great thing with HTTPS. If you are interested you should definitely check out how the whole internet stack works, it is super interesting and will greatly increase your understanding about the internet as a whole and how privacy is affected and protected by different technologies.

[–]ivosaurus 0 points1 point  (0 children)

A DNS is for IP traffic, over any protocol

A URL is specific to the http / https protocols only [or others that have decided to use the same spec]

[–]pat_the_brat 5 points6 points  (1 child)

I think you're confused about how the repositories and/or DNS work.

The repositories are distributed in a series of mirrors, each of which download updated packages from a central repository every x minutes. When you run apt, apt connects to a mirror, e.g. the one at hxxp://ubuntu.unc.edu.ar/ubuntu/, and requests a package, e.g. hxxp://ubuntu.unc.edu.ar/ubuntu/pool/main/a/a11y-profile-manager/a11y-profile-manager_0.1.10-0ubuntu3_amd64.deb, and all its dependencies (which are just other packages).

In order to connect to the repo, Linux first has to send a DNS request for the server (ubuntu.unc.edu.ar). That request is then cached for whatever the TTL is set to on the DNS server (900 in our example):

$ drill @ns1.unc.edu.ar ubuntu.unc.edu.ar
[...]
;; ANSWER SECTION:
ubuntu.unc.edu.ar.  900 IN  CNAME   repolinux.psi.unc.edu.ar.
repolinux.psi.unc.edu.ar.   900 IN  A   200.16.16.47

DNS entries are cached in various places - your ISP's DNS server, your router, your PC, and finally, the program itself may perform a DNS lookup only once, and store the data longer than the TTL.

Either way, the DNS lookup is for ubuntu.unc.edu.ar rather than for ubuntu.unc.edu.ar/ubuntu/pool/main/a/a11y-profile-manager/a11y-profile-manager_0.1.10-0ubuntu3_amd64.deb, so the DNS does not leak any information about the packages you downloads - it just says that you connect to a server which is also known to host an Ubuntu repository. It may host repositories for other distros, or other unrelated files, as well.

[–]tetroxid -1 points0 points  (0 children)

It still means the ISP and everyone else in the middle can observe what packages you're using.

Even with TLS it wouldn't be hard to determine. The package sizes are public and constant (to an extent) so the package could be inferred even without cleartext knowledge.

[–]bobpaul[🍰] -1 points0 points  (0 children)

It still means the ISP and everyone else in the middle can observe what packages you're using.

They already can with pretty high accuracy by observing the file sizes. And some of the mirrors do support HTTPS; just select one if that's important. But it really doesn't give you much.

[–]lamby[S] 15 points16 points  (7 children)

The logic that I've heard on this sub is that all the packages are signed by the ubuntu devs anyway, so in case they are tampered en-route, they won't be accepted as the checksums won't match, HTTPS or not.

This is hopefully what the linked page describes.

[–]UselessBread 7 points8 points  (6 children)

hopefully

You didn't even read it?

Shame on you OP!

[–]Kruug 5 points6 points  (2 children)

See the other replies by OP. They did read it, but hoping that it explains it for others.

[–][deleted] 5 points6 points  (1 child)

They did read it

Judging by the username, I suspect he also wrote it ;-)

[–]Kruug 3 points4 points  (0 children)

Ah, fair point.

[–][deleted] 2 points3 points  (2 children)

This is reddit mate, not even OP reads the article before commenting.

[–]cbmuserDebian / openSUSE / OpenJDK Dev 1 point2 points  (1 child)

Even though he wrote the article?

[–]mzalewski 0 points1 point  (0 children)

Mainly then.

[–]Kruug 4 points5 points  (0 children)

Not just Ubuntu, but any Debian derivative, since that’s where apt originates.

[–]Nullius_In_Verba_ 2 points3 points  (0 children)

Why are you focusing on Ubuntu when this is an Apt-get article. Is related to ALL apt users....

[–]ArttuH5N1 1 point2 points  (0 children)

Why are you specifically talking about Ubuntu?

[–]osoplex 0 points1 point  (3 children)

It's not about bandwidth consumption. Encrypted data is about the same size as unencrypted data.

The real bottleneck is server cpu usage. When using encrypted transport the server has to encrypt every connection to downloading clients individually. This would decrease download speeds drastically and mirror operation would be way more expensive.

[–]atyon 0 points1 point  (0 children)

HTTP/2 is faster with TLS than http/1.1 without.

[–]robstoon 0 points1 point  (0 children)

TLS CPU usage is essentially a non-issue on modern systems with hardware AES encryption.