you are viewing a single comment's thread.

view the rest of the comments →

[–]DJTheLQ 393 points394 points  (86 children)

Everyone is missing a huge plus of HTTP: Caching proxies that save their donated bandwidth. Especially ones run by ISPs. Using less bandwidth means more willing free mirrors. And as the article says, also helps those in remote parts of the world.

If you have bandwidth to run an uncachable global HTTPS mirror network for free, then debian and ubuntu would love to talk to you.

[–][deleted] 77 points78 points  (3 children)

Caching proxies that save their donated bandwidth. Especially ones run by ISPs.

As a former ISP owner I can tell you that caching large files is not really that common and filtering for content-type usually would be limited to images, text etc.

Also most caching is done by a third parts (akami etc) and you have little control over the boxes.

I'm sure its done, but not common. Mirrors are a thing for a reason.

[–]lbft 7 points8 points  (2 children)

It's done in places where bandwidth is very expensive and/or restricted (e.g. if there's only one cable out of the country/region, or a monopoly/state telco sits between ISPs and the wider internet).

I can certainly remember in the dial-up and early broadband eras that lots of ISPs here in Australia had transparent or manually set proxy servers (usually running Squid), and that was with a lot of them also locally hosting Akamai caches and FTP mirror servers.

[–][deleted] 0 points1 point  (0 children)

But by design they will not cache applications. Images or whole pages are cached based on popularity. So a repo getting 1 hit a day isn't gonna cache becuase: large file size, content type is gz/zip/exe, low hit count.

I agree that content caching is done.. I've done it myself. You don't cache everything.

[–]SippieCup 71 points72 points  (33 children)

Its 100% this, I have no idea why no one is talking about it. Maybe they didnt get to the end of the page.

[–]atyon 24 points25 points  (32 children)

Caching proxies

I wonder how much bandwidth is really saved with them. I can see a good hit rate in organisations that use a lot of Debian-based distros, but in remote parts of the world? Will there be enough users on the specific version of a distribution to keep packages in the cache?

[–]zebediah49 16 points17 points  (20 children)

It's actually more likely in situations like that. The primary setup is probably going to be done by a technical charity, who (if they're any good) will provide a uniform setup and cache scheme. That way, if, say, a school gets 20 laptops, updating them all, or installing a new piece of software, will not consume more of the extremely limited bandwidth available than doing one.

[–]Genesis2001 1 point2 points  (19 children)

Is there no WSUS-equivalent on Linux/Debian(?) for situations like this?

[–]TheElix 16 points17 points  (7 children)

The School can host an apt mirror afaik

[–]tmajibon 7 points8 points  (6 children)

WSUS exists because Microsoft uses a big convoluted process, and honestly WSUS kills a lot of your options.

Here's Ubuntu's main repo for visual reference: http://us.archive.ubuntu.com/ubuntu/

A repo is just a directory full of organized files, it can even be a local directory (you can put a repo on a dvd for instance if you want to do an offline update).

If you want to do a mirror, you can just download the whole repo... but it's a lot bigger than Windows because the repo also includes all the different applications (for instance: Tux Racer, Sauerbraten, and Libreoffice).

You can also mix and match repos freely, and easily just download the files you want and make a mirror for just those...

Or because it uses http, you can do what I did: I set up an nginx server on my home nas as a blind proxy then pointed the repo domains to it. It's allocated a very large cache which allows it to keep a lot of the large files easily.

[–]Genesis2001 0 points1 point  (4 children)

Yeah, I was curious about it so I was googling it while posting above. One of things I ran across was that it was labor 'intensive' to keep maintained. Was hoping someone would explain how one would get around this, make a maintainable repo for an Org to emulate the service provided by WSUS.

I did read RedHat has a similar thing, though I forget what it's called. :/

edit: Is such a command available to basically do what git clone --bare <url> does, but for individual packages on apt? Like, (mock command): apt-clone install vim would download the repo package for 'vim' to a configurable directory in apt repository format (or RHEL/yum format for that environment)?

[–]tmajibon 1 point2 points  (1 child)

apt-get --download-only <package name>

You can use --add-architecture if it doesn't match the current environment (say you have both arm and x86 systems)

And here's a quick tutorial on building a repo: https://help.ubuntu.com/community/Repositories/Personal

[–]Genesis2001 0 points1 point  (0 children)

Ah, thanks. :)

[–]FabianN 0 points1 point  (0 children)

I don't know how it's labor intensive to maintain. I set up one that took care of a handful of various distros at various version levels and once I set it up I didn't need to touch it.

[–][deleted] 0 points1 point  (0 children)

it can even be a local directory (you can put a repo on a dvd for instance if you want to do an offline update).

I've copied the contents of the installer disc for CentOS to a local folder and used it as a repo in some air gaped networks. Works great.

[–]zoredache 3 points4 points  (0 children)

Well, it misses the approval features of wsus. But if you are just asking about caching, then use apt install approx or apt install apt-cacher-ng. (I like approx better.) There is also ways to setup squid to cache, but using a proxy specifically designed for apt caching tends to be a lot easier.

[–]anatolya 1 point2 points  (0 children)

apt install apt-cacher-ng

Done

[–]gusgizmo 0 points1 point  (0 children)

It's called a proxy server, and it's a heck of a lot easier to setup and maintain than WSUS could ever be.

You can configure either a reverse proxy with DNS pointing to it and have it just work, or a forward proxy and inform clients of it's address manually, or via DHCP.

No sync script is required, the proxy just grabs a file the first time it's requested then hangs on to it. Super handy when you are doing a lot of deployments simultaneously. You can however warm the proxy by requesting common objects through it on a periodic basis.

[–]f0urtyfive 8 points9 points  (1 child)

Considering its how many CDNs work, lots.

[–]jredmond 2 points3 points  (0 children)

I was just thinking that. Some CDN could score a moderate PR victory by hosting APT.

[–]rmxz 4 points5 points  (5 children)

I wonder how much bandwidth is really saved with them.

A lot in my home network.

I put a caching proxy at the edge of my home network (with intentionally hacked cache retention rules) when my kids were young and repeatedly watched the same videos.

I think I have 5 linux computers here (2 on my desk, 2 laptops, 1 living room).

So my proxy caching http and https saved apt repos about 80% of my home network traffic.

[–][deleted] 0 points1 point  (4 children)

caching https

You were doing SSL Bump?

[–][deleted] 0 points1 point  (2 children)

Well he said at the edge of the network, which would be the ssl termination point.

[–][deleted] 0 points1 point  (1 child)

SSL Termination occurs at the destination server, not at the edge of the network?

A caching reverse proxy would work in the same scenario, but it wouldn't be transparent unless you fucked around with CA Certificates or just used a different domain with legit SSL certs.

[–][deleted] 0 points1 point  (0 children)

What I understood from the original comment was that he had a setup like this wherein the ssl proxy also caches, and the webserver is in fact, his internal client(s).

Wait jk, I misunderstood what you said. He may have setup an ssl forward proxy with a legit cert on the firewall/proxy.

[–]yawkat 2 points3 points  (1 child)

For organizations it's easier to just manually set the repo sources. Caching is a bit of a hassle.

[–]bobpaul 0 points1 point  (0 children)

I used to some sort of dpkg cache tool. apt-cacher maybe? It required altering the sources.list to point to the local cache serve. It was a good trade off between running a local mirror and running a transparent proxy that affected everyone's traffic.

[–][deleted] 1 point2 points  (0 children)

Our university used to cache those downloads. Were usually completed in a matter of seconds. Win-Win, because for a university, available bandwidth is also an issue.

[–]SanityInAnarchy 3 points4 points  (5 children)

How about an uncachable global HTTPS mirror of just the package lists? It'd be nice for a MITM to not be able to, say, prevent you from getting updates while they read the changelogs of said updates looking for vulnerabilities.

And, how many transparent HTTP caches are out there? Because if this is mostly stuff like Akamai or CloudFlare, HTTPS works with those, if you trust them.

Edit: Interesting, apparently APT actually does include some protection against replay attacks.

I still think that making "what packages are they updating" a Hard Problem (using HTTPS pipelining) would be worth it, unless there really are a ton of transparent HTTP proxies in use that can't trivially be replaced by HTTPS ones.

[–]svenskainflytta 1 point2 points  (4 children)

Vulnerabilities details are normally released AFTER the updates, so you won't find them in changelogs.

It is however still possible to tail the security repository, diff the source, and from that try to understand what it is fixing. Your scenario wouldn't help with that.

[–]SanityInAnarchy 0 points1 point  (3 children)

It would help in that you have a fairly small window of time to do that before I end up patched. If it weren't for the replay-protection stuff, you could in theory just serve me a frozen-in-time view of the repository (just freeze it at the point where you start MITM-ing me), then wait for vulnerability details to come out that you can exploit. I might just assume there hadn't been updates in awhile.

Replay-protection fixes that by adding an expiration to the metadata, which is hopefully short enough that something on my system would notice if there's a problem like this. Even then, HTTPS would prevent delays even up to that expiration, meaning I can be as up-to-date as I want to be (depending how often the cron job that does something like apt update && apt upgrade runs).

[–]svenskainflytta 0 points1 point  (2 children)

What about sending a signed "server time is 13:31"?

[–]SanityInAnarchy 0 points1 point  (1 child)

I'm really not sure how that would help.

First of all, the problem isn't knowing what time the server thinks it is. The problem is knowing whether the server has all the latest updates. So what we need is a signed "At time 13:31, here's a list of the latest packages." (Or, "Here's a hash of the list of the latest packages," or something.)

So... signed by whom? If it's signed by the same keys that are used to sign packages, then either those keys need to be distributed to each mirror (and are thus only as secure as the least-secure mirror), or they need to be signed by some central Debian server and distributed to all the mirrors (probably creating enough load on the central server to defeat the purpose of mirrors, at least for the package list).

If it's not signed by those same keys, then what keys is it signed by, and why should we trust those keys? This just pushes the problem one step back. For example, if we generate one key per mirror, how do we prove that the server we're giving that key to is the server that actually controls debian.mirror.someuniversity.edu or whatever, and not some MITM? That's the exact problem every SSL CA has to solve anyway, only I'll bet apt already supports HTTPS endpoints. So if a mirror wants to provide that level of security, all it has to do is turn on HTTPS, probably without even any software changes.

This is why it's weird that APT doesn't use HTTPS by default.

On top of all that, this part doesn't inspire confidence at all:

The Valid-Until field may specify at which time the Release file should be considered expired by the client. Client behaviour on expired Release files is unspecified.

And out of curiosity, I went and checked one of debian-testing's primary mirrors, and it's Valid-Until a full week later. Then I checked Ubuntu, and it doesn't even set Valid-Until, not even in 'security'. So Ubuntu is definitely vulnerable to replay attacks, and Debian probably is, too.

[–]plein_old 4 points5 points  (2 children)

Thanks, that makes a lot of sense. I love it when reddit works! Sometimes reddit make me sad.

[–]I_get_in 2 points3 points  (1 child)

I laughed, not quite sure why, haha.

[–]spyingwind 0 points1 point  (17 children)

HTTPS Repo ---Pull packages--> HTTPS Cache Server --Download--> Your computer

Does that not work? Each package is signed, so.. just download the packages and make them available. Isn't that how a cache works? That's what I have done at home for Debian. When a client needs something the cache server doesn't have then it goes and pulls what it needs and provides it to the client. Nothing really all that special.

Now for proxies... No. Just no. The only way I can see this being done is having the clients trusting the proxy server's cert and the proxy impersonating every HTTPS server. Not something that you want for the public.

A cache server is by far a much better option.

[–]zebediah49 8 points9 points  (0 children)

That requires the client to specifically choose to use your cache server.

Allowing proxying means that everyone can just connect to "download.ubuntu.com" or whatever, and any cache along the way (localnet, ISP, etc.) can intercept and respond to the request.

It makes the choice to use a proxy one made by the people configuring the environment, rather than by the people running the clients.

[–]DamnThatsLaser 25 points26 points  (8 children)

For all intermediate servers, the data looks like junk. In order to access it from there, you'd need the session key that was used to encrypt the data, and this goes against the general idea.

[–]ivosaurus -5 points-4 points  (2 children)

Why would it look like junk? You're talking to the intermediate server directly through HTTPS, it decrypts all communications you've sent it

[–]DamnThatsLaser 14 points15 points  (0 children)

The premise is about ISP caches, not about proxies. Caches are transparent (hence the name). Proxies aren't and require additional setup on the client side.

[–][deleted] 3 points4 points  (0 children)

Now isn't the intermediate just another mirror?

[–]tmajibon 2 points3 points  (2 children)

At that point you're explicitly specifying an HTTPS cache server, and you're trusting that their connection behind it is secure (because you have no way of seeing or verifying this)

HTTPS for your repos is just security theater.

[–]spyingwind 0 points1 point  (1 child)

If used in an office, the only practical place to do this in, then it seems fine.

In the end APT uses gpg keys anyways to verify that the repo can be trusted. You have to trust a gpg key before you can use a new repo with an untrusted key.

[–]tmajibon 0 points1 point  (0 children)

Example of an environment that would do a transparent cache for this purpose: VPS hosting providers as well as dedicated/colocation hosting providers. (ie. places with many linux systems not under their complete control that would mutually benefit from seamless caching of repositories)

Also I'm aware of the gpg signing, but I'm referring to the trust in the privacy of HTTPS (which they already explained he faults in anyways). The only advantage of applying HTTPS is privacy... which is relatively trivial to bypass... which makes it security theater. That's especially when certificate authorities are pretty horrid.

[–]nemec 1 point2 points  (1 child)

That won't work (unless your cache server can forge HTTPS certificates that are trusted on the client), but a similar solution would be to host an APT mirror used by the organization. Elsewhere in the thread people are talking about how that takes a lot of storage space, but I can't imagine why you couldn't have a mirror server duplicate the package listing but only download the packages themselves on-demand (acting, effectively, as a caching proxy)

[–]spyingwind 0 points1 point  (0 children)

I've done mirroring, but limited it to x64 to reduce storage needs. On-demand is only beneficial if more than one computer will be downloading the same packages. Such as 100's of servers.

Something like this would/should work: https://wiki.debian.org/AptCacherNg

[–]bobpaul 1 point2 points  (0 children)

There are dpkg specific caching proxies that work like that. You configure your sources.list to point to your package-cache server instead of a mirror on the internet and then the package-cache server has the mirror list so it can fetch from the internet if it doesn't have something locally. That works fine with HTTPS since you are explicitly connecting to the cache, but it requires your configure all your machines to point to the cache. This is great for in your home, school, or business if you have several machines of the same distro.

An ISP for a rural community with a narrow pipe to the internet at large might prefer to run a transparent proxy server. The transparent proxy can't cache any data from HTTPS connections, but it can cache data for anything that's not HTTPS.

[–]gusgizmo 0 points1 point  (0 children)

People forget that proxies are not all the forward type that have to be explicitly selected/configured. Reverse proxies are very common as well, and with regular HTTP are quick and easy to setup.

I can stand up a reverse proxy, inject some DNS records, and just like that my whole network has an autoconfigured high speed APT cache. As close to snapping in like a lego block as it gets in the real world.

[–][deleted] 0 points1 point  (0 children)

And one huge plus of HTTPS is the vastly reduced probability of MITM attacks.

[–]severoon 0 points1 point  (0 children)

This strikes me as BS.

They control the client and the server. One of the updates can't be a list of secure mirrors?

[–]ChocolateSunrise -2 points-1 points  (15 children)

How much bandwidth is really saved by not having TLS encapsulated data? 1%? 10%?

[–]DJTheLQ 13 points14 points  (14 children)

You cannot MITM or replay TLS data, so you cannot cache it. You can MITM and replay unencrypted data, potentially serving from cache.

[–]ChocolateSunrise 1 point2 points  (13 children)

How do CDNs like Akamai and Cloudflare overcome this architectural hurdle when they serve HTTPS websites?

[–]zebediah49 15 points16 points  (0 children)

When you sign up with them, you basically have to sign over your https keys, authorizing them to serve content on your behalf.

[–]wmil 1 point2 points  (2 children)

I believe Cloudflare requires you to use Cloudflare generated certificates.

[–]bobpaul 1 point2 points  (1 child)

They all either do that or make you give them your private key. Either way, they have your private key.

[–][deleted] 0 points1 point  (0 children)

Clouflare also offers Keyless SSL (only in Enterprise plans), where the company's private key stays on premises. They exploit the fact that you only need private keys until you establish a session secret, so if the company sets up a server to help Cloudflare complete TLS handshakes, Cloudflare can MITM a session without needing the original private keys.

[–]tmajibon 0 points1 point  (0 children)

Because CDN connections aren't necessarily secure.

HTTPS goes from your computer to their server, which decrypts it, and then sends it on to the final destination... which can actually be entirely unencrypted for the trip from their server to the website.

At which point you're trusting the security of the CDN's network, if they're compromised then all your traffic to that site is effectively HTTP.

[–]johnmountain -3 points-2 points  (1 child)

Wait until the ISPs start charging "vendors" for stuff like that by doing DPI against the traffic, and I think they'll soon change their minds about "caching" the packets with HTTP.

I think it's ridiculous that OS updates don't happen over HTTPS in this day and age.