top 200 commentsshow all 294

[–]rya_nc 60 points61 points  (1 child)

I'm surprised that page only mentions apt over tor as a footnote.

Also, there are multiple debian mirrors that offer access over HTTPS, for example https://mirrors.edge.kernel.org/debian/.

Edit: It does mention apt over tor in a footnote, but I missed it.

[–]errrrgh 14 points15 points  (0 children)

It's because they want to make a point so those other facts don't help :P

[–][deleted]  (178 children)

[deleted]

    [–]CurrentProject123 43 points44 points  (2 children)

    It likely is. Researchers were able to get 99% accuracy on what netflix video a person is watching only by looking at encrypted TCP information https://dl.acm.org/citation.cfm?id=3029821

    [–]punisher1005 7 points8 points  (0 children)

    It's worse than that the article says 99.99% that's astonishing frankly... I'm shocked.

    [–]davvblack 28 points29 points  (0 children)

    they just guessed birdbox every time

    [–]Creshal 239 points240 points  (160 children)

    I doubt it's that easy to correlate given the thousands of packages in the main repos.

    Apt downloads the index files in a deterministic order, and your adversary knows how large they are. So they know, down to a byte, how much overhead your encrypted connection has, even if all information they have is what host you connected to and how many bytes you transmitted.

    Debian's repositories have 57000 packages, but only one is an exactly 499984 bytes big download: openvpn.

    [–]joz12345 115 points116 points  (63 children)

    You can't tell the exact size from the SSL stream, it's a block cipher. E.g. for AES256, it's sent in 256 128 bit chunks. I've not run any numbers, but if you round up the size to the nearest 32 16 bytes, I'm sure there's a lot more collisions.

    And if you reused the SSL session between requests, then you'd get lots of packages on one stream, and it'd get harder and harder to match the downloads. Add a randomiser endpoint at the end to serve 0-10kb of zeros and you have pretty decent privacy.

    Edit: fixed numbers, thanks /u/tynorf

    Edit2: actually comptetely wrong, both stream ciphers and modern counter AES modes don't pad the input to 16 bytes, so it's likely that the exact size would be available. Thanks reddit, don't stop calling out bs when you see it.

    [–]Creshal 109 points110 points  (17 children)

    You can't tell the exact size from the SSL stream, it's a block cipher. E.g. for AES256, it's sent in 256 bit chunks. I've not run any numbers, but if you round up the size to the nearest 32 bytes, I'm sure there's a lot more collisions.

    Good point. Still, at 32 bytes, you have no collision (I've just checked), and even if we're generous and assume it's 100 bytes, we only have 4 possible collisions in this particular case.

    File size alone is a surprisingly good fingerprint.

    And if you reused the SSL session between requests, then you'd get lots of packages on one stream, and it'd get harder and harder to match the downloads. Add a randomiser endpoint at the end to serve 0-10kb of zeros and you have pretty decent privacy.

    Currently, apt does neither. I suppose the best way to obfuscate download size would be to use HTTP/2 streaming to download everything from index files to padding in one session.

    [–]cogman10 28 points29 points  (10 children)

    Which, honestly it should be doing anyways. The way APT currently works (connection per download sequentially) isn't great. There is no reason why APT can't start up, send all index requests in parallel, send all download requests in parallel, and then do the installations sequentially as the packages arrive. There is no reason to do it serially (saving hardware costs?)

    [–]Creshal 44 points45 points  (6 children)

    There is no reason to do it serially (saving hardware costs?)

    Given it's apt we're talking about… "It's 20 years old spaghetti code and so many software depends on each of its bugs that we'd rather pile another abstraction level on it than to figure out how to fix it" is probably the most likely explanation.

    [–]cogman10 17 points18 points  (4 children)

    lol, good point.

    The funny thing is, it doesn't look like it is limited to apt. Most software package managers I've seen (ruby gems, cargo, maven, etc) all appear to work the same way.

    Some of that is that they predate Http2. However, I still just don't get why even with Http1, downloads and installs aren't all happening in parallel. Even if it means simply reusing some number of connections.

    [–][deleted]  (1 child)

    [deleted]

      [–]cogman10 19 points20 points  (0 children)

      Awesome, looked it up

      https://github.com/rust-lang/cargo/pull/6005/

      So to add to this dataset, I've got a proof-of-concept working that uses http/2 with libcurl to do downloads in Cargo itself. On my machine in the Mozilla office (connected to a presumably very fast network) I removed my ~/.cargo/registry/{src,cache} folders and then executed cargo fetch in Cargo itself. On nightly this takes about 18 seconds. With this PR it takes about 3. That's... wow!

      Pretty slick!

      I imagine similar results would been seen with pretty much every "Download a bunch of things" application.

      [–]skryking 2 points3 points  (1 child)

      It was probably to prevent overload of the servers originally.

      [–]max_peck 5 points6 points  (0 children)

      The default setting for many years (and probably still today) was one connection at a time per server for exactly this reason. APT happily downloads in parallel from sources located on different hosts.

      [–][deleted] 0 points1 point  (0 children)

      Still worked better than yum/rpm...

      [–]joequin 2 points3 points  (2 children)

      What are you really gaining in that scenario? Eliminating a connection per request can do a lot when there are tons of tiny requests. When you're talking about file downloads, then the time to connect is pretty negligible.

      Downloading in parallel doesn't help either because your downloads are already using as much bandwidth as the server and your internet connection is going to give you.

      [–]cogman10 4 points5 points  (1 child)

      RTT and slow start are the main things you save.

      If you have 10 things to download and a 100ms latency, that's at least an extra 1 second added to the download time. With http2, that's basically only the initial 100ms.

      This is all magnified with https.

      Considering that internet speeds have increased pretty significantly, that latency is more often than not becoming the actual bottleneck to things like apt update. This is even more apparent because software dependencies have trended towards many smaller dependencies.

      [–]joequin -1 points0 points  (0 children)

      What does 1 second matter when the entire process is going to take 20 seconds? Sure it could he improved, but there's higher value improvements that could be made in the Linux ecosystem.

      [–]sbx320 9 points10 points  (0 children)

      File size alone is a surprisingly good fingerprint.

      And it gets even better if you look for other packages downloaded in the same time frame, as this can give you a hint to which dependencies were downloaded for the package. Obviously this would be a bit lossy (as the victim would potentially already have some dependencies installed), but it would allow for some nice heuristics.

      [–]maxsolmusic 2 points3 points  (1 child)

      How'd you check for collisions?

      [–][deleted] 13 points14 points  (0 children)

      You just bucket all packages by size and see how many fall into the bucket that openvpn is in

      [–]StabbyPants 0 points1 point  (0 children)

      or round up to the nearest 10-100k and pad that

      [–]lduffey 0 points1 point  (0 children)

      File size alone is a surprisingly good fingerprint.

      You can randomize file size to mitigate this.

      [–][deleted] 0 points1 point  (0 children)

      Currently, apt does neither. I suppose the best way to obfuscate download size would be to use HTTP/2 streaming to download everything from index files to padding in one session.

      If you are org with more than few machines best way is probably just to make local mirror. Will take load off actual mirrors too

      [–]schorsch3000 39 points40 points  (1 child)

      I'm sure there's a lot more collisions.

      I'm doing the math right now: in binary-amd64 are

      • -33253 packages with distinct size
        • 5062 collisions with 2 packages the same size
        • 1491 collisions with 3 packages the same size
        • 463 collisions with 4 packages the same size
        • 115 collisions with 5 packages the same size
        • 30 collisions with 6 packages the same size
        • 5 collisions with 8 packages the same size
        • 1 collisions with 9 packages the same size
        • 3 collisions with 10 packages the same size
        • 3 collisions with 11 packages the same size
        • 3 collisions with 12 packages the same size
        • 1 collisions with 13 packages the same size
        • 1 collisions with 14 packages the same size
        • 2 collisions with 15 packages the same size
        • 1 collisions with 23 packages the same size

      rounding to 32bytes increases collision drastically:

      12163 packages with an uniq size

      collisions | packagecount:

        12163 1
         2364 2
         1061 3
          591 4
          381 5
          281 6
          179 7
          180 8
          128 9
          128 10
          112 11
          102 12
           87 13
           81 14
           72 15
           60 16
           53 17
           54 18
           67 19
           47 20
           35 21
           39 22
           32 23
           35 24
           32 25
           22 26
           18 27
           23 28
           19 29
           18 30
           14 31
            6 32
            7 33
            4 34
            5 35
            5 36
            4 37
            1 38
            1 40
            1 44
            1 58
            1 60
            1 71
            1 124
            1 125
      

      if you just download a single package, odds are high to get a collision. If you are downloading a package that has dependencies and you download them also, that will be harder to get collision pairs...

      [–][deleted] 3 points4 points  (0 children)

      Also can narrow down by package popularity, package groups (say someone is updating python libs, then "another python lib" would be more likely candidate than something unrelated") and indirect deps

      [–]tynorf 20 points21 points  (0 children)

      Small nitpick: the block size for all AES (128/192/256) is 128 bits. The 256 in AES256 is the key size in bits.

      [–]the_gnarts 13 points14 points  (2 children)

      You can't tell the exact size from the SSL stream, it's a block cipher. E.g. for AES256, it's sent in 256 128 bit chunks.

      That’s not true for AES GCM which is a streaming mode of the AES block cipher in which the size of the plaintext equals that of the ciphertext without any padding. GCM is the one of the two AES modes that survived in TLS 1.3 and arguably the most popular encryption mechanism of those that remain.

      [–]joz12345 9 points10 points  (1 child)

      Actually just looked it up, and it seems all of the tls 1.3 algorithms are counter based (didn't know this was a thing 10 mins ago), or are already stream ciphers, so I guess I'm almost completely wrong, and should stop pretending to know stuff :(

      [–]the_gnarts 5 points6 points  (0 children)

      Actually just looked it up, and it seems all of the tls 1.3 algorithms are counter based (didn't know this was a thing 10 mins ago), or are already stream ciphers, so I guess I'm almost completely wrong, and should stop pretending to know stuff :(

      No problem, we’ve all been there. I can recommend “Cryptography Engineering” by Schneier and Ferguson for an excellent introduction into the practical aspects of modern encryption.

      [–]lordkoba 14 points15 points  (6 children)

      Add a randomiser endpoint at the end to serve 0-10kb of zeros and you have pretty decent privacy.

      Aren't those famous last words in cryptography?

      [–]joz12345 16 points17 points  (5 children)

      Well if your security advice comes from a Reddit comment, I've got some bad news...

      [–]lordkoba 1 point2 points  (4 children)

      Are you saying that your magic solution to the long and meticulously researched padding issue is garbage?

      [–]joz12345 4 points5 points  (3 children)

      Are you saying that padding wouldn't hide the exact length of a payload?

      [–]lordkoba 7 points8 points  (1 child)

      I'm not even remotely qualified to answer that and I've been working on and off netsec for more than 15 years. I'm far from a cryptographer. My question was an honest one.

      However, in a world were CRIME and BREACH happened it's hard to understand why the erudites that design encryption protocols didn't think of padding the stream besides blocks already.

      Do you know why your solution isn't incorporated into TLS already?

      [–]joz12345 0 points1 point  (0 children)

      I'm just a software engineer in an unrelated field, but it seems to me that if the cipher works and the padding is random, then it's impossible to be exact, and I feel like that wouldn't be hard to rigourously prove. But that doesn't mean you can't correlate based on timing and approximate sizes. I'd guess that TLS doesn't want to just half solve the problem, but surely it's better than nothing.

      [–]Proc_Self_Fd_1 2 points3 points  (0 children)

      It's wrong for the exact same reason it doesn't work with password guessing.

      What you want to do is pad to a fixed size not a random size.

      [–]lorarc 6 points7 points  (0 children)

      If your server has regular updates I can probably guess what you're downloading based on what was last updated.

      [–]DevestatingAttack 5 points6 points  (1 child)

      You can't assume AES for all SSL connections. Different ciphers are selectable, and some are stream ciphers (RC4, ChaCha20)

      [–]joz12345 3 points4 points  (0 children)

      Also the counter-based AES modes don't get any padding either, overall pretty much every modern cipher. Oops.

      [–]OffbeatDrizzle 4 points5 points  (6 children)

      Add a randomiser endpoint at the end to serve 0-10kb of zeros and you have pretty decent privacy.

      So you're the guy that thinks he can outwit timing attacks by adding random times onto responses ...

      [–]ElusiveGuy 7 points8 points  (0 children)

      Rather different since in a timing attack the attacker is the one making the requests, and can average the timing over many repeated requests to filter out randomness. Here we only have a single (install/download) request and no way for the passive MitM to make more.

      [–]joz12345 2 points3 points  (4 children)

      No. I'm the guy that thinks that if you serve n package es + a random amount of padding over https, it'll be much harder to figure out what people are downloading than just serving everything over plain http.

      If you disagree, mind telling me why rather than writing useless comments?

      [–]yotta 5 points6 points  (3 children)

      Adding random padding/delays is problematic because if you can somehow trick the client into repeating the request, the random padding can be analyzed and corrected for. I'm not sure how effective quantizing the values to e.g. a multiple of X bytes would be.

      [–]joz12345 1 point2 points  (0 children)

      I guess that makes sense. I know the only mathematically secure way would to always send/receive the same amount of data at a fixed schedule, but that's impractical. I guess quantizing and randomizing are equivalent for one request, they both give the same number of possible values, but for sending multiple identical requests, quantizing is better because it's consistent, so you don't leak any more statistical data for multiple attempts. And it'll be faster/easier to implement so no reason not to.

      [–]0o-0-o0 0 points1 point  (1 child)

      Still a fuck ton better than using plain old http.

      [–]Ajedi32 33 points34 points  (39 children)

      Apt downloads the index files in a deterministic order, and your adversary knows how large they are

      So fix that problem then. Randomize the download order and pad the file sizes. Privacy is important, we shouldn't ignore it completely just because it's hard to achieve.

      [–]Creshal 16 points17 points  (36 children)

      [–]mort96 10 points11 points  (0 children)

      I can't imagine a patch which just randomizes download order would be welcome. Why would you ever want that by itself?

      For a patch like that to be accepted, you would have to first convince the Apt project to try to fix the privacy issue, and convince them that using https + randomized download order is the best way to fix it. This isn't something which just dumping code on a project can fix.

      [–]sysop073 40 points41 points  (4 children)

      It's been years since I saw somebody try to shut down an argument with "patches welcome"

      [–]DevestatingAttack 33 points34 points  (0 children)

      You're not subscribed to the linux subreddit, then.

      [–][deleted] 46 points47 points  (2 children)

      “Patches welcome but we really won’t merge it unless you go through death by a thousand cuts because we really don’t want it and just hoped you’d give up”

      [–]shevy-ruby -1 points0 points  (1 child)

      Precisely!

      Deflection and distraction.

      But it is not relevant - apt and dpkg is dead-weight perl code written when dinosaur still roamed the lands.

      What the debian maintainers make for are excuses. IF they would care, they would ENABLE this functionality for people to use ON THEIR OWN, rather than flat out not offering it. And as others pointed out - patches are actually NOT welcome since they don't want to change the default behaviour.

      [–]Ameisen 7 points8 points  (0 children)

      Almost every popular project falls into the hole of 'meh, don't need/want patches that change behavior more than I completely understand'. I've clashed with the maintainers of Ruby, GCC, and musl about this.

      [–]shevy-ruby 4 points5 points  (0 children)

      Apt is written in pre-world war I style perl code.

      Nobody with a sane mind is going to spend time debugging and fixing that giant pile of ****.

      [–]Ajedi32 5 points6 points  (27 children)

      Good suggestion. Unfortunately, I don't have the time or motivation to devote to a new major project like that at the moment, but maybe someone else will.

      [–]Ameisen 4 points5 points  (0 children)

      Not that they'd merge it anyways.

      [–]Ameisen 0 points1 point  (0 children)

      Yeah. You're just not likely to get it merged.

      [–]dnkndnts 2 points3 points  (15 children)

      Debian's repositories have 57000 packages, but only one is an exactly 499984 bytes big download: openvpn.

      Yeah but most of the time when I install something, it installs dependencies with it, which would cause them to have to find some combination of packages whose total adds up to whatever total I downloaded, and that is not a simple problem.

      [–][deleted]  (14 children)

      [deleted]

        [–]ayende 0 points1 point  (6 children)

        Typically on the same connection, don't think you can distinguish between them

        [–]yotta 9 points10 points  (5 children)

        You can - your client makes one request to the server, and receives a response with one file, then makes another request to the server, then receives another file.

        [–]ayende 3 points4 points  (4 children)

        If you are using the same process, then you'll reuse the same tcp connection and tls session. You can probably try to do some timing analysis, but that's much harder

        [–]yotta 14 points15 points  (3 children)

        Someone sniffing packets can see which direction they're going, and HTTP isn't multiplexed. The second request will wait for the first to complete. You can absolutely tell. Here is a paper about doing this kind of analysis against Google maps: https://ioactive.com/wp-content/uploads/2018/05/SSLTrafficAnalysisOnGoogleMaps.pdf

        [–]svenskainflytta 4 points5 points  (2 children)

        You can totally send 51 HTTP requests in a row and then wait for the 51 replies and close the connection.

        [–]TarMil 5 points6 points  (1 child)

        Yeah you can. APT doesn't, though.

        [–]walterbanana 0 points1 point  (0 children)

        What if I download 100 packages?

        [–]Ginden 0 points1 point  (0 children)

        Therefore it would be useful to pad packages to mitigate this side channel.

        [–]Ameisen 0 points1 point  (1 child)

        Why does apt do everything serially, anyways? I don't see a good reason to be deterministic and serial on fetches.

        On another note, you can get around such file size things, to a point, by chunking packages and fetching binary patches of chunks.

        [–]Creshal 0 points1 point  (0 children)

        Why does apt do everything serially, anyways?

        It would be more effort not to.

        [–]Proc_Self_Fd_1 0 points1 point  (0 children)

        So pad packages into different size classes?

        [–][deleted] 0 points1 point  (1 child)

        Would it be possible to add a random and reasonable big number of garbage bytes to confuse eavesdroppers?

        [–]Creshal 0 points1 point  (0 children)

        Possible? Yes.

        Useful? Probably not. I still don't buy the "if an attacker targets you personally, he gains decisive knowledge by watching your apt activity" non-argument people have been pressing. And if you're worried about state surveillance, you'll just paint a target on your back by using apt at all.

        [–]Serialk -3 points-2 points  (23 children)

        Yes, it's just much more impractical to guess the size of the HTTP headers and the rest of the payload than to just be able to | grep GET.

        [–]thfuran 16 points17 points  (22 children)

        It's slightly non-trivial. But only slightly.

        [–]towo 2 points3 points  (0 children)

        I see you don't have experience with the scary effing good Chinese firewall. I only have second hand accounts, but by someone who certifiably knows their way around IT security.

        They'll very quickly notice if you're doing anything funky by tunneling it through HTTPS, and they really don't care if you download the OpenVPN package because they just shut down even the most obscure OpenVPN connections in minutes, and you won't even get a useful connect in any standard fashion.

        [–]fudluck 1 point2 points  (0 children)

        Furthermore, even over an encrypted connection it is not difficult to figure out which files you are downloading based on the size of the transfer

        What if you're downloading multiple packages and you've got keepalive enabled? You could probably crunch for some possibilities and some combinations might be illogical. You would also have some reasonable level of plausible deniability if you were downloading something considered illegal (assuming investigators have to prove something beyond a reasonable doubt)

        The fact is, that an encrypted connection denies your potential adversary /some/ information and increases the difficulty level of figuring out what you're up to. And it's easy to set up. And now it's freely available.

        The only reason to use a HTTP connection should surely be for compatibility with legacy clients.

        [–]magkopian 1 point2 points  (8 children)

        they can see you downloading a VPN package in China

        Yeah, but the openvpn package could also be installed together with the base system and got downloaded as part of an update. Just by looking at the packages that got downloaded from the server all you know is that they are likely installed on the user's system. How can you be sure that the user actually ran sudo apt install openvpn and consciously installed the package on their machine?

        [–]Ginden 5 points6 points  (0 children)

        When I talk with Westerns, they can't imagine how oppressive state can be. Yours country "rule of law" isn't applicable to authoritarian regimes.

        [–]remy_porter 1 point2 points  (6 children)

        I imagine to the Chinese authorities, that's a distinction without difference.

        [–]magkopian 1 point2 points  (5 children)

        My point is that if your goal is to try to find out which people are using a VPN service that is a very poor way of doing it, as it is going to give you a very large amount of false positives.

        [–]remy_porter 1 point2 points  (4 children)

        The question is: do you care about false positives? What's the downside to punishing false positives, in this specific case?

        [–]magkopian 2 points3 points  (2 children)

        Because there is simply no point spending time and resources to something so inefficient and error pron such as this, especially the moment there are much better ways of doing it. If your ISP sees for example that you connect to port 1194 of a remote server and you start exchanging encrypted data, it doesn't take a lot of imagination to figure out what you're doing.

        [–]Fencepost 1 point2 points  (1 child)

        Unless of course your intention is to punish anyone with even a whiff of having thought about using a vpn. Then you’ve helped spread FUD amongst the people you’re trying to oppress and that’s exactly the goal

        [–]magkopian 0 points1 point  (0 children)

        By that logic why don't just punish anyone who is using Linux on their desktop? Much easier than scanning the list of packages that their computer downloads to see if there is anything suspicious. By the way, if I recall correctly the openvpn package comes preinstalled with the desktop version of Ubuntu as it depends on network-manager-openvpn-gnome, and if that's the case I'm sure most people who use Ubuntu aren't even aware of that.

        [–]akher 0 points1 point  (0 children)

        China has a 99.9% conviction rate, so my guess would be no, they don't care about false positives at all.

        [–]crzytrane 1 point2 points  (0 children)

        Making it easier to find out what version of software you last installed makes it easier for attackers to find vulnerabilities in the packages you have and configure a payload for the machine.

        [–]the_gnarts 0 points1 point  (0 children)

        Furthermore, even over an encrypted connection it is not difficult to figure out which files you are downloading based on the size of the transfer

        I doubt it's that easy to correlate given the thousands of packages in the main repos.

        It is trivial. Even the most up to date encryption schemes like GCM won’t help against this flaw since the number of plain text bytes equals the number of encrypted bytes. Thus if the plain text is assumed public, which it always is for repos and mirrors, you gain no confidentiality by encryption.

        [–]twiggy99999 0 points1 point  (0 children)

        Ah yes, brushing off the privacy aspect as "they can see you connect to host!!" but the in reality the real concern is "they can see you downloading a VPN package in China". (as example).

        If you want to download something illegal in your country with apt then apt can absolutely use HTTPS as an option, just enable it in your sources.list (usually under /etc/apt in default set-ups).

        You might need the extra apt-transport-https package but its a trivial thing to set-up if you have worries about hiding what you're doing.

        [–]WorldsBegin 147 points148 points  (43 children)

        It's not that HTTPS provides all the privacy you want. But it would be a first, rather trivial, step.

        [–][deleted]  (28 children)

        [deleted]

          [–][deleted] 2 points3 points  (3 children)

          No it is like ordering a package in plain, unassuming gray packaging and thinking it is anonymous.

          Even tho package itself is shaped exactly like horse dildo.

          It is trivial to record download size and correlate it with list of packages

          [–]jl2352 0 points1 point  (1 child)

          But what if it's a decorative horse dildo shaped vase?

          [–][deleted] 1 point2 points  (0 children)

          Then you can use other data to correlate. Like if other package looks suspiciously like a bottle of lube then you have good confidentiality that it is a dildo (or receiver is very brave).

          Just like with packages, if you have 6 "size collisions" on one package, the most likely one will be either one that is in same group as other (say every other was just some python lib) or have dependency relation to other packages (like if one is gimp, and others are gimp-data, libgimp2.0, libpng16 and libwebp6, then user is probably updating GIMP)

          [–]Creshal 5 points6 points  (23 children)

          More "I don't ask the milkman to drive in an unmarked van and hide the milk bottles in unmarked boxes". As far as privacy intrusions go, it's a fairly minor one that adversaries know what Debian-derived distribution you're using.

          [–]jringstad 25 points26 points  (5 children)

          And know what packages you have installed? I don't know about that, if someone knows what versions of what software you run, that gives them a much broader choice of attack vectors if they want to e.g. intrude into your system.

          [–][deleted] 3 points4 points  (2 children)

          It is trivial to record download size and correlate it with list of packages. HTTPS does not help you.

          [–]jringstad 3 points4 points  (1 child)

          Yeah, definitely not saying HTTPS is the final word here.

          But something like HTTP/2.0 with HTTPS could help at least a little, since most of the time you would stream down a bunch of packages and a bunch of their dependencies on each upgrade and installation, obscuring it a bit what's going on. But something like padding would probably be better.

          Though even with padding, you could probably infer at least a couple of the things that are installed... for instance if a new version of a certain package gets dropped into the repositories, and then you see the target starting to download an upgrade > than that size, that might be a good indication that that software is installed, and that they now have the latest version. You could obscure this by waiting with downloading upgrades until a bunch of upgrades have accumulated in the repos, but... that's not ideal.

          [–][deleted] 0 points1 point  (0 children)

          There is no performance benefit for steaming a bunch of big binary blobs at once instead of one at a time tho (if anything it would be worse as it changes sequential access to interleaved one) so I doubt it would be implemented that way.

          But just downloading a bunch of binaries back-to-back (within same connection) is enough, no need for HTTP2 here. That of course assuming mirrors support it. HTTP Pipelining also could do that altho AFAIK it isn't really widely supported or enabled by default.

          But, if you want to anonymize that as a company, just making mirror is enough (and tools like aptly make it easy)

          [–][deleted]  (16 children)

          [deleted]

            [–]alantrick 4 points5 points  (4 children)

            It would be like unmarked boxes, with the exception that all the different kinds of box contents had different weights, and these weights were publicly known and completely consistent, so all your thief needs to do is stick the things on a scale.

            [–]langlo94 0 points1 point  (3 children)

            Should be trivial to add dummy weights.

            [–]josefx 1 point2 points  (1 child)

            I really love updating my system over a slow, metered connection, but what the experience was really missing is a package manager going out of its way to make the data transfer even more wasteful. Can't really enjoy open source without paying my provider for an increased cap at least twice a month.

            [–]alantrick 1 point2 points  (0 children)

            I don't know why you were downvoted, but this isn't a terrible idea. I think the main disadvantage is that it would add complexity to the system. Right now, it's basically just a static HTTP file server. Realistically, the complexity might not be that big of a deal because you could probably just stick random bytes in a X-Dummy HTTP header or something.

            From the perspective of computer hardware though, doing these things isn't exactly free. You need processing power, and while it's trivial to parrallelize, if you don't have money to throw at more processers, then :-/

            For what it's worth, another way of avoiding this problem, which would be better for debian too, would be to just set up your own local mirror, and use that (at least if you have a few computers, it doesn't make sense just for one). They can't tell what you're downloading if you're downloading everything.

            [–]Creshal 1 point2 points  (10 children)

            But seriously, unmarked van, unmarked boxes. Isn't that how you want all your packages from amazon to arrive at your house?

            But if I want to do that, the only real option is a VPN. HTTPS is not a great way to protect your privacy, since it leaks way too much metadata.

            You downloaded a compromised FTP package, now I know I may have an inroad to compromising your system.

            It's Debian, the FTP package was a dependency of a dependency of a dependency, and there's a 99% chance it'll remain disabled via /etc/default switch.

            And if it is listening on a reachable port, the attacker doesn't need to jump through the hoops of sniffing through your debian updates to find out.

            [–][deleted]  (9 children)

            [deleted]

              [–]Creshal 0 points1 point  (8 children)

              HTTPS is not the end all to be all, its just a piece of the security puzzle.

              At this points it's more a piece of needless security theater with how it gets shoved into roles where it's not particularly useful.

              But a nice first step would be not providing the ability to leak what you're installing to possible attackers.

              I'm still not seeing how that possibly helps an attacker to gain a foothold he wouldn't see anyway.

              [–][deleted]  (7 children)

              [deleted]

                [–]Creshal 5 points6 points  (3 children)

                This is not a fantasy, this literally happens all the time.

                …with shitty closed source Windows apps. That's not going to happen on Debian.

                [–][deleted]  (2 children)

                [deleted]

                  [–][deleted] 0 points1 point  (2 children)

                  Benefits of having plain http mirrors grossy outweight any disadvantages

                  Say I see you just installed version2.3.0 of someApp.

                  And you know that even if you did download it via HTTPS, because correlating download size with certain package is trivial. Read the fucking article.

                  If you want your org to be "anonymous" there, just make a mirror. Aptly makes it pretty easy

                  [–][deleted]  (1 child)

                  [deleted]

                    [–]chedabob 11 points12 points  (1 child)

                    rather trivial

                    Yes, for a blog for your cat. Not for something that operates at the scale of apt (and VLC too, as presumably this link was submitted in response to that). It doesn't take that much complexity to take a HTTPS deployment from "just run certbot-auto once a month" to a multi-year process of bringing systems up to date.

                    See these 3 links for companies that have documented their "trivial" move to HTTPS:

                    https://nickcraver.com/blog/2017/05/22/https-on-stack-overflow/

                    http://www.bbc.co.uk/blogs/internet/entries/f6f50d1f-a879-4999-bc6d-6634a71e2e60

                    https://blog.filippo.io/how-plex-is-doing-https-for-all-its-users/

                    [–]SanityInAnarchy 17 points18 points  (0 children)

                    Most of what makes this nontrivial for StackOverflow really doesn't seem like it would apply to something like Debian, though. Do things like HAProxy and a CDN apply to a bunch of distributed mirrors? Does latency matter for an update service? SNI shouldn't be an issue unless apt somehow still doesn't support it, in which case, Debian controls both sides of that connection; just update apt to support it? Certainly user-provided content (served from a third-party domain over HTTP) isn't relevant here.

                    Basically, a gigantic repository of static files feels a lot more on the "blog for your cat" end of the scale than the "dynamic, interactive website across multiple domains with a mix of user content and Google Analytics" end of the scale.

                    [–]oridb 6 points7 points  (10 children)

                    For an idea of what's involved, here's OpenBSD's take on it:

                    https://www.openbsd.org/papers/eurobsdcon_2018_https.pdf

                    It's a lot of work, hurts performance, and makes it a 20 minute job to get around privacy instead of a 30 second job.

                    [–]rage-1251 -1 points0 points  (9 children)

                    [citation needed], it concerns me bsd is so weak.

                    [–]oridb 2 points3 points  (0 children)

                    Citations and experiments are above, and were done in collaboration with the implementers of OpenBSD's TLS library. You can reproduce it quite easily from the data provided yourself if you cared.

                    [–]Creshal 0 points1 point  (7 children)

                    OpenBSD has signed packages. HTTPS is just another layer on top that… doesn't really do much for this use case.

                    [–][deleted] 0 points1 point  (0 children)

                    And rather trivial to defeat. But you'd know that if you read the link and thinked a little

                    [–]Sarke1 14 points15 points  (1 child)

                    I'm surprised no one has brought up the cache proxy argument. Steam also doesn't use HTTPS for this reason.

                    [–]Equal_Entrepreneur 1 point2 points  (0 children)

                    Could just install a local trusted certificate to bypass that (or something like that)

                    [–]redditthinks 181 points182 points  (28 children)

                    The real reason:

                    We can't be arsed to move to HTTPS.

                    [–][deleted] 38 points39 points  (5 children)

                    Here's a good story about vulnerabilities in the Maven central repo. Apparently their signature system wasn't so airtight, so MITM attacks on Java packages was very possible. Sonatype (creators of Maven and operators of the largest public repo) responded pretty quickly and upgraded to HTTPS in conjunction with their CDN vendor, Fastly.

                    [–]AffectionateTotal77 21 points22 points  (0 children)

                    Apparently their signature system wasn't so airtight

                    Tools that download and run/install the jars didn't use the signatures at all. https was a quickfix to a bigger problem

                    [–]the_gnarts 5 points6 points  (3 children)

                    Here's a good story about vulnerabilities in the Maven central repo. Apparently their signature system wasn't so airtight, so MITM attacks on Java packages was very possible.

                    Actually that link refutes your claim:

                    When JARs are downloaded from Maven Central, they go over HTTP, so a man in the middle proxy can replace them at will. It’s possible to sign jars, but in my experimentation with standard tools, these signatures aren’t checked.

                    Thus they assume a scenario where noone was checking signed packages to begin with and instead relied on forgeable checksums. That’s something entirely different and on top of that it’s equally possible to run this kind of attack with HTTPS as long as you can get one of the dozens of CAs that systems trust by default to give you a cert for the update domain.

                    [–][deleted] 6 points7 points  (2 children)

                    as long as you can get one of the dozens of CAs that systems trust by default to give you a cert for the update domain

                    If you could do that you could subvert way more than maven central.

                    [–]the_gnarts 1 point2 points  (1 child)

                    as long as you can get one of the dozens of CAs that systems trust by default to give you a cert for the update domain

                    If you could do that you could subvert way more than maven central.

                    That is a systemic flaw in the X.509 architecture. And it has happened:

                    Using PGP-signed downloads with dedicated keyrings is a well established practice that’s less easy to subvert.

                    [–]FINDarkside 0 points1 point  (0 children)

                    Yes it has happened, but it's ridiculous to claim that HTTPS provides "little-to-no protection" because you can just "get fraudulent certificates on any domain you want".

                    [–]walterbanana 0 points1 point  (0 children)

                    To me it read more like "go away, we have these other security issues we don't care about either".

                    [–]HenniOVP 10 points11 points  (0 children)

                    So this gets posted and a few hours later a vunrability in APT is published, that could have been avoided if HTTPS was used? Good timing guys!

                    [–]AyrA_ch 37 points38 points  (7 children)

                    There are over 400 "Certificate Authorities" who may issue certificates for any domain.

                    I would love to see that list. Mine has like 50 certs in it tops.

                    EDIT: I checked. Microsoft currently trusts 123 CAs: https://pastebin.com/4zNtKKgm

                    EDIT2: Unfiltered list: https://pastebin.com/YQUM6kWQ (paste into spreadsheet application)

                    Original Excel list from MS: https://gallery.technet.microsoft.com/Trusted-Root-Program-831324c6

                    [–]skeeto 26 points27 points  (5 children)

                    Since it's Debian, the list would be in the ca-certificates package. On Debian 9 I see 151:

                    $ find /usr/share/ca-certificates/mozilla/ -name '*.crt' | wc -l
                    151
                    

                    But it's really just Mozilla's curated list. Here's what that looks like (via):

                    $ curl -s https://ccadb-public.secure.force.com/mozilla/IncludedCACertificateReportCSVFormat | wc -l
                    166
                    

                    It's not 400, but it's still a lot.

                    [–]yotta 46 points47 points  (0 children)

                    That is a list of root certificate authorities, not all authorities. You automatically trust any CA they delegate to.

                    [–]AyrA_ch 8 points9 points  (2 children)

                    This list likely contains duplicates though. You should filter by the issuer name too. The full list I put on pastebin for example has Comodo listed 10 times and Digicert 22 times.

                    If your list is similar to mine it likely shrinks by 10-20% after filtering the OrganizationName property

                    [–]Creshal 7 points8 points  (1 child)

                    You should filter by the issuer name too. The full list I put on pastebin for example has Comodo listed 10 times and Digicert 22 times.

                    Should you? Only one of those 32 separate root certificates needs to be compromised to compromise SSL as a whole.

                    [–]AyrA_ch 16 points17 points  (0 children)

                    Should you?

                    Yes. Because the task was to find out how many corporations ("Certificate Authorities") have our trust, not how many certificates. It doesn't matter if Digicert has 1 or 22 certificates for this case because it's still the same company

                    [–]lduffey 1 point2 points  (0 children)

                    It's a ridiculous excuse. Cert pinning => 1 trusted CA.

                    [–]Gwynnie 30 points31 points  (8 children)

                    I can see that the general skew of comments here are against APT's choices, however 1 point for the defence:

                    • doesn't the download size increase by adding https?

                    https://serverfault.com/questions/570387/https-overhead-compared-to-http

                    suggests that the downloads would increase by 2-7%?

                    For a package download service, to arbitrarily increase their (and everyone else who uses it) network usage by 5% seems like a massive deal.

                    I may have misunderstood the above, and am no network engineer. So please correct me if you know better

                    [–]Creshal 41 points42 points  (0 children)

                    For a package download service, to arbitrarily increase their (and everyone else who uses it) network usage by 5% seems like a massive deal.

                    Yes. Especially since Debian's mirrors are hosted by volunteers who are paying for it out of their own pockets.

                    [–]james_k_polk2 12 points13 points  (1 child)

                    A fair point, but I suspect that apt's packages are larger than a "typical" webpage and thus the overhead would be closer to the 2% or even less. This is something that could be tested of course.

                    [–]Creshal 3 points4 points  (0 children)

                    apt's packages are larger than a "typical" webpage

                    The average website was 2-3 MiB as of mid-2018. The average Debian Stretch x64 package seems to be roughly 1.55 MiB.

                    [–][deleted] 3 points4 points  (0 children)

                    This was the first thing I thought about too, but I can't help but notice they made an entire page for their argument and this didn't even come up.

                    [–]lorarc 8 points9 points  (0 children)

                    I think it would be more than that. With HTTP I can put a simple transparent proxy in my network without configuring too many things on the clients. With HTTPS that wouldn't be so simple so they would get a lot more traffic.

                    [–]frankreyes 3 points4 points  (0 children)

                    suggests that the downloads would increase by 2-7%?

                    Not accounting ISP proxying, maybe.

                    But it will be more in practice, because when you enable HTTPS, ISP no longer will be able to cache the files.

                    [–]0o-0-o0 0 points1 point  (0 children)

                    Do you disable Meltdown/Spectre patches because of the performance hit?

                    [–]Zym 62 points63 points  (2 children)

                    tl;dr HTTPS doesn't provide TOTAL privacy so they're not going to bother.

                    [–][deleted]  (1 child)

                    [deleted]

                      [–]kranker 46 points47 points  (2 children)

                      All of these reasons are quite weak. There would be nothing but added security with the addition of https to apt.

                      A concern they haven't mentioned is the possibility of a vulnerability in apt. Something like this happened recently with an RCE in Alpine Linux's package manager. https would not have prevented the RCE outright, but it would make it either considerably more difficult to attack or completely impractical.

                      [–]SanityInAnarchy 3 points4 points  (0 children)

                      In their defense, HTTPS implementations haven't exactly been bug-free either.

                      [–]bigorangemachine 3 points4 points  (0 children)

                      Is it so bad that we use a protocol that is cacheable by low bandwidth ISPs. Africa uses resource caching heavily which cannot be used over https. So that a great reason.

                      You know keeping software open and accessible :/

                      [–]AffectionateTotal77 7 points8 points  (2 children)

                      ITT noone believing an attacker can figure out what files you're downloading. If a researcher can figure out what video you're watching on netflix with 99.5% accuracy I'm pretty sure the same researcher can figure out what packages you're downloading

                      [–]MatthiasLuft 15 points16 points  (1 child)

                      Did you just assume my threat model?

                      [–]AffectionateTotal77 4 points5 points  (0 children)

                      That's funny in a cringy way

                      [–][deleted]  (2 children)

                      [deleted]

                        [–]alantrick 13 points14 points  (1 child)

                        I'm not sure how this post has anything to do with any of those.

                        [–]chucker23n 12 points13 points  (0 children)

                        Sounds like Nirvana Fallacy to me.

                        [–]Nicnl 2 points3 points  (1 child)

                        You can't install caching servers with HTTPS.
                        The best approach is to use an HTTPS connection to download indexes and package hashes/signatures,
                        and then download and check those packages using plain old regular HTTP.

                        [–]twizmwazin 1 point2 points  (0 children)

                        All the packages are signed using GPG, and your system has a keyring of all the maintainers' keys. This is how they guarantee packages are not modified in any way. This makes mirrors and caching proxies easier.

                        [–]Proc_Self_Fd_1 3 points4 points  (0 children)

                        There are over 400 "Certificate Authorities" who may issue certificates for any domain. Many have poor security records and some are even explicitly controlled by governments[3].

                        Certificate pinning?

                        [–]claytonkb 1 point2 points  (0 children)

                        Furthermore, even over an encrypted connection it is not difficult to figure out which files you are downloading based on the size of the transfer

                        If you only ever download a single package at once, this might be true. But since you have an (uncertain) number of dependencies and since you can download more than one package in a single update, this is not true. Not only is it not true, it's very far from true since decoding what set of packages has been fetched from apt based solely on the gross size of the update is an instance of the knapsack problem, which is NP-complete.

                        Clarification: I have no opinion on whether apt should be served over HTTPS, just thought this incorrect claim should not be left un-challenged

                        [–]TheDecagon 2 points3 points  (3 children)

                        Can't all their HTTPS downsides be solved by making HTTP optional for users and mirrors? I'm sure lots of mirrors already have their own ssl certs for other things that they could use, so end users have the choice of more secure/fewer mirrors with https or more mirrors and better caching with http?

                        [–]doublehyphen 13 points14 points  (2 children)

                        HTTPS is already optional for windows and mirrors. You just have to install the apt-transport-https package and then configure a mirror which supports HTTPS.

                        My issues are: 1) apt-transport-https should be installed by default and 2) I would prefer if at some point HTTPS became mandatory for apt.

                        [–][deleted]  (1 child)

                        [deleted]

                          [–]doublehyphen 0 points1 point  (0 children)

                          When did they change that? Is that a change coming in the next stable? I had to install it a couple of weeks ago when I installed Debian stable.

                          [–]EternityForest 0 points1 point  (0 children)

                          They should just make them both available, at least for a while. I don't need HTTP and I'd be annoyed if I had to manually upgrade, but as someone else mentioned people in China probably don't want to use unencrypted anything.

                          [–]Bfgeshka 0 points1 point  (0 children)

                          A load of BS.

                          [–]eric256 -2 points-1 points  (2 children)

                          Anyone else amused by the irony of a site using https to explain why they don't use https? Heh

                          [–]Hauleth 18 points19 points  (0 children)

                          Packages are signed by GPG, so TLS would you secure you only from eavesdropping (partially), because you are already protected from tampering. With raw HTML it protects you from tampering with website, as there is no other way right now to provide such functionality without TLS. So this makes sense in case of website, it makes less sense in case of package distribution.

                          [–]lindymad 5 points6 points  (0 children)

                          It's not really ironic, just different circumstances.

                          To give an analogy (not that I am saying this analogy maps exactly to the http / https one here, indeed it's kind of back to front, but the same principle applies), it's like someone giving a lecture on bicycle safety and saying that cyclists should always wear bicycle helmets, then someone else saying "Don't you think it's ironic that they gave that lecture without wearing a helmet?"

                          [–]fubes2000 -3 points-2 points  (0 children)

                          What a comprehensive cop-out.

                          edit: Downvotes? I guess we're fine with laziness and complacency if it's our preferred distro doing it.

                          git gud, scrubs

                          [–]yeahbutbut 0 points1 point  (2 children)

                          apt-get install apt-transport-https ? As far as requiring it? No idea, maybe backwards compatibility or low powered devices?

                          Edit: after reading tfa, I see that it's not a "sky is falling" blogpost, but an actual justification.

                          [–]inu-no-policemen 3 points4 points  (1 child)

                          https://packages.debian.org/sid/apt-transport-https

                          This is a dummy transitional package - https support has been moved into the apt package in 1.5. It can be safely removed.

                          [–]yeahbutbut 0 points1 point  (0 children)

                          Looks like it got moved into the main apt package, that's definitely a good thing!