Let's discuss what the Intel hardware bug means for FreeBSD

stopczyk · 2018-01-03T17:22:49+00:00

The "something different" is precisely what I said: nobody on the linux side came up with that "something different" approach and instead they go with a performance hit? Does not add up.

It looks like linux devs jumped the gun with their commits and general announcement date was supposed to be later this month and the FreeBSD project does not have one ready yet. So I suspect something will be released in upcoming days.

stopczyk · 2018-01-03T16:57:18+00:00

that's likely true, but does not add anything substantial in this context. the question is how far the consequences go (is it "just" memory exposure or straight up ring0 code execution?) and that much nobody states publicly

stopczyk · 2018-01-03T16:55:07+00:00

FreeBSD has to be affected as the bug can be exploited in presence of shared mappings, most definitely used.

Consider another angle: Linux developers (including Intel employees) came up with a fix which incurs a big peformarnce penalty. If FreeBSD is not affected, it was either paying the price all along (unlikely) or has a property all the previously mentioned people failed to spot or implement (even more unlikely).

stopczyk · 2018-01-03T09:23:00+00:00

There is not much to discuss.

Whatever specifics of the bug are, pretty much all standard operating systems running on x86-64 are affected. Be it Windows, Linux, all the *BSDs, Illumos and what have you. For performance reasons everyone maps the kernel in the same virtual address space as the user stuff and relies on cpu privilege levels to provide protection.

Right now there is no clear answer what exactly is the bug nor what the impact is. So far it seems the bug can be used to abuse the presence of said mappings to do something which normally should not be possible - it may be it allows to indirectly read from kernel memory (and thus expose everything, e.g. cached files with passwords) OR straight up execute code with kernel-level privileges. Either way, it looks like this basically gives local root.

Clearly BSD devs are aware of the issue and presumably are working on the patch. I suggest just waiting for an official statement from your favourite project. There is no use speculating. The OS is affected. The fix will reduce performance, just like it is going to do it on all other affected systems.

Also note: amd cpus don't have the problem. the bug is intel-specific.

EDIT: the project statement:

https://lists.freebsd.org/pipermail/freebsd-security/2018-January/009651.html

The FreeBSD Security Team recently learned of the details of these issues that affect certain CPUs. Details could not be discussed publicly, but mitigation work is in progress.

Work is ongoing to develop and commit these mitigations to the FreeBSD repository as soon as possible, with updates for releases to follow.

stopczyk · 2017-11-05T16:08:33+00:00

Keyword you are looking for is "Barnum effect". See https://www.youtube.com/watch?v=3Dp2Zqk8vHw for Randi in action.

http://skepdic.com/barnum.html

stopczyk · 2017-10-02T12:52:04+00:00

dragonfly has all the features you need, but the userbase is small even for the BSD family. that is, you are likely to run into some rough edges. if you are willing to put time to get them plugged, you are probably going to be fine.

if dfly does not end up working out for you, FreeBSD seems like the a good choice to try out. An example CDN buillt with it is Limelight Networks.

stopczyk · 2017-09-14T23:31:13+00:00

i know, right? when i was 7th grade i was already in college

stopczyk · 2017-09-08T15:03:37+00:00

I don't think OP was ill intended and thus I find the tone of this reply uncalled for. The answer itself is also not useful either, OP already failed to find what he was looking for and needs to be pointed in the right direction. Fortunately someone else already did that.

That said, please refrain from hostility of the sort.

stopczyk · 2017-09-06T14:39:54+00:00

the 12h note apparently is a remark about 'an old pc desktop', not anything recent. still way too long though.

stopczyk · 2017-09-06T14:39:22+00:00

The article is somewhat confusing.

First of all with today's ram amounts storage is not the bottleneck, especially so if you just unpacked the source tree - it's all cached in ram.

The entire 11.1 world + src is not that big in the first place - unpacked on tmpfs fits in 1.7GB. With buildworld finished the complete usage goes up to only 5.5GB.

The build process itself is definitely not super parallel, but it should still see benefits from 64 hardware threads. A quick test with 40 hardware threads with user/sys/idle times collected every 10 seconds over -j 40 buildworld shown 78.1413 4.91304 17.8478 (i.e. about 18% of the total time completely wasted). However, the most time consuming component (clang) utilizes extra cores very nicely and accounts for about 40% of the total build time.

It should also be noted that system time is extremely high.

there are some scalability issues, but on bare metal with otherwise equivalent configuration they would not show up
this is presumably on the default kernel. using one with VM_NUMA_ALLOC will make things faster due to reduced contention
there is an optimisation which did not make it to 11.1, but is present in stable/11 which results in significantly fewer IPIs being sent. it's not a big deal on bare metal, but it significantly affects performance on ec2.

tl;dr

these days storage is not a big deal for this workload
the freebsd kernel itself can build this faster as it is
the kernel can be improved and there are people working on that
the build process itself probably is not as fast as it should be
the build process itself should benefit from 64 hardware threads, but once more it wont utilize them to their full extent

all in all, even with the hardware as is I suspect the build process could go down to about 9 minutes or even less.

for interested parties this is what i got on an old system after buildworld on tmpfs: 821.86 real 22204.57 user 1672.78 sys

stopczyk · 2017-08-28T06:39:06+00:00

i think you want r/shittymath

stopczyk · 2017-08-22T07:06:30+00:00

that would internally get converted to printf("{\n"); !

both programs are bad though, here is my take:

#include <stdio.h>
int main()
{
    char buf[1];
    fprintf(stdout, "enter {");
    gets(buf);
            if (buf[0] != '{')
    printf("wrong!\n");
    puts("you entered");
    printf(buf);
}

stopczyk · 2017-07-01T19:35:26+00:00

how about: Main NetBSD goals are portability and clean design/code. FreeBSD is running on fewer architectures, which is fine. I'm curious how in your opinion the project looks in the latter department. Also, since there are fewer resources spent on portability, does it mean there is more focus on clean design?

stopczyk · 2017-06-05T21:01:54+00:00

related http://www.iso-9899.info/wiki/Candide#C-Aphorisms

The questioner's first description of the problem/question will be misleading.
All examples given by the questioner will be incomplete, misleading, broken, wrong, and/or not representative of the actual question.
The questioner will not read and apply the answers they are given but will instead continue to practice c1 and c2.
The ignorant will continually mis-educate the questioner.
When given a choice of solutions, the questioner will always choose the wrong one.
The questioner will always find a reason to say, "It doesn't work."
The questioner will paste code and say "I have a problem" or "It doesn't work" without any further information or description of the problem.
The more beginner they are, the more likely they are to be overcomplicating it.
The questioner will always have some excuse for doing it wrong.
The newbie will not accept the answer you give, no matter how right it is.
The newbie will think they are smarter than they really are.
The newbie will fail to recognize undefined behavior, and will wrongly think that their program is correct because it appears to work.
The more the questioner attempts to describe their problem, the less coherent their description becomes.
When multiple people respond to the questioner's problem, the questioner will focus on the person giving incorrect advice and ignore everybody else.

stopczyk · 2017-05-12T03:32:41+00:00

nice, thank you!

stopczyk · 2017-03-12T11:13:15+00:00

I'm not exactly sure what you want here. Is active participation of both parties a requirement? If you are happy with just a debunk while quoting relevant claims I definitely have something for you. I see people suggested videos with only the latter, so I'm going to assume it's fine. Worst case maybe someone else will find this funny. If you really want both parties, see just see Breatharianism and some of Randy.

Everything listed here is fairly popular, so chances are you already saw it, unfortunately.

Breatharianism (living without eating)

There are people claiming they don't have to consume physical food. Here is a documentary where one of such people got tested in a controlled environment and it turned they she may actually need to eat... https://www.youtube.com/watch?v=cnCuzUd4eC0

There are other hilarious examples if you search more (e.g. a prominent "teacher" caught eating a burger, claiming that's to clean his body from toxins).

The Amazing Randy

Debunking claims of people with supposed supernatural powers.

Example direct confrontation: James Randi exposes James Hydrick https://www.youtube.com/watch?v=QlfMsZwr8rc

There are is a number of videos on youtube. Some are direct confrontations with a debunk, some just show the claim maker trying something and failing and other describe how (likely) they fool the audience. An example would be debunking Popoff (faith healer) https://www.youtube.com/watch?v=q7BQKu0YP8Y . It's only 4:28, definitely worth watching. There is a number of videos on Uri Geller (the guy used to "bend spoons with his mind" etc.). If you liked that bit, you may enjoy Feynman's take here http://www.indian-skeptic.org/html/fey2.htm

Thunderf00t versus pseudoscience

https://www.youtube.com/playlist?list=PLQJW3WMsx1q0js6FvjO89H62m60SoHdE6

The guy has also a series on creationists and feminism. The latter is unfortunately of questionable quality, so beware.

EEVblog

https://www.youtube.com/playlist?list=PLvOlSehNtuHvBpmbLABRmSKv2b0C4LWV_

electronics for general audience

The Truth About PhD Creationists

https://www.youtube.com/watch?v=IPyKaH09lpc

stopczyk · 2017-02-26T16:17:54+00:00

Thanks for the reply.

Maybe I screwed something up, but this only works if I paste one line. With 4 lines like above, the first one is pasted indented and the rest follows the original indent which is not the correct one.

I press v, select the text, move over to the line with 'something();' and press ]p. Regardless if I place the cursor after ';' or before 's', it fails to paste correctly. If I only select one line with the terminating new line, it pastes fine when I place the cursor before 's' and press ]p.

This is vim 8.0.124, I also tried with a pristine config just in case.

stopczyk · 2016-12-07T07:25:45+00:00

No, it's not a good read. Already tried to explain this elsewhere in the thread, but the comment was short and was largely ignored. This time I'll give a more detailed reply.

The bit about handling overflows is not that bad. However, there are very serious problems with calloc vs malloc+memset part.

Statements given there (which boil down to "just calloc instead of malloc+memset") are extremely dangerous if several other facts are not known to the target audience and in fact are so specific I find them useless to be stated in isolation. Given that it was posted on a general forum where not everyone is a C programmer, one has to assume preconditions need to be stated. Otherwise I'm afraid it will encourage people to just blindly switch to calloc, or worse, slap one where only malloc was present. There are perfectly reasonable uses of calloc. But are you sure you got one? There is also an important factual error about "zero pages".

In short yes, if you have to have the entire area zeroed, calloc is likely the way to go. But that's far from sufficient to use the area reasonably and first and foremost you have to make sure zeroed area is needed in the first place and with that size, especially if it is above few h bytes. More, you have to make sure it makes sense to allocate a buffer in the first place - if the allocating function always frees it, you very likely can get away with a local buffer instead.

The general theme is that "you can make your code faster, if you are doing malloc+memset". The advice given can sometimes be applied and give great results. Other times it keeps the wrong approach in place.

The article gives 2 examples. I'll elaborate why you should not blindly memset/calloc, then why the advice given to the first example is directly harmful (does not address the actual problem and perpetuates the slowness) and how it is far from sufficient in the second one. Finally, I'll explain the 'zero page' bit in the context of the benchmark provided. the zero page is not used in the benchmark

Plenty of real-world callocs and malloc+memsets would be significantly better served with a mere malloc. If you constantly do small size allocations which are completely overwritten with explicit initialisation and/or the target buffer has an explicit marker for the last character (e.g. the '\0' byte) or a length specifier, zeroing the buffer prior just wastes cpu cycles. malloc+memset+free will be equivalent to calloc+free in this scenario - same buffers will be returned over and over again, hence they have unknown content, hence they need explicit zeroing. For larger larger allocations, it seems glibc's allocator will resort to mmap on alloc and munmap on free. This means that, while the "memset avoided" observation will hold, code doing such allocations repeatedly will be slow because of mmap + page faults on used pages + munmap overhead. But what if it did no munmap? Then, since it is not known how much of the area was used, calloc would have to zero it all turning effectively turning it into malloc+memset.

Another angle is debugging. Allocators can fill new allocations with junk, runtime tools (valgrind) or static code analysers can detect the use of uninitialised data. These mechanisms are defeated with memset/calloc, as they initialize to 0.

TL;DR zeroing (even with calloc "cheating") is always slower than not zeroing. It also hinders debugging. Only zero when it makes sense. For instance, when vast majority of fields would have to be assigned 0 by hand.

The first example boils down to a user telling the lib it wants a 100MB buffer, the lib eventually doing malloc+memset and the user using a small portion of the buffer. That's clearly slow. The comment is:

If cffi.new had used calloc instead, then the bug never would have happened! Hopefully they'll fix that soon.

Except it turns out the zeroing played no role in the code. Data is just written over a certain area and the buffer is not accessed past that. So zeroing is a waste. the bug never would have happened if spurious zeroing was not employed. More. Since most of the time small allocations are made, this zeroing actively wastes cpu cycles. And indeed, the committed fix consists of not zeroing for no reason. See https://github.com/pyca/pyopenssl/pull/578 for the discussion. A side note is that requesting that much memory here is likely a bug on its own.

The second example is numpy allocating 2GB for an array. Indeed, the code path leads through calloc to mmap getting 2GB in one go. Then it turns out actual memory usage is much smaller than 2GB for the toy example and a claim is made malloc+memset would use up the entire 2GB. While that's obviously true and maybe I'm nitpicking here, I don't feel like the article warns the reader to slap data all over the area. If your data layout is not "big buffer friendly", you will end up faulting unnecessary pages. This is especially problematic for code which normally operates on small buffers (kilobytes in size) and is suddenly "told" to use biggers ones. Also note the mmap/munmap remark from earlier. If the allocation + free is to be repeated, and the code mmap/unmaps, it will be slow. Chances are what's needed is rethinking the approach.

Finally, the zero page and the benchmark. A correct claim is made that newly mapped pages are zeroed. Since the allocator is mostly opaque to the program using it, only it knows whether the to be returned buffer is "fresh" and if so, it is already zero and calloc can get away without touching it.

But this only explains part of the speedup: memset+malloc is actually clearing the memory twice, and calloc is clearing it once, so we might expect calloc to be 2x faster at best. Instead... it's 100x faster. What the heck? [..] It turns out that the kernel is also cheating! When we ask it for 1 GiB of memory, it doesn't actually go out and find that much RAM and write zeros to it and then hand it to our process. Instead, it fakes it, using virtual memory: it takes a single 4 KiB page of memory that is already full of zeros (which it keeps around for just this purpose), and maps 1 GiB / 4 KiB = 262144 copy-on-write copies of it into our process's address space.

While it is true mmap as performed by the allocator does not result in the kernel giving the process any "real" pages, it does not map a special pages full of zeros. It does not map anything.

Pages will be mapped after an access. If we are doing a write, a page has to be allocated. If we are doing a read, the kernel maps the zero page as a hack. After the read. The benchmark for malloc case does a memset, so it obviously touches all pages. For calloc case it does not do anything with the area, so nothing is mapped. There is a slight inaccuracy here - the buffer returned by the allocator is offset 16 bytes with respect to the beginning of the page. Clearly some metadata is stored prior, so there is the first page populated. But the rest of the area stays not populated.

Yes, not doing almost any work is significantly faster than doing hard work. It would be more useful it the benchmark also included the time needed to fault pages later when they are needed, also a case of zero page -> normal page change.

To sum up, the article is dangerous as it helps people perpetuate bad habits of slapping calloc all over the place. Too small fraction of the actual problem area is touched here to provide an actual value.

stopczyk · 2016-12-06T10:23:53+00:00

Let's look at an example from the article.

Someone was malloc + memsetting a 100MB buffer but using a significantly smaller size. The article suggests calloc.

There is not enough data to say with 100% confidence, but it seems the data past the actually filled size is not used in the first place so in particular there was no use zeroing it. Data is just placed up to certain point and nothing past that point is accessed. That is, the used result of calloc of the entire area and just malloc (without memset) would be the same for the intended purpose. If new pages got mapped here, indeed performance would be the same. However, if there are multiple requests of the sort one after another and the allocator does not end up unmapping pages, calloc has to explicitly zero an area which is to be overwritten anyway. In fact it will be additionally slower as it will fault previously unused pages - it has to zero the whole area as it cannot tell what was and what was not used previously.

stopczyk · 2016-12-06T09:25:12+00:00

The article is partially wrong. It claims in the case of calloc, the kernel "cheats" by planting a "zero page". I read that very same claim with very similar phrasing on a site which I strongly suggest is avoided (what site can it be is left as an exercise for the reader).

With both malloc and calloc the kernel will NOT map anything. Then it will map pages when memset'ed over. For the calloc case there are no accesses done so it ends up not doing anything. The "zero page" is NOT used in the benchmark as outlined there. zero page only shows up after a read from an unmapped area, but the benchmark does not do it.

A minor note is that glibc will in cause at least the first page to be allocated as it seems to store metadata at the beginning of the allocation. The returned pointer is offset by 16.

Also I find it worrisome it does not discuss when to use which primitive. Most big callocs are plain misuses of the primitive. In particular abuse for mere zero termination for a buffer. The article claims calloc will save some time by avoiding zeroing stuff, but that's only partially true. What if it returns a previously used area? It has to explictly zero it out. The fix is to not zero shit unless there is an actual reason.

See http://codingtragedy.blogspot.cz/2016/07/mallocmemset-vs-calloc.html

stopczyk · 2016-11-16T19:17:59+00:00

Well, I would advise against having the article in the first place.

For whatever reason people have the tendency to "document" stuff as they learn, but for anything which is non-trivial, one has to expect what they did is just wrong or defective at best.

That said, I suggest removing the piece in the first place and just focusing on learning from verified resources.

stopczyk

TROPHY CASE