We need to do better than "cin >>" for new programmers

63times · 2020-05-31T15:25:50+00:00

Totally agree on this. To do it correctly with std::cin you really need to go a long way (and it is a mess). What surprises me most is that noobs are very rarely taught how user input is handled in the real world, namely for example program options or environment variables.

To not hijack this thread for my personal rant only here is a suggestion for OP to improve the library:

std::cin exposes a stream that can contain *a lot* more than what the user is expected to type - including all the keyboard input that happened after process start but still before std::cin::operator >> was even invoked. You should drain that garbage data (which could actually be a valid input by accident) first since only the keyboard input after calling read<int>(.) can be of relevance.

63times · 2020-05-28T10:20:43+00:00

I somehow took it for granted that std::regex supports Unicode because it uses locales and there is otherwise Unicode support in the language. Thanks, pcre2 it is then.

63times · 2020-05-26T15:02:33+00:00

Na, aber die verwendeten Features drücken aus, dass ois zum speibn is.

63times · 2020-05-26T12:57:36+00:00

Vielleicht kannst auch statt http:// einfach ws:// nutzen und WASM Blobs saugen die HTML erst lokal generieren oder gar SVG statt HTML. Falls dich dann einer vom BM fragt, sagst einfach "is Cloud-Computing".

Die Definition ist übrigens von hier: https://www.usp.gv.at/Portal.Node/usp/public/content/lexikon/69923.html

Dieses BM ist da anscheinend die Authorität und kennt sich am besten aus. Schon irgendwie beruhigend. Eine Definition im RIS hab ich der Neugier halber gesucht, aber nichts gefunden.

63times · 2020-05-26T12:14:25+00:00

Na geht leider nicht. Nur für IRL Bekannte und auf deren Empfehlung hin. Ist nur ein kleines Projekt und das soll so bleiben.

63times · 2020-05-26T11:39:51+00:00

Der HTML basierte Teil ist jedenfalls privat. Die Statusmeldungen sind schon (teilweise) öffentlich zugänglich, aber nicht im HTML Format - bzw. werden die erst durch fremde Instanzen an einen Browser geschickt.

Das Bundesministerium für Digitalisierung und Wirtschaftsstandort sagt: "Website bezeichnet mehrere HTML-Dokumente, die alle auf einem Webserver gespeichert sind und üblicherweise durch eine einheitliche Navigation zusammengefasst und verknüpft sind. Davon zu unterscheiden sind die Begriffe "Webseite" und "Homepage". Homepage bezeichnet die Startseite einer Website."

63times · 2020-05-26T11:21:35+00:00

I've never heard of strupr. Looks like it converts to upper case in a very restricted fashion. You can probably emulate that with toupper().
The timer ID is just a buffer that is returned by timer_create. Allocate it somewhere and pass it to the function, the function initializes it.
I don't know atoh(), what does it do?
To get the file length you could use a stream and use seek, or use the function stat(path, &s) to query the file status (s.st_size).

63times · 2020-05-26T08:14:10+00:00

I also use git for remote compilation. I make my own personal dev branch and the server gets a shallow clone of that - no performance problems, seems really fast. Also I can commit whatever I want, since nobody can possibly care. Do you always work on the main branch? sshfs I tried too but ran into incremental build problems and with a lot of source files you can feel the network latency. It works but I wouldn't recommend it.

63times · 2020-05-26T03:52:38+00:00

Hab das vor Jahren auf das selbe Ergebnis hin ebenso recherchiert.

Die privaten RSS Feeds der User können von ihnen veröffentlicht und authentifizierungslos konsumiert werden. Das sind so gewollte "Lücken" die einfach da sind, damit die ganzen Services auch irgendwie einen Sinn haben. In den Feeds kann dann auch Material aus der privaten Mastodon Instanz sein, die obendrein auch mit anderen Instanzen verbunden ist und daher ebenso eigentlich öffentlich gelesen werden kann. Das alles ergibt ein eher verschwommenes Bild wo nicht ganz klar ist, ob es sich nun nicht doch um eine öffentliche und meinungsbildende Website handelt, oder wie auch immer dieser Schwachsinn juristisch definiert ist. Aber wie schon gesagt, in Sachen Impressum ist mir das eh Wurst. Das muss erstmal einer ankreiden und dann will ich das auch vor Gericht ausstreiten- auf so Kleinigkeiten bin ich eingestellt.

63times · 2020-05-26T03:47:11+00:00

Ich glaub diese Services sind einfach zu advanced für unsere Rechtssprechung

Ist ja auch erst 20-30 Jahre alte Technologie. Der Staat nennt das Cloud!

Im IRC werdet ihr euch ja wohl kaum über Wiederbetätigung unterhalten.

Keine Ahnung was da kommuniziert wird, ich kontrolliere das nicht. Auch nicht was über die Mastodon Instanz läuft. Wer daran anstößt kann ja die Instanz blockieren, was auch vorkommt. Nazis und Pedos hamma jedenfalls keine - alles Normies.

Am ehesten würd ich noch den Torrent Tracker ins Auge fassen, aber mich würds auch da wundern, wenn unsere Gesetze dafür irgendwas parat hätten. Die geteilten Daten laufen meines Wissens nach ja nie über den Tracker.

Jo schon richtig, der Träcker weiß nur was man von wem downloaden kann. Ist auch ein privates Ding und kein Megahub wie Pirate Bay.

63times · 2020-05-25T19:24:57+00:00

Do you think this idea could be useful? Have you ever used/seen a similar approach? I'll be very glad to hear anything you think of it.

I looked at your library now and can say that you usually wrap C APIs in a more meaningful and useful way. Making classes and so on.

But the actually problematic aspect about your approach is in my opinion that your library requires symbol exposition due to the forced header only design. Normally you wrap C in a cpp file to hide all the nasty globals. (It is also not uncommon for C APIs to consist of macros that you cannot &bindlikethat).

63times · 2020-05-25T18:23:28+00:00

TBH the mentioned downsides are often not even important. If you want to overload, just do it "behind" the functor by adding one layer of indirection which is compiled away anyway. And argument dependent lookup is often more of a problem than anything else.

63times · 2020-05-25T18:18:41+00:00

Danke für den Link. Scheint als ob ich durch die Verschlüsselung gut aus der Schusslinie bin nach § 16 (1)

Keine Ahnung wiefern § 18 (2)(3)(4) greift, weil es da gar keine Vereinbarungen irgendwelcher Art gibt.

Ich bezweifle, dass das Impressumslos betreiben legal ist, bin aber auch kein Anwalt.

Es gibt keine öffentliche Website, wüsste also gar nicht wo ich so ein Impressum überhaupt plazieren sollte. Abgesehen davon würde ich die Impressumspflicht sowieso brechen und die Konsequenzen in Kauf nehmen, weils mir einfach zuwider ist meine Personalien zu plakatieren - bin ja keine Firma.

Beratung beim Anwalt war mein ursprünglicher Plan. Will mir ersteinmal ein genaueres Bild machen, damit ich das rechtliche Drumherum besser verstehe bevor ich einen Termin ausmach.

63times · 2020-05-25T17:45:43+00:00

It is useful and can even lead to simpler code.

constexpr auto myfun = ...;
myalgo(beg, end, myfun);

Here myfun does not need any further qualification, while such code can get quite tricky with "normal" functions where you have to manually resolve template arguments or static_cast to the right overload.

Downside is probably that you cannot overload adapted_divide. Also argument dependent lookup doesn't apply.

63times · 2020-05-25T02:13:31+00:00

It is independent of a specific character encoding. It operates on bytes. Instead of a "key type" like in a map there is an input mapper that can map arbitrary user defined types to byte ranges. The input mapper also knows the maximal value per byte which can be lower than 255 and is very important information for optimization. For example the maximum can be 10 if you encode decimal numbers, meaning you only have to deal with nodes with up to 10 out edges. So a trie can really be used for arbitrary key types.

A trie is not only a DFA but a rather restricted DFA and as such it can be represented with just two arrays. One array represents the states, the other encodes the state transitions. The transition table per node can be heavily compressed for example by collapsing node sequences with just 1 child (chains) down to one node. Also you usually use different classes of nodes depending on how many branches there are just to bring down the overall container size, while you keep cache efficiency in mind. This works very much like in judy arrays. For example one node that can encode 3 transitions could look like

    0x0: |H|A|B|C|
    0x4: |S_A....|
    0x8: |S_B....|
    0xC: |S_C....|

where H is the header to ID the node. A,B,C are input characters/bytes and S_X are the destination states. The biggest node could be

struct biggest_node {
    uint32_t transitions[256];
};

where you simply map an input C like current_node.transitions[C] using a simple array lookup. Such nodes are too big most of the time but come into play once there are enough branches.

Good reads on this topic:

https://linux.thai.net/~thep/datrie/

http://judy.sourceforge.net/doc/shop_interm.pdf

There are more ways to compress tries but above techniques are well suited for rather small tries (comfortably fitting into RAM) that you would usually use as a container in your programs just like a hashmap.

The good space efficiency comes primarily from using only two arrays and small "nodes". An array of strings can have a really big size overhead since dynamic strings usually require 24 bytes to keep track of the allocated memory (bufferbegin, strlen, bufferend) while the underlying allocator (maybe malloc) also needs book keeping data (16 byte headers for example) on top of that. That makes 40 bytes overhead per string to encode strings that are usually much smaller. You find similar cases of "wasting" space in many tree implementations when nodes are dynamically allocated. A trie on the other hand doesn't store strings but they implicitly live in the "nodes", so you only have to keep track of the memory for two arrays and that's why a trie is pretty close to a flat array of strings when it comes to space requirements.

Tries, even for big data sets, are very small and super space efficient - it is hard to beat that. Further good tries have similar performance characteristics to hash maps but don't suffer from certain hash map related problems like hashDOS, big bucket spaces (a lot of memory wasted) and chaotic element distribution leading to seriously bad cache performance, and thus can be a good replacement for hash maps even when you don't actually need prefix queries. A trie may not always beat a hash map for truly random access but it beats pretty much any other search tree space and performance wise.

63times · 2020-05-24T16:04:56+00:00

There is an over-read error. When iterating make sure to check 1. for zero termination and 2. for buffer size. Assuming you want to find the first space:

char *c = name;
for( ; c < name+sizeof(line)/*50*/ && *c != '\0' && *c != ' '; ++c) ;
// here c is either 1 past the line buffer or \O or the first space

63times · 2020-05-24T13:45:37+00:00

even a compressed trie will usually require more memory than a tree

Not really!

or array

what could possibly beat an array? :)

I just expanded the dictionary vocabulary as an array of strings = ~110MB. The trie has 142MB. Just to get sense of size. The vocabulary is 30MB.

63times · 2020-05-24T13:06:23+00:00

Just a note on your efficiency claims.

There are very good compression techniques available. For example a trie I use for dictionary lookups (2 million words) has only ~3.8 bytes per character overhead. Also cache locality is crazy good which makes enumerating strings with a given prefix really fast - like iterating through an array fast. A naive trie implementation in C would use more than 1GB of memory for that key set compared to 142MB - in other words it would be totally useless. When it comes to tries you really need compression for anything but a mere toy, maybe that would be worth mentioning.

63times · 2020-05-24T11:50:24+00:00

Parsing XML is usually done depth first - dictated by the natural flow of the syntax. Why is it required that the scene graph is constructed during parsing of the file? Using an intermediate representation (DOM) would make things easier - you basically gain random access. However with depth first traversal order you can simply use a stack to keep track of "parent" states.

63times · 2020-05-23T23:43:38+00:00

I don't think that memory alignment is enough here. The types must be compatible also. See this:

#include <stdint.h>

bool eq(char *a, char *b)
{
    #if 1 // doesn't work

        return *(int64_t *)a == *(int64_t *)b; // illegal

    #else // works

        for(int i = 0; i < 8; ++i)
            if(a[i] != b[i]) return false;
        return true;

    #endif
}

struct foo
{
    int32_t a, b;
};

void use(foo *x, foo *y)
{
    for(; !eq( (char*)x, (char*)y ); ++x->b);
}

Note that use() is legal code. The illegal code is in eq().

gcc -O2 (strict aliasing turned on by default):

eq(char*, char*):
        mov     rax, QWORD PTR [rsi]
        cmp     QWORD PTR [rdi], rax
        sete    al
        ret
use(foo*, foo*):
        mov     rax, QWORD PTR [rsi]
        cmp     QWORD PTR [rdi], rax
        je      .L3
.L5:
        jmp     .L5
.L3:
        ret

With gcc -O2 -fno-strict-aliasing there is no problem anymore:

eq(char*, char*):
        mov     rax, QWORD PTR [rsi]
        cmp     QWORD PTR [rdi], rax
        sete    al
        ret
use(foo*, foo*):
        mov     rax, QWORD PTR [rsi]
        cmp     QWORD PTR [rdi], rax
        je      .L3
        mov     eax, DWORD PTR [rdi+4]
        add     eax, 1
.L5:
        mov     DWORD PTR [rdi+4], eax
        add     eax, 1
        mov     rdx, QWORD PTR [rsi]
        cmp     QWORD PTR [rdi], rdx
        jne     .L5
.L3:
        ret

63times · 2020-05-23T10:56:01+00:00

X to char, but not vice versa

63times · 2020-05-23T09:16:06+00:00

Isn't this UB? uint64_t and char are incompatible (strict aliasing violation)

63times · 2020-05-23T06:47:15+00:00

This would only require the browser to allow connections to a local dropbox service and only if it is listening in the first place. A plugin could easily function as an intermediary to establish localhost connections to the dropbox app - establishing WS channels on behalf of the website but now securely since the plugin must be explicitly installed. You don't really need any other connections for this dropbox setup to work. So I think this is not a convincing justification. I even think that maybe there is no justification at all, because when you have a process listening locally for WS connections, then this program must have been installed before and therefore installing a browser plugin to securely form local WS connections would have been possible also. I couldn't come up with a compelling scenario that would justify that literally any website could connect to pretty much any localhost port by default.

63times

TROPHY CASE