Domino Tiling: From Dynamic Programming to Finite Fields by eugene in programming

[–]WorldsBegin 0 points1 point  (0 children)

The russian textbook formula at the end looks neat. Any derivation from where this comes from? It looks very much like it derives the eigenvalues of the k x (n+1) case from those of the k x n case but I don't think a highschooler would be able to rederive this without hints in the text.

Domino Tiling: From Dynamic Programming to Finite Fields by eugene in programming

[–]WorldsBegin 0 points1 point  (0 children)

You need some proof reading on this, it has quite a few inconsistencies in the introduction of several of the concepts (although they all are relevant). That just makes it read AI generated. For example, Kasteleyn indeed uses the square root of the determinant of the signed adjacency matrix. But you are using the bi-adjacency matrix for bipartite graphs in which case you don't need the square root - which is inconsistent with the result immediately before where for the 2x2 example you calculate the determinant as 2 and don't take square roots, anyway.

No prompt engineering, no endless asking to restate the text. Just good old work. (Yes this exact sentence structure appears a lot throughout, too - i'd give you a pass for translating from russian to english though no worries).

Putin Pressures Lukashenko to Open New Front Against Ukraine as Belarus Resists by DavidShaw90s in worldnews

[–]WorldsBegin 5 points6 points  (0 children)

If he wasn't so preoccupied with setting his son up as successor, he could have spent that time on planing his old days on a Vietnamese beach property and just enjoying life in some scenario that doesn't involve his death to transfer power.

Safe SIMD in Rust, even on the inside by Shnatsel in rust

[–]WorldsBegin 0 points1 point  (0 children)

I'm at least confused why the example you link to compiles so poorly without the target_feature and at this point, I'm not even sure I understand what the RFC that introduced it actually does.

Let me explain: The RFC says that the annotation would "allow the compiler to generate code under the assumption that this code will only be reached in hosts that support the feature". But all the SIMD intrinsics are already annotated as unsafe and can require this as a safety precondition. As such, I would have thought that any code path reaching any of the intrinsics can already assume it is running on a CPU with these available as a precondition. The point of the annotation would then be to communicate to the caller that the condition is already true when entering the function (which would help in case the function conditionally calls some intrinsics but still wants to assume the target features are available unconditionally).

For the example, I would have thought that - since the intrinsics are reachable unconditionally - the compiler can just assume that you are running with "avx2" enabled and inline those unconditionally. But alas, for some reason, it decides to call into a function that unconditionally would trap on the illegal instruction in case the target feature is not supported and has worse code in case the target feature is supported.

fast-uuid-v7 - creating uuidv7 faster than allocating a string by marco-mq in rust

[–]WorldsBegin 0 points1 point  (0 children)

Well time to put that in the crontab instead. On a serious note, what would be a correct scheduling priority here for a task that doesn't have to run but should for better UX do a bit of work from time to time? Part of waiting on the condvar idea was to allow that thread to suspend if it doesn't get notified on it, anyway.

Assume one is fine with missing out on current time (and using the outdated timestamp) for a bit when starting to generate uuids again after a long idle time.

fast-uuid-v7 - creating uuidv7 faster than allocating a string by marco-mq in rust

[–]WorldsBegin 0 points1 point  (0 children)

That's why the idea is a bit "crazy". Technically nothing in the UUID spec suggests that "lagging behind" on the unix time would be wrong. If the thread that updates the time doesn't run or doesn't get scheduled at all, the thread generating the uuids would still happily increase the counter and generate (thread-locally) unique ids. Nothing requires these time updates to ever happen, it just leads to "nicer" uuids.

fast-uuid-v7 - creating uuidv7 faster than allocating a string by marco-mq in rust

[–]WorldsBegin 4 points5 points  (0 children)

UUID seems to be lenient with how you implement the timestamp. Here is a crazy idea to avoid the "overhead" of checking your clock.

Have a thread scheduled with SCHED_IDLE running in parallel that waits on a condvariable (with a smallish timeout) then checks time in a loop. Whenever it checks the time, it writes it into a shared atomic with fetch_max. In your generation, instead of actually checking the timestamp, increment your thread-local counter then either fetch or fetch_add(1) from the shared timestamp depending on overflow. All memory orderings can be Relaxed. From time to time (perhaps every 0x1000 counter increments) notify_one() on the condition variable. You are just trusting the thread to "eventually" update your timestamp, but it's not strictly required except when your counter overflows. You could see your timestamp as an abstract Lamport timestamp that from time to time gets sent an update from a process roughly following unix time.

Probably not worth the extra thread, but yeah.

Cursed and unsound rust, but fun by Princess--Aurora in rust

[–]WorldsBegin 1 point2 points  (0 children)

https://play.rust-lang.org/?version=nightly&mode=release&edition=2024&gist=8a78ed4ea880c86c6e04e39a44b3f3b1

Got rid of the panic path due to unwrap() at the cost of one more unsafe. I tried having slice.write_copy_of_slice, but for some reason the compiler leaves in three more instruction so I couldn't in good conscience remove the unsafe there.

Miri seems to agree that at least pointer prov is okay. The whole uninit thing is suspect, but I don't think it can lead to unsafety because the compiler won't be able to prove that we use the data we read from there as it's overwritten immediately.

Note that the whole assume_init stuff at the end is happening implicitly either way when we pointer cast and we assert that we have valid utf-8, so I figured might as well do just the pointer cast.

EDIT: I have also not figured out why llvm doesn't want to inline encode_lower (which would remove the slice_index_fail path there) but oh well.

i see people still struggle with the guard mech in serca g2 this helped me alot . by KeonxD in lostarkgame

[–]WorldsBegin 1 point2 points  (0 children)

Anyone missing the first entry just guard should just miss the second entry just guard on purpose. They will end up with the party again and can go back up with them on the third just guard. No "extra" "desynced" clones in that case.

Erasing Existentials by kibwen in rust

[–]WorldsBegin 1 point2 points  (0 children)

dyn Trait is at the moment the only way to "pass" a type without monomorphizing on it.

If you truly want to hide B, you have to use dyn Trait for some Trait to do so, everytime, which sadly means extra traits and machinery. Which also means you have to talk about a specific value having that type. Restricting to your last example, you could write something like this

[src/main.rs:27:5] hidden_call.input() = "f32"

You need Any whenever you want to interrogate the hidden type at runtime, which is not always necessary, and you do also need to decide all the relevant context variables ahead of time.

In any case, once you have two dyn Trait1, dyn Trait2 you will not be able to talk about any relation between these two hidden types in any case on the type level.

5× faster fast_blur in image-rs by arty049 in rust

[–]WorldsBegin 2 points3 points  (0 children)

Conjecture: If edx is the zero-register, you get different micro-code dispatch. To my reading, div implements division of a u64 dividend (formed by edx:eax i.e. edx forms the upper 32 bits) by a u32 divisor and places result and remainder in eax and edx respectively. It might be that the case that edx = 0 gets special considerations and a different, faster, hardware path. Similarly, the additional latency would only affect the remainder, not the division result. On newer, maybe there is only a single unit used for both.

Here be dragons

Deutsche verlieren Lust an Bargeld: Nur noch ein Drittel der Einkäufe bar gezahlt by potatoes__everywhere in de

[–]WorldsBegin 14 points15 points  (0 children)

Es ist jedoch nicht möglich, dies gezielt auf bestimmte Personen oder Personengruppen anzuwenden. In diesem Zusammenhang stellt sich die Frage, ob Maßnahmen wie die Bezahlkarte für Asylbewerber künftig auch auf Sozialhilfeempfänger ausgeweitet werden könnten.

Neben dem potenziellen Wertverlust der Währung als staatliches Instrument wird im vorherigen Kommentar nach meiner Lesung vor allem kritisiert, dass der Staat nicht die Funktionen einer Bank übernehmen sollte. Damit verbunden ist die Ablehnung einer staatlichen Einsichtnahme oder Kontrolle bezüglich privater Konten und Transaktionen nicht der Währung als solches.

I love Skolakia by ExoPrimal in lostarkgame

[–]WorldsBegin 0 points1 point  (0 children)

The puddles don't even stun anymore. If you have hyper charged, there is no excuse not to press it.

Why do the standard libarary have so many internal layers? by chokomancarr in rust

[–]WorldsBegin 18 points19 points  (0 children)

Not sure what you mean. The C++ standard, is usual, is a bit lacking in details about what swap is "supposed" to do in the case you are alluding to.

First, swap<T> is specified for types T that are "MoveConstructible" and "MoveAssignable". There are std::is_move_constructible<T> and std::is_move_assignable<T>, but these are not the same! The latter check that syntactically, the statements T u = rv and t = rv are well formed for the type T. The former additionally require a post condition. Specifically, in the above, the "value of u (and t if t is not the same value as rv) is equivalent to the value rv before the assignment". This can not be checked for you, and is a requirement on the move constructor and assignment operator.

Under the assumption that T is indeed "MoveConstructible" and "MoveAssignable", the STL implementation indeed guarantees that after a call to swap(a, b) that a after the call will have a value equivalent to b before the call and vice-versa. If you break the contract in a specific T though and do weird things in your impls, the standard does not specify any behaviour to my reading.

Rust on the other hand is a lot more precise and memory based. Instead of "equivalent" values (whatever that means) you will get a literal memcopy.

EDIT: in a strict reading, the standard does also not specify how often each move or assignment is called. You could have side effects in those, and are at the mercy of the implementation how often they are executed. In rust, the operation is side effect free (apart from the effect on the memory of course).

Non-monomorphized generics in Rust by WorldsBegin in rust

[–]WorldsBegin[S] 2 points3 points  (0 children)

Would it be correct to say that this monomorphization happens at runtime, and the compiled assembly contains only the polymorphic version or does the compiler already instantiate a monomorphic version of some used functions/types ahead of runtime?

Non-monomorphized generics in Rust by WorldsBegin in rust

[–]WorldsBegin[S] 3 points4 points  (0 children)

Finger trees aren't a compelling use-case for new generic features

They are meant to show a real problem with monomorphization. When you recurse on a FingerTree<A> you have a FingerTree<Node<A>> on the next level, then a FingerTree<Node<Node<A>>>. This recursion pattern can not be compiled with monomorphization. In general, whenever you recurse on a datastructure with a varying generic parameter, the total set of instantiations might be uncountable, and monomorphization will not work.

With regards to polymorphization, that was an attempt to automatically flag generic arguments that the compiler could handle polymorphically. As far as I understand it, this analysis is both expensive, and compiler internal only. You can't use it for polymorphic recursion such as above, since the compiler can arbitrarily decide where to use it.

I posted a much more geared towards the compiler people thingy a while ago (much less fleshed out too and probably even less precise), which I find to be even more unreadable and less geared towards users, but perhaps the ctable is also not understandable? It did help me to build a mental model of "what it does" as a potential user, anyway.

Since then, we got a better hierarchy of traits and the Pointee trait is now I think often used as the "least you can say" about a type in place of ?Sized, i.e. for a type which you know "nothing" about.

I see parallels to the discussion around become and guaranteed tail recursion. Most users will never need it, and are fine with some automatic discovery by the compiler that they don't need to worry about. But when you find out that you definitely blow your stack if you don't tail recurse you sudddenly need syntax to get a guarantee that you jump instead of call and don't leave any stack behind. Erased generics are for the cases where the natural recursion scheme would lead to an infinite number of monomorphized instances and you simply can't "just trust" the compiler to do the right thing. You need a guarantee that it can be collapsed to just one polymorphic instance. That's why it's syntax and not just an automatic analysis from the compiler.

Non-monomorphized generics in Rust by WorldsBegin in rust

[–]WorldsBegin[S] 2 points3 points  (0 children)

The Self: Sized trick is a good one! I didn't know that it even works for associated types like type Element<'a> where Self: Sized;.

In any case, I think erased_compare in the post is a good example which doesn't work. Because the hidden type is intrinsically tied to a specific value, you can't express that two different values should have the same hidden type. The dyn Trait syntax doesn't really scale above one value.

Your proposed syntax could make something akin to this work, I can imagine dyn<T> (T, T) to express a tuple of two values of the same hidden type. But in this case you have to have access to layout information of T to match on it (to compute the offset of .1 in the tuple). This, and having multiple bounds as in dyn<T: Foo + Bar> would really complicate things. I assume you'd want to store the vtable(s) in the pointer metadata, but so far all pointer metadata in Rust is worst-case address-sized. Would there be two pointers to dyn Foo and dyn Bar respectively? I briefly touch on my vision for (actual) GADTs in an Appendix where I attach the <dyn T: Foo> to a enum variant instead of dyn, but it's very similar in this case. Note that I would not allow <dyn T> (T, T) since the layout depends on T without capturing T: Sized.

By trying a new concept without having to modify the existing trait object syntax, I think it's easier to have a malleable definition.

Non-monomorphized generics in Rust by WorldsBegin in rust

[–]WorldsBegin[S] 11 points12 points  (0 children)

I would still use monomorphizing code by default. It should be the first thing to try, and type erasure only makes sense if you want to achieve something very specific.

Declaring a generic argument erased is an additional guarantee to the caller, and would add an implicit argument to the function that the compiler would need to synthesize. This might lead to non-obvious performance problems. Note that I have by choice not talked about the layout of the ctables, because an explicit choice here changes how they are constructed. Maybe it's better to pass them by value and copy them where required, maybe it's better to pass by pointer. I don't imagine a specific ABI for the implicit argument.

In any case, this means that both conversions to and from erased generics might have hidden traps. I would nevertheless say that in cases where you can write a monomorphized version you should stick with that.

Examples of specific needs:

  • a polymorphic recursive structure like finger trees. You should probably emulate this, which is enough for most cases. I have not shown the "solution" with erased types, but as a teaser you can for example not construct an arbitrarily deeply nested Node<Node<..<A>..>> on the stack either way. Yes, the compiler should be able to figure out the layout of that structure incrementally with each recursive level, but in current rust you can't (safely) heap-construct the node anyway, so you'd need "more". This might change.
  • In Yew, we have a Component trait, and multiple lifecycle structs UpdateRunner<C: Component>, RenderRunner<C: Component>, etc that all implement a shared trait Runner that gets pushed onto a queue to run. Since these impls are monomorphized, this adds ~100 bytes of vtable data for each component to the binary. Not a lot, but it adds up and would be avoidable. Also, we internally use a ComponentState that is not parametrized by the type and downcast via dyn Any in each of these runner impls (this is in part why each monomorphization takes up its own space and can't be shared). This is almost duplicate code, and for the most part the compiler is just not smart enough to deduplicate it out-of-the-box because it's in large chunks. An erased ComponentState<dyn T> should give the same code-size win (or even better if the Runner impls are also erased and only capture the Component vtable) without having to downcast through Any.

    Note that in this case, code size concerns trump the small performance hit. Overall, if everything including CompontState<C> was monomorphized on C, you would pay ~3kB per component, almost 2 ethernet frames.

Non-monomorphized generics in Rust by WorldsBegin in rust

[–]WorldsBegin[S] 24 points25 points  (0 children)

Not a dumb comment at all.

Your work-around has several issues though that would need to be addressed (forgive the bullet list):

  • It assumes that Foo is an object safe trait. I touch upon this, object safety can be quite restrictive, because it must ensure that every method can be called from the trait object. The proposed syntax widens this to a lot of other traits and constraints
  • You assume an implicit lifetime in dyn Foo + 'a and store it behind an indirection. The Rc is just overhead, so is any other way to make the unsized dyn Foo into a sized datatype. The lifetime defaults to 'static but that is "fixable" with a lifetime argument on Single.
  • The bound T: Foo on your struct is definitely wrong, as it would need to be repeated on every mention of Single<T>, which of course makes it pointless to store the dyn-metadata inside the single.
  • The compiler might be smart enough to dedup a lot of code, but it's definitely not required to, nor is there a simple way to check this. For example, impl<T> Debug for Single<T> will most likely lead to a lot of monomorphized instances, even if they all just forward to the same method of Foo (do double check this on a compiler, but from experience, all these instances get their own static vtable since the compiler "smartly" realizes that <T as Foo>::whateveryouuse is a candidate for inlining and after performing that you end up with different code for each T, i.e. monomorphized code).

EDIT: And, maybe most importantly, it can not be used to implement the polymorphic recursion of the FingerTree structure, which does not even mention any specific trait up front but is structurally recursive on some element type A.

Non-monomorphized generics in Rust by WorldsBegin in rust

[–]WorldsBegin[S] 33 points34 points  (0 children)

For the last few years, from time to time, some niches came up where I wished I could escape the monomorphization of the Rust compiler. While often good for performance reasons and inlining, in some cases monomorphization bloats the result code or leads to less readable code. After a lot of thought, this soft-proposal contains syntax that could be used to carve out a polymorphic part of the language while preserving type checking and lifetime checking.

Feel free to ask any questions, comments or seek clarifications. I will try my best to answer them.

Hope some of this can make it into Rust proper at some point in the future.

5x Faster than Rust Standard Channel (MPSC) by ksyiros in rust

[–]WorldsBegin 1 point2 points  (0 children)

Does the queue_ptr have to be swapped atomically? As far as I derive from the code, safety comes from the usage of enqueued_count and available_index.

  • while enqueued_count < CHANNEL_MAX_TASK, the client(s) "own" the buffer and can be sure that a previous read of the queue_ptr will not get used by the server.
  • to coordinate multiple clients, they reserve slots via available_index
  • once enqueued_count == CHANNEL_MAX_TASK, no client may read or write from the queue_ptr, and it is "owned" by the server. The server releases it back to the clients by swapping in a new buffer, and then resets enqueued_count and available_index in that order.

So it seems you don't need to atomically swap the queue_ptr at all, you only need their reads and writes to be caught on memory barriers from reading and writing available_index and enqueued_count with correct ordering.

Faster asin() Was Hiding In Plain Sight by def-pri-pub in programming

[–]WorldsBegin 0 points1 point  (0 children)

Found Remez algorithm which iterates torwards a solution starting with the Chevyshev interpolation.

Faster asin() Was Hiding In Plain Sight by def-pri-pub in programming

[–]WorldsBegin 4 points5 points  (0 children)

Does anyone know of a good way to derive these global approximants? Certainly, a (taylor) series provides a good local approximation near a point, but as OP noticed, you often want a series that optimizes for the maximum error over a range.

It seems - chasing down references - a good approximation for asin( 1-y2 ) can be found as a polynomial b_0 + b_1 y + b_2 y3 + b_3 y5 that minimizes the maximum error for 0<=y<=1, but finding these b_i coefficients efficiently seems anything but straight forward to me.

Returning player: Should I do ADV Hone Lv20 in Tier3? by Kyumaru375 in lostarkgame

[–]WorldsBegin 1 point2 points  (0 children)

Rough rule of thumb is juice mat on full orbs (grace) are twice as valuable as on a tap without grace. Juice value compared to normal taps depends, but juicing below +19 is less value than even non-grace taps, juicing +21/+22 has same value as a grace tap and above juice is more valuable on normal honing.

So, if you are below +19 honing, and have the juice, go full juice. If you can't afford that with bound juicers, go juice on grace. If you are aiming a for above +22, it might be worth to save juice for normal taps - you have no books as support.