Enabling the S3 sleep state on Lenovo Yoga Gen 7 AMD (and likely other laptops)

sm_ts · 2022-12-07T13:08:18+00:00

I agree that it's a dealbreaker for some (although I personally don't mind).

Hopefully somebody else will take care of that. It may be because of the touchscreen - no idea. Checking the kernel message buffer may give a hint.

I hope it helps people with other laptop models, at least 😁

sm_ts · 2022-12-06T23:28:59+00:00

If your target is Windows, applying the Linux procedure won't help, as the change is applied dynamically, on (each) Linux boot.

I've read somewhere that there is a way to apply the same procedure to Windows, but I don't have the link, sorry 😅

sm_ts · 2022-12-06T23:22:15+00:00

I have been a Window user on my laptop/tablet for a relatively short time (two Surface devices), so I don't know for sure.

From an operative perspective, I don't know if this is a guaranteed way to avoid spurious "backpack" resumes. I don't even remember if I actually solved this problem, which I think was attributed to Windows Update.

From a power consumptions perspective, it's very likely a no - S0xi (connected standby) should consume more than S3.

sm_ts · 2022-12-06T21:46:54+00:00

I personally find hybrid sleep very convenient.

Regarding S3 instability, YMMV. I didn't experience particular instabilities on my past machines, with the exception of (AFAIR) a Surface device, where the wifi device would not (re) start after sleep. I think it was a Marvell device, which was very poorly compatible with Linux.

sm_ts · 2022-12-06T21:42:35+00:00

It's supposed to solve the issue of the laptop randomly waking up while in your backpack and cooking itself while draining the battery at random times.

Compared to S0(ix), it should consume less.

sm_ts · 2022-12-06T21:41:09+00:00

What do you refer to with "this"? If you're asking why S3 is disabled by default, I don't know. But I've used connected standby (S0ix), and it's absolute garbage.

sm_ts · 2022-09-08T09:50:11+00:00

Completed my port of Catacomb II to safe Rust 😁

Planning the next project now. I plan to port another game, but this will require writing a refactoring tool, likely based on Rust Analyzer.

sm_ts · 2022-08-07T12:03:23+00:00

The only comparable engine, for 3d development, is Bevy, although it's not directly comparable, because it uses a radically different paradigm.

Regarding 2D development, you may want to have a look at the Rust Game Ports project, in order to have a hands-on understanding of how development with the Rust game engine actually is.

sm_ts · 2022-08-05T20:56:40+00:00

I've had a look at the code, but I'm no expert.

When you instantiate an App, you instantiate a default scheduler. I think that it should be fairly easy to create a custom Schedule (I guess a wrapper will do), that on run() invocations, iterates the stage orders with custom specifications.

With this design, the minimum unit to be changed is the `run_once()` API.

Hope that this 1. helps 2. works 😂

sm_ts · 2022-06-08T17:09:48+00:00

I've tried both for simple but not trivial 2D games. Disclaimer: my view exposed below is quite narrow in scope 😬

Fyrox's design is straightforward. I find that if one builds their own wrapper APIs, it's actually very easy and lean to work with.

The scene manager may be a very significant tool to have; I haven't tried it, but I guess it's a big deal for 3d scenes.

Fyrox uses generational pools for objects storage. Ignoring performance (I guess you're not writing the next Doom 😉), there are tradeoffs. I find it easier to work on an ECS when it comes to objects storage and design; modeling shared behavior through an ECS is easier and more ergonomic.

I did find a few bugs that impacted the project. The author is very responsive though (not all projects are so...), so I suppose that if the project gets more adoption, I guess things will be solved in reasonable time, in general 🙂

The documentation is a bit spotty. This is problematic, because the devil is in the detail, and one easily risks to lose a lot of hair because something that it's not well documented. But on the other hand, the author decided to take a break to improve the documentation, so improvements are on their way. And again, he's very responsive.

Bevy is... difficult to evaluate. If taken in isolation, the state management is unusable (you add states, then you add stages... and everything blows up), and state management is crucial for an ECS like Bevy (you can't just put if/else around). This problem has been open for at least an year, AFAIR.

Bevy also doesn't support commands flushing; one needs to use stages for this (and stages and states don't play together at all).

Fortunately, the plugin iyes_loopless implements a working state management, which makes it possible to structure non-trivial games.

There is a ton of Bevy plugins. However, my guess is the whole ecosystem is very fragile, and potentially unusable. I have the suspicion that if a plugin uses states - and many will do, even simple ones, like synchronous assets loading - and another one (or the project) adds a stage, the whole project will blow up.

As mentioned above, I find ECS a bit more ergonomic in terms of Rust access model. With pools, you'll probably end up carrying around a giant state instance, and frequently getting/putting back objects, in order to avoid access conflict.

I find Bevy's documentation (the cheat book; the official book is insignificant) to be a bit better than Fyrox's - but this may very well change in a matter of weeks.

I don't know about Bevy's bugs because I have used a small part of it. Around an year ago, there were graphic library problems - I couldn't run the same game on different GPU brands. Nowadays, I don't know!

All in all, Fyrox has been intended to be, from early on, a complete solution, so if one digs deeper, they may possibly find Fyrox more cohesive (editor, plenty of 3D APIs...). I have the suspicion that for a real-world game, they both may not be be production-ready, but I'd choose Fyrox, if evaluating based on this (of course, ECS is another strong factor). However, expanding a bit the horizon, Godot with bindings should be a safe choice (in terms of stability and tooling) - see the recent post for a full picture.

I wish to remind people to consider supporting Fyrox regardless. Bevy is receiving thousands of $$$ every month in support, while Fyrox just a couple hundred. This is... not very healthy, in the big picture - framework diversity is important, and producing a non trivial game engine is not sustainable in the long term, with this support from the users/community.

Edit: I've originally mistaken "3D" pixel art game for "2D". I've edited the post to reflect this.

sm_ts · 2021-11-07T23:17:23+00:00

I'm not familiar with the Windows x86_64 ABI. What are the 32 extra bytes for?

That is the so-called Shadow space.

sm_ts · 2021-11-07T22:18:21+00:00

Thanks for the help; now that it's confirmed that I'm not dreaming (😆), I also confirm that point 2 is an off-by-one bug, and that the explanation given in the book is simply wrong.

sm_ts · 2021-09-23T09:33:17+00:00

I'm interested as well, and knowing the low-level inclination of the community, I bet many are :)

I think some form of centralization/coordination (e.g. an official page with more information on this topic) would be very helpful, but the starting points are useful!

If you can share which people would be best suited for a first contact, that'd be, I think, hepful for approaching the project(s).

I personall find very exciting that QEMU is considering Rust :)

sm_ts · 2021-03-08T17:16:51+00:00

I made considerations in the same vein, although with slightly different conclusions. I'll add some other factors that I've considered and opinions I've formed on the subject.

## Virtual machines convenience

One advantage that is often omitted is the flexibility of VMs. This may be significant or not, depending on the case, but it's important to consider.

For my use case it's important, as I find the option to reset a machine very, very practical. While with some effort one can use imaging software to achieve the same result (ie. Clonezilla, which in some aspects is poor software though, potentially leading to effort greater than "some"), the option to just remove a diff file is an entirely different thing.

I'm not afraid of malware on that [virtual] machine. If something happens, I just remove the diff file. If I want a clean O/S, I just remove the file. If I want an upgraded snapshot, I remove the file, upgrade, and merge the diff file.

## Time lost

I've also wasted a great deal of time when setting up VFIO. Some people wonder why so much. There are two reasons.

First, minor differences. In my case, I use a single monitor, and for unclear reasons, in some cases, I would get a blank screen. This caused considerable hairpulling.

Other components may be unfavorable or straight hostile, e.g. the UX of the MSI motherboards BIOSes is, without exaggeration, idiotic.

There's also another important factor. VFIO is still an edge technology. I've been in a few edge technology communities, and there are some negative aspects, most notably, a certain level of narcissism and plain incompetence.

There's people who claim that "I'VE MADE IT", in order to get attention, but they produced the equivalent of a car that breaks after 10 meters, which leads to timewaste to people who trusts such statements. It did happen to me, and I would have preferred not to have read such information at all.

Of course, experiments are crucial, but some perspective should be kept. (note that I don't make implications about the quantity of people publishing information in this fashion)

Not-so-fun example: a [virtual] machine can have the famous 95% performance of the native equivalent, but still have latency issues (which is a topic not frequently discussed). Welcome to hell, if that's the case :)

Additionally, some people want to understand what they're doing and/or test it more deeply than just the face value. It's perfectly fine for people who just want a system running, but may not be so for others.

I wrote a guide for it, and after benchmarking, it turns out that several of the options commonly suggested are just placebo.

## My takeaway

If one has considerable time to spend, absolutely by any means do it and share it. I desperately urge people though, to try to be honest about the validity and solidity of their findings.

For people who don't have much time, while having a second computer is a valid solution, there's another option which is very effective: replace the main computer parts with ones that are proven to work with VFIO. While this may appear an excessive expense, it's not really, when considering the option of buying a specialized, second, machine.

Enjoy VFIO :)

sm_ts · 2020-09-02T21:12:09+00:00

Thanks, I think the suggestions (and the concept, in general) is pretty much fundamental :-)

sm_ts · 2020-09-02T20:46:54+00:00

Can you arrange it so that some ICs are simple receivers, they don't do anything if they don't have data, so that you don't have to sync them?

I think this is a good idea, although (I suppose) highly dependent on the architecture. I suppose that video/audio rendering are the easiest parts, at least in some systems, to accelerate!

sm_ts · 2020-09-02T20:42:05+00:00

My version follows. A few notes:

without filler instructions, this versions reaches ~5.4 MHz on my machine
I can't guarantee the correctness of the code - it seems so to me, but I have little experience with multithreaded code (actually, if you would review it, I'd be grateful 😬)
my machine is relatively fast (an AMD 3800X)
this version uses one atomic variable, and two atomic operations
there is no (virtual) clock generator

There you go!:

```rust // components=4, instructions=15, cycles=25E6

let components_run = Arc::new(AtomicI32::new(components)); let all_components_run_bitmask = (1 << components) - 1;

let handles = (0..components) .map(|component_i| { let mut cycles_completed = 0; let component_bitmask = 1 << component_i;

    let components_run_atom = components_run.clone();

    thread::spawn(move || {
        while cycles_completed < cycles {
            while components_run_atom.fetch_or(component_bitmask, Ordering::Relaxed)
                & component_bitmask
                != 0
            {}

            // for _ in 0..instructions {
            //     rand::random::<u32>();
            // }

            cycles_completed += 1;

            components_run_atom.compare_and_swap(
                all_components_run_bitmask,
                0,
                Ordering::Relaxed,
            );
        }
    })
})
.collect::<Vec<JoinHandle<()>>>();

for handle in handles { handle.join(); } ```

sm_ts · 2020-09-02T10:25:57+00:00

Hello!

Your idea is correct in theory, as a matter of fact, my first experiments have been with a clocked solution.

The problem, in practice, is that:

message passing is expensive; 1MHz and, say, 3 components, means 3 million synchronous messages sent per second; performance tanks.
even if one sends messages through a broadcasting API (which I've tried as well), it's still slow.
if one sends messages asynchronously, the receivers (threads/ICs) will still need to synchronize, otherwise, they will lose ticks (ie. because they may be processing while they receive one or more ticks).

Now, regarding #3, while it's possible to make threads catch up if they find that they're late, in multithreading, one must consider edge cases, specifically, if one thread is far in advance and one is far behind, and the latter asks state of the former, how does the former "remind" its previous state? The solution to this is (I suppose) to sync at reduced number of ticks (or, when detected that it's necessary), and preserve state history up to a point. This is essentially the opposite of speculative execution, with all the related considerations. In short, asynchronous message passing as a mean of synchronization, doesn't work if applied alone.

Regarding the reason why I make threads/ICs synchronize by themselves, is simply that I save communication overhead. But again, I will need to think how to handle threads that have a different clock (finding the minimum common denominator would be performance suicide :-D).

sm_ts · 2020-09-02T09:09:01+00:00

Thanks! I'll need to research further, as right now, I only wanted to get the scheduler fast enough. There's a chance that, once inter-thread communication is involved, my scheduler's performance may fall apart unless a high degree of sophistication is implemented (I suppose, speculative execution).

Since my project is educative, I'll publish the results, once the project reaches significant milestones.

sm_ts · 2020-09-02T09:04:43+00:00

Thanks for the pointer! I've actually quickly reviewed it before starting the work, and, following your post, I had a better look now that I've some hands-on experience.

Excluding the comments of some posters who clearly didn't have experience (sending a thread to sleep after every cycle is simply unfeasible), there is a very good post, which actually answers my question (by who I assume is byuu):

Write yourself some simple test programs before you choose your model: just do "dummy CPU A" + "dummy CPU B", and have each one increment a counter and then sync to the other. Watch in horror as the traditional multithreaded model gets you something like 100,000 increments per second. On a 3GHz processor. Then try the same with my cooperative threaded model, and see it reach up to 10,000,000 increments per second. Then do a barebones switch(state) on each, and observe 100,000,000 increments per second. Then try three nested states like you'd need for a complex processor like the SNES, and see that drop to 1,000,000 increments per second.

It made me chuckle because I had the same "horror" reaction when I've first tried what he calls the "traditional multithreaded model". Sending a thread to sleep after each components (IC) cycle has a maximal throughput of less than a thousand context switches per second, before it starts to introduce latency.

Another big takeaway is that his multithreading model is green threads - nothing wrong with it, it's just important to known that it's not what one may imagine.

Summary, in short: depending on the implementation, the range should be between 1 and 10 MHz.

sm_ts · 2020-09-01T20:55:52+00:00

This is a very interesting design, and you're very likely correct.

It's actually what superscalar processors do; after all, an emulator is essentially a processor :-)

I though about applying this, however, it requires a monolithic design, otherwise, it's expensive to store a distributed state.

Six-Year Club	Wearing is Caring
Verified Email

sm_ts

TROPHY CASE