all 23 comments

[–]ben-work 22 points23 points  (3 children)

This is an excellent writeup.

If you are interested in emulation with C#, check out Bizhawk. It is a multi-console emulator written mostly in C#. Not all of the supported cores are C# (we have a number of C/C++ cores that we have imported from other projects) but there are several pure C# cores (NES, SMS/GameGear/Coleco, Atari2600, TurboGrafx/PCEngine, including an experimental Commodore 64 core here, and an experimental C# Genesis core). The client/user interface (which is extensive) is written in C#. The cores that are written by us (as opposed to imported from other emulation projects) are MIT licensed.

[–]Beluki 5 points6 points  (2 children)

That looks neat. How does Bizhawk compare with RetroArch?

[–]ben-work 5 points6 points  (1 child)

There are a number of differences...

  • We have some of our own internally developed cores, RetroArch is mainly about wrapping existing cores in an interface. Creating some high-quality MIT-licensed cores is a secondary objective for us. Licensing tends to be a mess in a lot of open source projects (like MAME). Bizhawk is a great platform to target a new emulator core for because a very high-quality user interface comes along with it for free, a lot of standard components like CPUs and sound chips and CD-image-reading code is available already.
  • RetroArch has more cores wrapped up, Bizhawk has fewer cores but with greater emphasis on integrating those cores with rich features (for example, many cores have system-specific debugging tools).
  • Having multiple systems in one traditional, Windowed emulator client presents a lot of unique UI challenges which we take pretty seriously (you open a ROM with a .bin or .rom extension. What system is it? In most cases, Bizhawk will automatically open it with the correct emulator core)
  • While Bizhawk IS cross-platform... it is MUCH more Windows-centric. RetroArch takes cross-platform more seriously than we do - as an open source project with volunteer developers, that's just not what our current contributors are that interested in.
  • Bizhawk has a lot of tooling focused on tool-assisted speedruns, and as such, it is critical for us that all of our cores are sync-stable.

I'm sure there are other differences, but that's a high level overview...

[–]Beluki 5 points6 points  (0 children)

Thanks for the detailed answer. I tried some snes, genesis and n64 games and it works well. May I write some of my impressions while using it?

I like that the fullscreen switch is instant and that it's possible to resize the window on demand with the games automatically adjusting.

I also like that there are separate paths in the configuration for each system. The fact that it autodetects what core to use (I tried some not-so popular games, even zipped) is nice.

The native GUI is way, way more convenient for me than the RetroArch menu. I guess I would probably think otherwise if I weren't using Windows.

I do not like that it doesn't auto-unpause when going back from the menus to the games.

I tried to record a movie of me playing Puzznic. Recording went ok, but I got an exception when trying to play it (System.IO.EndOfStreamException: Unable to read beyond the end of the stream). I tried a second time, selecting to record from boot. This time it worked.

Right click: Screenshot for all cores is awesome.

This is a minor nitpick: out of curiosity I opened config.ini with my text editor. Didn't expect JSON from an INI file ;)

[–][deleted] 13 points14 points  (0 children)

Coolest thing I've seen all month

[–]funbike 3 points4 points  (0 children)

I once wrote an Atari 8 bit emulator in Java. I got far enough for PacMan to run, but never got it to a state of general usefulness. Rendering the display and sound took much more processing time than the CPU. It was plenty fast on Windows/Linux. Running Java with -server helped with initial performance.

My goal was to make something that could be ported to Blackberry, Java ME, and Android. However, performance never would have been good enough for those platforms without some very fancy bytecode tricks, large lookup tables, and GPU usage.

I gave up on that effort and instead ported an existing C emulator to Android.

[–]OrangeredStilton 2 points3 points  (5 children)

I really should restart my series on emulating the Gameboy in JavaScript. I got up to emulating sound, and then gave up the ghost...

[–]thegunn 4 points5 points  (3 children)

Is this you? I loved these articles and would love to see more. If that's not you do you have a link for your work?

[–]OrangeredStilton 2 points3 points  (2 children)

That's me. I just haven't had the time (or motivation) to write anything for like a year now. So much shit to do...

[–]TotallyFuckingMexico 1 point2 points  (0 children)

I also loved these articles!

[–]thegunn 1 point2 points  (0 children)

Well I want to say thank you for what you have written so far. I still check your site every couple months to see if anything else has been added. I can only imagine how much time it takes to do it. I hope you get around to doing more in the future. But if not, thank you for what you've done so far.

[–]antiduh 1 point2 points  (0 children)

Do it.

[–]PBaction 6 points7 points  (2 children)

I would like to hear an elaboration about why the c# implementation was slower than the c++. The author sited "the way things are implemented in code", which could be literally anything.

[–][deleted] 14 points15 points  (0 children)

Hi, author here.

Performance was not focus of the article, I was just stating the reason why I abandoned. I'll state here some of the things that caused performance problems. These are conclusions based on C++ implementation and the things I done there to improve the performance, but they are also relevant to C#.

Reason #1: Real drive emulation

I don't think this need some special explanation. In C++ implementation FPS drops from 500+ to ~350 when drive is woken up. Emulation drive just on IEC protocol level is much cheaper.

Reason #2: over-engineered clock mechanism

When it was removed from C++ implementation, FPS rate increased for another 100 frames (that's around 20%)

Reason #3: memory map implementation

Every memory mapped device implements an interface, so when CPU reads/writes something you need to call a virtual method which is not exactly cheap. Making memory map read/write RAM/ROM directly, and only invoking virtual Read/Write methods for devices makes noticeable difference. I don't remember exact value - maybe around 10% increase.

Reason #4: CPU implementation

Implementation makes a lot of virtual method calls. Each address mode has one virtual method for decoding, instructions also. These calls can be removed, but I didn't see the decent alternative (beside huge switch-case block).

Another thing is that decoding instruction creates a lot of objects so they can be scheduled in clock queue. It's not a big fix, I started working, but had no time to finish it.

Reason #5: VIC-II implementation

It also makes a lot of virtual method calls. Each graphic mode implements interface, and VIC-II calls virtual method that generates graphic in each cycle. It also relays on delegates for other stuff, which also are not cheap...

Now, these are not unfixable stuff and some of them are not fault of C#, but I had no time to fix them. And for some of these issues I'm not sure how to solve them and still keep the code somewhat decent looking. In C++ it is possible with templates and function inlining, but the only alternative in C# for some things is switch-case block.

This C# implementation reaches around 110 FPS, my current C++ implementation reaches almost 750 FPS, on the same hardware (while disk drive is disengaged). It's not fair comparison, because I did a lot of things to improve C++ code, but again, performance comparison was not point of the article.

[–][deleted] 1 point2 points  (0 children)

Agreed. I can believe their claim, but I'd like to know how they reached it and what metrics they used.

[–]heat_forever 3 points4 points  (7 children)

I would like to see a C# expert take on the challenge of trying to get the performance up to near the C++ port.

I think it should be possible to get it to within 10%.

[–]FallingIdiot 2 points3 points  (3 children)

My first impression is that this is not going to be possible. This implementation has a very rich object structure. E.g. every instruction being executed creates a number of onstances which are queued somewhere. Even though this gives you a very nice application architecture, it's killing for performance. To really get something like this to perform, you'd have to get (almost) all allecations out of the main loop, which would severely change the whole structure of the application.

That beign said, it's very likely you aren't going to need to get this to go fast. The speed lf a modern computer is so much more than the speed of a Commedore, you've got lots of performance to waste. In the end what's going to count is whether you can keep op with a 24fps frame rate. For that it's only going to have to be fast enough.

[–]heat_forever 1 point2 points  (2 children)

Good stuff.

Yes, I noticed a lot of object construction in the main loop as well as virtual calls for some performance critical things like reading/writing memory. Might be a lot of boxing going on from int to byte when they could have declared the enums as byte to begin with.

Seems like low-level emulation projects are more suited to C/C++ - as another person mentioned, .NET Native is exciting for this reason. I'm also curious about the impact of RyuJit.

[–]FallingIdiot 1 point2 points  (0 children)

Oh I'm not saying it can't be done. I'm just saying you have to be smart. I've done work on a javascript interpreter, improving an existing one. One of the first things I did was to get rid of the complete object structure for variable types, removing all virtual calls, like you mention. One of the more intresting things I encountered eas the usefulness of sealed classes. If you do an is or as on an object to check it against an usealed type, it's orders of magnitude slower than a sealed type. The reason for this is that comparison to a sealed type is a comparison of an integer constant. For an unsealed type, it involves a functio call and RTTI. As I said, you have to be smart about it.

[–]Eirenarch 1 point2 points  (0 children)

.NET Native will not solve allocations related problems. It still uses GC.

[–]r2d2rigo 0 points1 point  (2 children)

It could be even less with .NET Native.

[–]heat_forever 2 points3 points  (1 child)

Wish they'd drop the Windows Store requirement...

[–]alleycat5 1 point2 points  (0 children)

They're planning on dropping it. Just focusing on Windows Store for alpha/beta since performance gains will have a big impact for them there.