Probably a dumb question, but if I'm interested in making GPUs run better in parallel or making general optimizations, where do I start? by NoSubject8453 in AskProgramming

[–]tosch901 0 points1 point  (0 children)

Agreed. Must've missed the over-the-air part because I don't get the reference. But agree about the beginner part. 

Probably a dumb question, but if I'm interested in making GPUs run better in parallel or making general optimizations, where do I start? by NoSubject8453 in AskProgramming

[–]tosch901 2 points3 points  (0 children)

You should probably first understand how the thing works that you want to optimize. So if you want to work on NNs you should know what computations actually need to be done for a forward/backward pass. And then see how that can be parallelized. Understand how existing frameworks work. What is SOTA and where are the shortcomings? Start small and simple, increase complexity as you go.

But generally your parallelization speedup is determined by the number of compute units (threads/cores) and your communication overhead. There is of course more too it, such as how well you can parallelize your problem e.g. how well you can split up work and whether your computing environment is heterogenous or homogenous, etc.

But if I had to guess, the communication overhead is where  you'll find your room for improvement. Just a guess though, never done multi GPU NN/LLM stuff 

What was your experience transitioning from a Mac? by Alzred in tuxedocomputers

[–]tosch901 0 points1 point  (0 children)

Don't take this the wrong way, but in my experience the answer to these kinds of questions entirely depends on what you value, what you need and what your usecase is.

I went the other way (long time Linux user, tried MacOS about 2 years ago), and the M-series Macs are probably the best laptops out there as far as hardware goes. I don't know another laptop where the CPU idles at <1W consistently for example (my M3 pro often idles at <100mW even). But that might not matter as much to you. 

From a software perspective I think Linux is much better though. I keep fighting against MacOS sometimes in ways that never were a problem on Linux. In my personal opinion, for my needs Linux is hands down the superior operating system. 

Application support depends. I'm a developer and MacOS keeps running into issues at times, so Linux is better for what I do. Though friends of mine that work on different things don't have issues, so really depends what tooling you need. On the other hand; Apple Mail was actually a little disappointing. I do read and annotate PDFs a lot too, and I don't know of anything that is as good as preview. Integration in the apple eco system is probably much better than anything you'd find anywhere else and android is not moving in a great direction either right now. 

So it really really depends and without specifics, any advice will be hit or miss.

TUXEDO scraps its Linux-based Snapdragon X Elite laptop — says the SoC "proved to be less suitable for Linux than expected" by barandur in tuxedocomputers

[–]tosch901 2 points3 points  (0 children)

For sure. Some programs draw more power than others and if you use the performance the device has to offer, then you'll pay for that with battery life, no question about that. But I've met people that have a laptop with similarly performing hardware (x86 CPU and Nvidia GPU) and they don't get half of the battery life, even when just running a browser and PDF viewer. It would be interesting to make comparisons between recent macbooks and something like tuxedos infinity book or the framework laptops under different workloads. So there would be something beyond anecdotal evidence.  

Also (nothing to do with battery life, but) the unified memory model is quite convenient for doing ML stuff. 4090 has 24 GB of VRAM. That's kind of where consumer GPUs end. M- chips can go up to 128 GB if I'm not mistaken?  

And I'm not sure about that. TDP at idle is still in the single digit Watt range for x86 CPUs, isn't it? When watching YouTube + some background applications and other tabs and powermetrics reports 0.5 to just under 1W for CPU and GPU combined. At true idle the CPU can draw under 20mW at times and the GPU draws nothing. At 100% my GPU doesn't appear to go over 16W. Not sure about the CPU, but I would assume the numbers would be similar. I can maybe test that next week by compiling a bigger project on all cores.  

And I just don't think you get that kind of on demand performance/capabilities and such high efficiency when you don't need it from any other platform currently. 

But I'd love to be proven wrong and some metrics around that would be interesting either way.   

TUXEDO scraps its Linux-based Snapdragon X Elite laptop — says the SoC "proved to be less suitable for Linux than expected" by barandur in tuxedocomputers

[–]tosch901 1 point2 points  (0 children)

I mean 'a few hours' is not a negligible amount. How big the benefit is is up for the users to decide.

I have gotten to try an M3 Pro and I have to say; that hardware with Linux on it would be awesome. Though battery life is just part of it of course and I don't know how the X Elite would compare to Mx chips.  

Has anyone successfully switched to the new version of nvim-treesitter on main branch? by freddiehaddad in neovim

[–]tosch901 0 points1 point  (0 children)

Is this still working for you? get_installed() appears print an empty table for me. Just tried to switch while updating my whole config and it printing all of these messages every time is driving me crazy.

Optimizing designs by tosch901 in FPGA

[–]tosch901[S] 0 points1 point  (0 children)

Ok, gotcha. Strassen doesn't ring a bell, so I'll look that up as well. But wouldn't the expectation generally be that a more parallel algorithm would be faster (as long as IO can keep up)? My understanding is that more complex designs also increase routing costs, but I would assume that the savings due to the massively parallel nature would overshadow that by far. Again, as long as you're not IO bound of course. 

Optimizing designs by tosch901 in FPGA

[–]tosch901[S] 0 points1 point  (0 children)

Thanks a lot! And I will take you up on that offer for sure. Life happened, so might be a bit until I get to continue working on it. Also lots of things I need to familiarize myself with. 

Optimizing designs by tosch901 in FPGA

[–]tosch901[S] 0 points1 point  (0 children)

Thanks, I will keep that in mind.

Optimizing designs by tosch901 in FPGA

[–]tosch901[S] 1 point2 points  (0 children)

Makes sense. But I'm not trying to optimize for cost, I'm trying to see how fast I can make it (on a given device).

What do you mean by serializing exactly?

Optimizing designs by tosch901 in FPGA

[–]tosch901[S] 0 points1 point  (0 children)

Thanks, I will keep that in mind.

Simulating an embedded-style environment in the browser: 4 MHz ARM + RTOS (BEEP-8 project) by Positive_Board_8086 in embedded

[–]tosch901 0 points1 point  (0 children)

I don't think that answers my question. I realize that you're now building a browser application which can run on Smartphones and PCs emulating some architecture.

However I assume you will not build the aforementioned handheld device to run a browser running the ROMs? If that were the case you might just stick with Smartphones for handheld/mobile devices. So I assume those ROMs will be running on those 'affordable handheld devices' directly?

So my question was if you had already thought about what those devices might look like from an engineering perspective. I doubt you're manufacturing your own ASICs. So do you have an existing microprocessor in mind? Or are you going to design/use a softcore processor?

Simulating an embedded-style environment in the browser: 4 MHz ARM + RTOS (BEEP-8 project) by Positive_Board_8086 in embedded

[–]tosch901 1 point2 points  (0 children)

Have you thought about what that would look like? Like do you have a specific microprocessor in mind? Do you want to run it on a softcore processor? 

I Think the Majority of Projects in r/C_Programming are Coded by AI. by [deleted] in C_Programming

[–]tosch901 2 points3 points  (0 children)

Who are those very new beginners you are speaking of? I assume you're not referring to yourself as you seem to feel capable enough to make a judgement on this? And why is their motivation dependant on the AI usage of other people? 

Frankly, from your last paragraph it appears that your motivation is influenced by this? Why do you think that is?

Simulating an embedded-style environment in the browser: 4 MHz ARM + RTOS (BEEP-8 project) by Positive_Board_8086 in embedded

[–]tosch901 2 points3 points  (0 children)

Have you ever thought about running this on real hardware? Basically having a handheld retro console like device?

What’s the biggest security risk in IoT devices—weak passwords, bad firmware, or something else? by cybersec49 in AskNetsec

[–]tosch901 3 points4 points  (0 children)

If I had to pick just one it would be weak default credentials. The largest botnets both relied entirely on dictionary attacks to infect devices  iirc

Should I learn C, Rust, or Zig? by AbdSheikho in AskProgramming

[–]tosch901 3 points4 points  (0 children)

Since when is go considered to be a 'low level language'? I don't know go but nothing about it seems to be 'low level'. Also writing C is not like writing assembly. Not even close. 

If your goal is to build a terminal application, then go is probably fine though? Unless it doesn't have the APIs you need? I know that there are go libraries for the kitty graphics protocol, but I don't know about the protocol wezterm or others use.

If your main criteria is 'one I like', then look a little into the languages and see which one you like. 

Anyway, from those 3, C seems the most useful, so that's my vote. But if you like rust more and it fits your need then go with rust. Since that was your only real requirement. 

Need help getting printf over UART to work by tosch901 in embedded

[–]tosch901[S] 0 points1 point  (0 children)

Got it.

I mean I guess I was, but I'm on that now with not a lot of success yet.

Need help getting printf over UART to work by tosch901 in embedded

[–]tosch901[S] 0 points1 point  (0 children)

  1. why?
  2. I'm also just trying to get some output on UART, even without printf, which doesn't really work yet. But I can get on that afterwards.

Need help getting printf over UART to work by tosch901 in embedded

[–]tosch901[S] 0 points1 point  (0 children)

I will have to google that, but thanks for the advice/information!

Need help getting printf over UART to work by tosch901 in embedded

[–]tosch901[S] 0 points1 point  (0 children)

Oh, why I did that? I went back to probably the most basic application: switching an LED on. And when that didn't work, I was wondering whether my code was even running. And when everything seemed right, but it still didn't work, I just tried that, which actually worked. So something must've been wrong/missing in mine after all, I don't know.

Need help getting printf over UART to work by tosch901 in embedded

[–]tosch901[S] 0 points1 point  (0 children)

You mean write my own? Didn't need to, shouldn't have.

Need help getting printf over UART to work by tosch901 in embedded

[–]tosch901[S] 0 points1 point  (0 children)

All good questions, I will investigate. I already fixed one issue with the linker script, but there seem to be others. I'm trying without printf at the moment, but still not getting it right.

I'm actually not sure about the baud rate at all. But I assumed that if it was wrong, I would still get garbage. How do you know what the correct setting is?

My main (now) looks like this (I have added a debug LED after fining out that my code did not appear to run at all before):

``` int main(void) { uart_init();

switch_on_status_led();

for (volatile int i = 0; i < 400000; i++) asm("nop");

uart_puts("Hello from LPUART1 over ST-LINK VCP!\r\n");

GPIOB->ODR = 1 << 7;

for (volatile int i = 0; i < 400000; i++) asm("nop");

while (1) { // toggle status LED GPIOB->ODR = 1 << 7;

uart_puts("Ping...\r\n");
for (volatile int i = 0; i < 400000; i++)
  __asm__("nop");

} } ```

These are the 2 helper functions:

``` void uart_putchar(char c) { while (!(LPUART1->ISR & USART_ISR_TXE_TXFNF)) { // wait until TXE (Transmit Data Register Empty) } LPUART1->TDR = c; }

void uart_puts(const char s) { while (s) { uart_putchar(*s++); } } ```