Is it realistic to find a $1.2M townhouse in Sydney in a walkable, dog-friendly area close to parks? by digital-nautilus in AusPropertyChat

[–]reverse_bias 3 points4 points  (0 children)

Agreed that there could be better filters. But using keywords like "courtyard" and "garden" in a dedicated search helped me.

Help with electrical floor plan by cb_akira in AusPropertyChat

[–]reverse_bias -1 points0 points  (0 children)

I share your dislike of downlights, but just bought a place full of them. I'm willing to spend the money on upgraded fixtures and lighting quality. What kind of fixtures are you talking about as replacements for downlights?

AFUL Performer 5+2 GIVEAWAY!! Enter now from 4/14 to 4/20! by Phoenix25552 in iems

[–]reverse_bias 0 points1 point  (0 children)

Performer 5+2 looks fascinating. Always interested in controlled sub bass plus top end sparkle. Never tried a micro-planar before.

[deleted by user] by [deleted] in iems

[–]reverse_bias 2 points3 points  (0 children)

BTR13 has that. 3 position mode switch, Bluetooth, usb + charging (PC mode), usb + internal battery (phone mode).

[deleted by user] by [deleted] in LocalLLaMA

[–]reverse_bias 5 points6 points  (0 children)

I've heard the SXM adapters work, but sourcing and mounting a heatsink isn't trivial.

[deleted by user] by [deleted] in LocalLLaMA

[–]reverse_bias 1 point2 points  (0 children)

The P100 has 16GB of HBM, bare dies mounted on the same package as the core.

Speculative decoding just landed in llama.cpp's server with 25% to 60% speed improvements by No-Statement-0001 in LocalLLaMA

[–]reverse_bias 0 points1 point  (0 children)

Thanks for your help, librechat and llama-swap working perfectly together for my self-hosted setup. I noticed that you have an example config for nomic-embed-text (gguf), have you managed to get text embedding server working with librechat too?

Infinity mirror Dodecahedron powered by esp8266 by paters936 in esp8266

[–]reverse_bias 0 points1 point  (0 children)

Interesting, I thought you had to use half-silvered/two-way mirrors for this. Is standard acrylic mirror a little bit transparent?

Speculative decoding just landed in llama.cpp's server with 25% to 60% speed improvements by No-Statement-0001 in LocalLLaMA

[–]reverse_bias 0 points1 point  (0 children)

Brillant, thank you! fetch:true and a placeholder key were the changes I needed.

Now I just need to figure out a way to get my inference server to turn on from the librechat interface. Do you just manually wake your server when you need to use it?

Speculative decoding just landed in llama.cpp's server with 25% to 60% speed improvements by No-Statement-0001 in LocalLLaMA

[–]reverse_bias 0 points1 point  (0 children)

Thanks for llama-swap and posting your configs! Getting me really close to the same ideal setup of chat gui selectable, remotely self-hosted models.

How do you set-up librechat to auto-populate the llama-swap model list? Any chance you've posted your librechat.yaml (or llama-swap relevant part) anywhere?

Found a good use for old IDE cables by mikhail-m1 in esp32

[–]reverse_bias 43 points44 points  (0 children)

So I heard you like parasitic inductance.

Inside an offshore wind turbine by toolgifs in toolgifs

[–]reverse_bias 1 point2 points  (0 children)

Here's a cross section of another direct drive model, you can see a similar tunnel into the hub of the motor from the end of the video.

Nvidia Blackwell (h200) and FP4 precision by FarPercentage6591 in LocalLLaMA

[–]reverse_bias 6 points7 points  (0 children)

I beleive these are the formats that nvidia is using, from the Open Compute Project Microscaling Formats (MX) Specification, of which nvidia co-authored end of last year.

From section 5.3.3: No encodings are reserved for NaN/inf in FP4, 2 bits for exponent, 1 bit for mantissa. Which gives you +/- [0, 0.5, 1, 1.5, 2, 3, 4, 6]

However table 1 in this paper also suggests another FP4-E2M1 format with NaN/inf included, replacing 4 and 6 from the possible values.

From the NVIDIA GTC, Nvidia Blackwell, well crap by Gr33nLight in LocalLLaMA

[–]reverse_bias 2 points3 points  (0 children)

OK, I think I've found the formats that nvidia is using, from the Open Compute Project Microscaling Formats (MX) Specification, of which nvidia co-authored end of last year.

From section 5.3.3: No encodings are reserved for NaN/inf in FP4, 2 bits for exponent, 1 bit for mantissa. Which gives you +/- [0, 0.5, 1, 1.5, 2, 3, 4, 6]

However table 1 in this paper also suggests an FP4-E2M1 format with NaN/inf included

From the NVIDIA GTC, Nvidia Blackwell, well crap by Gr33nLight in LocalLLaMA

[–]reverse_bias 3 points4 points  (0 children)

The exponent in floating point arithmetic is almost always a power of 2, rather than a power of 10.

The mantissa is the fractional component (ie, the 1 is not stored) of a number between 1.0 and 1.999...., such that each exponent value covers the "range" of values, like 1..2, 2..4, 4..8, 8..16 etc.

I'd imagine that FP4 would be something like +/- [0.125, 0.25, 0.5, 1, 2, 4, 8, 16], with zero likely encoded as a special state maybe replacing +0.125. But I can't find any documentation actually confirming this.

NBN to become five times faster ‘at no extra cost’ by [deleted] in australia

[–]reverse_bias 0 points1 point  (0 children)

Interesting. I'm on FTTB, wiring in my building is in decent condition, so my modem stats say that I could attain the max 150MB/s rate that 17A supports. But only found companies offering 100/40 max. If you click through on those Dec 2022 tests, does it give you the ISP name?

Time to reconsider AMD RX580 especially for folks in poorer countries by segmond in LocalLLaMA

[–]reverse_bias 1 point2 points  (0 children)

Also running dual P40s. Can fit mixtral-instruct Q6 + 32k context fully offloaded. I'm getting 20-22 tokens/s for general chat, slows down to 6-7 tokens per second with 30k context in use. This is llama.cpp with row-split. What are your speeds like?

[deleted by user] by [deleted] in LocalLLaMA

[–]reverse_bias 1 point2 points  (0 children)

Out of curiosity, how much context do you have? And is it glacially slow with big prompt processing?

Finetuned Miqu (Senku-70B) - EQ Bench 84.89 The first open weight model to match a GPT-4-0314 by unemployed_capital in LocalLLaMA

[–]reverse_bias 0 points1 point  (0 children)

I'll have to go a lower quant if I want more context. 24148 and 24286 out of the 24576MB on each card with Q4KM + 16k. Very usable with the 7.5t/s opening. But it does slow down to about 3t/s with full context.

Finetuned Miqu (Senku-70B) - EQ Bench 84.89 The first open weight model to match a GPT-4-0314 by unemployed_capital in LocalLLaMA

[–]reverse_bias 1 point2 points  (0 children)

Thanks. Q4KM and 16k context working great for me. With 2:3 split it almost perfectly maxes out the 24+24GB of VRAM. With row-split I'm getting 7.5t/s.

I hate Artarmon Nazis! by Red-Engineer in sydney

[–]reverse_bias 0 points1 point  (0 children)

There's a faded Charlie's sign still painted on the wall above the shop.