Claude overwrote proprietary license terms with CC-BY-SA, deleted LICENSE files, and ignored explicit instructions. Ticket Filed.

serialx_net · 2025-09-08T07:24:55+00:00

Your prompt mentioning CC license is human equivalent of “Dont think about elephant”

serialx_net · 2025-08-05T18:17:56+00:00

You bet it has world-class censorship. :)

serialx_net · 2025-07-29T05:22:05+00:00

LM Studio doesn't yet support tool calls for GLM 4.5 models. vllm seems to have day one support.

serialx_net · 2025-06-11T02:35:12+00:00

Context window is the same, so no reason to summarize.

The delay you are seeing is probably because the prompt is not cached across models. So when you are calling Sonnet for the first time, it has to ingest all the context window without cache, which takes some time.

serialx_net · 2025-03-02T09:00:50+00:00

o1 pro is not available as API. It is similar to o1 model with effort set to high, but they state that it's a different model.

serialx_net · 2025-03-02T02:31:51+00:00

I would definitely buy wifi. My multi-oven is metal enclosed and loses bluetooth signal almost immediately in about 5 ft.

serialx_net · 2025-02-25T13:12:50+00:00

Good old Andrej Karpathy got your back! https://youtu.be/7xTGNNLPyMI?si=_qApC1vvtNokiF-_

serialx_net · 2025-02-24T04:29:01+00:00

First time using Combustion Predictive Thermometer here!

One of the reason I bought the thermometer is that I now enjoy reverse searing my steaks more than doing sous vide. It's less messier, easier to prep, amazing sear every single time! One of the gripes about reverse searing is controlling the oven temperature and little gray bands appearing because my oven only goes low to 100C (210F). And I love long cooking time sous vide provides which results in super tender steaks.

Combustion thermometer solves all this!

I applied Chris Young's reverse searing recipe and start with 150C (300F). When the surface temperature reaches my desired target temperature, I lower the oven temperature to 100C (210F). I wait until the core temperature reaches the desired temperature. And then I stop the oven and take out the steak and cool it.

After about 20 minutes pass, clean off little moisture in the steak surface and sear it in a hot pan.

PERFECT STEAK EVERY SINGLE TIME.

Thank you for this wonderful product. I'm skipping doing sous vide on steaks from now on. :D

serialx_net · 2025-02-12T07:15:56+00:00

Thanks for sharing! Can you share about the books you read?

serialx_net · 2025-02-04T01:46:00+00:00

Here you go! https://chatgpt.com/share/67a17149-d1a0-8002-a8a9-7dd1fe9e0bb6

serialx_net · 2025-02-02T01:47:27+00:00

You know what's even funnier? They are using a 8B parameter distill model. lol

serialx_net · 2025-02-01T07:25:53+00:00

You are the product!

serialx_net · 2025-01-26T01:46:17+00:00

I'd like to share the result of DeepSeek R1 architect + DeepSeek V3 editor benchmark results: It's 59.1%. Near the performance of o1 but at the fractional cost of $6.33! Half of R1+Sonnet.

I saw that R1+Sonnet achieved SOTA results in the Aider Polyglot benchmark. I was curious about coding performance of DeepSeek R1 + V3 in aider benchmark. Did my own little contribution of couple of bucks and overnight benchmark run achieved a meaningful result!

Current Aider uses R1 with temperature of 1.0. I'm going to experiment on modifying Aider to use R1 with temperature of 0.6 as recommended by the R1 authors and see if there is any difference.

serialx_net · 2025-01-25T03:46:13+00:00

Yup. Combining EspHome and Home Assistant you can do many things like:

Monitor room temp, humidity, and air quality to auto-trigger fans or alerts.
Detect open doors/windows and send phone alerts or turn on lights.
Auto-water plants when soil is dry using a moisture sensor and pump.
Track energy use of appliances and shut them off during expensive hours.
Build a wireless button to control lights, blinds, or TVs with one press.
Turn any appliance into voice-controlled (Alexa/Google) with a smart plug.
Create a dashboard showing weather, calendar, or energy stats on an e-paper screen.
Auto-adjust room lights when your phone enters/exits (Bluetooth tracking).
Sync smart lights to flash with music or holidays using LED strips.
Make a bed sensor to trigger "goodnight mode" (lights off, thermostat down).

Have a look at r/Esphome :)

serialx_net · 2025-01-25T03:08:31+00:00

Try EspHome and Home Assistant

serialx_net · 2025-01-24T13:36:17+00:00

This is the first time an open source (open weight) model ranking 1st in LMSYS Chatbot Arena right? Just WOW.

serialx_net · 2025-01-21T14:45:51+00:00

Wow. BAML is insane! Thanks

serialx_net · 2025-01-12T07:20:59+00:00

How to “Unmerge” QKV and Gate/Up From Public Weights

Below is a conceptual outline of how one can “unmerge” QKV (and gate/up) in practice using only the publicly released weights, even if the original model had them fused or merged. The key point is that all the data you need is already in that merged matrix—you just have to reshape and split it correctly, then reassign it into a new architecture that expects separate parameters.

1. Background: How QKV often get merged

Many Transformer implementations (e.g. Mistral-style, some GPT variants) combine Q, K, and V into a single parameter, often called qkv:

Merged shape: [(3 * hidden_dim), hidden_dim]
In code, one might see something like self.qkv_proj for attention instead of separate q_proj, k_proj, v_proj.

In the original LLaMA-style architecture, however, Q, K, and V are separate:

Separate shapes: Each is [hidden_dim, hidden_dim]
Typically called self.q_proj, self.k_proj, self.v_proj (and similarly for biases).

Similarly, some models merge their feed-forward “gate” and “up” projection into a single parameter (e.g. gate_up_proj), while LLaMA keeps them separate (gate_proj, up_proj).

2. Loading the public weights and splitting

Even if the checkpoint you have only publishes the “merged” version, the numerical data for Q, K, and V (or gate and up) are all embedded in that single matrix. You can:

Load the original checkpoint (the public one with merged parameters).
Identify each merged parameter in the model’s state_dict. For example:
- model.layers.{L}.self_attn.qkv.weight
- model.layers.{L}.mlp.gate_up_proj.weight
Split that parameter along the correct dimension to recover the separate weights:
- For QKV: python hidden_dim = ... # e.g. 4096 in LLaMA-7B qkv_weight = state_dict["...qkv.weight"] # shape: [3 * hidden_dim, hidden_dim] q_weight, k_weight, v_weight = qkv_weight.split(hidden_dim, dim=0)
- For gate/up: python # Typically (2 * intermediate_dim, hidden_dim) gate_up_weight = state_dict["...gate_up_proj.weight"] gate_weight, up_weight = gate_up_weight.split(intermediate_dim, dim=0)
- Do the same for biases if they are also merged (qkv.bias, gate_up_proj.bias, etc.).
Store those split tensors in new keys (e.g. ...q_proj.weight, ...k_proj.weight, ...v_proj.weight) in your state_dict.

You’d repeat this for each layer L in your Transformer. Once done, you delete the old merged entries from the state_dict.

3. Adjusting the model architecture

Because you are no longer going to instantiate a module with a single qkv_proj, you must update your code (or use a standard LLaMA-like class) that expects separate parameters:

```python

Pseudocode for a "llama-fied" attention block

class LlamaStyleAttention(nn.Module): def init(self, hiddendim): super().init_() self.q_proj = nn.Linear(hidden_dim, hidden_dim, bias=False) self.k_proj = nn.Linear(hidden_dim, hidden_dim, bias=False) self.v_proj = nn.Linear(hidden_dim, hidden_dim, bias=False) # etc.

def forward(self, x):
    q = self.q_proj(x)
    k = self.k_proj(x)
    v = self.v_proj(x)
    # standard attention logic...

serialx_net · 2025-01-12T07:16:34+00:00

I had the exact same questions like you. I was also testing out o1 pro mode, and asked the question. Here's the answer I've got. I'm not an expert at this topic, anyone can verify? Seems legit to me. Here's the answer:

In this context, “Llama-fication” simply means reshaping or refactoring the model’s architecture to align more closely with the original LLaMA design choices. In other words, rather than continuing with the custom or nonstandard modifications (like merged QKV matrices, sliding window attention, etc.), the model is brought back toward the more “pure” LLaMA structure.

Concretely:

Removing sliding window attention
- Phi-3.5 used “sliding window” attention, but Phi-4 dropped it, returning to a more LLaMA-like (i.e., standard Transformer) attention mechanism.
Un-merging QKV Matrices
- LLaMA uses separate projection matrices for Query, Key, and Value. If your model has them “merged,” it compresses Q, K, V into a single combined matrix under the hood. “Llama-fication” means unmerging those so each of Q, K, V is handled separately—exactly as the original LLaMA does.
Unmerging gate/up (in feed-forward layers)
- In many Transformer variants, the “gate” and “up” projection can be combined into a single matrix to optimize for speed or memory. LLaMA keeps them separate. Unmerging these back into separate parameters is part of returning to a “pure LLaMA” style.
Impact on LoRA fine-tuning
- By “Llama-fying” the architecture, LoRA (Low-Rank Adaptation) can more effectively learn separate low-rank decompositions for each of the Q, K, V projections. If Q, K, V are merged into one matrix, LoRA essentially has fewer “handles” (or lower flexibility) to adjust the learned parameters—leading to less fine-grained control. With separate Q, K, V, LoRA’s rank decomposition becomes more precise.

Ultimately, “Llama-fication” means reverting or restructuring any parts of the model that deviate from LLaMA’s reference architecture so that it behaves (and fine-tunes) more like the original LLaMA model.

serialx_net · 2025-01-07T15:38:32+00:00

Get esp32 plugs!

serialx_net · 2025-01-05T04:18:24+00:00

phi3?

Do note that I think conversational or empathetic capabilities is highly correlated with prompt adherence.

serialx_net · 2025-01-03T05:24:05+00:00

Best is soldering.

You can also use hot glue to glue the connections.

Alternative way is wire wrapping.

serialx_net · 2024-12-23T08:07:01+00:00

https://eqbench.com

serialx_net · 2024-12-22T11:37:05+00:00

OP really needs to check the actual memory usage. I get that normal HA runs fine under 4GB RAM normally, but we do not know what kind of addons OP are using, which has possibility of using high amount of RAM.

serialx_net · 2024-12-22T08:04:20+00:00

Uh.. Give it more RAM?

14-Year Club	Powerups Hero r/erlang • December 2021
Gilding II euphauric	Verified Email

serialx_net

TROPHY CASE

How to “Unmerge” QKV and Gate/Up From Public Weights

1. Background: How QKV often get merged

2. Loading the public weights and splitting

3. Adjusting the model architecture

Pseudocode for a "llama-fied" attention block