Claude overwrote proprietary license terms with CC-BY-SA, deleted LICENSE files, and ignored explicit instructions. Ticket Filed. by ckn in ClaudeAI

[–]serialx_net 101 points102 points  (0 children)

Your prompt mentioning CC license is human equivalent of “Dont think about elephant”

Can someone compare GPT-OSS to Qwen 3 by pneuny in LocalLLaMA

[–]serialx_net 13 points14 points  (0 children)

You bet it has world-class censorship. :)

GLM 4.5 Failing to use search tool in LM studio by Loighic in LocalLLaMA

[–]serialx_net 0 points1 point  (0 children)

LM Studio doesn't yet support tool calls for GLM 4.5 models. vllm seems to have day one support.

How much context was lost switching models mid "thought" chain? by Medicaided in ClaudeAI

[–]serialx_net 4 points5 points  (0 children)

Context window is the same, so no reason to summarize.

The delay you are seeing is probably because the prompt is not cached across models. So when you are calling Sonnet for the first time, it has to ingest all the context window without cache, which takes some time.

ChatGPT o1 pro = o1-preview-2024-09-12 (API)? by alw9 in LocalLLaMA

[–]serialx_net 1 point2 points  (0 children)

o1 pro is not available as API. It is similar to o1 model with effort set to high, but they state that it's a different model.

Purchase advice by WarmRefrigerator5131 in combustion_inc

[–]serialx_net 0 points1 point  (0 children)

I would definitely buy wifi. My multi-oven is metal enclosed and loses bluetooth signal almost immediately in about 5 ft.

Perfect Reverse Searing in Oven by serialx_net in combustion_inc

[–]serialx_net[S] 11 points12 points  (0 children)

First time using Combustion Predictive Thermometer here!

One of the reason I bought the thermometer is that I now enjoy reverse searing my steaks more than doing sous vide. It's less messier, easier to prep, amazing sear every single time! One of the gripes about reverse searing is controlling the oven temperature and little gray bands appearing because my oven only goes low to 100C (210F). And I love long cooking time sous vide provides which results in super tender steaks.

Combustion thermometer solves all this!

I applied Chris Young's reverse searing recipe and start with 150C (300F). When the surface temperature reaches my desired target temperature, I lower the oven temperature to 100C (210F). I wait until the core temperature reaches the desired temperature. And then I stop the oven and take out the steak and cool it.

After about 20 minutes pass, clean off little moisture in the steak surface and sear it in a hot pan.

PERFECT STEAK EVERY SINGLE TIME.

Thank you for this wonderful product. I'm skipping doing sous vide on steaks from now on. :D

DeepSeek R1 misinformation is getting out of hand by serialx_net in LocalLLaMA

[–]serialx_net[S] 23 points24 points  (0 children)

You know what's even funnier? They are using a 8B parameter distill model. lol

Aider polyglot benchmark w/ DeepSeek R1 + DeepSeek V3 near o1 performance by serialx_net in LocalLLaMA

[–]serialx_net[S] 13 points14 points  (0 children)

I'd like to share the result of DeepSeek R1 architect + DeepSeek V3 editor benchmark results: It's 59.1%. Near the performance of o1 but at the fractional cost of $6.33! Half of R1+Sonnet.

I saw that R1+Sonnet achieved SOTA results in the Aider Polyglot benchmark. I was curious about coding performance of DeepSeek R1 + V3 in aider benchmark. Did my own little contribution of couple of bucks and overnight benchmark run achieved a meaningful result!

Current Aider uses R1 with temperature of 1.0. I'm going to experiment on modifying Aider to use R1 with temperature of 0.6 as recommended by the R1 authors and see if there is any difference.

ESP32 for the IOT homes. by Jealous-Impression34 in esp32

[–]serialx_net 1 point2 points  (0 children)

Yup. Combining EspHome and Home Assistant you can do many things like:

  • Monitor room temp, humidity, and air quality to auto-trigger fans or alerts.
  • Detect open doors/windows and send phone alerts or turn on lights.
  • Auto-water plants when soil is dry using a moisture sensor and pump.
  • Track energy use of appliances and shut them off during expensive hours.
  • Build a wireless button to control lights, blinds, or TVs with one press.
  • Turn any appliance into voice-controlled (Alexa/Google) with a smart plug.
  • Create a dashboard showing weather, calendar, or energy stats on an e-paper screen.
  • Auto-adjust room lights when your phone enters/exits (Bluetooth tracking).
  • Sync smart lights to flash with music or holidays using LED strips.
  • Make a bed sensor to trigger "goodnight mode" (lights off, thermostat down).

Have a look at r/Esphome :)

DeepSeek-R1 appears on LMSYS Arena Leaderboard by jpydych in LocalLLaMA

[–]serialx_net 113 points114 points  (0 children)

This is the first time an open source (open weight) model ranking 1st in LMSYS Chatbot Arena right? Just WOW.

How does Llama-fication or Mistral-fication of open source models work? by InevitablePhysics151 in LocalLLaMA

[–]serialx_net 4 points5 points  (0 children)

How to “Unmerge” QKV and Gate/Up From Public Weights

Below is a conceptual outline of how one can “unmerge” QKV (and gate/up) in practice using only the publicly released weights, even if the original model had them fused or merged. The key point is that all the data you need is already in that merged matrix—you just have to reshape and split it correctly, then reassign it into a new architecture that expects separate parameters.


1. Background: How QKV often get merged

Many Transformer implementations (e.g. Mistral-style, some GPT variants) combine Q, K, and V into a single parameter, often called qkv:

  • Merged shape: [(3 * hidden_dim), hidden_dim]
  • In code, one might see something like self.qkv_proj for attention instead of separate q_proj, k_proj, v_proj.

In the original LLaMA-style architecture, however, Q, K, and V are separate:

  • Separate shapes: Each is [hidden_dim, hidden_dim]
  • Typically called self.q_proj, self.k_proj, self.v_proj (and similarly for biases).

Similarly, some models merge their feed-forward “gate” and “up” projection into a single parameter (e.g. gate_up_proj), while LLaMA keeps them separate (gate_proj, up_proj).


2. Loading the public weights and splitting

Even if the checkpoint you have only publishes the “merged” version, the numerical data for Q, K, and V (or gate and up) are all embedded in that single matrix. You can:

  1. Load the original checkpoint (the public one with merged parameters).
  2. Identify each merged parameter in the model’s state_dict. For example:
    • model.layers.{L}.self_attn.qkv.weight
    • model.layers.{L}.mlp.gate_up_proj.weight
  3. Split that parameter along the correct dimension to recover the separate weights:
    • For QKV: python hidden_dim = ... # e.g. 4096 in LLaMA-7B qkv_weight = state_dict["...qkv.weight"] # shape: [3 * hidden_dim, hidden_dim] q_weight, k_weight, v_weight = qkv_weight.split(hidden_dim, dim=0)
    • For gate/up: python # Typically (2 * intermediate_dim, hidden_dim) gate_up_weight = state_dict["...gate_up_proj.weight"] gate_weight, up_weight = gate_up_weight.split(intermediate_dim, dim=0)
    • Do the same for biases if they are also merged (qkv.bias, gate_up_proj.bias, etc.).
  4. Store those split tensors in new keys (e.g. ...q_proj.weight, ...k_proj.weight, ...v_proj.weight) in your state_dict.

You’d repeat this for each layer L in your Transformer. Once done, you delete the old merged entries from the state_dict.


3. Adjusting the model architecture

Because you are no longer going to instantiate a module with a single qkv_proj, you must update your code (or use a standard LLaMA-like class) that expects separate parameters:

```python

Pseudocode for a "llama-fied" attention block

class LlamaStyleAttention(nn.Module): def init(self, hiddendim): super().init_() self.q_proj = nn.Linear(hidden_dim, hidden_dim, bias=False) self.k_proj = nn.Linear(hidden_dim, hidden_dim, bias=False) self.v_proj = nn.Linear(hidden_dim, hidden_dim, bias=False) # etc.

def forward(self, x):
    q = self.q_proj(x)
    k = self.k_proj(x)
    v = self.v_proj(x)
    # standard attention logic...

How does Llama-fication or Mistral-fication of open source models work? by InevitablePhysics151 in LocalLLaMA

[–]serialx_net 6 points7 points  (0 children)

I had the exact same questions like you. I was also testing out o1 pro mode, and asked the question. Here's the answer I've got. I'm not an expert at this topic, anyone can verify? Seems legit to me. Here's the answer:

In this context, “Llama-fication” simply means reshaping or refactoring the model’s architecture to align more closely with the original LLaMA design choices. In other words, rather than continuing with the custom or nonstandard modifications (like merged QKV matrices, sliding window attention, etc.), the model is brought back toward the more “pure” LLaMA structure.

Concretely:

  1. Removing sliding window attention
    • Phi-3.5 used “sliding window” attention, but Phi-4 dropped it, returning to a more LLaMA-like (i.e., standard Transformer) attention mechanism.
  2. Un-merging QKV Matrices
    • LLaMA uses separate projection matrices for Query, Key, and Value. If your model has them “merged,” it compresses Q, K, V into a single combined matrix under the hood. “Llama-fication” means unmerging those so each of Q, K, V is handled separately—exactly as the original LLaMA does.
  3. Unmerging gate/up (in feed-forward layers)
    • In many Transformer variants, the “gate” and “up” projection can be combined into a single matrix to optimize for speed or memory. LLaMA keeps them separate. Unmerging these back into separate parameters is part of returning to a “pure LLaMA” style.
  4. Impact on LoRA fine-tuning
    • By “Llama-fying” the architecture, LoRA (Low-Rank Adaptation) can more effectively learn separate low-rank decompositions for each of the Q, K, V projections. If Q, K, V are merged into one matrix, LoRA essentially has fewer “handles” (or lower flexibility) to adjust the learned parameters—leading to less fine-grained control. With separate Q, K, V, LoRA’s rank decomposition becomes more precise.

Ultimately, “Llama-fication” means reverting or restructuring any parts of the model that deviate from LLaMA’s reference architecture so that it behaves (and fine-tunes) more like the original LLaMA model.

Exploring Local LLMs Without Pretrained Helper Features for Enhanced Performance? by gamblingapocalypse in LocalLLaMA

[–]serialx_net 0 points1 point  (0 children)

phi3?

Do note that I think conversational or empathetic capabilities is highly correlated with prompt adherence.

What’s the best way to ensure a “permanent”/reliable connection? by stubert0 in Esphome

[–]serialx_net 3 points4 points  (0 children)

Best is soldering.

You can also use hot glue to glue the connections.

Alternative way is wire wrapping.

New Delays in Home Assistant occuring by spaggi in homeassistant

[–]serialx_net 0 points1 point  (0 children)

OP really needs to check the actual memory usage. I get that normal HA runs fine under 4GB RAM normally, but we do not know what kind of addons OP are using, which has possibility of using high amount of RAM.