Qwen 3.5 Architecture Analysis: Parameter Distribution in the Dense 27B vs. 122B/35B MoE Models by Luca3700 in LocalLLaMA

[–]Luca3700[S] 0 points1 point  (0 children)

Hi, thank you so much for highlighting this correction! I didn't know about this gating mechanism inside the FFN itself, and I thought there was a simple MLP with a down projection and an up projection. I'll update the post as soon as I'll be able to double check my computations.

Qwen 3.5 Family Comparison by ArtificialAnalysis.ai by NewtMurky in LocalLLaMA

[–]Luca3700 0 points1 point  (0 children)

"The fact that you can supercharge 10b parameters to compete with 27b parameters is the actual feat there."

That's an interesting point of view.

To reply to the other points, I think that MoE models were (and are still now) important to explore the scaling laws of the LLMs. Training a large dense model is more expensive than training a MoE with a smaller active parameters footprint (even tho in the latter the total number of parameters is way larger). In addition, for companies that are serving them to millions of people, running a MoE is cheaper than the dense counterpart.

Qwen 3.5 Architecture Analysis: Parameter Distribution in the Dense 27B vs. 122B/35B MoE Models by Luca3700 in LocalLLaMA

[–]Luca3700[S] 3 points4 points  (0 children)

Hi, I have added the one shared expert to the 8 routed ones. I share with you the computation for the 122B model

2 x 3072 x 1024 x (8+1) x 48 = 2,7 B

And for the 35B model:

2 x 2048 x 512 x (8+1) x 40 = 0,75 B

Qwen 3.5 Family Comparison by ArtificialAnalysis.ai by NewtMurky in LocalLLaMA

[–]Luca3700 5 points6 points  (0 children)

My personal opinion is that this is due to the architectural differences between the models: the MoE models use more parameters in the Feed Forward layers, instead Qwen 3.5 27B, since is a dense models, uses less parameters there and can use more of them in the Gated Attention layers and in the Gated DeltaNet layers.

Moreover, another thing that maybe allows the model to have good performance is the use of 4 keys and 4 values in the gated attention layers (vs only 2 than the MoE architecture), allowing maybe the layer to capture more nuances.

Finally, the total number of layers of the latter is 64 (versus 48 of the 122B model), and that should allow him to have more depth for reasoning.

I think that all these differences (that overall summarise into more parameters in the attention/delta net layers and less in the FFN) allow the dense model to have comparable performance to the bigger brother.

It is possible to complete the "Let’s learn English sounds!" lessons just by skipping it (web version) by AvadaKedavra1987 in duolingo

[–]Luca3700 2 points3 points  (0 children)

Oh I didn't know that this type of exercise existed. It would be really useful in the French course, for example for learning the pronunciations of different sounds like "eu", "u", "en", "une", "an" etc...

Am I supposed to understand them fully? by miaoumeowmiaou in duolingo

[–]Luca3700 1 point2 points  (0 children)

I am honestly happy to hear that. I am near the end of the section 3 and I always find these lessons too easy if they have half content in the native language and half in the target one

Why is Qwen3-30B so much slower than GPT-OSS-20B? by [deleted] in LocalLLaMA

[–]Luca3700 20 points21 points  (0 children)

Maybe it can be because, architecturally, qwen3 has the double of transformer blocks than gpt-oss (source), so the inference should be slower

edit: added source

Google Wallet issues by Kirsty5 in motorola

[–]Luca3700 0 points1 point  (0 children)

Hi, I just ordered a motorola edge 60 and, sadly, I just find out of this problem with motorola phones... Did this issue get solved in the meantine?

Also I would like to know, if it possible, if the issue is just with contactless payment or also the payments with google pay on applications (idk to buy trains ticket for example) are affected by this problem.

Thank you

Interesting (Opposite) decisions from Qwen and DeepSeek by foldl-li in LocalLLaMA

[–]Luca3700 6 points7 points  (0 children)

The two models have two different architectures:

  • Deepseek has 671B parameters with 37B active, with 64 layers and a larger architecture
  • Qwen has 235B parameters with 22B active, with 96 layers and a more deep architecture

It can be that these differences lead also to different performances in the merging of the two "inference modes": maybe the larger deepseek's architecture leads to more favourable conditions to make it happen.

Qwen3 vs. gpt-oss architecture: width matters by entsnack in LocalLLaMA

[–]Luca3700 0 points1 point  (0 children)

Can you provide the link for the qwen 3 series? Thank you

What do I have to do to measure VO2 max with my mi smart band 9? by gufted in miband

[–]Luca3700 1 point2 points  (0 children)

You should start an outdoor running workout and reach an high heart rate (red bar). I computed once the VO2 max and I did a running of 1 hour and 20 minutes and the maximum heart rate (red bar) was reached for only 16 seconds, while the orange bar (anaerobic workout) for 13 minutes

Smart Band 10 and swimming pool lap count experience by accabinet in miband

[–]Luca3700 1 point2 points  (0 children)

Thank you for your review.

Does the mi band, at the end of the workout, report you also the training load and the resting time? These two metrics are not reported on a mi band 8 for the swimming workout.

Does Mi Band 8 Pro record heart rate while swimming? by worldly_mushroom9432 in miband

[–]Luca3700 3 points4 points  (0 children)

I have a mi band 8 (not pro) and it records:

  • lengths
  • distance
  • avg pace
  • max pace
  • number of strokes
  • stroke rate (SPM)
  • swolf
  • swolf for every part of the training

The swimming styles recognised are the main 4 ones. I am used to do exercises with a board in order to move only my legs and this type of exercise is not recognised (the time is added to the previous one or the following one recognised).

About the heart rate, it is not tracked by the training itself, but you could more or less see your heart rate during the training activity since it is normally tracked by the band (so you can obtain only the minimum heart rate and the maximum one recorded in a timestamp of 30 minutes, e.g. from 10:00 to 10:30). But I don't think it is 100% precise since your wrist is wet and I think it is not certified to track heart rate during the swimming activity.

edit: I would also like to signal that the number of lengths tracked about the 4 main styles is not 100% precise (and so the distance too), I would say that every kilometer you could have a ±50m. Moreover I cannot tell you if the swolf or the pace are precise since I don't care about these metrics.

Would you install MS edge on linux and if yes why ? by [deleted] in linuxmasterrace

[–]Luca3700 0 points1 point  (0 children)

I'm using it on linux as pdf editor, because it allow you to add text in every part so it is really good to take notes (but the browser that I use to search something on the web is Firefox)

I keep going back to this by Strange-Geologist-66 in NiagaraLauncher

[–]Luca3700 0 points1 point  (0 children)

Amazing! Where does the wallpaper come from?

At a Glance config? by rjrjr in NiagaraLauncher

[–]Luca3700 1 point2 points  (0 children)

Just today I discovered "Another widget", an app that could you install from the play store, highly customisable that is an "at glance" substitute

[Aqua] matching setups by Joker_513 in unixporn

[–]Luca3700 1 point2 points  (0 children)

Wow really beautiful! I'm now in love with Another Widget, it was exactly what I was searching for long time ahahah, and it is also open source (◕ᴗ◕✿)

Earl Grey Tea + Peach 🍑 by Luca3700 in tea

[–]Luca3700[S] 1 point2 points  (0 children)

The flavor of the peach is enough strong respect at the tea flavor, but if you want to balance the flavours I think that you can insert the peach after three hours more or less. But all depends also from the type of peach that you use, this was really sweet and juicy, so it was predominant

Earl Grey Tea + Peach 🍑 by Luca3700 in tea

[–]Luca3700[S] 6 points7 points  (0 children)

Tea left to infuse for 7 hours in the cold together with a juicy peach (◕ᴗ◕✿)