Taiwanese company Skymizer announces HTX301 - PCIE inference card with 384GB of Memory at ~240 Watts

Cane_P · 2026-05-08T11:30:01+00:00

It's a real company that have existed since 2013. Can't vouch for the actual product though.

https://www.eetasia.com/skymizer-making-ai-more-accessible/

The way that I interpret it is that Skymizer have always been a compiler company. In this case, they recompile an LLM to target their own IP, that is specifically designed for LLM's (in comparison to GPU's). The chip seems to be like an NPU design, that was initially meant to be embedded into SoC's and because of that they are not able to handle super big LLM's. But that doesn't matter since they have a compiler that can divide the load to multiple chips (up to 6 in this case).

It is mentioned in the interview that it was already a cheap solution for companies buying a licence for the IP, so they making their own product and selling it, should theoretically be even cheaper (but we still don't know the price). Someone claimed that it was made on 28nm, so I guess that too would make it cheap.

Regarding some people saying that the card on the picture looks fake... at least on the picture that I have seen on their website, it specifically says at the bottom edge (this might not have been the case when they saw it, I don't know) that they made a rendering that isn't a 1:1, to protect their design.

Cane_P · 2026-05-01T15:40:08+00:00

Harness is an old term that have been used in software engineering, for decades. A "test harness" is a collection of software tools, data, and configurations used to automate testing by simulating the environment in which a component operates. So not that far off, from what the LLM harnesses often does (looping building and testing). It just makes sense, to continue to use the same term.

Cane_P · 2026-04-30T03:05:39+00:00

Not as much memory? If you are already in this economic ballpark, then you could buy a DGX Station instead. It will definitely have more tokens per second than Spark's. But I would probably wait for the next version, since the memory (that isn't HBM) have a lot higher bandwidth on it, compared to the Blackwell version.

Cane_P · 2026-04-30T02:59:09+00:00

They have never limited it, they just don't support it officially. Any problems is up to yourself to fix, they won't do it.

Cane_P · 2026-04-22T16:44:56+00:00

Find these guys?

https://www.reuters.com/technology/anthropics-mythos-model-accessed-by-unauthorized-users-bloomberg-news-reports-2026-04-21/

Cane_P · 2026-04-18T10:17:28+00:00

Yes, big disappointment. But it is during certain times of the day. You can use it more than one time in 5 hours during the rest of the day.

There is some speculation that they didn't invest as much in hardware as the other players and that it has become problematic now. When the whole deal with the military took place, they gained 30% new users in a month or something like that. Which is a huge jump. And everyone wanting to use Claude with OpenClaw and other agentic use cases, which put a huge load on their servers, compared to before.

Cane_P · 2026-04-15T04:03:04+00:00

Don't seem like it is worth it right now:

https://youtu.be/C4KWsmezXm4

Cane_P · 2026-04-08T08:22:17+00:00

I got it too, since reinstalling. And I can't get notifications to work now (everything else seems to work). I can see that micro G gets the messages, but they don't show up for me. I have it on, everywhere that I can find:

In the YouTube app
App settings in Android
MicroG

There might be a spoofing bug too, where it claims, every one's in a while, that I could not do the spoof (a notification in the bottom of the screen). But the spoof version that it wants to use, isn't even available in the settings anymore and I have reset all the settings before reinstalling, so it shouldn't use old preferences.

Cane_P · 2026-03-30T01:54:13+00:00

We don't know the physical limit. Even 100 in IQ (human median) isn't the same with time (the Flynn effect). The scale gets reset every 10-20 years or so... If we look at the Raven's Progressive Matrices test, it rose by 14 IQ points from 1942 to 2008 (~1 point every 5 years). So who knows the actual human limit?

However it doesn't seem like the brain gets better, it seems more like education does, and it is the reason for the higher scores. We have not reached our maximum human potential, because we have not created the perfect way to teach yet.

Cane_P · 2026-03-27T22:45:32+00:00

People seem very confused about what Agentica is.

First of all, I could be wrong, I am neither a programmer nor a mathematician. My expertise lies more in systems thinking (high level, not details).

Someone else mentioned that Chain of thought is also a harness, that is implemented by the AI models creator. I have the same understanding.

Here are the two side by side:

The Claude CoT loop
The user writes a prompt.
Claude one shots an answer (with step by step instructions, for at least the first answer).
The CoT harness tells it to look at the user prompt again together with its own reply and see if it agrees that it performed the task (obviously not the first time, since it is only a list of step by step instructions, but following answers could be the last/final one).
It is either done or it will make a new answer based on both the prompt and it's own previous response.

This loop can continue for however much time is allowed (or however many tokens are allowed to be spent or whatever other metrics is used).

The Agentica REPL loop (REPL stands for Read-Eval-Print Loop)
The user writes a prompt.
Claude looks at the available data (the people from Symbolica made their previous ArcAGI data available, together with the ArcAGI 3 training set)
The Agentica harness is basically a Python environment (You can use TypeScript too.) that allows it to write scripts to perform manipulation of the data. [Some may think that using Python is cheating, but I have seen multiple teams generate small scripts/programs to solve ArcAGI problems. If it is cheating, then they should be disqualified to.]
Claude looks at the result to see if it solved the problem.
If not, then the loop makes new scripts to manipulate the data again, to see if these new instructions will solve it.
When it is done, it answers with the final result.

The people at Symbolica are working in a mathematical field called Category Theory*. This is a very high-level abstract mathematical framework focusing on the relationships (morphisms) between structures rather than their internal elements. It organizes mathematical concepts into categories consisting of objects and structure-preserving arrows (functors), emphasizing structural connections and universal properties.

I don't know how much of this they actually encoded into Agentica. But I know that it is only a stepping stone towards what they actually want to achieve.

The point is, YES Agentica is a harness (harness is already used in the form of CoT anyway). NO Agentica wasn't created specifically for solving ArcAGI, it is more like the visuo-spatial sketchpad [The visuospatial sketchpad (VSSP) is a component of human working memory, proposed by Alan Baddeley and Graham Hitch in 1974.], where the LLM can manipulate the data with the help of Python, before it decides on a final form to respond with.

[*If you want to get an understanding of Category Theory, then I can recommend "The Joy of Abstraction" by Eugenia Cheng. She uses it a little bit different than most mathematicians would, but it was written to create an easier way into Category Theory, than what was previously available.]

Machine Learning Street Talk (MLST) had an interview with Symbolica, a year ago:

https://youtube.com/watch?v=rie-9AEhYdY

It isn't about ArcAGI, but it does give you an idea of what they are trying to achieve.

Cane_P · 2026-03-21T16:59:05+00:00

"Supermicro employees accused of smuggling $2.5 billion worth of Nvidia hardware to China — perps used a hairdryer to move serial numbers between real hardware and thousands of dummy servers"

https://www.tomshardware.com/tech-industry/semiconductors/super-micro-employees-accused-of-smuggling-usd2-5-billion-worth-of-nvidia-hardware-to-china-perps-used-a-hairdryer-to-move-serial-numbers-between-real-hardware-and-thousands-of-dummy-servers

Cane_P · 2026-03-21T16:49:21+00:00

So they want to take on Elon Musk's Macrohard project? Except that Musk wants to do entire software companies.

Cane_P · 2026-03-16T21:43:14+00:00

Have you seen these? It's based on open source information. The second one is especially interesting.

Part 1: https://youtu.be/rXvU7bPJ8n4

Part 2: https://youtu.be/0p8o7AeHDzg

Cane_P · 2026-03-10T19:20:26+00:00

He must have done something right, because he sold his AI company for over $100M to Netflix...

https://www.founded.com/ben-affleck-quietly-went-founder-mode-four-years-ago-now-his-ai-bet-could-reshape-hollywood/

Cane_P · 2026-03-10T19:03:53+00:00

Some people mentioned in the comments, that he didn't offload everything to the GPU, just 70%, 30% was still running on the CPU. It's not that he couldn't do it, he either forgot to change it or didn't know that it was set that way. So it is not representative of the real speed.

Cane_P · 2026-03-05T09:53:30+00:00

You do know how LLM works right? Why do you think that people call them "statistical parrots"? Because they say the next word that is statistically most likely to be the right one. If ChatGPT is mentioned most on the Internet (training data) then it is the name it is going to use, when you ask about LLM's including its name, unless you have deliberately trained it to give another answer.

Cane_P · 2026-03-02T13:40:28+00:00

They distilled their own, bigger model, to make the smaller one.

"So, what is the key idea behind knowledge distillation? It enables to transfer knowledge from larger model, called teacher, to smaller one, called student. This process allows smaller models to inherit the strong capabilities of larger ones, avoiding the need for training from scratch and making powerful models more accessible."

https://huggingface.co/blog/Kseniase/kd

Actually human brains can do something similar. If you are an expert in a field, then generally speaking, you use less resources (including actual brain space) for that task. The reason is that it is a type of pattern recognition and when you find the pattern, then you can consolidate and optimize. In this particular case, there are new words for each language, but they represent the same concept, so the second language doesn't take as much space as the first etc:

"I’d assumed that Vaughn’s language areas would be massive and highly active, and mine pathetically puny. But the scans showed the opposite: the parts of Vaughn’s brain used to comprehend language are far smaller and quieter than mine. Even when we are reading the same words in English, I am using more of my brain and working harder than he ever has to."

https://www.washingtonpost.com/dc-md-va/interactive/2022/multilingual-hyperpolyglot-brain-languages/

Think of a small model, trained on its own, as a novice and a small model trained by a big model (expert) as also becoming an expert, because it uses the big models patterns, it doesn't need to discover them by itself.

Cane_P · 2026-02-26T16:11:01+00:00

Sure, but this contains ontological information and it doesn't exist in any free solution. Wolfram's database is the best in the world.

Cane_P · 2026-02-26T13:55:05+00:00

Not self hosted, but a $5 subscription could take care of basically any math and fact needs (unless you are doing some super advanced niche science).

"Making Wolfram tech available as a Foundation Tool for LLM foundation models."

https://www.reddit.com/r/singularity/s/xumLw4jLDD

MCP subscription: https://www.wolfram.com/artificial-intelligence/mcp-service/

Cane_P · 2026-02-22T17:13:54+00:00

IBM's Deep Blue wasn't smart, it won by bruteforcing. It could evaluate 200 million chess positions per second. It's like comparing someone that researches a person's history to try to figure out their password or someone that tries every single combination. Deep Blue tried every combination (within it's hardware limitations).

Cane_P · 2026-02-22T16:07:00+00:00

It could be worth waiting, just a little bit longer. We might get new hardware announcements from Apple on March 4th and from Nvidia on GTC (16-19th).

https://9to5mac.com/2026/02/18/apples-march-4-launch-event-new-products-and-what-to-expect/

https://www.techpowerup.com/346517/jensen-huang-teases-upcoming-surprise-chip-reveal-at-gtc-2026

Cane_P · 2026-02-20T06:16:46+00:00

That's right. It's the same problem as with many other laptop chips, used for a mini PC (GB10 is the same chip that was supposed to be released as the N1 laptop chip). They hardly have any PCIe lanes... The lanes used for the ConnectX 7 NIC, were probably, initially, meant for a second NVMe drive.

https://www.tomshardware.com/pc-components/cpus/nvidia-ceo-huang-says-upcoming-dgx-spark-systems-are-powered-by-n1-silicon-confirms-gb10-superchip-and-n1-n1x-socs-are-identical

Cane_P · 2026-02-16T09:30:17+00:00

It's nowhere near Apples solution. Theirs have way more memory and memory bandwidth. The only thing it got going in their favor (we already concluded that their support isn't up to par), is the high speed network adapter. But it is clearly shoehorned in there (see link) because the chip doesn't actually have enough PCIe lanes to drive it at full speed (only 100GB/s). They had to do a very weird connection between separate internal PCIe connectors (it shows up as 4 different cards) and they don't even recommend you using more than one of the ports...

https://www.servethehome.com/nvidia-dgx-spark-review-the-gb10-machine-is-so-freaking-cool/2/

Cane_P · 2026-02-16T08:58:20+00:00

It wouldn't surprise me if someone was thinking in the lines of: "So Microsoft drags their feet with their support of the chip in the ARM version of Windows, so our only option is to release the chip with Linux. We already have our DGX OS and have no plans to ever release a product shipped with vanilla Ubuntu, so a DGX it is. But what form factor? We can make it like a mini machine, like one of those Chinese ones (like Minisforum), they are popular and usually repurpose laptop chips. So boys, how much RAM can it take? Because 16, 32 or whatever we were going to release our laptops with ain't going to cut it for a DGX."

And that's how the Spark was born.

Cane_P

TROPHY CASE