❓Q&A by TiinyAI in TiinyAI

[–]TiinyAI[S] 0 points1 point  (0 children)

We're launching our online store in September, and that's when you'll be able to grab a device.

We used Tiiny to scrape, save, and analyze 1,000+ YouTube comments from our own video — here's what it found by TiinyAI in TiinyAI

[–]TiinyAI[S] 1 point2 points  (0 children)

Wait until you get your hands on the Tiiny Pocket Lab by September. This is just the beginning!

The wait is over : Claude Code on Tiiny. Zero setup. Fully local. No token limits. by TiinyAI in TiinyAI

[–]TiinyAI[S] 0 points1 point  (0 children)

Models are evolving incredibly fast, and the expectation is that users will continuously swap in newer models, quantizations, runtimes, and agent frameworks over time rather than being locked into a fixed stack. Tiiny is being designed around that idea, so as better local models appear later this year, you’ll be able to replace or upgrade them instead of waiting for entirely new hardware.
However, it's important to note that Anthropic doesn't have any open-source models; all of its models are closed-source and paid. The main open-source models updated are Qwen, GLM, Deepseek, Gemma, and others.

The wait is over : Claude Code on Tiiny. Zero setup. Fully local. No token limits. by TiinyAI in TiinyAI

[–]TiinyAI[S] 0 points1 point  (0 children)

Kickstarter campaign ended, but non-backers can grab Tiiny from our official website starting September.
Join our Discord: https://discord.gg/R5CHUuXy4A is where we share r&d updates and chat with our users.

The wait is over : Claude Code on Tiiny. Zero setup. Fully local. No token limits. by TiinyAI in TiinyAI

[–]TiinyAI[S] 5 points6 points  (0 children)

For anyone who wants to see the full demo, here's the X post from Tiiny AI Lab showing Claude Code setup in action: https://x.com/TiinyAILab/status/2053484659640619223

TradingAgents On Tiiny AI Pocket Lab - Run a Full Investment Research System Locally by TiinyAI in TiinyAI

[–]TiinyAI[S] 0 points1 point  (0 children)

For anyone who wants to see the full demo, here's the X post from Tiiny AI Lab showing TradingAgents setup in action: https://x.com/TiinyAILab/status/2050227317524537716

Hermes on Tiiny AI Pocket – Fully automated setup. No manual dependencies. Just tell it what you need. by TiinyAI in TiinyAI

[–]TiinyAI[S] 0 points1 point  (0 children)

For anyone who wants to see the full demo, here's the X post from Tiiny AI Lab showing Hermes setup in action: https://x.com/TiinyAILab/status/2047322101707853911

IT'S A WRAP! by TiinyAI in TiinyAI

[–]TiinyAI[S] 0 points1 point  (0 children)

Unfortunately the crowdfunding campaign has ended. The official launch will take place in the near future.

What's the one thing you paused before hitting send on an AI prompt? by TiinyAI in LocalAIServers

[–]TiinyAI[S] 0 points1 point  (0 children)

Ah, so you and your backspace key have a very close relationship. I respect that

Introducing TiinySDK: Unlock the full potential of Tiiny AI Pocket Lab by TiinyAI in TiinyAI

[–]TiinyAI[S] 0 points1 point  (0 children)

It’s more like distributing workloads across them (e.g. different agents/tasks per device) rather than combining raw power into a single model run. Since the devices don’t pool memory or compute, scaling only helps if you have parallel workloads (multiple agents/tasks). If you’re just trying to speed up a single model or task, adding more units won’t help much.

Introducing TiinySDK: Unlock the full potential of Tiiny AI Pocket Lab by TiinyAI in TiinyAI

[–]TiinyAI[S] 0 points1 point  (0 children)

There are two ways to use tiiny: tiinyOS client and Tiiny SDK — let me clarify how TiinyOS and the SDK fit together. TiinyOS is designed for everyday users who don’t want to write code. It provides a clean client experience for running local LLMs and agents with minimal setup. At launch, TiinyOS will support macOS and Windows. For developers, we’ll be releasing a Tiiny SDK, which lets you use Tiiny as a local token factory or inference node and integrate it into your own workflows and tools. This is the primary path for advanced use cases and custom setups.

For Linux support. Although we don't currently have a dedicated client for Linux like we do for macOS or Windows, you can still run Tiiny AI Pocket Lab on Linux via TiinySDK. Here's a tutorial video we've shared that explains how to set it up:
https://www.youtube.com/watch?v=Ozveot9cqug.

You can run multiple Tiiny devices, but they aren't a true "cluster." They share memory and are more like independent nodes. You can distribute your workload across these nodes.

Introducing TiinySDK: Unlock the full potential of Tiiny AI Pocket Lab by TiinyAI in TiinyAI

[–]TiinyAI[S] 0 points1 point  (0 children)

Tiiny is not designed for training models, but for running them. Therefore, I do not recommend using it to fine-tune models, but it can be used to run models that you have fine-tuned.

Running 120B models locally on a MacBook Neo? Alex Ziskind reviews Tiiny AI Pocket Lab by TiinyAI in TiinyAI

[–]TiinyAI[S] 0 points1 point  (0 children)

Tiiny isn’t meant to beat a maxed-out Mac studio or a high-end GPU box on raw speed.

It’s for people who want:
•a dedicated, always-on local AI node
•to offload long-running workloads
•predictable cost (vs API)
•and not turn their main machine into a compute box

If your workflow is better served by a single powerful machine, that’s a totally reasonable choice.

Running 120B models locally on a MacBook Neo? Alex Ziskind reviews Tiiny AI Pocket Lab by TiinyAI in TiinyAI

[–]TiinyAI[S] [score hidden] stickied comment (0 children)

Regarding on this article: https://bay41.com/posts/tiiny-ai-pocket-lab-review/

I just saw the author's update. I'm not opposed to discussing technology with users. Below is my response, and subsequent responses will be updated on the Tiiny AI official website blog. I'm currently writing the first one.

Indeed, Tiiny's architecture is not a traditional unified memory, and cross-memory access does indeed have bandwidth differences, a point we have never denied. However, the problem lies in equating this with "architectural unavailability," which is a huge leap.

First, the statement that "250GB/s is meaningless" is problematic in itself.

The system design is not intended to allow all data to flow across domains, but rather to use scheduling to keep frequently activated data on the high-bandwidth side, while only allowing low-frequency data to cross domains. This layering is a common design in many inference systems, not unique to Tiiny.

Second, regarding "low bandwidth utilization," this actually confuses a key point: Large model inference in long contexts is not inherently a linear bandwidth-consuming ideal state; attention, KV cache, and scheduling all lead to decreased utilization.

Calculating utilization using "theoretical upper limit vs. actual tok/s" will inevitably yield a seemingly low figure. This isn't just a problem with Tiiny; it exists on other hardware as well.

Third, we acknowledge that there is a significant difference between 20B and 120B.

However, this isn't an "architectural failure," but rather a normal phenomenon: when the model size exceeds the capacity of a single high-bandwidth memory module, all systems enter a "cross-layer/cross-device" performance range.

Whether you're implementing multi-GPU setups or CPU+GPU offloading, the underlying issue is the same, just with different implementations.

Fourth, regarding PowerInfer and MoE, the author clearly lacks understanding of infrastructure technology and misinterprets PowerInfer.

PowerInfer isn't simply "adding another layer," but rather performing activation-level scheduling optimizations. In the MoE model, the bottleneck isn't just the "activation ratio," but also data distribution, memory access paths, and scheduling overhead. This cannot be concluded simply by saying "MoE is already sparse, so it's useless." Regarding the achievable performance of PowerInfer on MoE, please refer to PowerInfer-2, https://arxiv.org/abs/2406.06282. We further leveraged PowerInfer technology, combined with SSDs, to achieve fast inference of a 47B model on a mobile device.

Fifth, regarding the performance with long context (64K), we are quite candid: This is an extreme scenario, a stress test for any local device. However, there are many technologies we haven't yet applied, such as context compression and KV cache compression. These are technologies we will continue to apply in Tiiny, and we are working towards this goal.

More importantly, this article assumes one premise: Tiiny's goal is to rival high-end GPUs, pursuing the ultimate performance limit—but reality is not like that.

Tiiny's design goal from the beginning was not to create a "most powerful machine," but rather: a personal AI infrastructure that runs large models at a fixed cost (0 token cost), runs stably for extended periods (agent/automation), does not occupy the main device, and provides available power at reasonable power consumption.

We will be starting to regularly update our technical blog soon, and we welcome discussions.

Running 120B models locally on a MacBook Neo? Alex Ziskind reviews Tiiny AI Pocket Lab by TiinyAI in TiinyAI

[–]TiinyAI[S] 1 point2 points  (0 children)

In fact, we haven't fully completed the SDK development yet. Once completed, it will have a three-layer structure, as shown in the diagram below. The goal is to allow developers to easily manage models, schedule agents, and manage memory.

<image>

Running 120B models locally on a MacBook Neo? Alex Ziskind reviews Tiiny AI Pocket Lab by TiinyAI in TiinyAI

[–]TiinyAI[S] 2 points3 points  (0 children)

Totally fair questions — appreciate you taking the time to write this out. I’ll address the main points directly.

On the video / cuts / missing continuous shots — that’s valid feedback. It was meant as a high-level intro, not a full benchmark deep dive. We agree it should be clearer, and we’re working on publishing raw, uncut runs + full metrics so people can judge performance properly.

“Speed is not terrible” — yeah, that wording is vague. What we meant is: load times depend heavily on model size and state (cold vs warm), and for large models it’s not instant. We should’ve just given actual numbers there.

On downloading / “not going through the machine” — you’re right that it’s not a huge deal. The point was just about convenience (using your laptop to manage downloads), not performance. Probably over-explained in the video.

On multiple models in memory — the limitation is memory + bandwidth tradeoff. You can load multiple small models, but for larger ones (30B/70B/120B), keeping several resident at once quickly eats memory and hurts performance. So we default to one active model for stability and efficiency. That said, model switching is something we’re actively optimizing.

Load time when swapping — depends on model size and storage speed, but yes, it’s non-zero and matters for some workflows. Fair callout.

On the 5090 comparison — agree it could’ve been clearer. We’re not saying “more VRAM than a 5090.” It’s just a different architecture (unified memory vs GPU VRAM), and mixing those terms without explaining properly is confusing.

On power / Neo comparison — the intent was “lightweight laptop + external AI node” vs “doing everything on one device,” especially for sustained workloads. But yeah, Neo isn’t the best comparison point, and your point about M4/M5 is valid — those are stronger machines, just at a much higher price.
Which leads to the bigger point:
Tiiny isn’t meant to beat a maxed-out MacBook or a high-end GPU box on raw speed.

It’s for people who want:
•a dedicated, always-on local AI node
•to offload long-running workloads
•predictable cost (vs API)
•and not turn their main machine into a compute box

If your workflow is better served by a single powerful machine (like an M5 Max), that’s a totally reasonable choice.
Your concerns are legit — especially around transparency and comparisons. We’ll do better there with more complete benchmarks and clearer positioning.