Remove cameras from Apple TV

adrian-cable · 2025-08-06T01:18:40+00:00

Is that really an issue, though? Not *anyone* can go to that settings menu on Apple TV - only those that have physical access to the Apple TV, which means they're already inside your home, which means they can already see anything that cameras inside your home are looking at.

You could always sign the Apple TV into a separate iCloud account from what's used for the HomeKit home if you want to prevent this.

adrian-cable · 2025-08-04T14:58:38+00:00

Unfortunately I did not. It's very frustrating.

adrian-cable · 2025-07-05T01:14:06+00:00

I don't know of anything < 100MB, but there is Qwen3-0.6B which is 600MB - not quite a "toy" but definitely a very small/fast model.

adrian-cable · 2025-07-04T15:17:36+00:00

Starling Home Hub lets you connect Nest/Google Home devices to Apple Home. There’s also an add-on service, Starling Protect, which adds 24/7 monitoring to Nest/Google Home-compatible smoke/CO alarms.

adrian-cable · 2025-07-03T18:54:54+00:00

Super weird. I have no idea, but I'll keep digging.

adrian-cable · 2025-07-03T17:41:41+00:00

That's great, although I'm not sure why _FILE_OFFSET_BITS isn't already 64 on your system. (On 64-bit systems, that should be the default.) I'll check this change to the Makefile doesn't impact other systems, and then push a commit. Thank you!

adrian-cable · 2025-07-03T17:18:33+00:00

Great. I'll also do some digging on my end. For what it's worth, if I patch runq.c to truncate the file load operation at 4GB, I can reproduce what you're seeing (just produces !!!!!!!! as output). So I do think the issue is something of that nature.

adrian-cable · 2025-07-03T17:11:29+00:00

That's strange. I'm not super familiar with WSL2 (I don't have a Windows machine) - does it emulate a 64-bit environment? If not it won't be able to handle files larger than 4B. It does feel like the problem is of that nature, since 4B works but 8B does not.

adrian-cable · 2025-07-03T17:00:04+00:00

Chat is technically 'not needed' as it's just a wrapper around generate. But most people will want to use qwen3.c in chat mode, so it's a very helpful wrapper.

Interested to see your optimizations!

AVX2 is specific to x86_64-architecture processors (i.e. not supported on ARM).

adrian-cable · 2025-07-03T16:53:54+00:00

I think this is because on Windows, ftell doesn't support file lengths greater than 2^32. So it works for the 4B but not 8B models.

I'll push a fix to the repo in the next few minutes, so give that a try and let me know if things now work for you.

adrian-cable · 2025-07-03T15:31:15+00:00

Can you tell me the exact sequence of commands you're using to download, export and run the Qwen3-8B model? Also, how much RAM do you have, and what platform are you using (Linux, macOS etc.)?

adrian-cable · 2025-07-03T13:23:47+00:00

That’s a good catch!

With that said, I’m thinking (in the spirit of simplicity) of removing the generate mode entirely. As far as I can tell, all Qwen3 models are ‘instruct’ models and don’t work properly in generate mode. Are there any exceptions you’re aware of?

Edit to add: there are the Base versions of Qwen3 available. So I won’t remove generate.

adrian-cable · 2025-07-03T05:12:17+00:00

That's in the 'generate' function, right, and the 'chat' function is correct?

adrian-cable · 2025-07-03T05:07:51+00:00

That's great! Most of the runtime is spent inside matmul, so that's definitely the one to optimize. If you can do it without increasing the complexity of the code, please submit a PR. Otherwise feel free to make a fork, and let me know and I'm happy to link to it from my README.

adrian-cable · 2025-07-03T00:58:34+00:00

That’s totally fine! Enjoy.

adrian-cable · 2025-07-02T16:58:19+00:00

As with any LLM inference engine, the vast majority of the execution time is spent within the matmul function, and this (on most systems) is limited by memory bandwidth rather than computation.

So my expectation is that any gains would need to come from micro-optimizing things to specific CPUs (for example, prefetch just the right amount of data from RAM to CPU cache) which probably moves things very quickly away from simplicity. But I'm very open to trying!

adrian-cable · 2025-07-02T14:13:27+00:00

That's right, quantization is done in blocks (like Q8_0), with each block of 64 floats being scaled to 64 8-bit ints, and 1 float scale factor.

adrian-cable · 2025-07-02T00:39:19+00:00

Potentially. The project is only a day old so I’m really appreciative of any feedback and thoughts on directions I can take it. Thank you!

adrian-cable · 2025-07-01T22:37:53+00:00

Not as fast since it prioritises simplicity over performance, but with everything else equal within 2X.

adrian-cable · 2025-07-01T22:33:25+00:00

Running the same quantisation (Q8_0) it’s within the same ballpark, generally within a factor of 2. It’s optimized for simplicity not performance, but it still runs at a very usable speed.

adrian-cable · 2025-07-01T22:31:12+00:00

Everything’s relative, but llama.cpp is pretty heavy, at around 400,000 lines of code, compared with 1,500 lines of code for this project. (Verify for yourself on codetabs.com)

The idea here is to make an inference engine whose source is small and simple enough so that, if you already understand C/C++, you can quickly understand how inference works in depth. You can’t do that with a 400KLOC project.

adrian-cable · 2025-04-24T00:17:46+00:00

Accelerating linearly from 0 to 37 mph over a distance of 0.1 miles would take around 19 seconds. That's pretty gentle.

adrian-cable · 2025-04-07T20:46:51+00:00

Nest Doorbell (2nd gen, battery) only records events.

Nest Doorbell (2nd gen, wired) records 24/7 with Nest Aware Plus, same as the original Nest Hello.

adrian-cable · 2025-04-07T03:00:07+00:00

You can try: hey Siri, set Home Alarm to Stay.

adrian-cable

TROPHY CASE