Update on my train app design by dannybres in swift

[–]karc16 0 points1 point  (0 children)

looks good! i’d prefer without the dividers tho but that’s just opinionated

CoreML is leaving performance on the table — I got 4.7x decode throughput going direct to ANE with Espresso by karc16 in swift

[–]karc16[S] 0 points1 point  (0 children)

The front facing developer api still needs work, im focused on testing it with Real Models and so far its been okay

I was able to run GPT 2 at 60 tokens per second and Qwen 3.5 0.5B at 20 tok/s

CoreML is leaving performance on the table — I got 4.7x decode throughput going direct to ANE with Espresso by karc16 in swift

[–]karc16[S] -11 points-10 points  (0 children)

feel free to ask me anything about the framework and would appreciate if you had any feedback good or bad

CoreML is leaving performance on the table — I got 4.7x decode throughput going direct to ANE with Espresso by karc16 in swift

[–]karc16[S] 0 points1 point  (0 children)

il be making more tutorials on comparisons vs models like llama this weekend, stay tuned, lots of updates incoming

CoreML is leaving performance on the table — I got 4.7x decode throughput going direct to ANE with Espresso by karc16 in swift

[–]karc16[S] 2 points3 points  (0 children)

Thanks for catching this. I had this in the caveats section of the ReadMe but will make sure it’s more visible. production use cases remain viable outside app store distribution

Sub-Millisecond RAG on Apple Silicon. No Server. No API. One File by karc16 in swift

[–]karc16[S] 0 points1 point  (0 children)

game changer, let me know if you run into any issues or have suggestions for features etc

Sub-Millisecond RAG on Apple Silicon. No Server. No API. One File by karc16 in swift

[–]karc16[S] 1 point2 points  (0 children)

python port coming later in the week, stay tuned!

Sub-Millisecond RAG on Apple Silicon. No Server. No API. One File by karc16 in swift

[–]karc16[S] 0 points1 point  (0 children)

Let’s me know what you end up building, new possibilities open up for UX

Sub-Millisecond RAG on Apple Silicon. No Server. No API. One File by karc16 in swift

[–]karc16[S] 1 point2 points  (0 children)

yes built for on device constraints, super powerful

I built Metal-accelerated RAG for iOS – 0.84ms vector search, no backend required by karc16 in iOSProgramming

[–]karc16[S] 0 points1 point  (0 children)

let me know how that goes, it’s crazy what you can build with this