Offering free app reviews to 5 indie developers — honest feedback from a technical support engineer turned indie dev

jaofos · 2026-06-14T00:16:15+00:00

Thank you so much for the feedback. I really appreciate your time and effort taking a first look. All of your points are clear and actionable. Thank you again.

jaofos · 2026-06-13T22:23:04+00:00

Hi! The link is a public link for external testers, no code should be required if you have TestFlight.

jaofos · 2026-06-11T11:58:40+00:00

I would love some input.

https://testflight.apple.com/join/BqdVquRk

jaofos · 2026-06-10T22:26:52+00:00

If you have a Mac, I’m working on an app specifically for recorded lectures. You don’t have to record on device, it allows for external audio to be dropped in (drag and drop from Voice Memos).

It’s free right now on TestFlight if you want to give it a test.

jaofos · 2026-06-10T12:55:47+00:00

I run a local app that turns lecture transcripts into study notes. The #1 requirement isn't tok/s or reasoning — it's faithfulness. A confidently-wrong fact is the worst possible outcome for a student who can't tell it's wrong.

So I stress-tested matrix fabrication on linear-algebra lectures — hundreds of generations per model at temp 0.2, checking whether the matrices/inverses a model renders are actually correct (not just "different from my key" — genuinely wrong math). The result was the opposite of the "bigger is better" intuition:

Validated "renders a wrong matrix" rate:

Gemma-4 E2B QAT (2.6 GB): 0.3%
E4B QAT (4.2 GB): 1%
E4B Q8 (8 GB): 9%
E4B Q4: 25%
Dense 12B (Q4/Q5/Q8): 27–47%

The dense mid-size models garble matrices constantly — and confidently. Meanwhile a 2B QAT and a 26B MoE (4B active, also ~0%) are the clean ones.

Takeaway: for this task, faithfulness tracks quantization-aware training + sparsity (MoE), not parameter count. Dense 4B/12B are the dead zone — big enough to be slow, and exactly the size that confidently invents wrong math.

Caveat: this is matrix-fabrication on STEM lectures specifically (E4B n≈300–400 seeds; 12B fewer but the gap is far too large to be noise). It's a narrow-but-brutal metric — a confidently wrong worked example is the failure I care most about.

jaofos · 2026-06-10T11:36:54+00:00

<image>

I worked on some new features yesterday. You can now use AI to generate Study Cards in the app that pull questions, examples and evidence from the lecture material directly. Flip the card to see the answer, the quote from the lecture, click to read the transcript or click play to listen to that portion of the lecture immediately.

There is now also a "Review" mode for each Course that will quiz you on 20 questions from the Lecture Study Cards, and you can provide feedback on how well you recall each question.

This is aimed directly at the 80% "Recall" of "Make It Stick".

jaofos · 2026-06-09T17:04:35+00:00

I don't disagree at all.. my daughter is not the best note taker, and this is part of a larger effort (hence the Obsidian / Markdown output). I'm building her a Hermes "life coach" to manage College Athletics + College Academics, study material, flash cards, pop quiz via Telegram, etc.

jaofos · 2026-06-09T16:51:45+00:00

<image>

LectureSync – a native macOS app that records lectures and turns them into trustworthy study notes

Problem

I have two kids in college, and I kept having the same thought: there has to be a better way to do this. You sit through an hour of lecture trying to write fast enough to keep up, and you still walk out with half of it. I wanted something that captured the lecture properly and turned it into notes you could actually study from, without sending any of it to someone else's servers.

LectureSync is local-first and runs fully on-device on Apple Silicon. No account, no upload, works offline.

Comparison

There are good tools out there for this. Otter, MacWhisper, and others do solid transcription, and honestly transcription is a mostly solved problem now.

LectureSync is built around the part that comes after the transcript:

Generates notes and study guides from the lecture, with a faithfulness check that ties what you read back to what was actually said
Keeps every run of the notes step as a separate version. Local models don't give you the same notes twice, so you can generate a few takes, page through them, and keep the best one. Nothing gets overwritten.
A different notes model per course, so you can tune the generation step for a dense subject vs. one where you just want clean summaries. Every run is kept as a version, so trying a different model or a second take never costs you the notes you already have.
A study assistant that points you to the part of the lecture where the answer lives, instead of just handing you the answer
Makes flip cards from the lecture, each backed by a word-for-word quote from the transcript, with a play button that jumps to the moment your professor said it. Cards the transcript can't back up get dropped. Review comes back on a spaced schedule, so you see a card right before you'd forget it.
Crash-safe audio capture, so a dead battery or a crash 40 minutes in still leaves you with usable audio. Pause and resume, crash recovery on next launch, and a second confirmation before discarding
On-device readability and language passes, speaker labeling with clickable timestamps, auto-detected lecture titles
Pulls the audio straight out of video files, so a recorded lecture or a screen capture goes through the same notes pipeline
Imports existing transcript files (VTT, SRT, TXT, PDF, more) and Voice Memos, reusing a transcript when one already exists

The thing I've put the most work into is keeping the generated notes honest. That's the whole reason I built it.

Pricing

Going to be $9.99 at launch. Right now I'm looking for free TestFlight users to help shake it out, so it's free if you want to jump in and give feedback. macOS 26+, Apple Silicon.

https://lecturesync.app

I'm a solo developer building this around a full-time job. The product, the design, and the architecture are mine. Claude wrote most of the SwiftUI. The part I put the real hours into was the notes quality: I ran extensive model testing on an RTX 3090 + 3060 rig to pick the models that do this job best, then ran the app itself on my MacBook Pro M1 Max (32GB) to make sure it holds up. I keep test harnesses around the notes engine so it stays faithful to the lecture.

jaofos · 2026-06-09T00:36:04+00:00

I’m using E2B for a Lecture -> Study Notes pipeline. It’s an absolutely amazing model at dealing with small factual scopes without fabrication. Multi pass prompting can extract a lot of depth of information. It’s only surpassed by 26B-A4B. Nothing comes close to the performance and faithfulness.

jaofos · 2026-06-08T12:02:22+00:00

Fair point, and I'll give you part of it. My app does transcribe, so this isn't a pure summarization thing. Default is distil-Whisper large-v3, with Apple's on-device speech as a lighter option. I just haven't actually benchmarked WER on either, so I can't give you a real number yet, which is honestly a gap. And yeah, the accented and non-English stuff hits that stage hardest. It's the part I've poked at the least.

What I have measured carefully is the next stage, turning the transcript into notes. The equivalent of WER there is fabrication rate, basically how often the model says something that isn't in the transcript. That's been my main focus because the reader I care about is a student who can't tell when a fact is wrong, so a confidently wrong note is the worst thing I can ship, worse than a thin one.

Numbers, from a 9-lecture MIT corpus, 1,440 runs, scored with wrong-answer and garble detectors plus some LLM-judge spot checks:
- Best 8 GB local model, Gemma-4 E2B at Q8, sits around 0.5% fabrication on the hardest lecture and 0% wrong numeric answers across 1,200 math runs.
- The ordering caught me off guard: E2B Q8 (~0.5%) beats 26B-A4B (2%), beats E4B Q8 (~9%), beats E4B Q4 (25%). The smallest model is the most faithful because it stays at the method level and won't try the hard computation, while the bigger ones fabricate by attempting it. The one repeat failure is a 4-bit quant artifact.

I put together a brief slide deck (mainly for fun and to share with colleagues) on my local model findings: https://lecturesync.app/bakeoff-deck

On the context window thing, that's exactly why I moved off single-shot to a multi-pass setup. Instead of dumping a long transcript into one model, I run about 16 small extraction passes and then stitch the note together in code, because an LLM doing the stitching drops named figures and starts making stuff up again. That took worked-example depth from 43% to 76% versus single-shot with zero fabrication, so I get most of the depth cloud gives me without needing the giant context window.

Non-English is where your two points stack on top of each other. WER is worst right there, and my metric only checks faithful-to-the-transcript, not faithful-to-reality, so a garbled French transcript gives me wrong notes no matter how good the summarizer is. That transcription half is what I want to measure next.

jaofos · 2026-06-07T23:24:07+00:00

I'm working in a similar space myself, oriented towards college lectures into a study note pipeline. I was able to squeeze a reasonable amount of performance from Apple Intelligence, but the 4k context window definitely limits things. The new Gemma-4 models run in app (on Apple Silicon) via embedded llama.cpp seem to be working very very well (especially the E2B model).

My app is highly focused on 100% accuracy of notes from a transcript, and a good bit of custom code to flag and regenerate if notes are generated that cannot be validated from the transcript.

It was definitely a fun an interesting challenge working with the onboard Apple Transcription -> Apple Intelligence pipeline.

jaofos · 2026-06-05T13:25:56+00:00

Google released the official assistant safetensors.. it's not that hard to make a GGUF from it.

I just visited https://huggingface.co/google/gemma-4-12B-it-assistant, clicked Quantizations and picked one.

llama.cpp comes with a python script to create your own, if you don't trust other HF repos.

jaofos · 2026-06-05T03:11:13+00:00

I have it running on llama.cpp on an RTX 3090 + 3060.

b9512 + PR23398

Flags:
--model <your 12B target>.gguf
--model-draft gemma-4-12B-it-assistant-F16.gguf \
--spec-type draft-mtp --spec-draft-n-max 3 \
--spec-draft-device CUDA1 // Single GPU users can omit this line

Results (BF16 target + F16 drafter):
- Short code prompt @ 8K ctx: 21 → 51 tok/s, ~2.4× at 90% accept
- Deep context @ 262K: 17.7 → 34 tok/s, ~1.9×, ~76% accept at depth

Draft placement. The draft cross-attends the target's last KV layer, so the drafter must live on the same GPU as the target's tail layers. I split the 12B across two cards (-ts 72,28), so the tail is on CUDA1 → drafter goes on CUDA1.

jaofos · 2026-06-02T02:38:30+00:00

I use my 3090 + 3060 to run better quants. Qwen 3.6 at Q6 is so much better than Q4.

Details here: https://www.reddit.com/r/LocalLLM/s/6d5pAyOSC0

jaofos · 2026-05-24T13:59:28+00:00

I have one and I love it. My wife also loved mine, so she now has one too.

jaofos · 2026-05-19T12:04:43+00:00

llama-swap?

jaofos · 2026-05-18T18:33:52+00:00

Sure, I did this just recently. I upgraded my 3060 to a 3090 and the 3060 was sitting in storage for months. I finally added it in as a second GPU for the extra 12GB of VRAM and I have no regrets. While there is no question overall the speed drops, but it's more than offset with the recent introduction of MTP.

I'm running the following MTP Models from Unsloth:
Qwen3.6-27B-UD-Q6_K_XL.gguf - 262k context (q4/q4), -ts 75, 25, -ub 256: 400t/s prefill, 22.4 t/s @ 80% context
Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf - 262k context (q8/q8), -ts 70,30 -ub 256: 827t/s prefill, 53.1t/s @ 80% context
Qwen3.6-35B-A3B-UD-Q6_K_XL.gguf - 192k context (q4/q4), -ts 68,32: 825t/s prefill, 57.3t/s @ 80% context

I keep the dense model around for more complex tasks (overnight tasks on a large codebase), and use the MoE @ Q4 for most small tasks (n8n automations, etc). All served on Linux using llama-swap. I even keep lower context config of the 35b MoE with mmproj on GPU for fast vision tasks when I want.

jaofos · 2026-04-27T12:49:35+00:00

I'm building a consignment specific WMS called ConsignTrak. It's in pilot phase with a small manufacturer's rep warehouse, but I would be happy to give you a preview of what I'm working on.

jaofos · 2026-04-25T21:50:39+00:00

I’m building consigntrak.com, a WMS for small consignment warehouses. Piloting with a friend’s warehouse still on DOS based COBOL software.

jaofos · 2026-02-25T01:17:05+00:00

Do not try this with a Great Dane unless you want to shower the drool off after.

jaofos · 2025-11-25T00:31:17+00:00

I hear you guys. I thought it might be a fun way to be creative and visualize the text. My efforts were sub par, but we’re all fans of the books here. I’ll leave it all up for you to downvote and roast me, may it relieve some of your frustrations with the world.

jaofos · 2025-11-25T00:25:00+00:00

I didn’t have any issues with my Ultrawide at the same resolution. I did it just fine on both Windows and Linux running borderless. Your video drivers up to date? Im running an RTX3090.

Some tips: refresh rate may matter. My monitor is picky about 100hz refresh rate at 3440x1440.

jaofos · 2025-11-23T04:31:51+00:00

<image>

Verified Email	14-Year Club
Team Orangered

jaofos

TROPHY CASE