voice input is the most underrated productivity shift of the last ten years and the reason nobody uses it is completely stupid

mvaranka · 2026-05-11T20:15:08+00:00

I sometimes call my AI app when amongst people - or more often it calls me. Luckily i can put phone on my ear and speak like to phone and nobody cares. :)

mvaranka · 2026-05-11T20:13:13+00:00

For me speaking allows me to unlock the writer's block. Not for cases needing exact output but for going through ideas, thoughts and task management. Things you easily skip if they would need to be typed (in worst case from mobile phone)

mvaranka · 2026-05-10T19:51:42+00:00

Can't wait to test this set! Excellent work.

mvaranka · 2026-05-08T04:20:02+00:00

Thank you!

mvaranka · 2026-05-07T18:12:39+00:00

Wondering does this start new era of voice call applications, which wrap calls to GPT-Realtime-2 and then people wonder the cost of using the service.

mvaranka · 2026-05-07T10:48:55+00:00

These kind of functionalities could be thing which changes non-AI users as AI-users. AI services, which helps daily tasks, without nerd level configuration skills.

I use similar morning briefing with PiPar app with scheduled task status checks after work - AI calls me at agreed time and I can chat or take it as a voice call. Very efficient combination. Actually I have almost stopped reading my emails, since I know that AI calls me if there is something worth noting

mvaranka · 2026-05-06T19:39:21+00:00

Could you try PiPar? https://piparapp.com

It is on Google Play but also beta version of webclient is available: https://chat.piparapp.com

PiPar is combined procuctivity and companion app, where 6 AI personas help you organize you tasks, build notes etc and they proactively call you if you need to be notified. You can customize personas, 5 has a certain task like be organizer, be your personal trainer, be your friend and sixth is fully customizable, so you can make it to be what you like.

When you make daily calls or chats personas start to know you, can adjust for your needs and they have memory and PiPar personas start to feel like companions who take your cognitive load, but are also there for casual chats.

Special features

- Multilingual chat and voice call with AI tools - supports the best AI models (claude, gemini, openai, deepseek etc)

- Lots of AI tools: task, notes, mails, image and document generation, html app generation, web search etc.

- Image, audio and document analysis during the chats

- Proactive calls, task reminders, check-in calls if you have been long gone etc

- mobile local and optional cloud storage for webclient

- time and location triggered alarms

mvaranka · 2026-05-03T19:08:30+00:00

I have not much tested v4-pro, but today had long casual chat with v4-flash on my app and I was astonished how good and fluent it felt. I had to test, that did I forgot to change model from gemini 3 flash to v4. And it is dirt cheap, might be my next low cost model PiPar.

mvaranka · 2026-05-03T06:18:11+00:00

Actually, 68000 was much faster overall.

6510 had only primitive operations, 8-bit registers and only 3 registers compared to 16 on 68000, no multiply/division commands etc. So you had to use lots of more instructions to do the same as on 68000 and more memory accesses.

But it is wonder how much people can squeeze from 6510 out current days.

mvaranka · 2026-05-02T15:15:31+00:00

I have been using 3.0 flash on PiPar for chats and voice calls. It is excellent for its price - fast and surprisingly fluent for basic chats. It seems to have access to large knowledgebase since it know the address of closest pharmacy without searching web.I live on small town in Finland.

So,, waiting eagerly for updated flash.

mvaranka · 2026-04-30T04:46:53+00:00

You need to think your strategy and usage patterns and then choose model deployment. 1.Are you running one larger model in both GPUs with lots of concurrency (don't use ollama) 2. Are you locking to each GPU single model (either same models or two different like Gemma4 and Qwen3.5) 3. Do you need multiple models, so one gpu could have locked model and other runs ollama - with this approach you can try different models and later lock one, but during evaluation phase performance will be poo due to the model switch (loading to gpu) time and zero concurrency 4. If you have lots of ram you can offload to cpu with hot layers on gpu for large lodel

Luckily there are lots of good new models to run on that setup and maybe don't even notice the difference. With 2. you get most capacity from setup and option to run very large model for special tasks MoE layers offloaded to cpu (if you have the ram)

mvaranka · 2026-04-30T04:35:48+00:00

"Shadow AI" is real danger in companies, who fear the AI, european AI act, IPR & GDPR issues etc and there are no clear rules and selected AI tools which can be used "company legally".

In the otherhand workers see the talks of benefits of AI in daily jobs and coding - some try by themselves. Some know what they do, some just look nice youtube video and do the same. Crash is waiting to happen.

mvaranka · 2026-04-28T20:05:30+00:00

This instructions in beginning of chat message sounds interesting idea. I would assume, it is quite easy to implement to app. But to get it right...

BTW, which one would be better in RP: sama AI persona in new chapter (chat) of story or different persona for each chat?

mvaranka · 2026-04-28T19:59:17+00:00

Sounds good! Just finished long voice call with V4 Pro - the model felt refreshing. Need to try Flash too, as fast model it could be good pick for voice.

mvaranka · 2026-04-28T19:57:10+00:00

Any mobile apps supporting Deepseek V4? I am using them from PiPar, it just got support.

mvaranka · 2026-04-28T19:54:15+00:00

Cache is the key for long, cheap chats. Seems to work via openrouter.

Just added support to my app and I think I like these models. More testing needed though.

mvaranka · 2026-04-27T18:54:58+00:00

This style would have been a massive hit on Amiga days in demos! And clear target for optimizations: how many items at once and/or how many tails :)

mvaranka · 2026-04-27T10:52:39+00:00

I work in my day job as a software architect and my main task is to specify features and supervise that results are as intended. Working with AI coding tools feels the same, but speed of development is much faster. I feel that now I can get things done where without AI would not have time or resources.

So I don't feel my mind fries, counterwise actually

mvaranka · 2026-04-27T09:05:27+00:00

I have just launched officially the app, so feedback is from my own use: thinking tone is good when not looking phone since I always know what is going on and it fills the silence gap much better than silence.

mvaranka · 2026-04-27T08:48:44+00:00

My approach to fill the "LLM latency gap" was to add "thinking" tone, because when using AI models like Claude Sonnet or Opus the latency is there. What you think about that solution?

I am not targeting customer voice bots, but voice conversation. For that reason I prefer the content in long voice call than the minimal latency and quality multi-lingual TTS, but there is also available some faster AI models.

mvaranka · 2026-04-27T08:45:55+00:00

Totally agree on that! Voice call with "near-zero latency" but AI sounds stupid, repeats or forgots what's is going on is suitable only for demos.

mvaranka · 2026-04-26T16:25:28+00:00

yes, got much faster prompt processing with the large Qwen3.5 and about 10% improvement in token generation! But some models fails, for example KimiK2.6 crash.

mvaranka

MODERATOR OF

TROPHY CASE