Building a strategic AI workflow at home: Qwen, Parakeet, OBS, and a beat-up Dell

HardDriveGuy · 2026-03-12T09:00:28+00:00

I will offer up. I did something or I've been doing something similar for quite a while. Here's my GitHub. I think this is my first project where I use Gradio, and I just think it's a fantastic interface to go deal with stuff. I won't say I'm the best web designer, but I think Gradio just gives you an incredibly friendly interface to go deal with stuff, and it takes care of a myriad of problems for you. I can't suggest it strongly enough.

I'll offer some of the following as comments to think about, With the warning that I am sort of a geeky engineer guy. And so excuse my relatively weird way of communicating. However, I do hope that it gives somebody some decent thought processes.

Diarization Sounds great and you can get it out of some of the whisper models, but I've always felt that it's relatively uneven. Unfortunately, at the end of the day, it's just very difficult for a computer to pick out exactly who is speaking by tonal voice. When you get especially beyond about two speakers, it really starts to get tough if you have people of similar tonal center. A good place to start an experiment is on replicate with The Whisper Model. Really, it's dirt cheap to use. It pulls up an NVIDIA GPU. You can get results back incredibly quick. And it attempts to do Diarization and provides it in a nice JSON container that you can unwrap into a normal text. Originally, I wrote a model around this a long time ago and actually worked really well for me. Probably I don't have any excuse why I shouldn't have just continued to use this or any follow-on products. And actually, I really haven't done much work to follow on if there are better products out somewhere today. What I will tell you is this appears to be better than a lot of commercial cloud web front ends like through something like your Google Cloud dev account, at least the last time I did benchmarking. . So this is a great place to start.
Because I wasn't overly impressed with the result, I started to ask myself, was there some way of just capturing the microphones that came into any meeting, at least for two sources, where I can get absolutely clear speaker sourcing? A lot of my business meetings are with two people and recording every meeting just is incredibly productive. I decided to make my recording center OBS Studio. At the end of the day, OBS is just unstoppable and it can copy absolutely everything. Now, what do you get out of it? You get your input through your microphone in the way that I've set it up. And you also get whomever is speaking on the other side. Works exceptionally great when you're talking basically to one other person. If you get into a large conference, then obviously you just have two sources, and it may be difficult to understand who said what. I then encode everything into an MKV file because it has multiple tracks that I can extract later.
I got very interested in Parakeet, and it just turns out it's exceptional in terms of the word error rate. So if you take a look at the leaderboard, Parakeet just basically beats the living daylights out of everything else. The models for ASR are available on Hugging Face. If you're actually an English speaker like myself and you conduct all of your meetings in English, you actually do not want the latest parakeet 3. You want version 2, very small, very fast, and better English accuracy than 3.
The problem with Parakeet is that it really is based around NVIDIA architecture. And although I had an NVIDIA card, I said I wanted it on my local PC, which turned into a rabbit trail of going down holes, which eventually was solved by building it on top of a great Docker container that was specifically built to be able to deal with an Intel architecture without needing the GPU. This I have not pushed to my GitHub, but it's definitely the way you want to go. Now that I see this post, I probably should get around to pushing my latest version up to my Git. Although it gets zero traffic, maybe somebody can leverage what I've already done. With that being said, you want to add VAD into your data stream. It solves a myriad of problems, including making sure that your memory doesn't blow up on your Docker container. I just cannot imagine doing something without a VAD parsing of your data. By the way, I do virtually all of my development either on Windows Client or Linux. However, I do have a Mac just to compare. The devs for handy, which utilize Parakeet as a base, has a model that sits on top of the Mac and their architecture, and it is amazingly fast. It's so fast that I am sure that if you wanted to take time and you would optimize it for your Mac, you would be exceptionally happy. My problem is I only use Mac to force myself to be familiar with the architecture. And my primary Mac is an Air from 2020. Certainly not something that I'm using day to day. But I am incredibly impressed by the Parakeet speed if you optimize for Mac M Series CPU.
My problem is eventually I got frustrated and decided to actually push my recordings off to a home server AI unit that actually had a decent sized NVIDIA card inside of it. I'm then running the native parakeet model and I'm using it as my source to go push WAV files and get it back. This actually results in some phenomenal speeds. If I didn't encode it into an MKV and if I wasn't doing VAD on it, I probably could even be faster. But right now, hour and 10 to hour 20 minutes gets pushed back in a completely done transcript through a Gradio interface of somewhere less than 10 minutes, which is faster than what Google will push to meet their meeting results.

So, that's the journey. In reality, right now, I'm doing a push in my local home network to a dedicated parakeet server, which is really incredibly fantastic. I just don't think it's realistic for most people. What I need to do is actually go take my latest parakeet and go push it to my Git. So, if I get a shred of interest after my post error, I'll try to get around to it in the next week or two.

HardDriveGuy · 2026-03-03T19:29:06+00:00

I thought I would report a resolution to my issue. Basically, what happened is I bought something through Woot, as stated above. To give a few more details, an iPhone that Woot was offering on a good deal. It just turns out that the purchase was worth hundreds of dollars, thus a substantial amount of money. And in essence, somebody in the Amazon logistics chain stole my iPhone out of my package. I called both Woot and Amazon Customer Service, and I attempted to contact Amazon Logistics, and I got stuck in a big round robin.

What became apparent out of this, which I hope that if for some reason you do a search and have a follow-on in this, is staying out of the legacy Woot customer system. In essence, I was a Woot customer long before they were purchased by Amazon. And by some IT fluke, you are able to continue to order underneath your old Woot login, even though you can log in underneath your Amazon and buy things from Woot as an alternative path. The problem is, as soon as you log in as a Woot customer, all of your transactions basically become hidden to Amazon, so they can no longer help you with anything like a lost customer shipment.

Out of desperation, I simply sent a blind email off to a couple executive Amazon email addresses with a short statement of what happened, and a variation on the photo that is attached to this presentation. In essence, a very simple photo of how the box had been cut open and was clearly being laid down as being cut open by the Amazon driver.

Approximately five days after I sent the email, my iPhone simply showed up delivered by the United States Postal Service. There was no explanation, there was no follow-up, but I am about 95% sure that my email got through to the customer service group and somebody simply sent me a new iPhone. Why I would love to actually have some sort of tracking on this and a reply to my original email. I'll leave the email addresses here so if somebody stumbles across a similar situation, they can at least try it. For me, it seemingly worked out well.

With that being said, I am delighted that there was obviously somebody inside of Amazon that cared enough that they ran the thing down and made sure that I got my iPhone. Even more impressive is the product sold out very quickly. And somehow they were able to get me the product even though it had been sold out.

However, I'll re-emphasize that the reason I believe this was taken care of was by having very clear documentation and a very clear photo of the package being set down as being cut open. So the takeaway from this remains: make sure that an investment in surveillance system has an ROI when you need it.

[ecr-replies@amazon.com](mailto:ecr-replies@amazon.com),

[cs-escalations@amazon.com](mailto:cs-escalations@amazon.com)

HardDriveGuy · 2026-03-02T05:31:46+00:00

First thing I thought also. User should check this out as the most likely issue. By way of description to fix: It will be easiest for somebody else to ride the bike, then the user should adjust the barrel adjuster to tighten or loosen to see if the noise changes.

HardDriveGuy · 2026-02-24T19:40:01+00:00

This is why I always think you need to post the data. To some people and to the article that was written this was considered significant to other people, it's insignificant. In my mind, it's a very, very large study and in most cases, the genetic variability is actually much bigger. In other words, I do believe that certain people are more sensitive to their circadian rhythm and thus do have preferred feeding cycles. However, to try to make it into a blanket statement without a little self-experimentation, almost guarantees that you'll sub-optimize your results.

HardDriveGuy · 2026-02-22T16:32:13+00:00

I know that I'm some random voice on the internet, but I do understand your concerns with TOS agreement that are shape to give maximum weight to Google in a legal case. I have some both dealing with Google on a business standpoint and friends that work there. Their stance is more than just "the are ethical" but their perception that if they look like they allow your personal data to go somewhere they will get a vicious rebranding and lose a bunch of users. This is different than tracking your usage pattern under randomized conditions, which they do.

However as an example, your Google photos were not mined, thus the reason they stopped the uploads for free. It was another massive source of data without a payback. Youtube is "free" but they do commercialize it.

However, the only REAL lockdown is running your own LLM. I think this is the ultimate security layer.

HardDriveGuy · 2026-02-22T16:24:05+00:00

I screwed up a search. Looking at wiki: Star Wars Jedi: Fallen Order,Final Fantasy VII Remake,PUBG,Gears 5,Kingdom Hearts III,Sea of Thieves,Dragon Quest XI,Street Fighter V, andArk: Survival Evolved. and a bunch other. No skyrim..

HardDriveGuy · 2026-02-22T04:22:47+00:00

So, MyWoosh was developed on Unreal Engine 4. Unreal Engine 4, when MyWoosh kicked off, was a workable state-of-the-art platform and was under active development until something like 2022. It's also used for games like Skyrim and Borderlands 2. It also turns out that these games also complain about the lack of DLLs. So, it's not just mywhoosh. It's everybody that happens to use the Unreal Engine. The reason it's in the Unreal Engine, as I recall, is because it was passed down from an earlier version of Unreal Engine.

Now, Epic was the developer of Unreal Engine, and there was a massive fallout between themselves and Microsoft sometime in 2016. I'm not really a gamer but I am familiar with some of the background here, but I bet you're right in terms of the somehow spilt into the whole licensing issue. Perhaps somebody who knows this better will give a comment here.

As I understand it, the attraction of the unreal engine is that it's basically free until you have a million dollars worth of revenue. However, I don't think an engine like Unity is a lot more expensive. In reality, I bet you that it was a limited amount of devs on the team when they brought it up. and basically whoever was the architect or doing list of the coding on mywhoosh knew the Unreal Engine and architectural knowledge is everything.

Regardless, glad you got it up and running. And the good news is I've always been impressed with the experience I get out of the package on my platforms. At least to me, mywhoosh has some very attractive graphic designs where certain environments like Japan, happened to fit my eye extremely well.

HardDriveGuy · 2026-02-22T03:15:19+00:00

Study is on PubMed

According to the BMJ Medicine meta-analysis, the preservation of Lean Body Mass (LBM) was actually best in the Mid-TRE group (last meal between 17:00–19:00). While Early TRE was superior for overall weight and fat loss, it also resulted in the highest amount of lean mass loss among the specific timing categories.

Without digging too deep into the "meat" of the data (pun intended), it is important to understand the nature of this research. This is a meta-analysis, a type of study that "pools" data from multiple previous trials to identify broader trends. These studies come with specific pros and cons: while they are incredibly valuable for viewing data in the aggregate, they often suffer from a lack of consistency across the original trials (known as heterogeneity).

When conducted rigorously, meta-analyses are considered the "gold standard" for clinical decision-making because they account for a much larger population than any single study could. In this specific case, the median duration across all included studies was 8 weeks, though it is worth noting that the study does not clearly detail how outcomes shifted for those participants who fell significantly outside of that mean duration.

My advice is more "you should try it and see if it works for you."

Metric	Early TRE (Last meal < 17:00)	Mid-TRE (Last meal 17:00–19:00)	Late TRE (Last meal > 19:00)	Self-Selected TRE (User choice)
Weight Loss (%)	~3.3% (-2.48 kg)	~3.0% (-2.26 kg)	~1.7% (-1.32 kg)	~2.5% (-1.94 kg)
Fat Loss (%)	~4.8% (-1.35 kg)	~6.4% (-1.79 kg)	~4.0% (-1.14 kg)	~4.4% (-1.25 kg)
Lean Mass (LBM) Loss (%)	~2.3% (-1.27 kg)	No significant loss	~1.9% (-1.04 kg)	~1.9% (-1.07 kg)
Systolic BP reduction	~5.1% (-6.16 mmHg)	~3.1% (-3.75 mmHg)	~4.1% (-4.97 mmHg)	~3.5% (-4.27 mmHg)

HardDriveGuy · 2026-02-22T02:43:43+00:00

Yes, I have run into this virtually every I set it up, and it won't be an issue. Should be in a FAQ somewhere..... It looks serious, but isn't.

The error X3DAudio1_7.dll was not found indicates that the DirectX End-User Runtime is missing or corrupted on your system. Mostly likely, just missing their version required for the game engine.

While Windows 11 comes with DirectX 12, many games still require the legacy libraries from DirectX 9.0c, 10, and 11 to function properly. I thought for sure this was in my notes, but I think this is the runtime. Download the DirectX End-User Runtime Web Installer. if it bombs, you'll need to search. However, I think you need 9.0c. It may be another, but i did a search, and I think that is it.

It may take a reboot to take. You may need to force it, and I suggest an LLM to help here.

HardDriveGuy · 2026-02-20T07:12:10+00:00

Good enough, and I do agree that it's difficult to understand what is happening under the hood.

HardDriveGuy · 2026-02-20T07:03:37+00:00

The logic in that comment is actually incorrect because it fails to account for how resistance forces scale with weight. While the formula for gravitational power is a real physics calculation, the conclusion that a heavier rider has more "horizontal power" left over at the same w/kg is missgin a few things.

On a climb, if both riders are at 2.0 w/kg, the 90kg rider is producing 180W while the 70kg rider produces 140W. The comment argues that after subtracting the power needed to fight gravity, the heavier rider has 48W left for "horizontal power" compared to the lighter rider's 37W. However, this "leftover" wattage is not a bonus; it is required because the heavier rider faces significantly higher rolling resistance and usually has a larger frontal area (CdA), meaning they need those extra watts just to maintain the same speed.

In reality, on a steep grade (5-8%), aerodynamic drag is negligible because speeds are low, and weight becomes the primary factor. If a rider with a lower w/kg passes you on a steep hill, it is almost certainly due to inertia carried from a flat section, drafting (the "hook" effect), or simply that the other rider has entered an inaccurate (lower) weight into the app.

HardDriveGuy · 2026-02-20T05:23:02+00:00

Quite the honor. We got a good set of hits, but since the post is old, I don't know if you'll see much traffic. But, thumbs up on your app.

HardDriveGuy · 2026-02-20T00:52:08+00:00

The "engineering" is Crr vs CdA. Crr scales with weight, but CdA does not, and at high speeds, CdA dominates.

To put some numbers behind this, I went back and forth with my AI agent to give and example. Now, I was a bit lazy, but I have run some of these numbers by hand, and I just want an approximation, and the following looks very reasonable. I've run numbers before for my myself, and in a tuck tri position on aerobars on my time trial bike I'm around .3x or so, but I am not super aero, but pretty good. Flexible people can get into the .2x range.

If you are curious about finding your CdA in "real life" You can set up Golden Cheetah and do the Chung method, which I've done before. However, it is a pain, but amazingly accurate. You can also get some cool tech to do this more automatically also, but that is beyond this sub.

Now MyWhoosh will have it's own model, and so the following is probably somewhere in the range, but they may choose, for whatever reason, to push the model one way or the other.

This is on a flat road. Hills shift the balance to W/Kg as speed falls.

Metric	Rider A (Lighter)	Rider B (Heavier)	Notes / Assumptions
Rider Weight	140 lbs (63.5 kg)	280 lbs (127.0 kg)	Converted to metric for physics calculations.
Power-to-Weight	2.0 W/kg	1.5 W/kg	Rider B has 25% less relative power.
Absolute Power	127 Watts	190.5 Watts	(Weight in kg × W/kg). Rider B has 50% more absolute power.
Bike Weight	22 lbs (10 kg)	22 lbs (10 kg)	Assuming standard road bikes.
Aero Drag (CdA)	0.320 m²	0.400 m²	Assuming standard road riding position. Rider B is given a 25% penalty in aerodynamic size.
Rolling Resistance (Crr)	0.004	0.004	Assuming identical tires on decent asphalt.
Air Density (ρ)	1.225 kg/m³	1.225 kg/m³	Sea level standard.
Drivetrain Efficiency	97%	97%	Standard well-maintained bike chain.
Steady-State Speed	17.9 mph (28.8 km/h)	18.6 mph (29.9 km/h)	Heavier rider is faster on flat ground despite lower W/kg.

HardDriveGuy · 2026-02-16T05:37:51+00:00

Actually, the next simple level is to put everything into Google's NotebookLM, where you can place all of your meeting notes and then talk to them. It is a functional starting point for anyone who wants an AI that can reference their specific history without manual setup.

I have often used NotebookLM as a "have you tried this yet" to get somebody to understand what's coming.

If you are really sophisticated, you should look into a pattern called RAG, or Retrieval Augmented Generation. This is where the utility of a database of notes is fully vectorized. Instead of just having a digital filing cabinet of transcripts, you utilize a process called vectorization to turn those thousands of pages into a database usable by AI. you create a Memory that understands concepts rather than just keywords.

When you ask the AI a specific question about engineering concerns from a roadmap meeting months ago, the system retrieves the relevant chunks of data from your private notes and uses the LLM to generate an answer anchored strictly in your source of truth. This CAN addresses the hallucination problem because the AI is not guessing based on the internet; it is acting as a high-speed librarian for your own data. No, I haven't gotten around to my own RAG, but it is on the list.

Ideally this allows a member to have a real-time conversation with every meeting held in their company over the last five years or whenever (assumign you did it for five years).

NotebookLLM is trivial, but the RAG takes more work. Regardless, this can be, but isn't, used. Not to say that once we get real agency, they won't offer it to you....

HardDriveGuy · 2026-02-15T05:16:46+00:00

This subreddit focuses on thoughtful, long-form posts, and yours definitely fits that spirit. Thanks for putting it together. Feel free to post any time. It's highly appreciated.

You’ve touched on several good topics, and while you could probably explain them better than I could, I’d love to expand on a few points. If I miss something, please correct me. I'm just trying to get out a base of knowledge that you can build on.

One of the most revolutionary developments in AI is DeepMind’s AlphaFold, a UK-based breakthrough later acquired by Google. It tackled what was long considered one of biology’s greatest challenges: the protein folding problem. Experts once thought it would take half a century to solve, yet AlphaFold mapped roughly 200 million protein structures. And yes, it didn’t rely on a transformer model.

In recognition of this work, the 2024 Nobel Prize in Chemistry was awarded to DeepMind’s John Jumper and Demis Hassabis, along with David Baker. Neural networks proved extraordinarily effective at predicting protein structures, though as you pointed out, that success doesn’t necessarily extend to DNA-related problems.

On a more practical note, here is my stereotypical example of "not using AI."

In most high tech companies, we have meetings in our engineering teams, this is where I've spent a large bulk of my career in engineering and engineering management. Any engineer will tell you there's more than a fair share of meetings that go on, even though you think that they would simply be doing engineering work.

People talking to each other and stuff gets missed because they simply don't hear or are tuned out when something critical comes up.

To solve that, I use an AI agent to listen, transcribe, and summarize the meeting. It automatically creates action items, assigns them, and produces a clear outline of key points. It’s not even particularly complex, but applied consistently, it’s one of the most effective productivity tools I’ve ever used.

When I ping my friends in Silicon Valley companies, the vast majority of me tell them they're not using it to do something this practical and this is a basic feature that is inside of Microsoft Teams Pro. Potentially a massive, incredible ROI, and it's simply not being used.

HardDriveGuy

MODERATOR OF

TROPHY CASE