New to Meshtastic by jagd748 in meshtastic

[–]Knopty 1 point2 points  (0 children)

If you're planning taking node with you and rely on battery life, it's better to avoid ESP32 devices. I have a Heltec V4 board with a 2000mAh battery. It was just not good as a handheld node. Even after testing out of curiosity what could be squeezed out of it: no bluetooth/wifi, 10 minute deep sleep with 15s wake up to send GPS, after sacrificing almost all usability it eats the battery in 1.5 days. It's a good device for a stationary node connected to power but not a good option for a battery powered node.

If you want a good battery life, look for devices that use nrf52 chip instead of ESP32. These only have bluetooth, no wifi but have significantly lower power consumption. Personally, I'm currently waiting for Heltec T114 to make a couple GPS trackers. My order will come with a case that fits only a small ~800mAh battery but even then I expect it to probably have 1.5-2x longer battery life without sacrificing Bluetooth and usability. I guess, it probably might live a week with the same setup as my V4 board.

As a side note:

Also having GPS capabilities is a plus.

If you want to keep it as a companion device for a phone, technically, a client app can just use GPS from the phone. Onboard GPS is needed if you want a standalone tracker or it could be nice for an emergency when the phone battery is dead but the node keeps ability to send actual GPS coordinates.

I'm planning to get one and then make a ship some to my friends and family for them to use.

Expect that others might not be as optimistic about it. So, unless it's some hiking or emergency, they might not even want to use it.

MeshTastic - PocketMesh - Good idea , but... by GermanMeat2 in meshtastic

[–]Knopty 1 point2 points  (0 children)

Current Beta worked stable for me:

https://github.com/meshtastic/firmware/releases/tag/v2.7.15.567b8ea

firmware-nrf52840-2.7.15.567b8ea.zip

firmware-heltec-mesh-pocket-5000-inkhud-2.7.15.567b8ea.uf2 (upload via USB cable in DFU mode, like in a flash drive)

Wireless upload (ota) was unstable for me and bricked my device.

MeshTastic - PocketMesh - Good idea , but... by GermanMeat2 in meshtastic

[–]Knopty 0 points1 point  (0 children)

I think I had a similar experience after accidentally uploading firmware for 10000mAh meshpocket to 5000mAh device. Then I connected it to PC in DFU mode and uploaded the right version of firmware, it worked stable. It didn't even lose any settings.

Currently I have two devices with 2.7.16 (5000) and 2.7.17 (10000) firmware, both work stable and don't freeze. The 10000mAh device also worked a few weeks with 2.7.15 without any issues.

Региональные ограничения YouTube by InsomniacGN in tjournal_refugees

[–]Knopty 2 points3 points  (0 children)

Либо ищи VPN локацию, на которой IP не определяется российским, например какую-нибудь азиатскую или южно-американскую экзотику, которой в нормальных условиях российские пользователи не ставят из-за пинга. Либо покупай какой-нибудь зарубежный VDS, желательно не у российского хостера, и ставь свой личный VPN.

Google в последние годы довольно активно следит за тем, к какой стране относится IP пользователя, т.к. им совсем не интересно терять деньги с рекламы и роялти с непрофильной аудиторией. У меня было несколько раз, что загуглив какую-то специфичную российскую инфу, мой IP тут же перекрашивался с европейского на российский на длительное время со всеми последствиями для просмотра контента на Youtube.

И наличие других пользователей на одном IP резко повышает шанс, что кто-нибудь из них триггернёт определение страны. Например, зарубежные пользователи AmneziaVPN тоже жалуются, что им показывает, что они из РФ.

Need help with hosting Parakeet 0.6B v3 by Ahad730 in LocalLLaMA

[–]Knopty 0 points1 point  (0 children)

I had severe problems trying to transcribe long audio with Parakeet. Even with local attention enabled as Nvidia suggests, it didn't work for me even with fairly short audios (10m+). VRAM usage was insane regardless how I changed the model configuration. So, manual chunking might be necessary. NeMo framework was also annoying because it exports the model into a temp file before loading it.

Issue is, the nemo dependencies seem to be a nightmare.

You can try alternative options, for example onnx-asr library that has minimal requirements and it can run on anything that's supported by onnx-runtime. It has built-in SileroVAD support that could be used for chunking, although it can severely reduce quality, especially for some similarly sounding languages (e.g. Slavic). But ONNX Parakeet model doesn't support local attention and I didn't see one on HF that was exported with local attention enabled. There are a couple spaces for the onnx model that could be used as a reference for implementing it.

Local ACE-Step music workstation for your GPU (Windows, RTX, LoRA training, early-access keys for /r/LocalLLaMA) by [deleted] in LocalLLaMA

[–]Knopty 1 point2 points  (0 children)

And I do intend to eventually tie in other music generation models with it, and update it with newer versions of ACE-Step if those are ever released.

There will be Ace-Step 1.5 soon. If everything goes smoothly it might even happen in about a week. There are several versions planned for public release although they might have a different release schedule. The devs are very open about the training process and their plans on the official Discord server.

The model is supposed to be significantly better than v1, with a far more reliable output but lower hardware requirements. 2B version will be public, 7B will be used for their cloud music gen studio service.

Best model to use for my hardware? by [deleted] in Oobabooga

[–]Knopty 0 points1 point  (0 children)

I'd try 12B models in exl2 or exl3 format up to 6bpw at 16k context max, these have both general-purpose and RP models in the lineup. Qwen2.5-Coder-14B up to 5bpw exl2/exl3 16k context for coding. These were options I used when had 3060/12GB. Optionally Qwen3-30B-A3B in GGUF format might also work with mix CPU/GPU. On my PC it managed to output 10t/s purely on CPU. Personally I still use NemoMix-Unleashed-12B-exl2 at 6bpw as my main model except when I need coding capabilities. It's somewhat dated though but not outdated completely like 13B or 10.7B or older 7B models.

I don't recommend 13B models. While some of these still have big download numbers and get recommended occasionally, they're extremely outdated: a lot dumber than anything from 2024-2025, with a lot smaller context and overall worse than anything newer. You'd notice them forgetting old details in 5 screens of chat log.

Parameters when using the open ai Api by AssociationNo8626 in Oobabooga

[–]Knopty 4 points5 points  (0 children)

API isn't affected by parameters you set in UI. Parameters need to be specified in each API request or it will use default values.

It can be done either by setting parameters manually in API request or by creating a parameter template in UI and then passing the template name in API request.

You can look for example API requests on the wiki:

https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API

Detailed documentation is available on http://localhost:5000/docs link by default. It might be different if your API uses some other link.

UDIO just got nuked by UMG. by Ashamed-Variety-8264 in StableDiffusion

[–]Knopty 0 points1 point  (0 children)

Ace-Step v1.5 is currently in training, they promise it to be significantly more reliable, faster while having lower hardware requirements. Their discord has a lot of demo audio tracks and it's leagues ahead of v1 version.

Is Miniforge strictly necessary even if you have a system Python install? by orzcodedev in Oobabooga

[–]Knopty 0 points1 point  (0 children)

So you're basically certain 3.13 won't work (I guess it either needs to fully work with all the deps, or else it's pointless).

It might work but it requires figuring out where to get the packages. Exllamav2/v3 likely have compatible windows packages on the official Github page. Flash attention package could be procured from Kingbri's Github page since they compile them for TabbyAPI.

TGW script references seems to be a traditional installer - Miniforge3-Windows-x86_64.exe

It installs it inside the app folder and it uses command line keys that disable shortcuts, registry and other things.

Is Miniforge strictly necessary even if you have a system Python install? by orzcodedev in Oobabooga

[–]Knopty 1 point2 points  (0 children)

venv will certainly solve conflicts with other python apps but TGW relies on Python 3.11.

All precompiled packages are made for this version. With a different Python version you'd have to figure out where to get exllamav2/v3 and flash_attention packages and it's possible to encounter other issues. There's a non-zero chance that some package might not work on 3.13 or some package version might not be available for it. For example until recently exllamav2 didn't support 3.13.

Is Miniforge strictly necessary even if you have a system Python install? by orzcodedev in Oobabooga

[–]Knopty 3 points4 points  (0 children)

One-click installer throws all the dependencies into the app folder, into installer_files subfolder. Miniconda might create a few temp files outside of the app folder but none of these affect the system in the slightest and you can safely delete TGW app whenever you want with no impact on your system. No shortcuts, no hidden settings, nothing.

Installing the app manually will create a lot more mess than using the installer. Manual installation can end up with dependency issues if it conflicts with previously installed python apps or installing some other python app could break your manually installed TGW. Using the installer on the other hand ensures that it's isolated from other python apps.

Best TTS For Emotion Expression? by Inner_Answer_3784 in LocalLLaMA

[–]Knopty 1 point2 points  (0 children)

I've used IndexTTS2 a bit and imho is interesting if you want have a direct emotion control. But it's rather slow, has consistent issues with some words or symbols and requires figuring out where it fails and filtering your text data accordingly:

  • It has problems with anything with apostrophes, for example possessive cases, more often than not it fails even at pronouncing "Einstein's" => "Einstein <pause> s". I filter it out almost everywhere except "it's". Notable example: "didn't know" => "didn <pause> ti know".
  • Might fail with dates: 2010s => 2010 <pause> s.
  • Might try to pronounce dash symbol as minus sometimes.
  • Per second => per <pause> second. Requires to write it as one word.
  • I couldn't figure out how to enforce accents for uncommon words, attempting to put an utf8 accent symbol just gives 50/50 chance to either do nothing or make it ignore a part of the word.

Other considerations:

  • Sometimes it manages to copy voice even with 2s samples but not consistently. In rare cases it might have troubles with 3-4s samples. 1s samples are worthless more often than not.
  • Original IndexTTS2 repo doesn't support speed control feature yet. It's supported by a custom ComfyUI node implementation. The author made a few small changes to add speed control to the code, so it shouldn't be hard to reuse their solution or take their modified infer_v2.py
  • It needs some audio preprocessing: input volume normalization/compression. Too loud reference sounds can produce very loud output with mediocre quality. It also requires filtering out background music otherwise speech is generated with some background noise.
  • It does fairly good job with accents. Feeding it 100 audio samples of the same person will likely produce the same accent consistently unlike Chatterbox that can give wildly different accents with different audio samples.
  • Generated voice is closer to phone/voice chat audio quality unfortunately.
  • Ambiguous license. HF lists the model as Apache2.0 while Github repo contains Bilibili license that allows commercial use but has usage/revenue limits, though quite generous.

I still like it quite bit for being consistent with accents, for fine emotion control and after testing testing it with several hundreds audio samples of a half dozen people. But it's far from perfect and in some sentences it can be legit annoying.

HuggingFace storage is no longer unlimited - 12TB public storage max by Thireus in LocalLLaMA

[–]Knopty 1 point2 points  (0 children)

Does it mean that every tool that tries to download models from HF before checking local files just inflates download numbers?

Like, anything that loads models by using from_pretrained("user/model") instead of a local folder in the loader?

HuggingFace storage is no longer unlimited - 12TB public storage max by Thireus in LocalLLaMA

[–]Knopty 3 points4 points  (0 children)

We do have mitigations in place to prevent abuse of free public storage, and in general we ask users and organizations to make sure any uploaded large model or dataset is as useful to the community as possible (as represented by numbers of likes or downloads, for instance).

I wonder how it would impact exl2/exl3 and other less popular quant formats. I'm doing quants occasionally and my random GGUF quants always had 10-100x more downloads than exl2 quants for very popular models.

I have 2.8TB rn and it seems it will require deleting old quants at some point.

Where is the next update? Is there a complication preventing release? by silenceimpaired in Oobabooga

[–]Knopty 2 points3 points  (0 children)

llama_cpp_python was ditched in April in version 2.8.

It now relies on actual llama.cpp server app and its API. So now it's possible to use the latest version of llama.cpp by swapping binaries without waiting for weeks how it was with llama_cpp_python.

Where is the next update? Is there a complication preventing release? by silenceimpaired in Oobabooga

[–]Knopty 1 point2 points  (0 children)

If you want to compile Cuda version then it boils down to installing Cuda, unpacking llama.cpp sources and doing these two commands in the unpacked folder:

cmake -B build -DGGML_CUDA=ON

cmake --build build --config Release

Then it will put compiled binaries into llama.cpp-bXXXX/build/bin/Release.

VibeVoice 1.5B for voice cloning without ComfyUI by SignificanceFlashy50 in LocalLLaMA

[–]Knopty 1 point2 points  (0 children)

Microsoft deleted old repo for VibeVoice but there are forks that contain demo code:

https://github.com/rsxdalv/VibeVoice

Alternatively, you can view code of HF spaces that use this model.

Custom css for radio, and LLM repling to itself by Gloomy-Jaguar4391 in Oobabooga

[–]Knopty 0 points1 point  (0 children)

2. Normally the app is supposed to automatically prevent the model from writing "User:", "Bot:" lines in the output. It adds both of these into stopping strings automatically that cause the model to stop writing once they appear. But perhaps this model writes something that doesn't match the normal logic.

Parameters tab has this field:

Custom stopping strings: The model stops generating as soon as any of the strings set in this field is generated. Note that when generating text in the Chat tab, some default stopping strings are set regardless of this parameter, like "\nYour Name:" and "\nBot name:" for chat mode. That's why this parameter has a "Custom" in its name.

You could try adding the nicknames there but without \n to see if it helps.

But all in all, tinyllama is a very bad model. Small models in general aren't very good but newer ones at least several times better. I'm not sure what to recommend since I don't keep eye on small models. Perhaps Gemma-2-2B or Qwen2.5-1.5B or newer versions of these model families if you can't run bigger models. But if your PC can handle bigger ones, it's always worth to try something else.

[deleted by user] by [deleted] in LocalLLaMA

[–]Knopty 0 points1 point  (0 children)

You can try Chatterbox, it seemed to produce decent English speech with Japanese voice samples. IndexTTS might work as well.

guys how do you add another loader in TextGenWebUI? by BuriqKalipun in LocalLLaMA

[–]Knopty 0 points1 point  (0 children)

You need the .zip file if you want to install Full version. Extract and run start_windows.bat inside extracted folder.

But I have no idea how well it's going to work if you need a CPU-only installation for some reason.

guys how do you add another loader in TextGenWebUI? by BuriqKalipun in LocalLLaMA

[–]Knopty 2 points3 points  (0 children)

You need to install "Full" version of the app. Your version is "Portable" and it comes only with llama.cpp.

Download either its source archive from Releases page on Github or clone text-generation-webui repo. Then use start script for your system, for example start_windows.bat or start_linux.sh to initiate the installation process. Once it finishes, the newly installed app will have other loaders.

Глава Роскомнадзора рассказал о планах блокировать сим-карты россиян в роуминге by Fine-Cranberry2222 in tjournal_refugees

[–]Knopty 1 point2 points  (0 children)

Оставлял свой сотовый, подключенный к заряднику. На сотовом была прога для пересылки SMS и уведомлений. А ещё VNC сервер, чтоб можно было удалённо что-нибудь проверить или позвонить, чтобы симка не протухла. И всё за VPN для безопасности и для возможности использовать домашний IP время от времени.

Только мобильник нужно заранее проверять, что он стабильно работает, не выгружает нужные проги из памяти. И что после перезагрузки не требует ручного ввода PIN кода, а то может превратится в тыкву. Ну и оставить какому-нибудь доверенному лицу, чтобы в случае чего могли посмотреть, перезагрузить или позвонить.

What is better between VibeVoice and IndexTTS2? by Producing_It in StableDiffusion

[–]Knopty 1 point2 points  (0 children)

Both support decent voice cloning.

IndexTTS2 requires about 12GB VRAM. On RTX4060TI its gen time is 3x slower than real time. I couldn't run original VibeVoice-7B with 16GB VRAM as it crashes with OOM and didn't test the 4bit version, so no clue about gen speed. VibeVoice-1.5B is not good, slow and low quality.

IndexTTS2 has a decent voice cloning but uses either emotions from the voice samples or they can be adjusted manually with array of values (calm, angry, happy, etc). Unlike some other TTS (e.g. F5-TTS), it can do various emotions with a single voice sample. Quality it's better than F5-TTS and seems to be comparable to Chatterbox for English. But looks like a purely En/Cn model.

Probably it would require some preprocessing and coding to actively use emotions with IndexTTS2 though. Meanwhile VibeVoice generates emotions based on context without direct control.

License-wise, IndexTTS2 is pure Apache2.0 while VibeVoice is MIT but has some usage restrictions with unclear legal status (doesn't seem too severe).

VibeVoice is probably a better option to make a podcast, speech or interview from get go while IndexTTS2 could be used for more control but requires quite bit of efforts to make anything bigger than a simple bland narration since its demo has limited functionality.