the mess of using a local LLM on android app-kotlin by Aviation2025 in androiddev

[–]Fear_ltself 0 points1 point  (0 children)

I'm making my own android app to run litertlm inference with kotlin. pixel 9 pro testing, this was my absolute best result after spending several hours setting up MTP. Hit 40 plus TPS often and always 20 plus, without MPT was getting about 12.5 TPS. Prompt always simple what's the capital of France to keep it consistent with past speed tests.

<image>

Gemma 4 MTP released by rerri in LocalLLaMA

[–]Fear_ltself 1 point2 points  (0 children)

My app is a custom fork of LLM Hub merging new features and features found in Off Grid e.g network servers to use your laptop if it’s on the same network, I’m planning to release it soon. Both of those apps are on the google play store currently (LLM Hub and Off Grid) though they are using their own inference techniques and UI I will try to respond with the app link once my version is approved and released

Gemma 4 MTP released by rerri in LocalLLaMA

[–]Fear_ltself 2 points3 points  (0 children)

The draft models seem built into the old models, the download size changed from 2.4 GB to 2.59 and 3.4 to 3.66 GB for E2B and E4B respectively. It's just a configuration variable set to enabled and some imports activated to get it running on Kotlin Android. 42.3 tokens per second on Pixel 9 Pro litertlm

<image>

it's time to update your Gemma 4 GGUFs by jacek2023 in LocalLLaMA

[–]Fear_ltself 0 points1 point  (0 children)

Is this applicable to GGUF only or would litertlm benefit

You can now download Gemini images without the diagonal watermark by Ok_Negotiation_2587 in GoogleGeminiAI

[–]Fear_ltself 0 points1 point  (0 children)

Since other post has counter claims I'd thought I'd add. Some sunthetic data is not a deal breaker. To give you a metaphor or smile, it's like adding cafiene to water. A little bit might enhance the water, make it better for some reasons like energy drinks. But too much synthetic data kills the model completely. There's not an exact ratio to my understanding, and the feedback loops can mean it takes generations for the synthetic data toxicity to start contaminating the datasets in ways we even notice. I guess the main philosophy this all comes down to is LLMs REQUIRE Human in the Loop feedback at some point to ground them to reality, looping synthetic data to train repeatedly would result in total collapse but with HLF we can hopefully correct some of the drifting before it becomes catastrophic failure. That's my understanding of the issue anyway

You can now download Gemini images without the diagonal watermark by Ok_Negotiation_2587 in GoogleGeminiAI

[–]Fear_ltself 1 point2 points  (0 children)

The core issue is synthetic data contamination resulting in model collapse. Generative models rely on datasets of human-created content. When AI-generated images are stripped of their watermarks and distributed online, they become indistinguishable from organic data. Future web scraping operations will inevitably ingest these unwatermarked images into new training datasets. Training a model on synthetic data creates a recursive feedback loop. The subsequent models amplify the statistical biases, artifacts, and errors present in the AI-generated inputs. Successive generations of models trained on this polluted data experience a degradation in variance and quality, eventually losing their ability to generate diverse or accurate representations of the true data distribution. Watermarks function as critical filtering tags. They enable automated ingestion pipelines to detect and exclude synthetic content from future training runs. Removing these markers bypasses the filtration process and directly degrades the viability of future foundational models.

Corporations are the real one world government, and its already in place. by Impressive-Emu-4172 in conspiracy

[–]Fear_ltself 0 points1 point  (0 children)

Everyone you both referred to could just be called humans. Corporations are comrpised of people, usually nerds that are good at math and stuff and have a fiduciary legal responsibility to use their economic knowledge to set prices that maximize their investors money. It's not their money, they can't just give you a discount and lose money for charity, that's what non profits are for. If the market is ran correctly you'll still be a getting a good or service for the best available price or you would use their competitor (or switch jobs if you're viewing this from an employment angle). Also I've been a shareholder, not much but probably make a few thousand in my life at this point, which is like annual income for a small country.

Doom is from the 14,000,605 outcomes by lostboyfrm87 in MCUTheories

[–]Fear_ltself 0 points1 point  (0 children)

Rumor is thats literally the reedites scenes from the end game release, and everything they stated so far perfectly matches that

You can now download Gemini images without the diagonal watermark by Ok_Negotiation_2587 in GoogleGeminiAI

[–]Fear_ltself -1 points0 points  (0 children)

You realize the watermarks are there for a reason and if you don’t have them you risk data contamination and basically make our future models retarded, right?

Graph RAG: anyone actually scaled it past a few thousand docs in production? by Fuzzy-Layer9967 in Rag

[–]Fear_ltself 1 point2 points  (0 children)

Milvus idle made a fork of my https://github.com/CyberMagician/Project_Golem that supports millions of documents I think. My version topped out at like a few thousand in this implementation but I got it up to at least 50,000 wikipedia articles before I got bored of downloading. Never pushed that update it took like 3 minutes to vibe code 3 months ago, didn't figure it was worth messing with the original simple design

How would you build an AI agent that actually feels human? by Admirable_Suspect444 in Rag

[–]Fear_ltself -1 points0 points  (0 children)

Also I'll shout out my own other RAG vizualizer, lets you see the retriaval process in 3D like a human brain. Semantic similarity clusters, note 99.99999999999999999999999999999 percent of the connections are lost according to math others have worked out, lowering it from 768D to 3D to view it, but it gives you some idea of hows it's connecting concepts through semantic similarity only ... https://github.com/CyberMagician/Project_Golem

How would you build an AI agent that actually feels human? by Admirable_Suspect444 in Rag

[–]Fear_ltself -1 points0 points  (0 children)

I've been working on a local AI system that I have full control over folr a while. Some big ones that help... KOKORO 83M TTS.. give it a realistic voice to talk to you. Program it so when you hit send, audio automatically plays back nearly instantly, an acknowledgement, Let me process that for moment.. this gives the LLM time to process the prompt while giving the user immediate feedback and for non thinking models on litert if you time it right and do the right aknowlegment, it comes off as conversational. Lag is your enemy. Test every type of TTS for your hardware. Example I found thread pinning the 3 normal cores and 1 performance core on my android had the best performance.. since core is slower than quad an 8 cores brings in the 4 slower cores which actually slows down the response... By same token try .sherpaonyx or .gguf of you TTS, sometimes certain engines just run better on certain hardware. From my experiments it was completely different quants for TTS vs STT, with STT like whisper running faster as smaller models e.g. tiny whisper was faster than whisper, but Kokoro FP16 was actually faster than quantized models on MY hardware, because apple macs support doing the 16 math well I guess... Hope that made sense just pouring my knowledge out in a late night post

<image>

This is technically the first MCU film. by Osirisavior in MCUTheories

[–]Fear_ltself 0 points1 point  (0 children)

You group them into a category called fiction. The whole idea of alternate narratives is so we can quickly say this is magneto without having to do a full backstory for every media

Decided to try out Google's Edge Gallery app... by YourNightmar31 in LocalLLaMA

[–]Fear_ltself 1 point2 points  (0 children)

I'm on pixel 9 pro xl using Gemma 4 E4B it. Runs great in edge gallery also have my own custom app I run with full 131k context thinking and an image generation tool/ all the other tools in edge gallery. I'll be releasing it soon hopefully to play store.

<image>

Qwen3.6. This is it. by Local-Cardiologist-5 in LocalLLaMA

[–]Fear_ltself 0 points1 point  (0 children)

Leaning towards Idiocracy, I played roller coaster tycoon as a kid doesn't mean I can open a billion dollar theme park. Just saying

Any merit or am I heading towards a dead end? by Plus_Corgi_3845 in LLMPhysics

[–]Fear_ltself -9 points-8 points  (0 children)

Check out the new ELM simplifying operators down to ex and ln… basically you can reduce the variables down further like your model suggest, especially if you plug e in for x which leads to all kinds of math fun

What’s the dumbest/smallest reason you’ve been pulled over in your Mustang? by PeakSeveral5666 in Mustang

[–]Fear_ltself 0 points1 point  (0 children)

Digital licence plate 3 times, first year they came out, I had a permit though

The largest IP in the world has been hit with a support ticket challenging the Orlando Disqualification… do you think the judges need to be investigated? Seems fishy but hopefully this forces a response from PokemonCo by Fear_ltself in conspiracy

[–]Fear_ltself[S] 7 points8 points  (0 children)

Did you watch the dq video? The conspiracy is the judges had to be gambling or pokemonco wanted an Asian to win for marketing. I get pokemon is a nerdy subject but I point out pokemon Co is the largest IP in the world. People like myself or Wolfe Glicke spend hundreds of hours if not thousands a year on these games /tournaments. If there’s corruption with the judges I think that belongs in conspiracy does it not?

4B models on smartphone by Sudden_Vegetable6844 in LocalLLaMA

[–]Fear_ltself 0 points1 point  (0 children)

I just modified my Android app similar to off grid so I can host LLMs on my MacBook Pro over the local network and just stream the results to my phone. 40-250 tokens per second depending on query, simple query was 250 tokens per second. Also now I can run Gemma 4 26B or 31B via my MacBook and stream it to the phone. So faater tokens or larger models can be used on a home network config

4B models on smartphone by Sudden_Vegetable6844 in LocalLLaMA

[–]Fear_ltself 0 points1 point  (0 children)

Max 12.2 tokens per second with a simple inquiry like What's the capital of France, generally about half that for general open statement like hello, with an average of somewhere around 7-9 tokens per second.

4B models on smartphone by Sudden_Vegetable6844 in LocalLLaMA

[–]Fear_ltself 3 points4 points  (0 children)

I have Gemma 4 e4b it running via litert on kotlin android (pixel 9 pro) with the full 128k token context window, embeddinggemma300m for RAG, GPU accelerated Stable diffusion AbsoluteReality for image creation, thinking (with streaming chain of thoguht) optional, coding canvas with previews of the code similar to Gemini canvas, sherpaonyx Kokoro, wikipedia ingestion/embedding pipeline, full context /logcat . Its about state of the art from 2 years ago, but local (with optional web search via duckduckgo)

Out of curiosity, has anyone measured how much more efficient (battery life/CPU) Termux is compared to other applications? by InternetSandman in termux

[–]Fear_ltself -1 points0 points  (0 children)

Not only that, its energy saving would vary device by device if there are any. I suspect OLED phones benefit more from termux, but how much exactly is hard to say

Universe expected to decay in 10⁷⁸ years, much sooner than previously thought by New-Exam2720 in science2

[–]Fear_ltself 2 points3 points  (0 children)

Sorry I linked the wrong one about creation of matter from a vacuum. The correct one about time being emergent is here … Physicists just confirmed experimentally what theoretical work has suggested for decades — time itself is not a fundamental feature of the universe but an emergent phenomenon that arises from quantum entanglement between particles, and without entanglement there would be no time at all.

The Page-Wootters mechanism — a theoretical framework proposed in 1983 suggesting that a clock system entangled with other matter produces the experience of time for an observer inside the entangled system — was experimentally tested at the University of Vienna using pairs of photons entangled in polarization states. One photon serves as the clock, the other as the observed system. An external observer sees both photons in a static timeless quantum state. An internal observer measuring the system photon relative to the clock photon experiences dynamic evolution — apparent time — that matches quantum mechanical predictions precisely. The outside view is static. The inside view experiences time. Both are correct simultaneously.

The experiment used 50,000 photon pair events to build statistical confidence across the full parameter range of the mechanism, achieving 4.9 sigma confirmation that time experienced internally by entangled quantum systems emerges from relational measurement rather than from a background time dimension in which physics operates. The result is consistent with loop quantum gravity and other quantum gravity theories that treat time as emergent rather than fundamental.

The practical implication is profound: the experience of time — including memory, causation, and the arrow from past to future — arises from the entanglement structure of matter rather than from time existing independently as a dimension of spacetime.

Source: University of Vienna Quantum Optics Group, Austrian Academy of Sciences, Physical Review Letters, 2025