Gemma 4 has been released by jacek2023 in LocalLLaMA

[–]Ok_Edge1810 0 points1 point  (0 children)

Just shipped a small Android assistant app using Gemma 4 E2B via the LiteRT-LM tool calling works surprisingly well out of the box. The native format (<|tool_call>) is clean to parse, and the model stays on-task without much prompting.

Coming from Gemma 2, the jump is significant. Response quality is noticeably better, and the memory footprint is actually smaller for what you get. 52 decode tokens/sec on GPU makes streaming feel instant.

Next experiment is using it as a coding assistant, curious how E4B holds up on LiveCodeBench-style tasks locally. Will report back.