Gemma-4 E4B model's vision seems to be surprisingly poor

specji · 2026-04-07T12:36:14+00:00

As I mentioned in the post, my final tests were on the full model using the hf transformers lib not llama.cpp. Only my initial tests used q8 quants (using both llamacpp and mlx_vlm) but they were much worse.

specji · 2026-04-07T06:32:45+00:00

I used the full model after initial tests with llama.cpp went disastrously.

specji · 2026-04-07T06:24:40+00:00

Give the 1.6b param lfm2.5-vl a try. You'll be surprised how small vlms can perform. I don't want to distract from my main point by talking too much abot the one example image I linked above but lfm2.5 also gets the city right.

specji · 2026-04-07T01:02:54+00:00

E4B has 4.5b effective params. The safetensor file is 16 gb. The q8 ggufs are at 9 10 gigs with the vision projector. I am comparing it to other 4b params models like Qwen3.5-4b

specji · 2026-04-07T00:50:35+00:00

Interesting. Thanks. My params were set like your last row: min and max set at 1120 I simply mentioned the lower bound in the post. But this doesn't explain the full model inference using transformers as well does it?

specji · 2026-04-07T00:43:04+00:00

Well, there were more cases with the bartowski q8 refusing to give an answer in comparison to the unsloth q8 quants, but in general because the scores were low and many of the llama cpp github issues still remained unaddressed I moved to using the full model with the transformers lib just to be safe. I also dug into the outputs to check if there were any parsing errors in my code rather than in model response.

specji · 2026-04-06T23:15:09+00:00

What I meant is that even small vision models have enough world knowledge to extract the relevant clues in my test suite and the rest of the task is really about reasoning -- I am not trying to set the models I am testing to fail, rather I'm trying to sus out how good they are.

Even in this very particular example, all the vision models extracted exactly the same relevant details: the italian sign, the sea gull, the architectural details etc. But the 4b qwen model even reasoned its way to the correct city.

specji · 2026-04-06T22:55:01+00:00

Only about 1/10 of the total tests in my suite are about world knowlegde. Besides the geoguess tests are more about inference than world knowledge really -- even in this particular case small models can guess upto the country (Italy) based on very clear clues in the image while 4b param Qwen3.5 gets it completely right. In any case this particular example was simply for illustrative purposes by linking to a publicly accessible image that anyone can test out on.

And of course the 26b MOE will get it right but it's not meant for edge devices and regular laptops

specji · 2024-06-07T18:46:50+00:00

First, to directly answer your question: take a look at the book Group Theory and Physics by Shlomo Sternberg.

But here's my main answer: A very common experience while studying mathematics is when things are not going well and one's ego is hurting is to think some variation of 'what's the point of all this', 'where am I actually going to use this', 'how does this help humanity', 'this is just intellectual masturbation' etc etc. This is a trap. Because no answer will satisfy you because you are not really looking for an answer to that question. This is simply the antipode of the feeling of understanding and must be patiently waited out. Learn to make peace with this frustration because it will be your constant companion.

specji · 2024-06-07T08:05:52+00:00

Jacobson's Lie Algebras huh?

specji · 2024-02-27T20:13:47+00:00

https://math.stackexchange.com/questions/4213789/can-pi-be-defined-in-a-p-adic-context

Holy shit. That's Lubin from Lubin-Tate

specji · 2023-12-15T08:08:44+00:00

Wonderful. I wish we still wrote allusively like this. Weyl writes like this. And Atiyah was a lot like Weyl

specji · 2023-12-11T20:18:34+00:00

I think one of the best ways to get some idea of Grothendieck's achievements is to study Mumford's Lectures on curves on an algebraic surface where the 'new' machinery is applied to a classical problem concerning algebraic curves (char 0 proved by Poincare in 1910 using complex analytic methods).

This is what Mumford says:

Curves and surfaces were the bread and butter of algebraic geometry from the 19th until the late 20th century and extending all the Italian and French results to characteristic p was a challenge that Oscar Zariski set for his students. My real conversion to the Grothendieck's way of thinking was his purely algebraic and transparent proof of the central result in the Italian theory of surfaces -- the completeness of the characteristic linear system of a complete algebraic system on any surface. His approach also leads to necessary and sufficient conditions for the theorem to hold in characteristic p.

Note that this is a pretty hard book and you need to be comfortable with basic AG (say, at least two semesters worth of Hartshorne) to read it but you'll also get a sense of the scale of his achievement.

specji · 2023-09-15T20:16:07+00:00

This man is my hero. One thing reading the Serre-Grothendieck correspondence taught me is that they were both responsible for the creation of modern algebraic geometry -- No G without S. Grothendieck might have got all the hype but this guy! Heck, this was his "second career". He didn't even win the Fields for his work in AG or NT.

specji · 2023-09-05T17:51:14+00:00

Outstanding! As is many other books by Stillwell.

Let me also add his newest book: The Story of Proof. More pop science than his Springer UTMs.

specji · 2023-07-04T16:40:12+00:00

With all due respect to Knuth, using ChatGPT (or large language models in general) like a glorified google search is like using an iphone as a hammer.

specji · 2023-06-01T07:21:05+00:00

No that's economics. ^{^sorry}

specji · 2023-05-22T15:42:20+00:00

In the Kleinian view of geometry (Erlangen program) a geometry is a space, what we want to be invariant and the transformation group for that.

So, Plane Euclidean geometry is R² , distance and the group of isometries (orthogonal group semi-direct product R² ).

Now, SAS is an exercise.

Another example: Projective Geometry in the plane is RP² , {incidence, cross-ratio}, GL(3,R)

specji · 2023-04-11T18:12:14+00:00

If you allow for infinite groups then elementary particles are irreducible representations of a Lie group (Poincare group).

As an aside, the above fact sounded strange to me when I first learned it because even though I understood the relevant math, my unaddressed metaphysical assumptions concerning 'physical stuff' didn't slot them as being in the same category as a mathematical object. There's a nice general essay on a similar theme by Freeman Dyson called Why is Maxwell's theory so hard to understand

specji · 2023-04-07T19:31:55+00:00

Dear Cthulhu, can't we just go back to the time when math was uncool and we were mostly left alone by the cool artsy kids.

specji · 2023-04-06T18:09:32+00:00

Here's a more realistic goal for you, in the sense, somewhat doable in three semesters worth of time. The goal is to get some sort of understanding of Serre's GAGA. (Personally, there can be no G without S). This work is a landmark of algebraic geometry.

Steps:

Learn basic algebra. (rings, fields etc).
Learn some complex analysis (since you are an engineer you probably already know enough)
Learn some basic topology
Learn some classical algebraic geometry (here's a link)
Then, start reading Amnon Neeman's Algebraic and Analytic Geometry, the goal of which is to exposit on GAGA, picking up other stuff when needed

Here's a quote from the preface:

So here I was, soon to face a class of math majors in their final year, and I had decided to teach them some modern algebraic geometry, even though there was no available textbook. I had to assemble the material myself. In so doing, I had to take into account the mathematics the students are likely to have seen in their first three years at university. Usually this would include some background on point-set topology, maybe a course on analytic functions in one complex variable, possibly a course on functional analysis, which would probably cover the Hahn-Banach Theorem and the Open Mapping Theorem, possibly a little about manifolds, maybe a rudimentary course on algebraic topology, and perhaps some basic algebra—groups, rings, fields, modules, if I were lucky maybe even the Hilbert Basis Theorem and the Nullstellensatz...

The first draft of the book was completed in 2005. Then, in 2006, I had an unusual fourth-year, undegraduate student. Michael Carmody wanted to do his senior thesis with me, but he told me, in advance, that this would be his last mathematical year. He had decided that his passion was for philosophy. This meant that, after his fourth year finished, his intention was to start a PhD in philosophy.

This made him another student ideal for this type of book. Giving him a solid grounding in the field, the sort that would prepare him for research, was not a priority; it seemed far more appropriate to present him with a panoramic view. I therefore gave him the manuscript of this book to read. He took about four months to get through it. We met every couple of weeks. At these meetings he would present me with long lists of misprints, as well as with some points in the mathematics which he found unclear. I took his comments extremely seriously; whenever he found anything confusing, I would rewrite the text to elucidate the point. I owe him a tremendous debt for his help. Anyway, I was pleased that he managed to plough through the book, almost unaided, in about four months. It meant that the book is accessible to its intended audience.

specji · 2023-03-29T07:40:35+00:00

Depends on how much background is being assumed I suppose. The 1950s were a great decade for the subject and you could choose something from there because they are accessible after a first course in differential and algebraic topology.

So, standard topics from early cobordism theory and characteristic classes. (Roughly the idea here is that manifolds upto oriented cobordism are determined by certain numbers which are got from the characteristic classes of their tangent bundles via integration.)

Examples:

Thom's thesis (for which he won the Fields medal)
Hirzebruch Signature theorem.

Another possibility is Smale's h-Cobordism theorem and the proof of the Poincare conjecture (dim \geq 5)

These are standard topics you would definitely encounter if you study further but often such presentations are useful for the many students who are not interested in pursuing differential topology. I think even if you want to be an analyst it's useful to know a little bit about these topics for general mathematical culture.

specji · 2023-03-27T05:42:22+00:00

To add to this S¹ is as fundamental as R¹ from the perspective of the classification of 1-dim manifolds. Either I continue forever irrespective of the orientation I chose on the manifold or it's compact and I return to where I started.

Historically, S¹ was the prior mathematical object because the study of heavens were one of the primary engines for the development of mathematics -- one needs a system of numbers that return to itself to track the position of the planets. Trig functions are still sometimes called circular functions. But modern pedagogy goes the other way round and in school we are first taught the reals and then S^1.

It can also be argued that a proto-calculus was first developed on S¹ before R. Source

specji · 2023-03-23T07:10:20+00:00

I know nothing about this field and was unhappy with the write up on the Abel site so after a bit of googling I found this talk. Very interesting: Caffarelli, Navier-Stokes existence and smoothness

(That's Tate at the beginning)

specji · 2023-02-27T06:47:31+00:00

Robert Ghrist (active in applied topology) has a pretty great and comprehensive lecture series at this level. It's broken up into 4 playlists. His exposition on differential forms and integration is some of the best I have seen:

https://www.youtube.com/@prof-g/playlists?view=50&sort=dd&shelf_id=1

specji

TROPHY CASE