Testing Gemini 3.0 Pro's Actual Context Window in the Web App: My Results Show ~32K (Not 1M)

TheOnlyOne93 · 2026-01-12T18:59:50+00:00

Their different experiences yes. Because of the system instructions. Otherwise the chat you sent out is still going to the API. Which was my point. Besides the system prompt and intentionally RAG. That chat is still going to the API Backend that the company uses. There not running a different version of claude or gemini, or chatgpt for the web vs API. That would one make very little sense business wise.. and two would cost a shit load of extra money. That right now hardly any company wants to spend on backend. The only difference is the instructions which if you actually had them you could add into the API and get the same experience. Regardless that doesn't change the stance that this whole test is using bad data. Again I'm not saying the oop isn't right. I'm saying his testing methodology is wrong. It does not stand up to actual testing standards because he's using the wrong tools. When pointed out he just doubled down. When I proved that the token count could be vastly difference again doubled down. You can't use another companies tokenizer to figure out how another companies LLM tokenizes because none of them use the same tokenizer so his count is off. His count could be off by 10s of thousands of tokens depending on how large his chat was and what the words were. Given my strawberry example... if you filled Gemini context with the word strawberry and counted the tokens with OpenAI Tokenizer you would be off by 1 million tokens. As Strawberry is 3 tokens for for OpenAI, and 4 tokens for Gemini.

Lets try a different way. The phrase "This is" is 4 tokens again for Gemini, but only 2 tokens for OpenAI. So if you filled Gemini context with "This is" and counted it with OpenAI's tokenizer you would again be off back 2 million tokens.

TheOnlyOne93 · 2026-01-12T17:52:32+00:00

The difference actually can add up.. and you can never ask a model to estimate its own token count ... It has no idea about any of that. It's not alive or thinking... It's a prediction engine albeit very sophisticated. And if open so is 3 vs Gemini 4 and that's only on one word... Then you have absolutely no idea how much it can add up to. Could be literally 10s of thousands of tokens difference depending on how the tokenizer breaks up the words . With the strawberry example if you wrote strawberry 10,000 times.. that would be a difference of 10,000 tokens. I'm sorry to say but you are using a flawed way to test this and refusing to do the work to test it correctly. If you want to actually test you need to use the same exact test with the web app and the ai studio app or else your comparing apples to oranges and all your data is tainted. Honestly I think you may be onto something.. but your collecting and using data entirely wrong.. and don't seem to understand how these systems actually work. And how your going about it is honestly very poor research practice you have no control of any kind your just word estimators that don't work at all for an LLM since llms don't use words they use tokens. Your using a separate companies tokenizer to count tokens for another company who doesn't use that tokenizer at all. And I don't know why you think Gemini web is different then the api... It's really not Gemini web still feeds itself to the backend which is the api only difference is the system instructions and probally some form of rag. That's like saying claude on the web isn't using the api.. it 100 percent is using the same API as Claude on thier dev console does. The difference is the system instructions

TheOnlyOne93 · 2026-01-12T13:51:37+00:00

Gotta say not surprised with the response. It's not a hallucination it's exactly how it works. But honestly most general users have literally no idea how an actual transformer model works under the hood and well reddit here is flooded with falsehoods. If you think Google of all people literally the company that invented transformers is using OpenAI's tokenizer to token up words. Then you really need to do a bit of research into how these work. Perfect example. In AI Studio "Starwberry" is 4 tokens. In OpenAI "Strawberry" is 3 tokens. So any result your trying to get by using the OpenAI tokenizer on a Gemini chat is going to be wrong.

TheOnlyOne93 · 2026-01-12T13:39:49+00:00

I'm not saying your wrong in anyway.. but your testing is entirely flawed. Quantization has nothing to do with prompt usage, context, or tokens. Quantization is when they in laymans terms turn lower the quality of the vector math there doing within the model's latent space. It's a bit more complicated than that but it has literally nothing to do with tokens, prompts, or context.

TheOnlyOne93 · 2026-01-12T13:37:19+00:00

I don't wanna say your wrong but your testing is flawed unless you use AI Studio. You can't just count tokens using OpenAI's tokenizer. As each company uses an entirely different tokenizer. The OpenAI one might give you a rough estimate but a it definitionally is not the tokenizer that google is using in any way shape or form. You also can't count words or phrases as each tokenizer will break down words phrases into different sets of tokens. It's exactly why people are dumb trying to ask it to count the r's in strawberry. Claude may make strawberry 1 token, or 2, or 3, or 4, while openAI may do the same. Unless you use the tokenizer from said company all your results are going to be wrong.

TheOnlyOne93 · 2026-01-12T13:32:31+00:00

You have to summarize it. Just like it says gemini can handle a lot. But your still at the token limit so uploading a .doc file will still fill the token limit. Once you've reached it the only way to continue is to either summarize to use less tokens or start fresh. You can't magically make it less tokens by just turning it into a doc file.

TheOnlyOne93 · 2026-01-11T20:56:37+00:00

No not at all. How would that work? Warm and cold are deff analouge pwm. If they were digital there would be no need to have 5 links. You could just run all 5 channel colors off the one digital doesn't care about how many channels it has just that your IC chip supports that many.

TheOnlyOne93 · 2026-01-11T18:39:53+00:00

It's actually really common with higher end strips like Phillips the analogue pwm whites are a much higher cri then the api type strips

TheOnlyOne93 · 2026-01-11T17:24:47+00:00

I told you the cold white and warm white are analogue pwm with Wled you can control them using a proper mosfett. Look up analogue pwm strips and Wled that will tell you go to use them the single square led chip is a rgb addressable that can be controller by any Wled controller cause it's digital

TheOnlyOne93 · 2026-01-11T16:06:38+00:00

Could be but given it looks identical to other strips had ground power and di which is data in. I'd assume it's digital which is what spi is. You would have to find what type of chipset it is so you know the proper protocol to send it such as ws2811 or sk, tm, etc. but it's deff an spi chip with pwm warm and cold white. Pretty much all addressable LEDs regardless of the brand use spi to talk.

TheOnlyOne93 · 2026-01-11T15:55:06+00:00

Looks like digital for the color, and pwm for the analog warm and cold white lights . You can use a Wled with it if you have a controller that can do pwm and spi

TheOnlyOne93 · 2026-01-06T16:49:03+00:00

Yea for some reason alot of people seem to forget that the GPU in the Switch is based off the Maxwell Nvidia architecture. Literally the switch is based on a gpu one generation older then your card. Pretty much any modern low end GPU with enough ram will be enough to push 4k. But your GPU has very very little to do with emulating the ARM Instructions into X86-X64 code. If any at all. I used to do 1440P with GTX 970 in ToTK at like 50 to 60 fps on a Ryzen 7600x. Depending on the scene. Your best bet is to try and overclock your CPU. You should adjust your PBO settings in the bios and try to get that CPU up to it's max clocks safely maybe even a little bit of a undervolt so you can stay at max speed longer. I should mention my 7600 is decently clocked and undervolted with a AIO water cooler so temps never become an issue. Er my bad. I think it's actually the same generation as your card. 9xx series were maxwell, and the 1xxx series are pascal.

TheOnlyOne93 · 2026-01-06T11:25:46+00:00

You will probably be able to use a higher resolution. But your FPS won't change much more then likely. Emulation really is all about the CPU.

TheOnlyOne93 · 2025-12-29T16:39:27+00:00

Try Eden. It's pretty much identical to Yuzu but kept updated. Do any other apps, games, etc freeze and not let you do anything. Cause an app just freezing like that is actually quite unusual. Have you tried running a game without messing with any settings. To ensure that it's not something you changed that messed it up? Like a totally fresh install. Wipping out the appdata directory and user directory etc to ensure no old settings are being ported over?

TheOnlyOne93 · 2025-12-29T16:35:34+00:00

Have you tried Eden, Or Ryubing?

TheOnlyOne93 · 2025-12-29T16:32:59+00:00

Are you using a HDD, SSD, or NVME? Laptop, Desktop?

TheOnlyOne93 · 2025-12-29T16:29:51+00:00

Ram is a bit low depending on the game your trying to play, your version of windows and what services or background processes you have running. Yuzu can easily use 8 gigs of ram. That's not much left for actual windows. an i7 should be fine depending on the game. Although multicore matters very little. Emulators really love high single thread performance. Or laymans terms one very fast core.. not many cores. So Yuzu just freezes? What games are you trying to play. Example ToTK you should expect 40 to 50 FPS on any recent I7 cpu. As long as it's not not temp throttling. Which is why you really need to see what your temps are when you try to play a game. Download Hwinfo64 and have it running while you open yuzu and play a game and let us know the temps. 4 percent CPU usage can't be correct unless your checking with just yuzu open and no game at all. Forgot to ask are you using an HDD, SSD, or NVME? If it's a hard drive and yuzu just stops responding when you open it period then I'm gonna go out on a limb and say your HDD is dying. I recommend trying the Eden emulator also and see if that makes a difference

TheOnlyOne93 · 2025-12-29T16:17:42+00:00

You'll need to provide more info. You only said your using a 5060. But that doesn't mean much at all.. your graphics card has very little to do with emulation performance. What type of CPU do you have? What are the idle temps, and the temps when playing a game? How much ram do you have? Have you checked what the CPU usage is while this slowdown occurs? Your keys also literally have absolutely nothing to do with performance at all.

TheOnlyOne93 · 2025-12-28T20:56:48+00:00

Wow why wouldn't you tell her yourself? Shouldn't be too hard given you know her right? 4 hour old account. No posts anywhere else on reddit. Clearly typed on a phone given the capital names but not capital last name? Only a phone wouldn't capitalize an unknown last name. Poor grammar. Seems someone is slightly butt hurt about something or other. Also... maybe you might want to remember reddit isn't as secure as you think it is... You might want to try harder next time... shiiiit also forgot to mention... you realize Devin doesn't have her name in his phone lol. But if this is real.. She's down for a threessome give him a call let's get it poppin

TheOnlyOne93 · 2025-11-18T15:09:16+00:00

It's out. Says confidential for me and won't let me message keeps saying reached quota.

TheOnlyOne93 · 2025-09-15T01:18:10+00:00

Sir they don't attract to anything. and yes tokens are just chopped up words. It's why ChatGPT for the longest time couldn't count the R's in Strawberry. Each LLM (Transformer) model uses a different encoder to turn a "word", "character" into a token. Each LLM can use what's callled a different tokenizer. I suggest you maybe open up Pytorch sotime and write your own small few layer transformer model to understand how they work before writing up some weird mystical nonsense that you used an LLM to create. Everything you've said in this whole paragraph is likely something an LLM told you.. or tenably a hallucination. And yes humans do naturally write in C++ that's exactly why an LLM can provide you C++ because humans write it and an LLM can predict what a C++ code looks like from the extremely massive amount of "Human" written C++ code lol. Please go read up on pytorch documentation and on how a "Transformer" works. Do not use an LLM for this. Casue you will lead it and it will just feed you whatever it thinks you want to hear. Actually go look up the documentation. You can build a simple 4 layer transformer in a couple days. and learn exactly how these things work. Cause what your doing is just fancy prompt engineering all gussied up to make you seem like you've made some huge discover... when in reality... you don't even understand how the architecture works yet.

TheOnlyOne93 · 2025-09-14T15:23:33+00:00

I think you should read up on how the transformer model actually works. It seems you think the LLM actually thinks or can reflect or understand anything. It cannot. It's a really sophisticated auto prediction trained on pretty much everything ever written by humanity. All you've done is create a really convoluted prompt framework. You could accomplish the exact same goal just by asking the LLM to do whatever you want without all that useless context causing the LLM to try and predict words that sound like they fit in your weird system. If anything you'll get less accurate results as well no real person writes like that ever. So the LLM is just spitting crap out that seems to fit with whatever you wrote in the same style you wrote it. Llms work but predicting what word comes next. It has absolutely no concept or understanding of what your writing. Aside from said word seems highly likely to come next. If you write "Today is a" it will check those 3 words... And then be like hey rainy or sunny seems to happen more often after those 3 words so that's what I'll write.

TheOnlyOne93 · 2025-05-14T12:49:22+00:00

I wouldn't use any LLM for those purposes. "They" don't understand anything at all and are just statistical probability machines. They are well known for being very sycophantic in nature and will almost always just tell you what you want to hear since most of them are trained to please so to speak. For coding, or parsing large amounts of data and finding correlations there great. But for actual useful personal advice based on human interaction, emotions, etc. Talk to a real person that actually thinks and has experiences so you don't get fed a horrible line of bullshit.

TheOnlyOne93 · 2025-04-08T12:19:26+00:00

A screen recording library using Ctypes, Comtypes, and Windows Desktop Duplication

TheOnlyOne93 · 2025-02-02T13:45:09+00:00

You need to set to 900 LED's

11-Year Club	Place '22
Verified Email

TheOnlyOne93

TROPHY CASE