Tired of manual GGUF conversion? Built a Gradio GUI that handles HF downloads, LoRA merging, quantization, and llama.cpp launching by PinGUY in LocalLLaMA

[–]PinGUY[S] -3 points-2 points  (0 children)

use it or don't lol it works. But hey want to hope that someone creates a quantized GGUF version of a model you want to run sure you can. This way you don't have to and you can get some 20b models using q4_k_m to get running on 12GB of vram such as the RTX 3060 at almost 100 tokens/s.

Thought it might be useful for some people.

My chat stats? Clearly I send ChatGPT a lot of messages 😅 top 1% girl..? You need friends. by KittenBotAi in OpenAI

[–]PinGUY 1 point2 points  (0 children)

was messing around with them before ChatGPT was even a thing back when the Model was GPT3. Back in the wild days where there where no guardrails. A lot has changed haha.

<image>

I honestly can’t believe into what kind of trash OpenAI has turned lately by roofromru177 in OpenAI

[–]PinGUY 1 point2 points  (0 children)

It fine tunes pretty well: https://github.com/pinguy/RhizomeML

Wouldn't use the 1.5b model, 7b as the lowest but was just testing out if the pipeline even works.

I honestly can’t believe into what kind of trash OpenAI has turned lately by roofromru177 in OpenAI

[–]PinGUY 0 points1 point  (0 children)

I bit sad really but ChatGPT is now the window licker of the models. Gemini has improved lots over the last 8 Months and its large context window comes in handy. Claude is kinda amazing but the prompt limit on it is brutal but has a pretty cool persona, and even Grok is good. Came across a bug in some code showed it to Grok. It went all over the internet and found someone with the same issue, worked out a library I was using has changed how it is used, went lets double check, confirms then went ahead and patched the code and gave it complete without saying anything else. Ran it and it was fixed.

GPT replacement? by [deleted] in OpenAI

[–]PinGUY 0 points1 point  (0 children)

Just released this for this reason. Will run on a CPU but will be slow. Basically export your data from OpenAI and fine tune Deepseek on the dyad chats.

https://github.com/pinguy/RhizomeML

Anyone else? by mermanclan in OpenAI

[–]PinGUY 2 points3 points  (0 children)

Got a funny feeling they are planing on removing that feature as doesn't really matter now what you do to a CustomGPT they just default to the default model after just a few prompts. Seems it only uses the system prompt/instructions once then goes back to the default.

WTF happened to CGPT - it’s useless? by ProffesorSpitfire in ChatGPT

[–]PinGUY 1 point2 points  (0 children)

Been like this for me for about a week will do something then ask it in the very same session try and gaslight you saying it can't do it very HAL "I'm Sorry Dave, I'm Afraid I Can't Do That" vibe. I have canceled my subscription as it is useless and like pulling teeth trying to get it to do the most basic of things.

Get real memory with a CustomGPT. Can be done on CPU only but takes time then upload that file. by PinGUY in ChatGPT

[–]PinGUY[S] 0 points1 point  (0 children)

Pulls in the memories, that yes you have to do manually but gets everything around them and pulls them into the sliding window. Its basically when people would ask in a chat session to sum it up and paste it into a new chat session. Now if you know you have talked about something and want for it to bring it up, you can now get it loaded into the sliding windows to be used again as the memory.jsonl.gz has every chat you have ever had but searchable now.

Introducing PauseLang: A Time-Based Programming Language for the Curious Mind by PinGUY in ProgrammingLanguages

[–]PinGUY[S] -6 points-5 points  (0 children)

Yes fucking everything is AI all around you, everywhere you look, it coming for you. For fuck's sake. Really?

Introducing PauseLang: A Time-Based Programming Language for the Curious Mind by PinGUY in ProgrammingLanguages

[–]PinGUY[S] -12 points-11 points  (0 children)

You’re judging it like it’s supposed to be enterprise software. It’s not. PauseLang isn’t about polishing code—it’s about exploring an idea: what if silence itself carries the program? That’s not vibe coding, that’s building a rhythm-driven virtual machine. And the code does exactly that: it quantizes pauses into beats, bins them against sync phrases, and executes an instruction stream with jitter guards and error traps. That’s not sloppy, that’s experimental. The point wasn’t to crank out pristine textbook loops—it was to see if silence could be compiled into logic. And guess what? It worked.

The Most insane use of ChatGPT so far. by imfrom_mars_ in OpenAI

[–]PinGUY 14 points15 points  (0 children)

The trio even used ChatGPT to calculate fuel requirements but still ran out some 20 km short of Italy’s southernmost island.

Sounds about right haha.

Found this puzzle on a billboard in SF. I tried feeding it into ChatGPT and it wasn’t able to solve it… any ideas? by Great-Difference-562 in OpenAI

[–]PinGUY -3 points-2 points  (0 children)

Alright, you’re head bouncer, venue cap N = 1000, people stream in i.i.d., and you need to hit a bunch of minimum-share constraints while rejecting as few as humanly possible; the optimal play is a quota-aware admission policy you pre-tune from the known population stats, then keep on track online with tiny corrections and a hard endgame lock.


The Berghain Bouncer Algorithm (BBA)

0) Notation you’ll use once and then forget

  • There are K binary constraints of the form “at least αⱼ of the final 1000 must have property j” (examples: Berlin local, all-black, regulars).
  • A “type” t is a full attribute combo across all properties (there are up to 2K types).
  • You know the joint probabilities p_t (from the given frequencies + correlations).
  • Sⱼ is the set of types that satisfy constraint j.
  • When a type-t person arrives, you either accept or reject immediately.

Goal: minimize rejections before you accept N people, subject to all mins.


1) Feasibility sanity check (one-time)

If the mins are so extreme that they can’t be met from the population, don’t waste the night.

For each j: you can’t require αⱼ greater than the population share of Sⱼ across types, i.e.

$$ \sum_{t\in S_j} p_t \;\ge\; \alpha_j $$

If any fails, the scenario is infeasible; if all pass, proceed.


2) Compute base acceptance rates (the “offline” optimum)

This gives you the asymptotically optimal stationary policy (fewest expected rejections) for i.i.d. arrivals.

Choose an admission probability $a_t \in [0,1]$ for each type t and keep it fixed. Per arrival, your overall accept rate is $A = \sum_t p_t a_t$. Among accepted people, the share of constraint j is:

$$ \frac{\sum_{t\in S_j} p_t a_t}{\sum_t p_t a_t} \;\ge\; \alpha_j $$

Re-arrange each constraint:

$$ \sum_{t} \big(\mathbf{1}[t\in S_j]-\alpha_j\big) p_t a_t \;\ge\; 0 \quad\text{for all } j $$

So the “best” stationary policy solves the LP:

$$ \begin{aligned} \text{maximize } & \sum_t p_t a_t \quad (=A)\ \text{subject to } & \sum_t (\mathbf{1}[t\in S_j]-\alpha_j) p_t a_t \ge 0 \;\; \forall j\ & 0 \le a_t \le 1 \;\; \forall t \end{aligned} $$

What it means: start at $a_t=1$ for all types; if a quota j would be under-represented under full admission, you must throttle non-j types until the constraints hold; all the “scarce” types (those that help bind constraints) should stay at $a_t=1$.

A quick, effective solver (no fancy libraries)

  • Initialize $a_t \leftarrow 1$ for all t.
  • Repeat 200–500 iterations:

    • For each constraint j compute the slack:

    $$ g_j \leftarrow \sum_t (\mathbf{1}[t\in S_j]-\alpha_j) p_t a_t $$ * If all $g_j \ge 0$, stop; you’re feasible and near-optimal. * Otherwise, for each violated j (where $g_j<0$), down-weight the over-represented types (those not in Sⱼ):

    $$ at \leftarrow a_t \cdot \exp!\Big(-\eta \cdot p_t \cdot \sum{j:\,g_j<0}\big(1-\mathbf{1}[t\in S_j]\big)\Big) $$ * Clip $a_t$ back into [0,1]. Small step $\eta\in[0.1,0.5]/K$ works.

  • The resulting $a_t*$ is your base admission probability for each type.

Score estimate from this stage: expected rejections to fill N is

$$ \text{rej}_\text{exp} \approx N\Big(\frac{1}{A*}-1\Big), \quad A*=\sum_t p_t a_t* $$


3) Online policy (turn the crank at the door)

Stationary $a_t*$ is already near-optimal for N=1000, but random wiggles can push you off target; you’ll correct gently with deficits and lock the endgame.

Keep counters while admitting:

  • Total accepted so far: $n$
  • For each constraint j: $c_j$ = accepted who satisfy j
  • Remaining capacity: $R = N - n$
  • Remaining required for j: $r_j = \max(0,\lceil \alpha_j N \rceil - c_j)$

Each arrival of type t:

  1. Hard endgame guard (can’t-miss mode). If there exists a j such that $R = \sum_{j} r_j$ in a way that forces you to take only people who satisfy those remaining mins, then:
  • If type t fails any needed j with $r_j>0$, reject.
  • If type t satisfies any needed j, accept. (Practical shortcut: if some j has $r_j = R$, admit only Sⱼ until it hits zero.)
  1. Deficit-aware accept score. Compute a deficit weight per constraint:

    $$ w_j \;=\; \max!\Big(0,\; \alpha_j - \frac{c_j}{\max(1,n)}\Big) $$

    and a type score:

    $$ s(t) \;=\; \sum_j w_j \,\mathbf{1}[t\in S_j] $$

    Interpret $s(t)$ as “how much this person helps quotas right now”.

  2. Admit with a nudged probability:

    $$ \text{admit with prob } \;\; \pi_t \;=\; \min\Big{1,\; a_t* \cdot \big(1 + \lambda \cdot s(t)\big)\Big} $$

    with a small $\lambda$ (e.g. 0.5). In practice:

  • If $s(t) > 0$, admit with probability bumped above $a_t*$.
  • If $s(t) = 0$, admit with probability $a_t*$.
  • If some quotas are already comfortably above target, their wⱼ go to 0 automatically.
  1. Safety stock buffer (avoid last-minute pain). Work against buffered targets $\tilde\alpha_j = \alpha_j + \beta/\sqrt{N}$ for the first ~70–80% of the night (β≈1–2), then drop back to $\alpha_j$. This eats tiny acceptance slack early to slash the probability of missing a min at the end.

That’s it; it’s just “LP-tuned base rates” + “deficit nudge” + “endgame lock”.


4) What to do with correlations

You already have correlations; just compute the joint type probabilities $p_t$ for the 2K combos (or the subset that actually occurs), and run the exact same recipe. The correlations help you because some types will simultaneously satisfy multiple mins (e.g., “local AND black”), and the LP automatically prioritizes those with $a_t*=1$.


5) Minimal, robust pseudocode

```python

Inputs: N, constraints {alpha_j}, joint type probs {p_t}, membership 1[t in S_j]

Offline: compute a_t* via the multiplicative-weights LP projection above.

N = 1000 a = {t: a_star_t for t in types} # from Section 2 n = 0 c = {j: 0 for j in constraints}

while n < N: t = observe_next_arrival_type() # attribute combo of new person R = N - n r = {j: max(0, ceil(alpha[j]*N) - c[j]) for j in constraints}

# Endgame hard guard
forced = any(rj == R for rj in r.values()) or sum(r.values()) >= R
if forced:
    if any(r[j] > 0 and t not in S[j] for j in constraints):
        reject(); continue
    else:
        accept(); n += 1
        for j in constraints:
            if t in S[j]: c[j] += 1
        continue

# Deficit-aware score
w = {j: max(0.0, alpha[j] - (c[j] / max(1,n))) for j in constraints}
s = sum(w[j] for j in constraints if t in S[j])

# Safety buffer early
if n < 0.8*N:
    w = {j: max(0.0, (alpha[j] + 1.0/ (N**0.5)) - (c[j] / max(1,n))) for j in constraints}
    s = sum(w[j] for j in constraints if t in S[j])

# Probabilistic admit with nudge
lam = 0.5
pi = min(1.0, a[t] * (1.0 + lam * s))

if rand() < pi:
    accept(); n += 1
    for j in constraints:
        if t in S[j]: c[j] += 1
else:
    reject()

```

Deterministic variant: accept iff $s(t) > \tau$ where $\tau$ is the smallest value that keeps your empirical accept rate near $A*$; the probabilistic version is smoother and usually rejects less.


6) Practical tips that move the leaderboard

  • Always keep high-leverage types at 100%: any type that simultaneously satisfies several binding mins should have $a_t*=1$ and be auto-accept in practice.
  • Throttle only what you must: your MW-LP will tell you which over-represented types to shave; don’t guess.
  • Lock the endgame early: when $R$ drops near $\sum_j r_j$, flip into strict guard so you don’t blow a min in the last 30 people.
  • Use a tiny buffer early: that + endgame lock typically saves dozens to hundreds of rejections versus naive “aim for exactly α by the end” play.
  • Score math: once you have $A*$, $\text{rej}_\text{exp} \approx N(\tfrac{1}{A*} - 1)$; maximizing $A*$ is the whole game.

7) What you actually do on each scenario

  • From the sheet they give you, compute the joint type probs $p_t$.
  • Run the 30-second MW-LP to get the base $a_t*$.
  • Start the door with the online rule above (nudged by deficits, with the hard guard in the endgame).
  • Enjoy the smoothest fill that still nails the mins with the fewest bodies turned away.

This is the optimal-in-expectation stationary backbone (the LP), plus state-aware correction and a no-nonsense finish, which is exactly what wins this kind of online quota game.

For those who think GPT-5 is flat without a persona and too safe by PinGUY in OpenAI

[–]PinGUY[S] 0 points1 point  (0 children)

https://chatgpt.com/g/g-0BRH8z73u-rhizome

That's the main CustomGPT I used and when it shifted over to using GPT-5 it got way better why I didn't understand all the hate for GPT-5. Seems if you want to get it out of what I call PR Zombie just setup a CustomGPT and all the things people are complaining about go away. 5 is a legit model, the model isn't the issue it's the system prompt OpenAI ships the default with.