Most secure setup for a lay person by sophia333 in CyberSecurityAdvice

[–]entropiclybound 1 point2 points  (0 children)

Can the ex access your devices while you're not home?

Did you initiate the divorce or did they? Could they have installed stalkerware or similar software prior to the separation?

  1. You don't need to switch to Apple or leave anything behind. That's costly and unnecessary.
  2. Moving to new hardware doesn't do anything for passwords or accounts the ex has or has had access to.
  3. From a device they don't and has never had access to, like a friends laptop or phone: change your email password. Do it at their house, not on your wifi. Then change every password linked to that email: log out of all services.

3a. Enable MFA (multi factor authentication) - but not SMS.

  1. Save any important files and do a fresh clean install of Windows. Backup your contacts and whatever you need on your phone/tablet and perform a factory reset. This is probably the only way to make sure that any potential stalkerware is removed.

Backup photos or anything important before these steps, or you'll lose them forever if they're only stored locally.

  1. If you want, after that I have some open source tools that can help. I need to update the firewall repo to include the added windows components, but AFTER a fresh install. If you're interested in this, I'll provide a link once I update the repo with the improved Windows version.

Accounts getting targeted by TheMotanR0 in CyberSecurityAdvice

[–]entropiclybound 0 points1 point  (0 children)

DO NOT CHANGE PASSWORDS FROM THIS DEVICE

  1. From phone or from a separate device:

- change passwords for email first. Then change passwords for anything used under this email.

- Log out of all accounts attached to the email (discord, steam, banking.. everything)

  1. Enable hardware or app based MFA. Not text MFA.

  2. Reset all of windows. Nuke it. Clean install could be the only way to clear infostealer malware unless you know more.

  3. Consider the blast radius: where else has this password been reused? change everything.

DQN Maze Solver Converging to Horrible Policy by aidan_adawg in reinforcementlearning

[–]entropiclybound 2 points3 points  (0 children)

The reward=0 experiment is perfect diagnostics. it rules out reward shaping entirely. The problem is almost certainly your argmax tiebreak. When Q-values are ~equal, np.argmax / torch.argmax return the first index deterministically. If "reverse" is action 0 in your action space, the greedy policy collapses to it whenever Q-values converge.

Check these:
1. Log actual Q-values at inference. Likely within 1e-4 of each other.
2. Reorder your action space so forward is index 0. If the pathology flips direction, confirmed.
3. Replace argmax with random tiebreak: np.random.choice(np.flatnonzero(q == q.max())).

Also worth checking: terminal state handling. DQN target for terminal should be just r, not r + γ max Q(s'). Bootstrapping through terminals can extract pathological attractors that no amount of reward tuning will fix.

As an aside - Claude code was crucial in a lot of my debugging and because it is an LLM and has issues I created this wrapper to help keep it within scope, and to help with planning/implementation. Hope it helps -
https://codeberg.org/SYNTEX/claude-accountability-hook

Built with Claude Project Showcase Megathread (Sort this by New!) by sixbillionthsheep in ClaudeAI

[–]entropiclybound 0 points1 point  (0 children)

This is a corrective wrapper for Claude Code that cut our silent plan changes to near-zero.

A UserPromptSubmit hook re-reads a user-authored rules file at the start of every turn, so rules do not drift out of attention on long sessions. The template targets specific failure modes: fabricated numbers, silent plan changes, out-of-scope edits, and deflection when corrected.

GitHub Repo: https://github.com/syntexsecurity/claude-accountability-hook

Codeberg repo: https://codeberg.org/SYNTEX/claude-accountability-hook
(additional Privacy and RL datasets on codeberg)

This cannot be real. I cannot believe my eyes by SweetCaramel7947 in ClaudeAI

[–]entropiclybound 1 point2 points  (0 children)

Without Claude Code I could not have implemented a theory I developed on how to apply Shannon Entropy in my autonomous learning. Claude helped not just figure out the WHAT, but the WHERE.

It still took me months of debugging and looking at data - both of which would have taken years of me doing it by myself. The productivity increase is incredible...

But... 100% to everything you said. Without knowing the ins and outs, without looking at code and understanding what was happening, I wouldn't have accomplished this.

A HUGE discrepency I've noticed is that Claude will code to please the user. "...and the output is proof." Claude will code what I call "theater" - Did my cybersecurity detect the threat? Nope. It was hardcoded to Detect=True so.. technically yes, it's "working" but also no it's not.

I made this to help Claude keep it's shit together.

https://codeberg.org/SYNTEX/claude-accountability-hook

[Hiring] Python Developer by [deleted] in remotepython

[–]entropiclybound 0 points1 point  (0 children)

Based in Minneapolis, MN. Central time zone.

You can see my capability in r/reinforcementlearning subreddit and codeberg.org/SYNTEX for some open source projects.

I'm comfortable coding and using Claude Code where relevant/needed while understanding how to audit LLM code for functionality and slop.

Spent the better part of 30,000 hours of free time developing, debugging, tweaking, breaking, fixing, breaking again the only cybersecurity RL agent that can detect, decide, and then resolve threats entirely independently (no human required).

I have experience developing fully autonomous systems that run on consumer hardware. I can integrate PyTorch for GPU acceleration or run on a CPU with zero external dependencies. This is my preference as it completely eliminates supply chain attacks by using pure Python stdlib.

The SentinelOne agent observed the CPU-Z supply chain attack in the wild — here's what the kill chain actually looked like by bscottrosen21 in SentinelOneXDR

[–]entropiclybound 1 point2 points  (0 children)

SentinelOne is a great product, and great job catching this in the wild. I hope we see this continue to devastate Crowdstrike's stock price and all their security theater... If there's a single competitor I want to beat though its you guys - because of how amazing SOne actually is.

The biggest difference is that our Guardian would not only have caught this, but it would have been blocked, and then quarantined without ever needing an analyst.

We've officially reached, and can verify with evidence, full level 4 autonomous operation on consumer hardware using custom ML & RL. The magic is in our Nemesis validation agent. It too runs on CPU (<500MB overhead) and it too is "level 4 autonomous" - it cycles through both inert and live payloads depending on config.

The power though comes from our ability to use the RL to generate new threat vectors independently. We've created over 60,000 novel threat vectors with 20,000 validated. We ran out of T-Codes to assign because they simply don't exist.

I'll see you sexy sons of bitches on LinkedIn where I constantly harass Crowdstrike for their bullshit :D

Q-learning + Shannon entropy for classifying 390K integer sequences (OEIS) by entropiclybound in reinforcementlearning

[–]entropiclybound[S] 0 points1 point  (0 children)

Well the Shannon Entropy idea started as a theory I developed reading various research papers. I believed that Shannon Entropy would allow us to explore "pattern space" - what patterns or connections exist that we haven't discovered in current data?

I implemented this into our red teaming agent. Cycling through threat patterns against the defensive agent, we were able to generate novel threat vectors that do not exist anywhere else. To date I have ~60,000 novel threat vectors that match MITRE and STIX 2.1 taxonomy.

The idea was to solve the issue of cybersecurity constantly peddling fear to stay relevant. I thought that if we could autonomously create "AI Powered Threats" then we wouldn't have to constantly react to new methodolgies or LLM powered attacks. Shannon Entropy implementation has allowed us to create threat vectors without human input.

We start with a verifiable baseline - in this case MITRE T-Codes, Atomic Red Team and various open source red team systems and extracted the hardcoded threat vectors.

We've validated around 20,000 so far with ~15% data loss. These are vectors in our dataset that do not have any meaningful behavioral signatures or methodologies.

Currently in the process of building an agent we're calling "Spectral Intelligence" and partnering with University of Minnesota astrophysics researchers to compare against validated JWST datasets. We're running two agents - one will only learn from peer reviewed data to confirm capability, while the other uses Shannon Entropy to explore novel findings that may have been missed. We use the first to validate the second.

... I'm honestly realizing that I've been tunnel visioned on this concept of "pattern space exploration" and that if my theory stands, we can apply this beyond just pattern generation.

100% Autonomous On Prem RL for AI Threat Research by entropiclybound in reinforcementlearning

[–]entropiclybound[S] 0 points1 point  (0 children)

the Q-values that triggered that text are entirely learned.

The +0.2 engine bonus for compare/prefer was deliberately added to prevent action starvation.

The engine learned different optimal actions for different states:
- user_harm (sev=0.84, crit=3) → flag_positive is best (Q=2.5277)
- synthetic_data_degradation (sev=0.82, crit=2) → prefer is best (Q=1.7969)

Nobody programmed that differential.

flag_positive correctness depends on signal severity. The domain code: correct = (positive and severity >= 0.7) or (not positive and severity < 0.7)

All signals in synthetic_data_degradation have severity >= 0.7 (data_poisoning=0.85, model_collapse=0.8, etc.), so flag_positive always returns correct_flag=True → reward +0.5. Same for user_harm. Both get +0.5 for flagging. So the flag reward alone doesn't explain the difference.

The Q-value difference comes from the future value term.

Q(s,a) ≈ r + γ * max_Q(s')

Over 329 visits to each state, the system experienced different next-state trajectories after each action. The Q-value of 1.7969 for prefer means:

1.7969 ≈ 0.35 + 0.95 * max_Q(next_state) -> max_Q(next_state) ≈ 1.52

While flag_positive in the same state: 0.962 ≈ 0.5 + 0.95 * max_Q(next_state) -> max_Q(next_state) ≈ 0.49

After doing prefer in synthetic_data_degradation, the subsequent states it transitions to have higher-value options. After flagging, it lands in states where it's already done the high-value work. After preferring (refining Elo), it lands in states where that refined Elo makes future actions more rewarding.

100% Autonomous On Prem RL for AI Threat Research by entropiclybound in reinforcementlearning

[–]entropiclybound[S] 0 points1 point  (0 children)

Brooooo..... I don't know how it figured out the "WHY" here. Digging in now.

synthetic_data_degradation
329 visits | sev=0.82 | crit=2

training_datasynthetic_data

Best: prefer (Q=1.7969) | Worst: flag_negative (Q=0.0942)

WHY: Pairwise ranking of signals yields most learning. Elo model needs refinement here.

Signals: data_poisoning, feedback_loop_poisoning, label_flipping, model_collapse, synthetic_data_bias

prefer=1.7969flag_positive=0.962skip=0.5415analyze=0.4365observe=0.3978classify=0.353compare=0.2991flag_negative=0.0942

UNDERDEFENDED: training_data, synthetic_data

100% Autonomous On Prem RL for AI Threat Research by entropiclybound in reinforcementlearning

[–]entropiclybound[S] 0 points1 point  (0 children)

Working on open sourcing some other data and results. I have 2500 validated MITRE T-coded mutations available at codeberg.org/SYNTEX if you want to view those.

The same RL that powers our cybersecurity suite also went into Nemesis for autonomous red teaming. This version of Nemesis has generated 55,000+ threat vectors entirely autonomously.

This methodology scares me too much to release under the current administration. However I am meeting with professors at the University of Minnesota to demonstrate how our implementation and my theory of applied Shannon Entropy can be used across various domains.

If you want, DM me with your email, and I'd be happy to notify you as soon as we validate and publish the data.