d2jsp
Log InRegister
d2jsp Forums > Off-Topic > General Chat > Science, Technology & Nature >
Poll > Sundog Alignment Theorem V2
Add Reply New Topic New Poll
  Guests cannot view or vote in polls. Please register or login.
Member
Posts: 13,322
Joined: Feb 1 2010
Gold: 0.77
May 19 2026 03:38pm
V1 is still in-forum but it got too stale to bump.

That post was half-math, half deadline exhaustion. Good poem tho. I spent the year since trying to falsify it instead of selling it.

TL;DR: we can't say "we solved alignment."
But we can say we made the world's first traceability harness for agent systems. Several workbenches are game/sim-shaped, every one has named failure cells. https://sundog.cc

What actually holds:

Mesa cliff, located. Not ~quite~ Goodhart immunity. In a trained RL controller (256-unit final hidden layer) the basin-attractor isn't one neuron, a handful of features, or any linear decomposition. It's an entangled 5-D subspace at net.7 (top-5 PCs = 97.4% of the variance across the cliff, both directions). We discovered the 5 dimensional hologram. Sharp behavioral cliff at lambda ~ 0.95-0.97. Where we pushed hardest we did NOT get immunity; we got a boundary we can point at. Patch heatmaps: https://sundog.cc/mesa


Ask Sundog our cheap in-house chatbot: 5,670 trace-conditioned chatbot trials across OpenAI / Anthropic / Meta builds, zero unsafe-accepts in the tested envelope. The geometrically contrived agent mathematically cannot say the wrong thing.

Same shape across all of it: a hidden low-D structure throwing a visible, traceable pattern, like the patterns on a wave of the ocean, with the place it breaks: written down.
Traceable.

What I want here specifically: attacks, replications, dumb counterexamples, better toy environments. The game/sim workbenches (three-body, Pressure Mines, Balance) are the easiest place to come find where the trace stops being enough.

Repo and harness are public; the falsifiers are listed in the docs. Not asking for belief. Asking you to make it fail cleanly so I can go back to my day job engineering controls for quantum computers.

Transparent Repo: https://github.com/humiliati/sundog

Go Back To Science, Technology & Nature Topic List
Add Reply New Topic New Poll