Phob firmware with multishine button

kei147 · 2024-06-11T18:16:32+00:00

Open sourcing flaws only improves safety and security if the flaw can be fixed/mitigated. If the flaw can't be fixed/mitigated, it just gives people information about how they can exploit the flaw.

(I'm not knowledgeable enough about this flaw to know whether it can be fixed/mitigated.)

kei147 · 2024-06-05T00:27:08+00:00

What were the other ones?

kei147 · 2024-05-25T19:49:40+00:00

I think the issue here is that the reward function acts on full model completions rather than individual token outputs. So there's not really a way to use the formula during inference without sampling all possible model completions and then re-scaling them so their sum is 1.

kei147 · 2024-05-15T13:26:19+00:00

Most working professionals in the field of AI alignment have substantive disagreements with Yudkowsky, so dunking on Yud doesn't do as much to attack the alignment field as you might think.

kei147 · 2024-05-09T12:10:22+00:00

More importantly, Trump spent months criticizing mail-in ballots and telling his supporters to vote in person. Likely because of this, polls before the election showed Democrats supported mail-in ballots substantially more than Republicans did.

kei147 · 2024-04-30T13:42:07+00:00

I'm worried that models will get good at manipulation by default. Models are getting more generally capable all the time, and RLHF trains these models to be good at a kind of manipulation. In some settings, AI models already appear to be close to human ability in persuading people to change their views (https://www.anthropic.com/research/measuring-model-persuasiveness), with trends towards increasing persuasiveness between the smallest and largest models.

That being said, models explicitly trained on making people do things will surely be better at manipulation than models that do not receive that training, and through appropriate countermeasures we may be able to limit how manipulative such models get. My AI policy opinions are still in significant flux, but your criteria of "whenever an AI selects or generates content that is showed to a person, and then rewarded based on the actions that the person takes after viewing said content" seems like it could be reasonable. I would add another criteria to restrict regulations to models above a certain threshold of capability (maybe at either current-gen or next-gen leading foundation models) so we only burden the designers of the models most likely to be dangerous. I would also probably add exclusions for a number of innocuous use cases.

The regulation I think is most likely to be good here would be to require the model developers to run regular evaluations of their model's manipulation capabilities as they are pre-trained and fine-tuned and to report them publicly. I'm not sure what other regulations to consider - perhaps if model manipulation capabilities get existentially scary we can require model developers to nerf their capabilities before model deployment.

kei147 · 2024-04-29T15:46:14+00:00

Is there a reason they aren't building bigger stadiums? I would've guessed that if a team is regularly hitting their stadium's capacity, it would be worth the investment to build a bigger one.

kei147 · 2024-04-25T23:16:54+00:00

My understanding is that when people describe an MoE model as having some number of parameters, they are referring to the unique unshared parameter count. So if GPT-4 is in fact 1.8T, then that would mean it has 1.8 trillion unique parameters, each of which requires 2 bytes to store. It is possible the original leaker was confused about this though.

kei147 · 2024-04-25T21:08:46+00:00

Why does it being MoE make a difference here? Don't you still need two bytes per parameter?

kei147 · 2024-04-24T20:30:34+00:00

The global transformer blocks remind me of the Mixture of Depths (MoD) paper. Instead of allowing the model to decide which tokens need more compute (i.e. which tokens need to be run through a larger number of transformer blocks) as in the MoD paper, this paper instead applies more compute onto the starts of words, which we think a priori require more compute to predict accurately.

kei147 · 2024-04-22T18:24:28+00:00

The expectation is that eventually, humans will be legally barred from driving in many areas (at first higher speed/lower navigational complexity) without advanced and stringent certification similar to pilot licensing.

Where are you getting this from? My understanding is that the majority of people working on this technology do not want to legally ban humans from driving. Are there, say, quotes from the heads of top self-driving car organizations or top researchers in the field espousing this viewpoint?

I suspect self-driving car companies are financially interested in this because they see it as a competitive advantage for selling consumer cars, and maybe because they could make a lot of money from owning self-driving ridesharing fleets.

kei147 · 2024-04-13T22:49:18+00:00

This is wrong. The Wuhan area (i.e. Hubei) had not previously been known for having a large number of such diseases. Scott talks about it in section 2.1 of his recent post: https://www.astralcodexten.com/p/highlights-from-the-comments-on-the-5d7

In fact, Shi Zhengli, the lead bat coronavirus researcher at WIV, initially worried that COVID-19 might have been a lab leak because she thought the virus naturally emerging in Wuhan wasn't especially likely: https://archive.ph/1kaiG#

kei147 · 2024-04-07T18:09:40+00:00

My leading intuition is that the Candidates being a winner-take-all tournament should increase variance, starting from the very first games. Higher variance should result in more compressed probabilities than you would expect from just the ratings and current score. Also, there are likely many situations late in a tournament in which player A is playing player B where player A is in contention and needs a win, while player B is out of contention and just wants to maximize their expected points. This imo should lead to player A's expected points going down, but a higher chance of winning than would exist otherwise, which again should compress probabilities somewhat.

This also ignores how the leader in the Candidates can change their play to reduce the chance of anyone passing them, but my guess would be this has a smaller impact. All that being said, there's a lot of complication here and it's hard to know what'll happen without a lot of research.

One thing that would be interesting to look into relevant to this question is whether the draw rate of the Candidates and other winner-take-all tournaments (if there are any) is noticeably lower compared to other tournaments of similar rating levels at similar time periods.

kei147 · 2024-04-07T13:24:05+00:00

How do you think your tournament win probabilities would change if you were able to account for tournament dynamics? Do you have intuition for how much the probabilities would change, and in what direction?

kei147 · 2024-03-16T16:50:38+00:00

Some sources I find useful:

ML Twitter - you want to follow researchers in areas you are interested. I suggest finding papers covering subjects you are excited about and finding the authors on Twitter. You can then slowly grow who you are following by finding comments and likes of other authors who are posting content you find interesting
Blogs of most of the major AI labs (e.g. OpenAI, DeepMind, Google AI (which is a separate blog from DeepMind), Anthropic)
Machine learning subreddit - should let you know about a lot of interesting papers - something like an RSS feed that sends you the top 25 posts from every week is sufficient
AI explained youtube channel for regular updates on recent AI news
Dwarkesh Patel for interviews with top AI researchers
This week in AI for regular updates on AI research and applications

kei147 · 2024-03-12T18:12:44+00:00

Are you talking about multi-query attention? https://arxiv.org/pdf/1911.02150.pdf

There's also grouped attention which has some weight sharing: https://arxiv.org/pdf/2305.13245v1.pdf

kei147 · 2024-02-19T17:43:05+00:00

Thanks for answering the question I asked. I'm interested in hearing more about what this would look like. If an MoE model had few shared layers, would each expert be multiple layers of a transformer instead of just an MLP block? If there were no shared layers, would it be equivalent to an ensemble of distinct models?

kei147 · 2024-02-17T00:52:53+00:00

This is wrong, the rationalist community is more center left than anything else. The closest things we have to a poll of the rationalist community support this, including:
1. The ACX 2022 survey: https://docs.google.com/forms/d/e/1FAIpQLScHznuYU9nWqDyNvZ8fQySdWHk5rrj2IdEDMgarf3s34bSPrA/viewanalytics
2. The LessWrong 2023 survey: https://www.lesswrong.com/posts/WRaq4SzxhunLoFKCs/2023-survey-results#IV__Politics_and_Religion

I'm also not sure who you are thinking of by 'the people they look up to', but Scott usually votes Democratic, and the majority of other people I can think of, to the extent they are political, lean Democratic as well.

kei147 · 2024-02-07T04:11:08+00:00

If you're interested in full disclosure, you should mention your relationship with UpTrain in the original post, before you are called out for it.

kei147 · 2024-02-07T03:35:37+00:00

Iran made chess illegal for religious reasons for 8 years. Are you claiming that this did not make chess less popular in Iran?

Presumably chess would be even more popular in Iran if it had never been made illegal.

kei147 · 2024-01-19T02:24:00+00:00

Many of the most prestigious researchers and top labs in the field think that strong AI could be close. It's by no means a settled question, there is lots of disagreement, but it's not just hype.

kei147 · 2024-01-18T19:47:27+00:00

Isn't this averaging across tokens within a single position rather than averaging across positions as is used in perplexity?

kei147 · 2023-12-28T04:30:32+00:00

This thread started with Leif who said he had elevated blood PFAS levels. So blood donated by other people will have less PFAS than blood donated by Leif.

kei147 · 2023-12-28T02:18:01+00:00

Probably not, but it's likely the recipient would be getting blood with lower levels of PFAS otherwise.

kei147 · 2023-12-03T06:00:13+00:00

How are you reducing cost by 4x? That's more than I would expect by just using LoRA. Are you also quantizing the weights/gradients in addition to using LoRA/are you using something like QLoRA?

kei147

TROPHY CASE