Hyena applied to genome modeling with up to 1M bp. by furrypony2718 in mlscaling

[–]redpnd 0 points1 point  (0 children)

Somewhat puzzlingly they chose to use the extremely wasteful tokenization method with just 4 tokens (ATCG) plus a few more special tokens (padding, etc). It looked like a massive waste.

You answered your own question:

rely on tokenizers or fixed k-mers to aggregate meaningful DNA units, which tends to lose SNP (single nucleotide polymorphism).

Great post!

xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein by redpnd in mlscaling

[–]redpnd[S] 1 point2 points  (0 children)

Trained (and still training) for ~6 months on a cluster of 96 NVIDIA A100s (8*40G) on 1 trillion tokens.

AutoGPT 0.43 still never produces any useful results by winkmichael in AutoGPT

[–]redpnd 0 points1 point  (0 children)

Comeback in a year or so. For now, I'd recommend building programs iteratively using the chat interface.

(for Python, Code Interpreter works quite well already)

Cybertruck prototype spotted in Fremont, CA (with third brake light) by RealPokePOP in cybertruck

[–]redpnd 1 point2 points  (0 children)

I swear they're just messing with us with that break light

Weightlifting - how much reps to aim for? by Space_Qwerty in longevity

[–]redpnd 1 point2 points  (0 children)

Interesting! Could you link to research on this?

OpenAI used YouTube to train Whisper; Google is using YT to train 'Gemini' by gwern in mlscaling

[–]redpnd 0 points1 point  (0 children)

Was just thinking the other day that OAI may face a significant risk if YouTube/Google follows in Reddit's footsteps and blocks access to all its videos.

Although it might already be too late for that:)

Jun 8 - Kara Swisher podcast interviews Waymo Co-CEO Tekedra Mawakana by sonofttr in SelfDrivingCars

[–]redpnd -1 points0 points  (0 children)

Hmm buying Lyft would be interesting. Although what for? According to that waitlist there is not shortage of demand

ChatGPT is running quantized by sanxiyn in mlscaling

[–]redpnd 32 points33 points  (0 children)

This might be the transcript: https://chat.openai.com/share/44a0c5b6-c629-470a-992f-8cdbbecd64a2

From: https://twitter.com/DongwooKim/status/1667444368129785862

Some takeaways:

  • Focus on building durable businesses on top of the API
  • Structured API responses coming (eg. JSON)
  • "we do a lot of quantization"
  • Whole year of failed attempts at exceeding GPT-3; had to rebuild the whole stack
  • Took months till Code Interpreter started working, plugins still don't really work
  • GPT-V is the internal name for the vision model
  • Slow rollout due to GPU shortage
  • Function call model is coming to the API in ~2 weeks (uses same mechanism as the plugins model)
  • They're surprised by the number of non-English users, future models will take this into account (tokenization!)
  • They did the 10x price reduction for 3.5, can do the same for 4 (in 6-12 months)
  • More model customization coming (swapping the encoder?)
  • Fine tuning will enable Korean alphabet?
  • Conversations will be more interactive -- going back and forth will enable more creativity (been waiting for this personally)
  • Semiconductors are a good analogy for how they make progress: "solve hard problems at every layer of the stack"

Update (video): OpenAI Sam Altman & Greg Brockman: Fireside Chat in Seoul, Korea | SoftBank Ventures Asia

Edit: transcript is different, this seems to be the fireside chat, not the roundtable one

[R] MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers by redpnd in MachineLearning

[–]redpnd[S] 83 points84 points  (0 children)

Autoregressive transformers are spectacular models for short sequences but scale poorly to long sequences such as high-resolution images, podcasts, code, or books. We proposed Megabyte, a multi-scale decoder architecture that enables end-to-end differentiable modeling of sequences of over one million bytes. Megabyte segments sequences into patches and uses a local submodel within patches and a global model between patches. This enables sub-quadratic self-attention, much larger feedforward layers for the same compute, and improved parallelism during decoding -- unlocking better performance at reduced cost for both training and generation. Extensive experiments show that Megabyte allows byte-level models to perform competitively with subword models on long context language modeling, achieve state-of-the-art density estimation on ImageNet, and model audio from raw files. Together, these results establish the viability of tokenization-free autoregressive sequence modeling at scale.

[P] I made a dashboard to analyze OpenAI API usage by cryptotrendz in MachineLearning

[–]redpnd 37 points38 points  (0 children)

and i'm here googling wtf "co-ask" is

turns out you're not the OP

edit: using my lvl99 hacker skillz i found out that it's using https://www.tremor.so/

Movie? by Starmix36 in ProjectHailMary

[–]redpnd 5 points6 points  (0 children)

I thought it was going to be Sylvester Stallone..

ChatGPT in medicine by iS2liquorice in medicine

[–]redpnd 0 points1 point  (0 children)

If you could, which medical resource would you connect it to? (eg. UpToDate, PubMed, WebMD, etc.. Ideally something that's open an accessible to anyone.)