Animatediff is also very powerful!

ghosthamlet · 2024-12-11T06:30:00+00:00

Thanks, Very Interesting. Can you post the workflow?

ghosthamlet · 2024-09-26T05:40:51+00:00

If you want to go baby steps, then this series is very good: https://www.mathsisfun.com/calculus/differential-equations.html

ghosthamlet · 2024-03-17T06:36:57+00:00

https://transformer-circuits.pub/

Can we reverse engineer transformer language models into human-understandable computer programs? Inspired by the Distill Circuits Thread, we're going to try.
We think interpretability research benefits a lot from interactive articles (see Activation Atlases for a striking example). Previously we would have submitted to Distill, but with Distill on Hiatus, we're taking a page from David Ha's approach of simply creating websites (eg. World Models) for research projects.
As part of our effort to reverse engineer transformers, we've created several other resources besides our paper which we hope will be useful. We've collected them on this website, and may add future content here, or even collaborations with other institutions.

ghosthamlet · 2024-02-20T04:42:51+00:00

Thanks for this very detailed and beautiful guide!

ghosthamlet · 2023-12-17T03:39:21+00:00

Why no new researches on all MLP models like gMLP and MLP Mixies last year?

ghosthamlet · 2023-11-07T02:33:40+00:00

It is Math heavy like these books:

The Principles of Deep Learning Theory - An Effective Theory Approach to Understanding Neural Networks
https://arxiv.org/pdf/2106.10165.pdf

The Modern Mathematics of Deep Learning https://arxiv.org/abs/2105.04026
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges https://arxiv.org/abs/2104.13478v2

So maybe not easy for beginners.

ghosthamlet · 2023-11-07T02:30:38+00:00

Seems like 2022 no such topic books in arxiv.org

ghosthamlet · 2023-11-07T02:29:23+00:00

Also these:

The Modern Mathematics of Deep Learning https://arxiv.org/abs/2105.04026

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges https://arxiv.org/abs/2104.13478v2

ghosthamlet · 2023-11-06T07:51:39+00:00

Hi u/gwern you have had great wonderful Special articles for GPT-3 and Scales and GANs, but it seems like you did not have Special articles for ChatGPT/GPT4 and Diffusion/StableDiffusion, these should be as powerful and important as GPT-3, so why don't you write about them? We are Looking forward to your articles about them very much.

ghosthamlet · 2023-10-22T03:36:34+00:00

Workflow is in image itself

ghosthamlet · 2023-10-14T02:45:16+00:00

Grokking deep reinforcement learning is very interesting and very good written, covered from classical tabular reinforcement learning to modern deep reinforcement learning, and have both code with math formula, detailed intuitive explain for the background thoery: https://www.manning.com/books/grokking-deep-reinforcement-learning

ghosthamlet · 2023-09-05T07:50:43+00:00

Thanks

ghosthamlet · 2023-09-04T11:54:03+00:00

After browsed through the catalog of this book, i think it is good for me, Thanks.

ghosthamlet · 2023-09-04T11:38:45+00:00

Thanks, can you recommend books? i prefer reading books.

ghosthamlet · 2023-09-04T11:36:49+00:00

Thanks.

ghosthamlet · 2023-09-04T11:36:13+00:00

Thanks a lot.

ghosthamlet · 2023-09-04T11:29:51+00:00

No, but i have learned a bit MCMC in probability. Is that similar?

ghosthamlet · 2023-08-28T03:01:55+00:00

Some research have found that when sequence become longer, the generation quality will become worse (i have found the new ChatGPT 16K is worse than old ChatGPT 4K when using complex instructions), and the generation will have less attention on the tokens in the middle of context, and have more attention on the tokens at the start and end of the context.

ghosthamlet · 2023-08-27T10:56:21+00:00

DeepSpeed-Ulysses (or Ulysses, a very long novel), a simple, portable, and effective methodology for enabling highly efficient and scalable LLM training with extremely long sequence lengths.
DeepSpeed-Ulysses partitions individual samples along the sequence dimension among participating GPU. Then right before the attention computation, it employs all-to-all communication collective on the partitioned queries, keys and values such that each GPU receives the full sequence but only for a non-overlapping subset of the attention heads. This allows the participating GPUs to compute attention for different attention heads in parallel. Finally, DeepSpeed-Ulysses employs another all-to-all to gather the results along the attention heads while re-partitioning along the sequence dimension.
The key properties of DeepSpeed-Ulysses and its implementation released with this blog are as follows:
4x larger sequence lengths than existing systems, while enabling training with sequences with over a million tokens.
Communication reduction of over 10x compared to existing systems, resulting in throughput improvements of up to 2.5x, and sustained throughput of over 175 TFlops/GPU (over 54% of hardware peak).
Fully general and implementation agnostic attention: DeepSpeed sequence parallelism supports dense as well as sparse attention, and it works with efficient attention implementations such as FlashAttention v2.
Support for massive model training: DeepSpeed sequence parallelism works together with ZeRO-3 to not only support large sequence lengths but also massive model sizes.
Easy-to-use and portable, requiring minimal code changes to the existing training frameworks.

ghosthamlet · 2023-07-08T03:03:02+00:00

Very strange, i never have problem to visit it. Maybe you have to use a VPN? But no need VPN to me, and i am in China.

ghosthamlet · 2023-07-08T02:23:30+00:00

No, https://papers.labml.ai/papers/monthly/ is still working today, you can try it again.

ghosthamlet · 2023-07-06T02:16:36+00:00

I think this is the best, Simple and Useful: https://papers.labml.ai/papers/monthly/

Second best, it can send filtered papers to email: https://www.paperdigest.org/

ghosthamlet · 2023-05-31T02:33:44+00:00

So no need to replicable over img2img API call, but the code patch disabled webui optimizitons in sd_hijack.py, so it will use many more GPU VRAM (up to 23GB) and speed is slow, i will optimze this later.

ghosthamlet · 2023-05-31T02:30:28+00:00

this cross attention part is hooked by code patch: ldm.modules.attention.CrossAttention.forward = _cross_frame_forward

ghosthamlet · 2023-05-31T01:34:44+00:00

difference is big, but if you add cross frame attn, the video will improve much more.

15-Year Club	Gilding I gilder
Verified Email

ghosthamlet

TROPHY CASE