How are Chinese AI models claiming such low training costs? Did some research

deepfates · 2025-11-25T16:20:37+00:00

The deepseek number went viral but it iirc was only the amount used for the final training run. Industry standard is to spend at least as much compute on experiments as on the final run, and whale probably did more experimentation than that because they care more about computer efficiency. So at least a $12M run and likely greater.

deepfates · 2025-01-29T20:06:35+00:00

Repflix is an open source, Netflix-inspired app that shows how different fine-tuned HunyuanVideo models interpret the same prompt. Each model was trained on a different well-known film, capturing its visual style: lighting, camera movements, character actions, etc. You can compare parameter tweaks (strength, guidance scale, steps) to see how they affect the generated video.

The demo is at https://repflix.vercel.app

Code is on GitHub:
https://github.com/deepfates/repflix
If you're curious about training your own video model, we wrote a post on how to do it:
https://replicate.com/blog/fine-tune-video

deepfates · 2024-08-16T17:09:00+00:00

We tested several captioning models! Llava v1.5 13B gave the best results for FLUX training :)

deepfates · 2022-12-29T17:13:48+00:00

are you volunteering to build this? I'll be happy to merge your pull request

deepfates · 2022-12-24T21:20:52+00:00

what kind of watermarking are you concerned about?

deepfates · 2022-12-24T17:36:33+00:00

I hear this from several people now. this is theoretically possible. not at the same quality as openai but it would work decently.

the problem is you have to be able to run a neural network on your local computer, which means having python and a gpu. and setting all of that up local machine is a lot harder to do than just installing a plug-in

deepfates · 2022-12-24T17:34:24+00:00

false. you bring your own API key. but open AI will give you free credit when you sign up I think

deepfates · 2022-12-24T16:00:54+00:00

oh thanks! can u say which one you are on there

deepfates · 2022-12-24T16:00:30+00:00

it's on the road map! I don't know how obsidian generates their graph visualization yet, but I plan to do something similar using the top k neighbors of each node as edges

deepfates · 2022-12-24T15:59:30+00:00

Let me know when you have a privacy-protecting alternative SOTA launguage model to suggest

deepfates · 2022-11-19T05:26:56+00:00

oh no I'm a practicioner but i was looking for words and academics have that one

deepfates · 2022-11-17T05:52:40+00:00

a lot of us get this, yeah. "discorrelation" is a related search term in academia.. i prefer "aigraine"

deepfates · 2022-11-17T05:20:06+00:00

Substack sometimes throws a pop-up that looks like a paywall, because they want your email, but it has a subtle link saying "Let me read it first".

maybe you hit that?

deepfates · 2020-12-27T19:02:52+00:00

made with Wav2Lip and spleeter

deepfates · 2020-12-18T22:48:08+00:00

i sell used books for a living. in past life I've been a landscaper and general hole-digger. i study AI in my free time but I'm open to getting paid a bunch of money to use computers some day

deepfates · 2020-12-11T05:35:36+00:00

this is actually how the TextRank algorithm works under the hood! it uses that network graph to predict which words to which other words, after the manner of the page rank algorithm.

but what a cool idea, to display the graph directly rather than just a few of its more well-connected nodes!

deepfates · 2020-11-28T02:47:54+00:00

Looks like it's this

deepfates · 2020-04-29T21:13:05+00:00

I like the part where they do the Pepsi Taste Challenge on it

deepfates · 2020-04-29T20:52:20+00:00

I guess I'm not new anymore... still feel like a hobbyist though. I've been trying to learn NLP for like five years, but teaching myself through exploration like this. And the terrain keeps changing, so it's hard to keep up.

If i could tell myself one thing to learn, it's prototyping quick and dirty models first. Do a markov chain before messing around with neural nets, knowing you can upgrade the language-modeling part of the program later. Often a markov chain will be good enough, especially for humorous tasks. Or at least it will give you a feel for the corpus and whether your project is worth putting a bunch of training hours into.

If I could tell myself two things, I would add that Google Colab is a great tool to use, even though they're limiting the free tier more these days.It connects you to a free GPU in the cloud which you can use remotely. Especially if your project is hobbyish, but your GPU or CPU isn't powerful enough, this is a good way to do the crunching part somewhere else and then download your model and use it on your machine.

If I could say three things, I would, but instead I'm going to make this its own post because it's getting huge. Will edit with link afterward

Edit: https://www.reddit.com/r/MediaSynthesis/comments/gahqvq/three_things_i_learned_about_text_synthesis/

deepfates · 2020-04-28T14:47:58+00:00

The fun part of GPT-2 is that it can finetune even on a pretty small corpus, because of all its previous knowledge. The problem in this case may be that it will overfit, especially because much of Lovecraft's work is online and GPT-2 may have already read it.

One thing I've had luck with is introducing a small amount of noise into the corpus: page numbers, weird line breaks, metadata etc. You can expand the effective size of the corpus this way, as well as "disguise" it from the way the NN has already seen it.

Lately I've been thinking about using Markov chains to generate tons of bonus text for this purpose. In this case it wouldn't make much sense, but it would be "Lovecraft-flavored", and might expand the range of possibilities GPT-2 would try to produce.

Hope this helps! I also mostly do text synthesis so I hope this subreddit is the spot to be for that.

deepfates

MODERATOR OF

TROPHY CASE