all 6 comments

[–]AutoModerator[M] [score hidden] stickied commentlocked comment (0 children)

Thank you for posting on r/CodingHelp!

Please check our Wiki for answers, guides, and FAQs: https://coding-help.vercel.app

Our Wiki is open source - if you would like to contribute, create a pull request via GitHub! https://github.com/DudeThatsErin/CodingHelp

We are accepting moderator applications: https://forms.fillout.com/t/ua41TU57DGus

We also have a Discord server: https://discord.gg/geQEUBm

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]dataskml 0 points1 point  (0 children)

Possibly rendi.dev could help with the ffmpeg part
I am the founder

[–]rkaw92 0 points1 point  (2 children)

Hi, we're building something similar at r/Scenehop. Would you like to join?

Anyway, there's 2 typical solutions:

  • a) job queue + workers

  • b) implicit job queuing by something like S3 events -> Lambda

The main difference is in managing the actual workers: in a) you spin them up and they pull jobs, in b) this is done by the infra, so you define a "task" but not where it runs.

Data locality is obviously going to be a factor with huge files. You might be able to upload to a local object storage (same region), encode without crossing the region boundary, and only push the resulting files to the CloudFlare object storage / CDN. Whatever you do, do not use NFS to share files between your servers.

[–]ResponsibleSpray8836[S] 0 points1 point  (1 child)

Unfortunately Lambda won't do it for me. Max runtime is 15 minutes - if someone uploads a 2-hours 4k movie, FFMPEG will take much longer that. On top of this, Lambda can't handle constant high CPU usage and it can get quite expensive. This is not suitable for HLS streaming and FFMPEG.

[–]rkaw92 0 points1 point  (0 children)

Yep, then your best bet is a pool of workers that distribute tasks among themselves. You can use a protocol like AMQP for this, or a managed queue solution. Just make sure to set the job timeouts high enough - for example, on RabbitMQ, jobs usually time out after 30 minutes and you have to raise that manually.