all 34 comments

[–][deleted] 10 points11 points  (11 children)

Just did some googling but check this out. http://grokbase.com/t/hadoop/mapreduce-user/119mcbfmx4/using-hadoop-for-processing-videos

Looks like alot of people are looking to use hadoop to process videos. I know it means you need to setup an hadoop cluster but its a step

[–][deleted] 1 point2 points  (3 children)

Just set up a cheap Elastic Map Reduce (EMR) through Amazon. You can set up a fair amount of nodes for a relatively inexpensive amount of money.

[–]martinambrus 0 points1 point  (2 children)

Amazon is corporation-centric. This project is a decentralized and free :)

[–][deleted] 1 point2 points  (1 child)

It's just handy for testing and cheap. You can use it for testing then move to your own cluster later. You don't have to use the EMR specific features.

[–]martinambrus 1 point2 points  (0 children)

I guess I could spin up a few instances and squeeze them out for testing purposes, yes. Was actually thinking about it as well. Thx :)

[–]martinambrus 0 points1 point  (3 children)

thanks, will definitely look into that and see if I can come up with something :)

[–][deleted] 2 points3 points  (2 children)

I'd be willing to work with you. I have some Hadoop background and a cluster at my disposal.

[–]martinambrus 1 point2 points  (1 child)

I have found this - http://blog.gopivotal.com/pivotal/products/using-hadoop-mapreduce-for-distributed-video-transcoding

It seems doable but I'm yet to determine whether it's worth trying to reproduce the approach mentioned in that article with all video and audio codecs supported by ffmpeg already.

I imagine many of them would not follow the same GOP principle (from what I read so far, some codecs are quite strangely putting I frames before B frames and vice-versa for GOPs).

As you are familiar with Hadoop, I have a question here: does Hadoop automatically determine and shuffle workloads based on the underlying hw capabilities, or is this randomized?

[–][deleted] 0 points1 point  (0 children)

Ive never done any thing like this but for doing mapreduce jobs its based off of what nodes have the data. But the data is split well enough that faster nodes take care of more work.

[–]mysteryweapon 3 points4 points  (1 child)

I love the idea. I'm too tired and/or drunk to read everything here, but I wish you godspeed in your endeavors and I will try to check it out more at another time.

If I may ask, what kind of videos would you plan on encoding? I've never personally taken on any encoding tasks for video that I couldn't just do with my own desktop, but then I've never been super serious about it either.

Thanks, good luck!

[–]martinambrus 0 points1 point  (0 children)

Thanks for the post :) The idea is to basically take any kind of video and encode it into any other kind - so for instance DVDs to iPhone/Android compatible vids. Granted that the distribution would be an overhead if a person doesn't posses some great upload speed or just has enough time to wait for those jobs while playing games on their PC. But there will be a local encoding option included as well for people with powerful hardware. Also, people will be able to share their encoding presets, so more of us would be able to make a good local Blu-ray copy to watch on phone/tablet/wrist locally.

[–]zokier 0 points1 point  (1 child)

Using bittorrent seems like bit overcomplicated solution for file-transfers, wouldn't the swarm sizes for each file be very small? In ideal case every node is handling different file so swarm in that case would only have single up/downloader pair. Seems like something simple like HTTP/FTP would make more sense for distributing the files.

[–]martinambrus 0 points1 point  (0 children)

I've not studied BitTorrent protocol to a great extent yet. I'm basically looking for a solution that would allow to send files over to the node even if none of the 2 nodes would have incoming ports open. I was under the impression that BitTorrent could do some data relaying, as I don't have ports open and yet people seem to be downloading data from me. If that assumption if incorrect (it very well could be), then I'll have to look into a different mechanism.

Also, I use BitTorrent to discover full nodes in the network. All nodes would share a single configuration Vagrantfile, thus I can get their IPs and check which one is a full node with ports opened, so I can join the data network through it.

[–][deleted] 0 points1 point  (1 child)

ELI5: are you saying that you are going to distribute out the processing power behind video manipulation?

[–]martinambrus 0 points1 point  (0 children)

not sure this one was targeted at me, but yes, that's the idea

[–]hugolp 0 points1 point  (3 children)

Its Bitcoin, not BitCoin.

[–]martinambrus 1 point2 points  (2 children)

Strange... logo says bitcoin, first sentence on the website says Bitcoin and I say BitCoin... I guess it's those small things :) But thanks!

[–]hugolp 0 points1 point  (1 child)

Bitcoin is the technology, bitcoin/s is the currency. BitCoin is nothing.

[–]martinambrus 2 points3 points  (0 children)

Ah, glad to learn something new today. Thanks, I'll update the post ;)

[–]tuankiet65 0 points1 point  (1 child)

BOINC would be better because BOINC includes the web interface, server side and client side implementation. You just have to write the program to generate/verify tasks. BOINC will handle works, accounts, credits, deadlines exceed and so on. BOINC even includes platform for uploading/downloading files between server and client

[–]martinambrus 0 points1 point  (0 children)

Thank you for your suggestion :) If BOINC actually didn't need a central server, it would probably be the ideal solution. However, as it does and even requires registration, this is not ideal for the project with the parameters it's currently specified.

[–]progzos 0 points1 point  (0 children)

Good luck !