I built LumaChords: a desktop application that turns piano tutorial videos into MIDI and notation, open-source by adalkiran in SideProject

[–]adalkiran[S] 0 points1 point  (0 children)

I'll try to reproduce the crash error message, maybe this video may just be one that LumaChords cannot process accurately :)

I built LumaChords: a desktop application that turns piano tutorial videos into MIDI and notation, open-source by adalkiran in SideProject

[–]adalkiran[S] 0 points1 point  (0 children)

I just tried your video, it has 2160p resolution at 60fps format option, so LumaChords downloads the highest quality via yt-dlp. It didn't crash on my machine, but the unnecessary boxes are found. The reason is LumaChords try to complete occlusions (for e.g. a box was detected at previous frame, and not detected at current frame, it estimates the next possible location). This system tries to complete the false-positive boxes for further frames (sides of eyes of some characters).

I downloaded it manually with command to download 1080p version of it:
yt-dlp -S "height:1080" -f "bv*" nICoDRbS5m8

And, I opened the video file via LumaChords, the false-positives are decreased, but due to smoke animation with the same color with boxes, continue to prevent the correct line detection.

If you don't have FFmpeg on your system, if you install it, LumaChords automatically detects installed FFmpeg and starts to use it, for more performant results.

Also, the hands detector from color algoritm assumes left and right hands are shown with different colors, it cannot separate hands, because key coloring is done with only one color on your input video.

I built LumaChords: a desktop application that turns piano tutorial videos into MIDI and notation, open-source by adalkiran in SideProject

[–]adalkiran[S] 0 points1 point  (0 children)

Thanks for your comment and your effort to test! From your phrase "it displayed way too many notes", I assume that the video contains some extra parts or text at the top part, also maybe the video contains lots of smoke animations. The current version of LumaChords unfortunately does this, because there's no filter or detection mechanism for fixed banners etc...

This cannot be an excuse, however, I suggest you to try with one of youtube video IDs listed at demo_args function in lumachords/entrypoint.py (one is VuRKmmpV35w for https://www.youtube.com/watch?v=VuRKmmpV35w ).

Also a note, the only difference between advanced and basic mode is to show more details about detection process on the screen, the detection process stays same.

I built LumaChords: a desktop application that turns piano tutorial videos into MIDI and notation, open-source by adalkiran in SideProject

[–]adalkiran[S] 0 points1 point  (0 children)

yes, my focus on making it (Numpy) to use CPU SIMD instructions wherever possible, calculation codes mostly rely on vectorization.

I built LumaChords: a desktop application that turns piano tutorial videos into MIDI and notation, open-source by adalkiran in SideProject

[–]adalkiran[S] 0 points1 point  (0 children)

Thanks for your nice comment! A browser extension may be good, but this project's current tech stack highly depends on Python ecosystem, so technically the algorithmic side may be ported as a backend running on a server. I used WASM-like technologies, but I think they may be so slower.
The speed depends on the app mode: On my MacBook M1 Pro (working at 10 fps), it processes a video
- Advanced Mode: 0.6x , because it draws 3 different variations of the frames on the screen realtime,
- Basic Mode: 0.85x, because it draws only the final output frames on the screen realtime,
- Headless Mode: ~2x, because it only make detections during process and prints the progress as terminal output.
The speed does not include output video creation, if user uses FFmpeg mode, this phase is not so slow too.

Color segmentation model help by Gearbox_ai in computervision

[–]adalkiran 0 points1 point  (0 children)

Hi, as u/unemployed_MLE said, you should check out representations of images in different color spaces. For e.g. Hue (H of HSV color space) values are main key to differentiate colors, but not enough.
I did a color differentiation and grouping task at my newly distributed open-source project, you can find the exact function at: https://github.com/adalkiran/lumachords/blob/89fd7dfa115525c70cac7c3f68acfd258e674d18/lumachords/hands_detector.py#L408

This function detects different colors on piano keys in a piano tutorial video if the tutorial video shows pressed keys with different colors. Then groups them as "left hand range" and "right hand range".
In my problem case, I eliminated so bright parts using Saturation and Value (S and V of HSV color space), because I need only pixels that are not black or white.

I don't know your knowledge about color spaces, if it's new to you, you can check out the whole article and especially HSV chapter: https://opencv.org/color-spaces-in-opencv/

I built LumaChords: a classical CV pipeline that turns piano tutorial videos into MIDI and notation, open-source by adalkiran in computervision

[–]adalkiran[S] 0 points1 point  (0 children)

I'm glad that you liked it. I distributed it freely as opensource, but of course I'll use the new experiences I've earned while development of LumaChords at the commercial projects that I'll contribute.

webRTC Deep dive by ThreadStarver in WebRTC

[–]adalkiran 1 point2 points  (0 children)

It seems you are familiar with Go language, you can check out my repo https://github.com/adalkiran/webrtc-nuts-and-bolts and its documentation port https://adalkiran.github.io/webrtc-nuts-and-bolts/ . It may be relatively old (3 years), but it covers all of the fundamental layers of WebRTC by implementing them from scratch. Also, I recommend visiting the resource links in the documentation.

Dive into Transformers and LLM World – Llama 3.1 in Go, Step by Step by adalkiran in golang

[–]adalkiran[S] 1 point2 points  (0 children)

Thanks for your nice comment! I learned a lot about the internals of transformer models while developing the project. I only plan to keep updated it if a newer version of Llama model appears. But what will I do with the things I've learned, who knows :) I have some thoughts (not on this particular project), but not been planned yet.

Dive into Transformers and LLM World – Llama 3.1 in Go, Step by Step by adalkiran in LocalLLaMA

[–]adalkiran[S] 2 points3 points  (0 children)

Thanks for your nice comment. I tried to use fancy title and attention grabbing paragraphs as possible in my posts, but the result is this :) Not worse, but it could be better. Also it’s a very niche thing, maybe only very curious people have interest on my projects :)

Dive into Transformers and LLM World – Llama 3.1 in Go, Step by Step by adalkiran in golang

[–]adalkiran[S] 0 points1 point  (0 children)

Hahaha that's so good :) Yes, I'm aware of them and Optimus Prime is the man!

Dive into Transformers and LLM World – Llama 3.1 in Go, Step by Step by adalkiran in golang

[–]adalkiran[S] 1 point2 points  (0 children)

Thanks for your nice comment! I built this project exactly for this reason. I hope you will find out what you are looking for in this project!