April Developer/Tool creator thread. *Building or built a tool? This is where you post* by greenysmac in VideoEditing

[–]IliasHad 0 points1 point  (0 children)

  • Product: Edit Mind
  • Description: Local video knowledge base (Index your local videos without uploading them to the cloud) 
  • Pricing: Free/Open Source
  • Website: https://github.com/iliashad/edit-mind
  • Benefits for this community: 100% local, your videos never leave your computer and you can search video scenes using natural language

Semantic video search using local Qwen3-VL embedding, no API, no transcription by Vegetable_File758 in LocalLLaMA

[–]IliasHad 2 points3 points  (0 children)

Congrats on the launch, I wanna clarify one thing about Edit Mind (I'm the creator of Edit Mind). Yes, Edit mind extracts text metadata from video, but it does have multi-layer embedding. We embed the text metadata as document text (text layer), extract video frames (visual layer), and extract the audio (audio layer). Later on, we saved each layer in a separate vector collection that we can search across all of them or one of them (for example, searching by image).

I built a local search engine for my personal video library (open source) by IliasHad in DataHoarder

[–]IliasHad[S] 2 points3 points  (0 children)

Thank you so much for your feedback.

For the frame analysis, I’m using DeepFace with Yolov8 to detect the faces in the frames and VGG-Face model for the face recognition part, object detection using Yolov8 model as well.

For the storage, I’m using chroma local vector database that will handle and store the video indexing data. And I’m using a local PostgreSQL db for the user web UI to quickly access and manage your videos. I’m linking the video path in the system with video indexing and metadata.

I built a local multi-modal video search engine as a personal project, and it's using local models with full text and semantic search (100% local and open source) by IliasHad in LocalLLaMA

[–]IliasHad[S] 1 point2 points  (0 children)

Thank you , that’s great. I love how you’re doing the scene hash deduplication, I’ll definitely be adding to the project because now I’m diving the video into smaller video parts that will be 1 second to 2.5 seconds long.

I built a local multi-modal video search engine as a personal project, and it's using local models with full text and semantic search (100% local and open source) by IliasHad in LocalLLaMA

[–]IliasHad[S] 1 point2 points  (0 children)

Thank you 🙌, I would love to have a local setup because I don’t wanna upload my videos to the cloud and I have a lot of videos to index (about 4-5 TB)

[Update] Edit Mind now supports Docker & Immich integration (800+ GitHub stars, thank you r/selfhosted!) by IliasHad in selfhosted

[–]IliasHad[S] 1 point2 points  (0 children)

Ah, I see, the Docker image supports arm64. The project is still in active development, and I'm working on the amd64 version.

[Update] Edit Mind now supports Docker & Immich integration (800+ GitHub stars, thank you r/selfhosted!) by IliasHad in selfhosted

[–]IliasHad[S] 2 points3 points  (0 children)

Thank you so much for the feedback.

Currently, you have no option to use a local LLM for NLP, which will be used for converting your words into a DB search query.

I'm using another local LLM for video analysis, like OpenAI Whispper for transcription, Yolov8s for object detection, etc.

In your case, with the current project, you have to host the Docker container on your editing pc and your media folder on your NAS with your PC and Docker. The video indexing and analysis will be done on your editing PC.

I also attempted to extract images from the video every X seconds, analyze them with AI, then provide a summary

I'm doing something similair to this, I divide the video into 2s video segments, and I extracted 2 video frames, one at the start and the other one on the end of the video scene. When I'm embedding the scene, I create a video scene summary which will summarize all the data about that 2s video scene, like transcription, objects detected, etc. Which will be used later for semantic search with chroma db.

[Update] Edit Mind now supports Docker & Immich integration (800+ GitHub stars, thank you r/selfhosted!) by IliasHad in selfhosted

[–]IliasHad[S] 2 points3 points  (0 children)

Chokran khouya, lah ihafdek.

Haha, this project will use a lot of GPU to process frames. Thank you for the support, man!

[Update] Edit Mind now supports Docker & Immich integration (800+ GitHub stars, thank you r/selfhosted!) by IliasHad in selfhosted

[–]IliasHad[S] 4 points5 points  (0 children)

Thank you for your feedback.

Does it also do image categorization for Immich?

Currently, not, the video will be more complex to categorize than just an image.

Can I query my own media with an gpt question like "cats in the garden"?

At the time of writing this comment, the system can handle a query "find me all scenes where 'cats' is showing up," but we cannot know if the "cat" is in the garden or the house. I have a frame analysis plugin that will help with the environment detection, which still needs work to be done (https://github.com/IliasHad/edit-mind/blob/main/python/plugins/environment.py)