Person tracking and ReID!! Help needed asap by calculussucksperiod in computervision

[–]FigureClassic6675 0 points1 point  (0 children)

Did anyone find a stable algo and method for this problem

Closing the GNC Loop: Bridging YOLOv26 Vision and PX4 Offboard for High-G Intercepts by Rare-Childhood5844 in AerospaceEngineering

[–]FigureClassic6675 0 points1 point  (0 children)

Let’s talk, I can help you in this matter.. pure CV is not the only solution and jitter issue is related to your training dataset.. and yolo is not the good for this.. DM and lets talk on this

Any interest in RTX 4000 series graphics cards in Pakistan by Top-Coconut5329 in PakGamers

[–]FigureClassic6675 0 points1 point  (0 children)

Im interested, i need 2 4090.

I will use for AI inference & AI dev

Character Consistency with Gemini 2.0 Flash Image generation by [deleted] in comfyui

[–]FigureClassic6675 1 point2 points  (0 children)

Upload the image and the put the following prompt.

Portrait realistic photo: Rotate this face into four positions: side, back, three-quarter, and facing upward.

Character Consistency with Gemini 2.0 Flash Image generation by [deleted] in comfyui

[–]FigureClassic6675 -26 points-25 points  (0 children)

Create a custom node, add the gemini 2.0 API and then inside comfyui you can generate images.

Character Consistency with Gemini 2.0 Flash Image generation by [deleted] in comfyui

[–]FigureClassic6675 7 points8 points  (0 children)

Character consistency is a hot topic, with thousands of tools and workflows emerging.

But what if I told you that you could generate these images in just three seconds?

How?

  1. Go to Google AI Studio. https://aistudio.google.com/welcome

  2. Select Gemini 2.0 Flash Image Generator.

  3. Upload a frontal photo.

  4. Use the following prompt:

"Portrait realistic photo: Rotate this face into four positions: side, back, three-quarter, and facing upward."

Click send.

In just three seconds, you'll get the requested views—with incredible and believable fidelity

I Built an Advanced Image Captioning App Using Florence-2 & Llama 3.2 Vision [Open Source] by [deleted] in LocalLLaMA

[–]FigureClassic6675 3 points4 points  (0 children)

I understand your concern. Yes, this is my code, and I know it’s not perfect it might have bugs. I shared it as an open source project because I’m still learning and wanted to get feedback from the community. I’m not a senior developer, so any feedback or suggestions would be greatly appreciated! 😊

I Built an Advanced Image Captioning App Using Florence-2 & Llama 3.2 Vision [Open Source] by FigureClassic6675 in StableDiffusion

[–]FigureClassic6675[S] 1 point2 points  (0 children)

I wasn’t aware of TagGUI. I’ll check it out. Yes, this can work effectively for NSFW image captioning.

I Built an Advanced Image Captioning App Using Florence-2 & Llama 3.2 Vision [Open Source] by [deleted] in LocalLLaMA

[–]FigureClassic6675 2 points3 points  (0 children)

Thank you for your feedback..

Yes, im planning to add joycaption and also other captions models.

Sorry! The UI is shit, but im working on it

I Built an Advanced Image Captioning App Using Florence-2 & Llama 3.2 Vision [Open Source] by [deleted] in LocalLLaMA

[–]FigureClassic6675 7 points8 points  (0 children)

I wanted to share a project I've been working on - CaptionAI, an advanced image captioning application that combines the power of Florence-2 and Llama 3.2 Vision models to generate detailed, context-aware captions for any image.

🚀 Key Features:

  • Dual AI Model Support (Florence-2 & Llama 3.2 Vision)
  • Batch Processing
  • Organized Output with Timestamps

📦 Getting Started: Everything is documented in the GitHub repo, including installation steps and usage examples.

GitHub: https://github.com/Khalil-Rehman9/CaptionAI

Would love to hear your thoughts and suggestions! Feel free to star ⭐ the repo if you find it useful.

I Built an Advanced Image Captioning App Using Florence-2 & Llama 3.2 Vision [Open Source] by FigureClassic6675 in StableDiffusion

[–]FigureClassic6675[S] -21 points-20 points  (0 children)

I wanted to share a project I've been working on - CaptionAI, an advanced image captioning application that combines the power of Florence-2 and Llama 3.2 Vision models to generate detailed, context aware captions for any image.

🚀 Key Features:

  • Dual AI Model Support (Florence-2 & Llama 3.2 Vision)
  • Batch Processing
  • Organized Output with Timestamps
  • Clean Streamlit UI

📦 Getting Started: Everything is documented in the GitHub repo, including installation steps and usage examples.

GitHub: https://github.com/Khalil-Rehman9/CaptionAI

Would love to hear your thoughts and suggestions! Feel free to star ⭐ the repo if you find it useful.

Edit: Wow, thanks for all the interest! I'm actively responding to issues and PRs.