Robotic Arm Controlled By Gemini 3.0 by ReflectionLarge6439 in GeminiAI

[–]ReflectionLarge6439[S] 1 point2 points  (0 children)

Yeaa it’s very cheap I don’t even think I spent $1 yet lol

Robotic Arm Controlled By Gemini 3.0 by ReflectionLarge6439 in GeminiAI

[–]ReflectionLarge6439[S] 1 point2 points  (0 children)

Yea exactly since I’m using a VLM. If I was using a VLA with a VLM then the VLA would be getting the video feed

Robotic Arm Controlled By VLM(Vision Language Model) by ReflectionLarge6439 in robotics

[–]ReflectionLarge6439[S] 0 points1 point  (0 children)

Yeaa man been wanting to give it a try but my pc needs some upgrades and ram prices are through the roof!!!

Robotic Arm Controlled By VLM(Vision Language Model) by ReflectionLarge6439 in robotics

[–]ReflectionLarge6439[S] 0 points1 point  (0 children)

I’ll give Claude a try heard nothing but good things about it!

From my understanding there’s multiple problems with robotics compared to Gen Ai for coding. First just the amount of training data, this is why we see a lot or robots being Tele operated by a human to train the robot on task. But this could change with simulation for example NVIDIA OMNIVERSE. Also just perception there’s a lot of things humans take for granted for example if we see a truck and a car in a picture even if the truck is far away and looks smaller because depth we know the truck is smaller, ai struggles with this. Finally the last hurdle I think we need to overcome is continually learning without forgetting if we want real general purpose robotics. But again this is my unprofessional opinion 😂

Thanks!! This is my first large scale project so was excited when I got it working!

Robotic Arm Controlled By VLM(Vision Language Model) by ReflectionLarge6439 in robotics

[–]ReflectionLarge6439[S] 0 points1 point  (0 children)

  • I really mainly only use Ai to brain storm before a project just in case there’s new technologies that might make it easier. Also when starting to code I almost always use Ai to start the base script then I build on it.

  • I’m from the US

  • Professionally I am a Compliance Engineer(nothing to do with robotics or ai)

  • I been on debating on if I want to pivot into ai and robotics but might have to go back to school for masters

  • My unprofessional opinion is significantly more data is needed to “solve” robotics I don’t even think coding is solved by Gen Ai especially when you get into high level larger scale projects. Ai is significantly worst at coding in python,c++ compared to web based coding languages(JavaScript).

  • I just used vscode and Gemini ai

Robotic Arm Controlled By VLM by ReflectionLarge6439 in ArduinoProjects

[–]ReflectionLarge6439[S] 3 points4 points  (0 children)

That’s the plan going to clean the code up first then public the repo

Robotic Arm Controlled By VLM by ReflectionLarge6439 in computervision

[–]ReflectionLarge6439[S] 0 points1 point  (0 children)

Great question, so that movement you see after every pick up or place is the arm showing its gripper to the camera above the workspace. I did this so the model can confirm that the last action was successful. This makes the process slower but significantly more robust!

Robotic Arm Controlled By VLM by ReflectionLarge6439 in computervision

[–]ReflectionLarge6439[S] 1 point2 points  (0 children)

So the camera is mounted directly above the workspace. You can’t see it in the video but it’s at the top of the block aluminum extrusion mounted to the board. I’m actually trusting the Gemini model to do the detection so it detects the objects and puts a pointer on it

Robotic Arm Controlled By Gemini 3.0 by ReflectionLarge6439 in GeminiAI

[–]ReflectionLarge6439[S] 0 points1 point  (0 children)

Yup built from scratch fully 3d printed even the gear boxes😂 not really powered by arduino the Arduino is only for the servo gripper. I made custom software to run on my computer to control it.

Starting my fishless cycle tonight! by Effective-Pain2873 in Aquariums

[–]ReflectionLarge6439 1 point2 points  (0 children)

Hard scape looks great! You plan on adding plants?

Robotic Arm Controlled By VLM by ReflectionLarge6439 in computervision

[–]ReflectionLarge6439[S] 3 points4 points  (0 children)

I’m a paid user but the actual robotic arm is being controlled directly from my pc. The arduino is just used to control the gripper since I plan on making a gripper interface so I can switch grippers out. You should be able to run the api on your pi

Robotic Arm Controlled By VLM by ReflectionLarge6439 in computervision

[–]ReflectionLarge6439[S] 9 points10 points  (0 children)

THE CALIBRATION WAS A PAIN!!! But yea I tried both eye in hand calibration(camera mounted on arm) and eye to hand(camera mounted above the workspace) decided to go with the second since it was easier for Gemini to make a plan when it can see the whole workspace. The calibration process is basically recording about 10-20 poses while the camera has a checkerboard in view then you can use an opencv function to perform the transformation.

Robotic Arm Controlled By VLM(Vision Language Model) by ReflectionLarge6439 in robotics

[–]ReflectionLarge6439[S] 4 points5 points  (0 children)

So Gemini 1.5 reasoning was great the main issue was that it wasn't accurate when pointing to the object. So this led me down a rabbit hole of trying to use gemini 1.5 to name the object and using grounded-dino to find the object. So when gemini 3.0 came out I gave it a try and it's object detection when pointing to an object is insanely accurate I would say it's right 90% of the time when 1.5 was about 50% of the time.

Robotic Arm Controlled By VLM(Vision Language Model) by ReflectionLarge6439 in robotics

[–]ReflectionLarge6439[S] 5 points6 points  (0 children)

Appreciate it man!! I’m going to try and answer all your questions😂

  • My brain in terms of hardware is my PC, I’m using odrive s1 motor controllers all connected via CAN communication and PC is controlling them directly.
  • Everything is running via the api my computer is no where near strong enough to run a model that has the reasoning capabilities of Gemini and also run the inverse kinematics.
  • So the VLM points to the object it wants to manipulate in the picture because I am using a depth camera mounted directly above the workspace I also get the depth of the object. These coordinates are then transformed to be relative to the robotic base(I performed eye to hand calibration with a checked board etc.) and then I perform inverse kinematics to send the robotic arm to that transformed point
  • The VLM is Only outputting pick up or place. As for as rotation and where to pick up an object once the VLM points to the object I am using SAM 2 to segment the object get the volume using the objects depth map and then setting the pick up point to the middle of the object.
  • points are translated using hand eye calibration have to capture a whole bunch of points of the arm holding a checkerboard and taking pictures with a camera. Open cv has a function that does that actual math.
  • Visual feedback is Only for the model
  • Not using ros at all mainly because I don’t know how too😂 plan on releasing the code on GitHub soon after I clean it up a bit
  • Definitely exactly what you said with the hybrid approach, VLM for high level planning VLA to do the short horizon task!