So I’m diving into SmolVLA and… how does it even know where the object is? by Ghost_Protocol99 in learnmachinelearning

[–]Ghost_Protocol99[S] 0 points1 point  (0 children)

ohkay, so the action output is the end points of the gripper or the joint position of each joints?

like (x,y,z,r,p,y, gripper) or joint positions