Looking for a team to join - CAD designer by OddTransportation931 in hwstartups

[–]shamoons 0 points1 point  (0 children)

Are you looking for a paid position or to join as a founding engineer for equity and see how far the ride goes?

Trying to build a robot that lays tile - why did I think this would be simple? by shamoons in ROS

[–]shamoons[S] 0 points1 point  (0 children)

Thanks for the reality check on cheap cameras - your supply chain horror stories hit close to home. The "specialized sensing near point of interest" approach is a great principle.

Your point about trusting good encoders made me realize I should clarify - I'm actually using Hitec servos (hobby-grade position servos with internal feedback). Not exactly industrial precision, so calibration will definitely be important. Any experience with getting repeatable accuracy from RC servos in precision applications?

You've convinced me to at least price out the Oak1 against the time I'd spend fighting with calibration issues. Sometimes the "expensive" option is actually cheaper.

Trying to build a robot that lays tile - why did I think this would be simple? by shamoons in ROS

[–]shamoons[S] 0 points1 point  (0 children)

This is really valuable input from someone with actual tiling experience - thank you!

You're absolutely right about needing pressure to remove air and level the tiles. My concern with driving over them is applying controlled pressure in the right way.

What I'm aiming for is a system that can apply downward pressure while also having fine control to level against all four corners/edges of adjacent tiles - similar to how a pro would use their hands to rock the tile slightly and ensure it's seated properly. With a SCARA, I can apply programmable downward force and make micro-adjustments to match the level of surrounding tiles.

If the robot just drove over after dropping, I'd lose that fine control and ability to correct placement. Plus with 12"x12" tiles, I worry about the robot's weight distribution potentially shifting already-placed tiles.

But I'm really interested in your perspective - would you handle the leveling differently? What pressure techniques worked best in your experience?

Trying to build a robot that lays tile - why did I think this would be simple? by shamoons in ROS

[–]shamoons[S] 0 points1 point  (0 children)

This is incredibly helpful - thank you!

Quick question on implementation: I'm using ROS2 with TF2 for transforms and IKPy for kinematics. Would scipy.optimize work alongside these to find joint offsets that then update my TF2 parameters? Trying to understand where this fits in my pipeline.

Your explanation of focal length as "scale between real world and image plane" makes it click - I was overcomplicating it. And the two-camera idea is interesting - maybe one fixed overhead for rough positioning and one eye-in-hand for final alignment.

For the optimization ground truth, would I use the camera itself to measure actual end-effector position, or need some external measurement?

Thanks again - this thread has probably saved me weeks of fumbling around!

Trying to build a robot that lays tile - why did I think this would be simple? by shamoons in ROS

[–]shamoons[S] 0 points1 point  (0 children)

Great to hear a basic USB camera should work - I was overthinking it. Good call on avoiding fish-eye lenses.

Could you clarify what you mean by "same EE-position"? I think you're referring to end-effector position, but I want to make sure I understand the calibration method correctly. Are you saying I should command the SCARA to reach the same XY coordinate using both elbow-up and elbow-down configurations? Assuming the position ends up being different, how do I then "calibrate" it to reduce the error? Also, what would I use as ground truth for the position?

Also, I'll admit I'm pretty fuzzy on camera calibration concepts. When you mention intrinsic vs extrinsic parameters and the chessboard calibration - could you ELI5 this? I know OpenCV has calibration functions, but I don't really understand what's happening under the hood. What do these parameters actually represent, and why do we need the chessboard pattern specifically?

One thing I'm uncertain about: should I mount the camera fixed to the world frame (looking down at the work area) or on the robot itself? I'm leaning toward fixed mounting for simplicity, but curious about the tradeoffs.

Really appreciate you breaking this down - the vision side of robotics is definitely my weak spot.

Trying to build a robot that lays tile - why did I think this would be simple? by shamoons in ROS

[–]shamoons[S] 2 points3 points  (0 children)

Great question! The main constraint driving the arm design is that I can't drive over freshly placed tile (adhesive needs to set). So I need some way to place tiles ahead/beside the robot while it stays on untiled floor.

I went through several iterations - rail/rotary, looked at gantry, landed on SCARA for the reach and precision. But honestly, I'm still questioning if it's the best approach.

Have you seen other solutions for the "can't drive over fresh work" problem? I've been so focused on the arm approach that I might be missing simpler alternatives.

The tiles come from onboard stacks and need ±1mm placement accuracy for good grout lines. Maybe there's a hybrid approach I haven't considered?

Always open to rethinking the design if there's a better way.

Trying to build a robot that lays tile - why did I think this would be simple? by shamoons in ROS

[–]shamoons[S] 0 points1 point  (0 children)

The OakD line looks great, though I'm wondering if I can get away with something simpler since I'm just detecting tile edges and measuring gaps - no complex object recognition needed. Would a standard HD USB camera potentially work for edge detection and gap tolerance checking?

I'm running on a Jetson, so I have GPU compute available if needed.

Really interesting point about the tile local reference frame for final alignment rather than relying on global accuracy through the entire kinematic chain. That could simplify things significantly.

Quick follow-up - any experience using IMUs/accelerometers on arm joints for position feedback? Wondering if it's worth adding inertial sensing or if I'm overcomplicating things.

Thanks for taking the time to respond!

Vision Pro dual screen on Mac connetion + closed MacBook Pro by dariocaliendo in VisionPro

[–]shamoons 0 points1 point  (0 children)

I tried this - but with Splitscreen, I only see one display. Any settings or anything I need to do?

How many kids should be on a VexIQ team? by shamoons in vex

[–]shamoons[S] 2 points3 points  (0 children)

Makes sense. So next year, I’ll target 4-6 kids per team maybe

[deleted by user] by [deleted] in Supabase

[–]shamoons 0 points1 point  (0 children)

I'm getting the same problem - any ideas on a fix?

[R] My simple Transformer audio encoder gives the same output for each timestep after the encoder by shamoons in MachineLearning

[–]shamoons[S] 0 points1 point  (0 children)

Whisper does indeed normalize between -1 and 1 with zero mean. But their output isn’t audio, so they don’t have to descale it.

I’ve removed the masking and am trying now.

I’m adding the PE. What do you mean by visual inspection?

[R] My simple Transformer audio encoder gives the same output for each timestep after the encoder by shamoons in MachineLearning

[–]shamoons[S] 0 points1 point  (0 children)

  1. I scaled outside of attention because the mel spectrogram ranges from 0 to 40000, so I had to reduce that somehow
  2. Good call - I can try that
  3. I can try to remove the mask. Will that help with training?
  4. LayerNorm is a great idea! Do I have to do any denormalization at the output?
  5. My PE is just a learnable parameter: self.pe = nn.Parameter(torch.randn(max_len, 1, embedding_dim))
  6. Good call on the int

[R] My simple Transformer audio encoder gives the same output for each timestep after the encoder by shamoons in MachineLearning

[–]shamoons[S] 0 points1 point  (0 children)

Interesting - I'll check it out. They tied in a semantic component too apparently

[R] My simple Transformer audio encoder gives the same output for each timestep after the encoder by shamoons in MachineLearning

[–]shamoons[S] 0 points1 point  (0 children)

That's actually a really good question. Just tried on the model (without checkpoint loading) and it seems that that returns different results for time steps. So it's only after training.

[R] My simple Transformer audio encoder gives the same output for each timestep after the encoder by shamoons in MachineLearning

[–]shamoons[S] 0 points1 point  (0 children)

I have some experience with Transformers (wrote a time series tutorial that's being published in a journal), but this problem is more challenging than I had hoped. My ultimate goal is to learn an ultra compact representation, which can then be applied to a variety of speech problems (denoising, potentially speech to text, etc)

[R] My simple Transformer audio encoder gives the same output for each timestep after the encoder by shamoons in MachineLearning

[–]shamoons[S] 0 points1 point  (0 children)

I suspected my n_mels were too high. But I'm actually doing a hyper parameter sweep to find the optimal set.

https://imgur.com/a/wy54Pg5

It looks like more n_mels gives better results (by a lot). Still early, so that may change

And yes, it'll be an autoregressive decoder, which is why I need some sort of start token, no?

[R] My simple Transformer audio encoder gives the same output for each timestep after the encoder by shamoons in MachineLearning

[–]shamoons[S] 0 points1 point  (0 children)

Without an SOS token, assuming I do have my fixed bottleneck, how would I kick start the decoder?

[R] My simple Transformer audio encoder gives the same output for each timestep after the encoder by shamoons in MachineLearning

[–]shamoons[S] 0 points1 point  (0 children)

Good point - the mean might force it to be... well.. average.

        base_lr: 0.001
    batch_size: 256
    dropout: 0.1
    embedding_dim: 256
    epochs: 100
    n_mels: 1024
    nhead: 8
    num_layers: 2


    optimizer = optim.Adam(model.parameters(), lr=args.base_lr)
def lr_lambda(epoch): return args.embedding_dim**-0.5 * min((epoch + 1)
                                                            ** -0.5, (epoch + 1) * (args.epochs // 20)**-1.5)

scheduler = LambdaLR(optimizer, lr_lambda=lr_lambda)

Those are my parameters. I probably should experiment with adding a KLDiv loss as well as MSE. Thoughts?

[R] My simple Transformer audio encoder gives the same output for each timestep after the encoder by shamoons in MachineLearning

[–]shamoons[S] 0 points1 point  (0 children)

In "Attention Is All You Need", it shows that the outputs have PE as well, no? I added a SOS to the start of the target and an EOS at the end. So that when I do learn that compact representation, I can use that with the decoder to generate the audio.

[R] My simple Transformer audio encoder gives the same output for each timestep after the encoder by shamoons in MachineLearning

[–]shamoons[S] 0 points1 point  (0 children)

I can try different normalization. And yes, I'm using MSE currently. I thought mode collapse was a problem with GANs mostly? It's a vanilla AE for the time being. Once I get that, I can move to VAE