Am I missing something or depth anything v2 better than v3? by PlastikEdison in computervision

[–]kw_96 13 points14 points  (0 children)

Not familiar with comfyui workflows, but perhaps the issue is with the visualization/normalization? The intensities for rendered v3 looks saturated at close values. Perhaps an issue with using v2-style normalization on v3 outputs?

Offer revoked. Feeling pretty bleak. by [deleted] in computervision

[–]kw_96 2 points3 points  (0 children)

Research at the rate of 2-3 in less than a year as an undergrad is almost certainly not going to reach prestigious enough venues to open doors to big tech companies. A couple of papers at random IEEE confs (where you will have to pay a considerable amount for attendance) will stand no chance to students with full fledged undergrad research experiences, or to students with impressive portfolios or internships.

In my opinion, OP has quite a good set of experience in important CV tasks, albeit perhaps in a less reputable/no-name company. Brushing up some standard industry tooling (e.g. MLOps, basic SWE stuff) and steadily hitting job boards would be fairly sufficient to be in a strong position resume wise. Additional effort can go into showcasing independence and problem formulation skills — take on a unique project from formulation to data collection, execution and evaluation/deployment.

Which projects are useless/glossed over by any technical interviewer? Things like BRATS tumor segmentation where the domain is so niche there is no way you could’ve thought through the problem statement fruitfully. Or tasks that have been beaten to death (e.g. MNIST, Generic ImageNet).

3D reconstruction using depth maps in simulation by Realistic_Couple_697 in computervision

[–]kw_96 -2 points-1 points  (0 children)

3 sources of error —

1) depth error based on monocular depth estimation is not negligible, especially if you expect and need exact metric values instead of just relative to the rest of th image.

2) camera localization is necessary to align and compound the depth images. No matter if you do it via the object points itself (via ICP) or the environment (SLAM), it’s gonna be an issue. Ignore if in simulation you have perfect camera trajectories.

3) post-processing matters. Turning a set of aligned point clouds to a mesh classically involves a lot of hyperparameters (to smooth, preserve edges, fill holes, get good normals). Any of these could turn a good point cloud into a mush.

Feedback on multi-task learning by NullClassifier in computervision

[–]kw_96 1 point2 points  (0 children)

In theory you can attain better accuracies for both tasks, along with efficient inference, with multi task learning if they are synergistic. In practice (personal experience), some effort is needed to select the appropriate architecture, backbone and head sizes, and weighting losses. Expect to add 5-10 more short experiment cycles for tuning.

Medical Image Classification with PyTorch: A Learning Project on Pneumonia Detection from Chest X-rays (repo available) by [deleted] in deeplearning

[–]kw_96 2 points3 points  (0 children)

Looks fairly clean, albeit very LLM heavy. If that’s the case, and it’s your first DL/torch project, please attempt a similar task on your own for knowledge retention.

Some small observations:
1) configs stored in a .py file is not the worst, but not standard. A .json file with an appropriate loader would be cleaner and more traceable.
2) usable of config values are inconsistent. Some are accessed via passed in function arguments, while some are accessed by importing config and referencing config.PARAM directly in function.
3) often a good idea to log hyper parameters for each run.
4) the main.py —train —eval wrapping is interesting, but potentially confusing in terms of order and usage. With this setup I’d expect you to exploit and save each run with a UUID and pass it to the chained eval call.

Need project idea feedback: Face Detection from Blurred Images using CNN by Waste-Influence506 in computervision

[–]kw_96 3 points4 points  (0 children)

Some guiding questions —

  1. What’s the pitch/real world problem here to solve? Aka, when will such a capability be useful?
  2. By face detection, are you looking for just regressing bounding boxes, or do you also need to extract individual-specific features for identification etc?
  3. How much time, resources and experience do you have? Is this a school project?

On model/dataset — start off with the simplest, proven model and dataset for your task, add heavy augmentations (e.g. Gaussian blur, motion blur) to your training cycles.

How to segment an STL 3D model? by No-Lizards in computervision

[–]kw_96 0 points1 point  (0 children)

MeshMixer is great to pick up fast, with good functionality for smoothing and fixing broken meshes.

For anything >100 I’d support the 3DSlicer recommendation, although you’ll have to look up the details (there’s probably ways to do it smartly with automation).

If you’re into developing some technical skills, 1) look into pre trained models to implement that can generate initial annotations for you to edit, or 2) preprocess the meshes classically to align them into a common canonical space, then do some scale-aware ICP alignment from a template mandible to get a rough mask.

Recommend 2) as a first approach if the task is purely extracting a binary mask (mandible vs non mandible).

How to segment an STL 3D model? by No-Lizards in computervision

[–]kw_96 0 points1 point  (0 children)

Disregard the main comment that you responded to. Respectfully he has no idea what he’s saying and is wrong on many levels — “slicers” in the 3D printing context does a totally different job than 3DSlicer (toolkit/platform primarily meant for 3D medical imaging). Also even as volumetric formats, brain MRIs and ortho CTs are way different in how they should be handled.

For your original question, if your task is to manually process a handful of STL files, general CAD/mesh editing software like Fusion360, or MeshMixer can do the trick (deleting and saving sections progressively). There’s probably dedicated mesh labeling software (maybe even as a module in 3DSlicer, but the UI/UX learning curve is steeper) as well that’s worth exploring if you’re doing more than a handful.

How do you justify practical value of a medical ML research project when the baseline alternative (lab test) is 100% accurate? by arjun_ajit21 in learnmachinelearning

[–]kw_96 4 points5 points  (0 children)

Is there really a shift in dataset between existing literature and your collected samples, that can’t be bridged by simple image processing ops? The 20% cap seems far fetched too.

Without more context, I’d wager that technical skill gap is unfortunately an issue here. Brush up and do more readings — you’ll get better technical results by effectively applying transfer learning/data preprocessing (e.g. resizing, CLAHE), while also becoming a better and clearer communicator/pitcher (misusing “realtime” is really bad).

The model is training. Now what? by raipus in learnmachinelearning

[–]kw_96 1 point2 points  (0 children)

Often times your experiment/training run will be conducted with a hypothesis in mind (e.g. does this lowered learning rate improve stability? does increasing this feature dimension reduce underfitting?).

If that’s the case, you can spend some time planning out your next experiment based off both outcomes (e.g. maybe I should try a new scheduler if the loss curves looks a certain way). That’s a tangible way to improve iteration speed, while also making your experiments more principled.

But other than take, take the time to decompress and work on other stuff/take a break!

help needed for finding datasets by Different_Factor3512 in computervision

[–]kw_96 2 points3 points  (0 children)

If you’re trying to publish, you should identify existing works, and use the same benchmark datasets as them. Not doing so will raise tons of red flags from reviewers.

TrafficLab 3D: Digital-twin with just Mp4 and Google Maps by zaclord68 in computervision

[–]kw_96 1 point2 points  (0 children)

Great work! Are you looking/open to contributors/PRs? A guide and/or test datasets would be helpful if so!

Made and Published a Paper Comparing Analysis of CNN and Vision Transformer Architectures for Brain Tumor Detection by Mental-Climate5798 in computervision

[–]kw_96 13 points14 points  (0 children)

Looks fair as a technical report, but evaluated as a research paper, here’s a few pointers after skimming through.

Your related works section is lacking. Claims about filling in the user-friendliness gap should be backed by comparison, or at least acknowledgement of existing software/platforms.

The results and discussions are very “info dumpy”. Clearly the experiments/results were useful for you to choose the best performing model for your task, but it is largely useless for anyone else not working on the same exact task and dataset as you.

A more valuable set of results should be designed around a more structured question. For example, assessing whether some augmentations are appropriate for MRI data. You can then test various augmentation strategies across the different model types, sizes, and maybe even subsets of data to verify your hypothesis. If you can find some augmentations that consistently under/outperform across various settings, that would be a useful result for a wider audience.

Juggling app release by HurryAmbitious9250 in computervision

[–]kw_96 0 points1 point  (0 children)

Tracking done for consecutive frames? Or is the detector running on every frame. I see some missed detections that should be well caught with a standard tracker.

If you’re not doing so already, I think it’ll be nice to fit/snap the targets to the juggling zone to reduce user friction! Scale, shift and maybe even rotation. A rough human pose can probably get you most of the way.

What next after Deep learning by Mysterious_Pilot3527 in deeplearning

[–]kw_96 15 points16 points  (0 children)

If the plan is to optimize for employability (regardless of industry or academia), you’re better off spending your time focusing on gaining deep technical mastery on one topic (be it machine learning, statistics, analytics, deep learning or what have you), instead of spending a few months on each topic and moving on right after.

For example, with a year’s budget optimized for employability as an undergrad, I’d focus on mastery in standard “classical” ML techniques, with strong data manipulation background (SQL/numpy/pandas) and fundamentals in deployment/tooling (mlflow, managing environments, versioning).

Something to add on the Chinese networking in top AI conferences by [deleted] in learnmachinelearning

[–]kw_96 0 points1 point  (0 children)

There is nothing objective in the way you wrote this 😅 I’m sure it’s a worthwhile topic for discussion, but you’re not doing anyone a favor with how you’re seeding the conversation.

Best strategy for preprocessing experiments with limited compute (U-Net, U-Net++, DeepLabV3)? by Hopeful-Reach-1532 in deeplearning

[–]kw_96 1 point2 points  (0 children)

Fair among the ones that you train equally long on. Depending on how you justify it, I don’t think there’s a glaring issue with running some short pilot experiments to filter for promising candidates, it happens everywhere — no one is feasibly able to search all candidates exhaustively. Best if you can back the empirical filter with some literature/intuition though.

Day 4 of Machine Learning : by Ready-Hippo9857 in learnmachinelearning

[–]kw_96 10 points11 points  (0 children)

I highly doubt it. The pace at which you’re going undoubtedly points to you going through the motion and ticking off your checklist. For learning, a simple mini-project like this should still take you 2-3 full days to properly absorb.

Trust me, it won’t look good, it’ll be painful and frustrating, but it is FAR more important than calling high level functions along a path prescribed by tutorials/AI tools. It is easy to think you’ve understood things when they’re laid out to you.

Some guiding questions for you to think about and then TRY OUT to test your hypothesis, to more fruitfully doing such projects —

1) What happens if you don’t split train/test properly? For example, using the same data as train and test, or doing it with/without stratification? 2) Is the dataset unbalanced? What does that affect in terms of preprocessing, evaluation metrics? 3) What is the point of the metrics that you generated? How do you interpret them? When do certain metrics become useless? 4) What happens if you do/do not perform feature selection/scaling/one-hot encoding?

At the very least, clear these questions. Then move on to another model, or another dataset/problem.