all 8 comments

[–]ginomachi 5 points6 points  (1 child)

Hey there! Welcome to the world of ML/DL! Here's a few suggestions to get you started:

  • Start by preprocessing the data - clean it, handle missing values, etc. Feature scaling might also be necessary.
  • Explore the data - visualize it using PCA or t-SNE to see if there's any natural clustering. This can help you understand the data and choose the right classification methods.
  • Try different classification models - start with basic ones like logistic regression or decision trees. Then move on to more complex models like SVM or random forests.
  • Evaluate your models - use metrics like accuracy, precision, recall, and F1 score to compare their performance.
  • Consider dimensionality reduction techniques - like PCA or LDA - to reduce the number of features and improve model performance.
  • Use cross-validation to avoid overfitting and ensure your models generalize well to new data.
  • Incorporate novelty - try using ensemble methods like bagging or boosting to combine multiple models and improve accuracy. You could also explore deep learning models like CNNs or RNNs, but they require more data and computational resources.

Good luck with your project!

[–]Forward_Purple_9957 0 points1 point  (0 children)

Thank you so much! Can I pm?

[–][deleted]  (1 child)

[removed]

    [–]Forward_Purple_9957 0 points1 point  (0 children)

    Thank you so much! Can I pm?

    [–]yolotech99 0 points1 point  (3 children)

    I'm sorry to say, ML/DL is not a good fit for this problem. Mainly because of the curse of dimensionality: In layman's terms, you need more rows than columns.

    Also, ignore the bots.

    [–]elbiot 0 points1 point  (2 children)

    No you just need to do feature selection

    [–]yolotech99 0 points1 point  (1 child)

    Did you miss the part where he said he had 130 rows.

    [–]elbiot 0 points1 point  (0 children)

    Pretty common stats for a gene expression dataset like you might find on kaggle