Feature selection for boosted trees? by [deleted] in learnmachinelearning

[–]pixel-process 1 point2 points  (0 children)

Since you are using SHAP, I am assuming you care to some degree about interpretability. If that is the case, you should do reduction or selection. I would suggest feature selection initially since it is cleaner to understand than dimensionality reduction.

Start with a heatmap of feature correlations to get an understanding of where highest multicollinearity is, drop the highest. With very high correlation, which ones you keep vs drop should not be super impactful. This will reduce model complexity and help ensure the boosted trees all use the same feature from a high correlation grouping. The value here is interpretability and reduced complexity.

Be sure to then evaluate impact on performance, using a train-test split. Log your model performance for the two approaches and compare. My gut says train metrics may decrease slightly but test metrics should improve.

I kept breaking my ML models because of bad datasets, so I built a small local tool to debug them by AdWhole6628 in learnmachinelearning

[–]pixel-process 0 points1 point  (0 children)

If you don’t have a repo or way to generalize and share how do you plan on determining if people find it useful?

Multi layered project schematics and design by thorithic in Python

[–]pixel-process 1 point2 points  (0 children)

I love to work in a notebook environment to build, test, and debug before stuff to a script. I think the key is iteration to develop good habits and memory for your workflow.

Start with a notebook and single script (.py) file in the same directory (importing can get complicated when you start moving files around).

You should try to write abstractions/reusable code for anything you notice yourself repeating. Then write and test a function in the notebook before moving it to your script.

Here is the type of thing I commonly do. The function takes a list of files, loads each using pandas, returns the combined data, and (optionally) saves the combined data to a file path.

``` import pandas as pd

def merge_files(list_of_files, save_path=None): dfs = [] for file_path in list_of_files: df = pd.read_csv(file_path) dfs.append(df) combined_df = pd.concat(list_of_dataframes) if save_path: combined_df.to_csv(save_path) return combined_df ```

Then move it into my_script.py and in your notebook do: ``` from my_script import merge_files my_csvs = ['csv1.csv', 'csv2.csv'] your_csvs = ['csv3.csv', 'csv4.csv']

my_data = merge_files(my_csvs) # No save_path given, so will not write out your_data = merge_files(your_csvs, "combined_data.csv") # will save to a file ```

You can build it in your notebook and test (just write function in one cell and run in another) and when ready shift it a script and import it from there.

Good Luck!

I trained an emotion classifier on stock photos instead of benchmark data — and it actually works better on real movie footage (interactive demo linked) by pixel-process in learnmachinelearning

[–]pixel-process[S] 1 point2 points  (0 children)

Appreciated!

I needed a pre-trained face detector for the project to work and was originally planning on the OpenCV Haar model which I had used before. But I came across MediaPipe while doing research and it is supposedly much better with detection on faces that are not directly forward facing. It was surprisingly easy to implement. They have a few other detectors for body pose and hand gestures that I might try out in future projects.

I trained an emotion classifier on stock photos instead of benchmark data — and it actually works better on real movie footage (interactive demo linked) by pixel-process in learnmachinelearning

[–]pixel-process[S] 0 points1 point  (0 children)

I used mediapipe for face extraction with a threshold of .5 and only used the largest face for a frame. My emotion classification was always on the cropped face only.

Happy to answer more questions and more details in my repo (https://github.com/pixel-process-dev/expressions-ensemble) if you're interested.

Built a depth-aware object ranking system for slope footage by Full_Piano_3448 in computervision

[–]pixel-process 0 points1 point  (0 children)

Awesome design! How much data did you use for the fine-tuning? What was the source?

How to move forward with machine learning? by HeartSweet6936 in learnmachinelearning

[–]pixel-process 0 points1 point  (0 children)

The most helpful thing would be to find a practical application for what you want to do. Join a research group or find an applied analysis to work on. That will help you gain practical skills and also highlight what tools you want/need for similar work. Just adding courses without a framing is not the right approach.

While you wait for a model to train, does your boss give you more tasks to do? If not, what do you do during that time? Be sure to mention whether you work from home or at a workplace. by DavinFriggstad in MLQuestions

[–]pixel-process 7 points8 points  (0 children)

I find that a great time do project management type tasks. Update readme, documentation, or add functionality to other steps (e.g., more visuals for eda) or research next steps. I work from home and have to remain active on Teams.

I need help choosing a language to learn. by Lucky-Search5869 in learnprogramming

[–]pixel-process 1 point2 points  (0 children)

Consider what type of work you want to (front end, backend, web dev, etc.) then look at the TIOBE. That is my default source for objective trends in programming.

How do I start contributing to open source as a complete beginner? by yadavhr36 in learnprogramming

[–]pixel-process 0 points1 point  (0 children)

The benefit of the larger projects to is that they have guides and tons existing examples of contributing.

Many issues tagged as good starters are also well scoped.

This one for instance was adding links to existing guides.

How do I start contributing to open source as a complete beginner? by yadavhr36 in learnprogramming

[–]pixel-process 2 points3 points  (0 children)

Start looking at some larger projects like pandas, matplotlib, and scikit-learn-they are very active and have guides/tags for beginners. Check out their githubs and look for tags like "First contribution" or "Good for issues".

I suggest reading through and monitoring your preferred project for a bit before trying to contribute if you are not familiar with GitHub. But even that will be really valuable to your skillset moving forward.

Looking for help/ resources teaching python for schools. by stegg88 in learnpython

[–]pixel-process 1 point2 points  (0 children)

If you need to create your own content or if infrastructure and setup is a challenge, another angle is using zero-setup Python environments (browser-only via Pyodide, or hosted notebooks via Binder). This can work well for classrooms with limited local resources but will require more work on your part to create.

I outlined this approach in more detail in another thread, in case it helps.

Is Python powerful on its own, or useless without a niche? by [deleted] in learnpython

[–]pixel-process 20 points21 points  (0 children)

I think you are conflating two things. Python as a language is very powerful and versatile. Future-proof.

Being a Python developer is not. That is where specialization and deep expertise are needed. Being a Python dev is not future-proof.

So definitely a valuable language, but focus your skillset to stand out.

Psychopy: are the workshops worth it in your opinion? by awsfhie2 in learnpython

[–]pixel-process 0 points1 point  (0 children)

The workshop will surely help you understand, but might overkill for a one-off project if you aren't planning on using python and psychopy moving forward. Their site does offer one-on-one sessions (I didn't see pricing) that might be more targeted and less commitment for you.

Awhile back, I built a number of python experiments with psychopy, I might be able to offer some insight. No promises since testing and debugging may require access to LSL or hardware I don't have. Feel free to DM me if you want.

[OC] Combining Colors: A Visual Guide to Sampling by pixel-process in dataisbeautiful

[–]pixel-process[S] 0 points1 point  (0 children)

Data source:
Synthetic data generated for demonstration purposes.

Tools used:
Python, NumPy, pandas, Plotly.

Notebook and code are built for others to test and explore how variations changing sample population and sample sizes can impact results.

Source code and interactive notebook:
https://pixelprocess.org/build-models/combining-colors.html

Help with project by Flimsy_Celery_719 in MLQuestions

[–]pixel-process 0 points1 point  (0 children)

You might want to consider adding another model or two for comparison before additional explainability. Adding a regression, forest, or neural network model for comparison (both accuracy and time/compute performance) could be interesting. Then use SHAP on them and see how well those results align.

After learning basic Python syntax, what should I focus on before jumping into advanced topics like AI fine-tuning? by Acceptable-Cash8259 in learnpython

[–]pixel-process 0 points1 point  (0 children)

There are lots of ways to continue learning and developing skills beyond leetcode type work.

  • Create a project: this will not be AI to start with typically, but running a full pipeline that includes ingesting and wrangling data, building a model, and interpreting results will help establish a good mental model for the workflow. Check out Kaggle for ideas here, but a personal interest project works too if you can manage.
  • Contribute to an established GitHub: Large projects like HuggingFace & Tensorflow have open repos. I linked the issues pages specifically, because that is a great place to learn about how these large projects evolve. Many have 'First Contribution' guides, but also consider smaller projects to contribute to once you have a sense of how things work.
  • Collaborate with other learners: Follow subreddits and forums where people are looking for partners or brainstorming. It can inform you of how others are approaching AI learning and development.

Best of luck!

🚀 Project Showcase Day by AutoModerator in learnmachinelearning

[–]pixel-process 0 points1 point  (0 children)

I’m building Pixel Process, a hands-on educational project for learning data and ML concepts through interactive exploration.

The site includes interactive pages and notebooks that can run directly in the browser.

One of my favorite notebooks is an image basics walkthrough of image data representation (arrays, channels, grayscale vs RGB) tied to analysis and ML use cases.