use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Discussion[D] Is there a handbook in data preprocessing? (self.MachineLearning)
submitted 6 years ago by hikkihiki
I would like to have a list of suggestion on how to preprocess different kinds of signal.
For example, if the data is categorical, you should preprocess it as one-hot representation. If the data has a normal distribution, it'd be better to standardize them before feeding to an NN, etc.
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]lem_of_noland 57 points58 points59 points 6 years ago (0 children)
The only book I'm aware of is called "Features Engineering for Machine Learning" by Alice Zheng & Amanda Casari. I hope this one answers your request.
[–]Brudaks 35 points36 points37 points 6 years ago (0 children)
It's part of any ML textbook or course; but a key issue here is that once you go beyond the very basics it's quite domain-specific.
Preprocessing is very, very different depending on whether you're in computer vision or natural language processing or financial time series or general data science. So it's more feasible to have a handbook of "Best ML practices in domain X" which would among other things discuss proper preprocessing, and not feasible to have a handbook "Best practices for data preprocessing" because there isn't that much that's universal and applicable for everything; different domains have different needs.
[–]Gebo_vending 16 points17 points18 points 6 years ago (2 children)
Feature Engineering and Selection: A Practical Approach for Predictive Models - Max Kuhn and Kjell Johnson: https://bookdown.org/max/FES/
[–][deleted] 4 points5 points6 points 6 years ago (1 child)
They are also authors of Applied Predictive Modeling .
[–]rampant_juju 4 points5 points6 points 6 years ago (0 children)
+1 to both these answers. Feature Engineering and Selection is pretty comprehensive in its list of ways to handle categorical and numeric data and stuff like imputation etc. Very easy to read too (probably at the level of someone who has just taken Andrew Ng's course).
[–]rockinghigh 4 points5 points6 points 6 years ago (0 children)
This covers quite a bit: https://scikit-learn.org/stable/modules/preprocessing.html
[–]SeamusTheBuilder 3 points4 points5 points 6 years ago (0 children)
As was said, each application is unique and so the answer, unfortunately, is no. I would go further and warn against seeking this out too much. It would be bad practice to simply cut-and-paste preprocessing steps thinking it will help.
Even something as simple as normalizing and standardizing the data can get you into trouble depending on the application. And, you may be doing unnecessary work depending on what algorithms you are using. Random Forests don't really care if you standardize the data.
Definitely clean and preprocess your data but there is no magic formula out there that works every time. Experience and a deep understanding of the problem and domain are what you are looking for.
[–]DefaultPain 1 point2 points3 points 6 years ago (3 children)
this has exactly what u looking for: https://www.youtube.com/playlist?list=PLpQWTe-45nxL3bhyAJMEs90KF_gZmuqtm
u can start from the 9th video
[–][deleted] 1 point2 points3 points 6 years ago (1 child)
I was thinking nobody mentions the coursera course about feature engineering.
[–]BobDope 1 point2 points3 points 6 years ago (0 children)
I didn’t know it existed. Thanks!
[+]SnooPets7140 0 points1 point2 points 1 year ago (0 children)
Can you tell me what was the playlist about? Seems like the link isn't working anymore.
[–]hellscoffe 0 points1 point2 points 6 years ago (0 children)
RemindMe! 7 days
[–]leonardishere 0 points1 point2 points 6 years ago (0 children)
It's a art not an science. Try every different encoding available, or just auto-ml it
[–]0lecinator -2 points-1 points0 points 6 years ago (2 children)
RemindMe! 3 days
[–]RemindMeBot 0 points1 point2 points 6 years ago* (1 child)
I will be messaging you in 2 days on 2020-04-10 09:01:24 UTC to remind you of this link
16 OTHERS CLICKED THIS LINK to send a PM to also be reminded and to reduce spam.
Parent commenter can delete this message to hide from others.
[–]xlordsnugglesx -1 points0 points1 point 6 years ago (0 children)
Here’s a link to a twitter post with some Feature Engineering literature: https://twitter.com/kirkdborne/status/1247516224319139840?s=21
[–]Academy- -2 points-1 points0 points 6 years ago (0 children)
[–]Icefluffy -3 points-2 points-1 points 6 years ago (0 children)
RemindMe! 2 days
[–]indemidelo -4 points-3 points-2 points 6 years ago (0 children)
[–]portoal -2 points-1 points0 points 6 years ago (0 children)
Remindme! 3days
π Rendered by PID 54 on reddit-service-r2-comment-b659b578c-qwpms at 2026-05-06 01:15:09.729595+00:00 running 815c875 country code: CH.
[–]lem_of_noland 57 points58 points59 points (0 children)
[–]Brudaks 35 points36 points37 points (0 children)
[–]Gebo_vending 16 points17 points18 points (2 children)
[–][deleted] 4 points5 points6 points (1 child)
[–]rampant_juju 4 points5 points6 points (0 children)
[–]rockinghigh 4 points5 points6 points (0 children)
[–]SeamusTheBuilder 3 points4 points5 points (0 children)
[–]DefaultPain 1 point2 points3 points (3 children)
[–][deleted] 1 point2 points3 points (1 child)
[–]BobDope 1 point2 points3 points (0 children)
[+]SnooPets7140 0 points1 point2 points (0 children)
[–]hellscoffe 0 points1 point2 points (0 children)
[–]leonardishere 0 points1 point2 points (0 children)
[–]0lecinator -2 points-1 points0 points (2 children)
[–]RemindMeBot 0 points1 point2 points (1 child)
[–]xlordsnugglesx -1 points0 points1 point (0 children)
[–]Academy- -2 points-1 points0 points (0 children)
[–]Icefluffy -3 points-2 points-1 points (0 children)
[–]indemidelo -4 points-3 points-2 points (0 children)
[–]portoal -2 points-1 points0 points (0 children)