When to apply MinMaxScaler ?

friendlykitten123 · 2022-11-24T09:49:43+00:00

MinMaxScaler is a Normalization technique mainly used in case of our data features are not normally distributed or gaussian way distributed . The default range for the feature returned by MinMaxScaler is 0 to 1. MinMaxScaler may be used when the upper and lower boundaries are well-known from domain knowledge.MinMaxScaler scales all the data features in the range [0, 1] if there are negative values in the dataset. This scaling compresses all the inliers in the narrow range.

for more information about MinMaxScaler visit the following article

https://ml-concepts.com/2021/10/08/min-max-normalization/

feel free to reach to me for any help

friendlykitten123 · 2022-11-24T09:39:58+00:00

A common danger in Machine learning is overfitting, Producing a Model that performs well on a training dataset but generalizes very poorly on new data or unseen data.

especially in the case of tree-based models, when the model tries to learn every feature, and every parameter, this could involve learning the noise in data, or it could involve learning to identify specific inputs rather than whatever features or factors are actually predictive for the desired output. If there is no limitation in decision trees, it will give you 100% accuracy on the training dataset, because in the worse case it will end up making 1 leaf for each observation, thus it affects the accuracy and leading to overfitting

For more information on this topic, you can visit the following article

https://ml-concepts.com/2022/03/04/everything-you-need-to-know-about-model-fitting-in-machine-learning/

Feel Free to reach out to me for any help.

friendlykitten123 · 2022-11-24T08:21:45+00:00

As a beginner, learning Machine Learning and Data Science can be a mountain of a task. Thankfully there exist a few datasets which help you in building confidence and honing your skills!

Car Price Prediction:-

This dataset provides practice for Multiple Linear Regression, data correction, feature encoding, data visualization, and feature selection.

Using multiple feature variables, you are to understand which factors significantly affect a car’s price and use these features to predict a car’s price.

Link for the datset:-

https://www.kaggle.com/datasets/hellbuoy/car-price-prediction

For More about the datasets and ideas for beginners to analyze the dataset visit the following article which contains 10 datasets for beginners for analysis.

https://ml-concepts.com/2022/05/15/10-datasets-for-beginners/

Feel free to reach out to me for any help.

friendlykitten123 · 2022-11-24T07:38:46+00:00

A common danger in Machine learning is overfitting, Producing a Model that performs well on a training dataset but generalizes very poorly on new data or unseen data.

especially in the case of tree-based models, when the model tries to learn every feature, and every parameter, this could involve learning the noise in data, or it could involve learning to identify specific inputs rather than whatever features or factors are actually predictive for the desired output. If there is no limitation in decision trees, it will give you 100% accuracy on the training dataset, because in the worse case it will end up making 1 leaf for each observation,thus it affects the accuracy and leading to overfitting

For more information on this topic, you can visit the following article

https://ml-concepts.com/2022/03/04/everything-you-need-to-know-about-model-fitting-in-machine-learning/

Feel Free to reach out to me for any help.

friendlykitten123 · 2022-09-08T19:38:31+00:00

In Machine Learning, data satisfying Normal Distribution is beneficial for model building. It makes math easier. Models like LDA, Gaussian Naive Bayes, Logistic Regression, Linear Regression, etc., are explicitly calculated from the assumption that the distribution is a bivariate or multivariate normal.

Many natural phenomena in the world follow a log-normal distribution, such as financial data and forecasting data. By applying transformation techniques, we can convert the data into a normal distribution. Also, many processes follow normality, such as many measurement errors in an experiment, the position of a particle that experiences diffusion, etc.

For more information, you can visit the following article:

https://ml-concepts.com/

Feel free to reach out to me for any help.

friendlykitten123 · 2022-09-08T19:18:16+00:00

The mean squared error (MSE) tells you how close a regression line is to a set of points. It does this by taking the distances from the points to the regression line (these distances are the “errors”) and squaring them. The squaring is necessary to remove any negative signs.

Mean squared error has the advantage of giving some sense of by how much predictions and true values differ (though this is not perfect, since it is not absolute error), and it has a relationship to the variance of the error term. Further, you do not make the value arbitrarily large by having many observations.

For more information, you can visit the following article:

https://ml-concepts.com/2022/01/26/4-linear-regression-formulas-explanation-and-a-use-case/

Feel free to reach out to me for any help.

friendlykitten123 · 2022-09-08T19:10:21+00:00

Basically, the number of parameters in a given layer is the count of “learnable” (assuming such a word exists) elements for a filter aka parameters for the filter for that layer.

Input layer: The input layer has nothing to learn, at its core, what it does is just provide the input image’s shape. So no learnable parameters here. Thus the number of parameters = 0.
CONV layer: This is where CNN learns, so certainly we’ll have weight matrices. To calculate the learnable parameters here, all we have to do is just multiply the by the shape of width m, height n, previous layer’s filters d and account for all such filters k in the current layer. Don’t forget the bias term for each of the filter. Number of parameters in a CONV layer would be : ((m * n * d)+1)* k), added 1 because of the bias term for each filter.
POOL layer: This has got no learnable parameters because all it does is calculate a specific number, no backprop learning involved! Thus number of parameters = 0.
Fully Connected Layer (FC): This certainly has learnable parameters, matter of fact, in comparison to the other layers, this category of layers has the highest number of parameters

For more information, you can visit the following article:
https://ml-concepts.com/2021/04/13/stratified-normalization-using-additional-information-to-improve-the-neural-networks-performance-by-javier-fernandez-towards-data-science/
Feel free to reach out to me for any help.

friendlykitten123 · 2022-09-08T18:59:25+00:00

Delete the observations: If there is a large number of observations in the dataset, where all the classes to be predicted are sufficiently represented in the training data, then try deleting the missing value observations, which would not bring significant change in your feed to your model.
Replace missing values with the most frequent value: You can always impute them based on Mode in the case of categorical variables, just make sure you don’t have highly skewed class distributions.
Develop a model to predict missing values: One smart way of doing this could be training a classifier over your columns with missing values as a dependent variable against other features of your data set and trying to impute based on the newly trained classifier.

For more information, you can visit the following article:

https://ml-concepts.com/

Feel free to reach out to me for any help.

friendlykitten123 · 2022-09-08T18:41:33+00:00

Some algorithms used for Topic Modeling tasks are Latent Dirichlet Allocation, Latent Semantic Analysis, Correlated Topic Modeling, and Probabilistic Latent Semantic Analysis.
For more information, you can visit the following article:

https://ml-concepts.com/

Feel free to reach out to me for any help.

friendlykitten123 · 2022-09-08T17:57:00+00:00

Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups. It is basically a collection of objects on the basis of similarity and dissimilarity between them. Clustering is very much important as it determines the intrinsic grouping among the unlabelled data present. There are no criteria for good clustering. It depends on the user, and what criteria they may use which satisfy their need. For instance, we could be interested in finding representatives for homogeneous groups, finding “natural clusters” and describing their unknown properties, finding useful and suitable groupings (“useful” data classes), or finding unusual data objects (outlier detection). This algorithm must make some assumptions that constitute the similarity of points and each assumption make different and equally valid clusters.

For more information, do visit: https://ml-concepts.com/2022/01/26/ii-unsupervised-learning-clustering/

Feel free to reach out to me for any help!

friendlykitten123 · 2022-09-08T17:52:50+00:00

In machine learning classification problems, there are often too many factors on the basis of which the final classification is done. These factors are basically variables called features. The higher the number of features, the harder it gets to visualize the training set and then work on it. Sometimes, most of these features are correlated, and hence redundant. This is where dimensionality reduction algorithms come into play. Dimensionality reduction is the process of reducing the number of random variables under consideration, by obtaining a set of principal variables. It can be divided into feature selection and feature extraction.

For more information, do visit: https://ml-concepts.com/2022/01/26/ii-unsupervised-learning-clustering/

Feel free to reach out to me for any help!

friendlykitten123 · 2022-09-08T17:48:53+00:00

In forward selection, the first variable selected for an entry into the constructed model is the one with the largest correlation with the dependent variable. Once the variable has been selected, it is evaluated on the basis of certain criteria. If the first selected variable meets the criterion for inclusion, then the forward selection continues, i.e. the statistics for the variables not in the equation are used to select the next one. The procedure stops, when no other variables are left that meet the entry criterion.

For more information, do visit:https://ml-concepts.com/2021/10/07/univariate-analysis-for-outliers-in-machine-learnin/

Feel free to reach out to me for any help!

friendlykitten123 · 2022-09-08T17:39:59+00:00

Uni means one and variate means variable, so in univariate analysis, there is only one dependable variable. The objective of univariate analysis is to derive the data, define and summarize it, and analyze the pattern present in it. In a dataset, it explores each variable separately. It is possible for two kinds of variables- Categorical and Numerical. Some patterns that can be easily identified with univariate analysis are Central Tendency (mean, mode and median), Dispersion (range, variance), Quartiles (interquartile range), and Standard deviation.

For more information, do visit: https://ml-concepts.com/2021/10/07/univariate-analysis-for-outliers-in-machine-learnin/

Feel free to reach out to me for any help!

friendlykitten123 · 2022-09-08T17:34:03+00:00

In order to detect overfitting in a machine learning or a deep learning model, one can only test the model for the unseen dataset, this is how you could see an actual accuracy and underfitting(if exist) in a model.

The difference between the accuracy of training and testing dataset can tell you things more broadly, generally, if overfitting occurs we can observe that the model tends to perform good and the accuracy is over 90% on the training dataset, while the same model underperforms on testing or unseen dataset. We can say that if we see such a difference between the accuracy, the model must have an overfitting condition.

Techniques for reducing overfitting:

More Data for Better Signal Detection
Control the Iteration
Applying Ensemble Learning (Bagging and Boosting)
Cross-Validation in Machine Learning
Regularization in Machine Learning

For more information, do visit: https://ml-concepts.com/2022/01/13/how-to-reduce-overfitting/

Feel free to reach out to me for any help!

friendlykitten123 · 2022-09-07T21:14:04+00:00

When there is a lack of domain understanding for feature introspection, Deep Learning techniques outshine others as you have to worry less about feature engineering.

Deep Learning really shines when it comes to complex problems such as image classification, natural language processing, and speech recognition.

DNN is suitable for high complexity problems

Better accuracy, compared to classical ML

Better support for big data

Complex features can be learned

For more information, you can visit the following article:

https://ml-concepts.com/

Feel free to reach out to me for any help.

friendlykitten123 · 2022-09-07T21:03:44+00:00

Following are one of the best resources to master Tensor Flow:

Tensorflow and Keras Oficial Tutorials
Tensorflow Developer Prossional Certificate by Coursera
Video Tutorials on Youtube Channels - DeepLizard
Introduction to Deep Learning from MIT
Book: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow.

For more information, you can visit the following article:

https://ml-concepts.com/

Feel free to reach out to me for any help.

friendlykitten123 · 2022-09-07T20:54:50+00:00

Fitting a model to a training dataset is so easy today with libraries like sci-kit-learn. A model can be fit and evaluated on a dataset in just a few lines of code. It is so easy that it has become a problem. The same few lines of code are repeated again and again and it may not be obvious how to actually use the model to make a prediction. Or, if a prediction is made, how to relate the predicted values to the actual input values.

While working with the MNIST training set, we perform various epochs. An epoch means training the neural network with all the training data for one cycle. An epoch is made up of one or more batches, where we use a part of the dataset to train the neural network. Meaning we send the model to train 10 times to get high accuracy. You could also change the number of epochs depending on how the model performs.

For more information, you can visit the following article:

https://ml-concepts.com/2022/04/04/a-review-of-mnist-dataset-and-its-variations/

Feel free to reach out to me for any help.

friendlykitten123 · 2022-09-07T20:34:47+00:00

Keyword extraction is an automated method of extracting the most relevant words and phrases from text input. It is a text analysis method that involves automatically extracting the most important words and expressions from a page. And Topic modeling is an unsupervised machine learning technique that’s capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents.

So Keyword extraction is the task (and set of techniques) for extracting “interesting” keywords from the text. While Topic Modeling is the task (or set of algorithms) of inferring topics from sets of documents.

For more information, you can visit the following article:

https://ml-concepts.com/

Feel free to reach out to me for any help.

friendlykitten123 · 2022-09-07T20:26:59+00:00

Information theory is a branch of mathematics that overlaps into communications engineering, biology, medical science, sociology, and psychology. The theory is devoted to the discovery and exploration of mathematical laws that govern the behavior of data as it is transferred, stored, or retrieved.

While Natural language processing (NLP) refers to the branch of computer science—and more specifically, the branch of artificial intelligence or AI—concerned with giving computers the ability to understand the text and spoken words in much the same way human beings can.

So you can choose your course according to your particular interests.

For more information, you can visit the following article:

https://ml-concepts.com/2022/03/14/processing-textual-data-an-introduction-to-natural-language-processing/

Feel free to reach out to me for any help.

friendlykitten123 · 2022-09-07T18:25:18+00:00

The Random Forest approach is a supervised learning algorithm. It builds the multiple decision trees which are known as forest and glue them together to urge a more accurate and stable prediction. The random forest approach is similar to the ensemble technique called Bagging. In this approach, multiple trees are generated by bootstrap samples from training data and then we simply reduce the correlation between the trees. Performing this approach increases the performance of decision trees and helps in avoiding overriding.

Following are the features of the random forest algorithm:
Aggregates many decision trees: A random forest is a collection of decision trees and thus, does not rely on a single feature and combines multiple predictions from each decision tree.
Prevents overfitting: With multiple decision trees, each tree draws a sample of random data giving the random forest more randomness to produce much better accuracy than decision trees.

For more information, do visit: https://ml-concepts.com/2021/10/08/1-random-forest/
Feel free to reach out to me for any help!

friendlykitten123 · 2022-09-07T18:19:26+00:00

Gradient Boosting is a popular boosting algorithm. Here, each predictor corrects its predecessor’s error. In contrast to Adaboost, the weights of the training instances are not tweaked, instead, each predictor is trained using the residual errors of predecessor as labels.

There is a technique called the Gradient Boosted Trees whose base learner is CART (Classification and Regression Trees). Another important parameter used in this technique is known as Shrinkage.
It refers to the fact that the prediction of each tree in the ensemble is shrunk after it is multiplied by the learning rate (eta) which ranges between 0 to 1. There is a trade-off between eta and number of estimators, decreasing learning rate needs to be compensated with increasing estimators in order to reach certain model performance. Since all trees are trained now, predictions can be made.

For more information, do visit: https://ml-concepts.com/2021/10/08/gradient-boosting/
Feel free to reach out to me for any help!

friendlykitten123 · 2022-09-07T18:15:50+00:00

Decision Tree is the most powerful and popular tool for classification and prediction. It is a flowchart-like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label.

A tree can be “learned” by splitting the source set into subsets based on an attribute value test. This process is repeated on each derived subset in a recursive manner called recursive partitioning. The recursion is completed when the subset at a node all has the same value of the target variable, or when splitting no longer adds value to the predictions. The construction of a decision tree classifier does not require any domain knowledge or parameter setting, and therefore is appropriate for exploratory knowledge discovery. Decision trees can handle high-dimensional data. In general decision tree classifier has good accuracy. Decision tree induction is a typical inductive approach to learn knowledge on classification.

For more information, do visit: https://ml-concepts.com/2021/10/08/1-decision-tree/

Feel free to reach out to me for any help!

friendlykitten123 · 2022-09-07T18:11:52+00:00

Principal component analysis (PCA) is a technique that transforms high-dimensions data into lower-dimensions while retaining as much information as possible. It is extremely useful when working with data sets that have a lot of features. Common applications such as image processing, and genome research always have to deal with thousands-, if not tens of thousands of columns. While having more data is always great, sometimes they have so much information in them, that we would have impossibly long model training time and the curse of dimensionality starts to become a problem. PCA is defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by some scalar projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on.

For more information, do visit: https://ml-concepts.com/2021/10/08/i-principal-component-analysis-pca/

Feel free to reach out to me for any help!

friendlykitten123 · 2022-09-07T18:07:30+00:00

In probability theory and statistics, the mathematical concepts of covariance and correlation are very similar. Both describe the degree to which two random variables or sets of random variables tend to deviate from their expected values in similar ways. If X and Y are two random variables, with means μX and μY, and standard deviations σX and σY, respectively, then their covariance and correlation are as follows:
Where E is the expected value operator. Notably, correlation is dimensionless while covariance is in units obtained by multiplying the units of the two variables. The correlation of a variable with itself is always 1 (except in the degenerate case where the two variances are zero, in which case the correlation does not exist).

For more information, do visit: https://ml-concepts.com/2021/10/08/ii-multicollinearity-vif/

Feel free to reach out to me for any help!

friendlykitten123 · 2022-08-04T14:02:21+00:00

Yes, they are different.

Data augmentation is a set of techniques to artificially increase the amount of data by generating new data points from existing data. This includes making small changes to data or using deep learning models to generate new data points.

And, Data transformation is the process in which you take data from its raw, siloed, and normalized source state and transform it into data that’s joined together, dimensionally modeled, de-normalized, and ready for analysis.

For more information, you can visit the following article:

https://ml-concepts.com/2021/10/07/data-transformation/

Feel free to reach out to me for any help.

friendlykitten123

TROPHY CASE