## 7 Machine learning algorithms to know A beginner’s Guide Machine learning algorithms are a set of methods that allow a computer to learn from data without being explicitly programmed. These algorithms build models based on sample data to make predictions or decisions without being specifically told how to perform the task.

There are many different machine learning algorithms, including:

Supervised learning algorithms that learn from labelled training data, where the desired output is provided for each example in the training set. The goal is to make predictions about new, unseen models by finding patterns in the training data.

Unsupervised learning algorithms: These algorithms learn from unlabeled data, finding patterns and relationships in the data without prior knowledge of the desired output.

Semi-supervised learning algorithms: These algorithms learn from a combination of labelled and unlabeled data, using the labelled data to make predictions about the unlabeled data.

Reinforcement learning algorithms: These algorithms learn by interacting with their environment and receiving rewards or punishments for specific actions.

Machine learning algorithms are used in many applications, including image and speech recognition, natural language processing, and self-driving cars.

Here are the 7 Machine Learning Algorithms that you must know:

## Linear regression

Linear regression is a supervised machine learning algorithm used to predict a continuous outcome variable (also known as the dependent variable) based on one or more predictor variables (also known as independent variables). It is called "linear" because the relationship between the predictor and outcome variables is modelled using a linear equation.

The linear regression model assumes a linear relationship between the predictor variables and the outcome variable and estimates the coefficients of the linear equation based on the training data. The linear equation can predict the outcome variable for new data.

Here is an example of a linear regression model with one predictor variable:

y = b0 + b1*x

Where y is the outcome variable, x is the predictor variable, b0 is the intercept term, and b1 is the coefficient for the predictor variable. The intercept term represents the expected mean value of the outcome variable when all predictor variables are zero. The coefficient for the predictor variable represents the change in the outcome variable associated with a one-unit change in the predictor variable, holding all other predictor variables constant.

Linear regression can be extended to handle multiple predictor variables by adding additional terms to the linear equation. In this case, the model is called multiple linear regression.

Linear regression is a simple and widely used statistical model that is easy to interpret and implement. Still, it can be limited in capturing non-linear relationships between the predictor and outcome variables.

## Logistic regression

Logistic regression is a supervised machine learning algorithm used for classification tasks. It is called "logistic" because it uses a logistic function to predict the probability that an example belongs to a particular class.

In logistic regression, the goal is to learn a model that can predict the probability that an example belongs to a positive class (e.g. "yes" or "1") given one or more predictor variables (also known as independent variables). The model can then predict the class of new, unseen examples by estimating the probability that they belong to the positive class.

The logistic function models the probability that an example belongs to the positive class. It is defined as follows:

p = 1 / (1 + exp(-z))

Where p is the probability that the example belongs to the positive class, and z is a linear combination of the predictor variables:

z = b0 + b1x1 + b2x2 + ... + bn*xn

where b0 is the intercept term, and b1, b2, ..., bn are the coefficients for the predictor variables x1, x2, ..., xn.

The coefficients of the logistic regression model are estimated using maximum likelihood estimation, which seeks to find the values of the coefficients that maximise the probability of the observed data.

Logistic regression is a simple and widely used statistical model that is easy to interpret and implement and can handle multiple predictor variables. It is often used for binary classification tasks but can also be extended to handle multi-class classification tasks. However, it needs to be improved to capture more complex relationships between the predictor and outcome variables.

## Naive Bayes

Naive Bayes is a supervised machine learning algorithm used for classification tasks. It is based on the idea of Bayesian probability, which is a way to represent the uncertainty of an event based on prior knowledge.

In the context of classification, the goal of a Naive Bayes model is to predict the class of a given an example based on the example's features (also known as predictors or independent variables). The model estimates the probability of each class given an example's features and assigns the example to the class with the highest probability.

The "naive" part of the name comes from the assumption that the features of the example are independent of each other, given the class. This assumption is often unrealistic, but it allows the model to be trained and evaluated efficiently.

Several variations of the Naive Bayes algorithm include the Gaussian Naive Bayes, Multinomial Naive Bayes, and Bernoulli Naive Bayes. These variations differ in how they estimate the probabilities of the classes and features.

Naive Bayes is a simple and widely used statistical model that is easy to implement and can handle large amounts of data efficiently. It is often used for text classification tasks, such as spam filtering and sentiment analysis. However, it can be limited in capturing complex relationships between the features and the class.

## Decision tree

A decision tree is a supervised machine-learning algorithm for classification and regression tasks. It is a tree-like model of decisions, where an internal node represents a feature (or attribute), a branch represents a decision based on the value of the quality, and a leaf node represents the final decision or prediction.

The goal of a decision tree is to create a model that can make predictions about the target variable (also known as the dependent variable or class) based on the values of the input variables (also known as the predictor variables or features).

To build a decision tree, the algorithm starts at the root node and splits the data into subsets based on the values of the input variables. It then repeats the process for each child node until it reaches a leaf node that predicts the target variable.

There are several algorithms for building decision trees, including the Classification and Regression Tree (CART) algorithm and the ID3 algorithm. The CART algorithm is used for classification tasks, and the ID3 algorithm is used for regression tasks.

Decision trees are simple, interpretable, and can handle both categorical and numerical data. However, they can be prone to overfitting, especially if the tree is allowed to grow too deep. To prevent overfitting, the tree can be pruned, or the maximum depth of the tree can be limited.

## Random forest algorithm

Random forests are an ensemble machine-learning algorithm used for classification and regression tasks. A costume is a method that combines the predictions of multiple models to make a more accurate and stable prediction.

A random forest is an ensemble of decision trees, where each tree is trained on a different training data sample. The final prediction is made by averaging the predictions of the individual trees. This helps reduce the model's variance and improve the generalisation performance.

The algorithm first generates many decision trees to build a random forest using a bootstrapped training data sample to create a random forest. A bootstrapped sample is a random sample of the training data drawn with replacement. This means that some examples in the piece may be repeated while others may be left out.

In addition to using different training data, the decision trees in a random forest are also trained on a random subset of the features. This helps to decorrelate the trees, making the random forest more robust to correlated features.

Random forests are simple to implement and can handle large amounts of data efficiently. They are often used for many applications, including feature selection, anomaly detection, and classification. However, they can be less interpretable than a single decision tree, as the prediction is based on the average of many decision trees.

## KNN algorithm

K-nearest neighbours (KNN) is a supervised machine learning algorithm for classification and regression tasks. It is based on finding the K training examples closest (in some sense) to the test example and using their class labels or values to make a prediction.

In KNN, the number of neighbours (K) is a hyperparameter that the user specifies. A smaller value of K means that the model will be more sensitive to the influence of individual training examples. In comparison, a more significant matter of K means that the overall distribution of the training data will influence the model.

To classify a test example using KNN, the algorithm first calculates the distance between the test example and each training example using a distance metric, such as Euclidean distance or Manhattan distance. It then selects the K training examples closest to the test example and uses their class labels to predict the class of the test example. For regression tasks, the prediction is made by averaging the values of the K nearest neighbours.

KNN is a simple and intuitive algorithm that is easy to implement. It requires calculating the distance between the test example and all training examples. It is sensitive to the scale of the features, as the scale influences the distance metric. To address these issues, it is common to normalize the features before applying KNN.

KNN is often used for classification tasks, but it can also be used for regression tasks. It is often used for applications such as image classification, anomaly detection, and recommendation systems.

## K means algorithm

K-means is an unsupervised machine-learning algorithm that is used for clustering tasks. It is a method for finding clusters (also known as groups or clusters) in a dataset, where each set is represented by its centroid, which is the mean of the points in the cluster.

In K-means, the user specifies the number of clusters (K) they want to find in the dataset. The algorithm then assigns each example in the dataset to one of the K clusters based on the distance between the sample and the cluster's centroid.

The K-means algorithm iteratively performs two steps:

Reassignment step: In this step, the algorithm calculates the distance between each example and the centroids of the K clusters and assigns each instance to the cluster with the closest centroid.

Update step: In this step, the algorithm calculates the new centroids of the K clusters by taking the mean of the examples in each group.

The algorithm repeats these two steps until the clusters converge, which means that the assignment of examples to groups and the location of the centroids do not change between iterations.

K-means is a simple clustering algorithm that can quickly implement and scale to large datasets. It is exposed to the scale of the features, as the scale influences the distance metric. It can be sensitive to the initial placement of centroids and is prone to finding suboptimal solutions if the actual clusters have a non-convex shape. To address these issues, it is common to normalise the features before applying K-means.