# Machine Learning Algorithm

Nowadays taking decisions in minimal time span, predicting all possible ways to leverage business at 10x speed, or saving lives by predicting weather forecasts are based on **Machine Learning Algorithm** (MLA)

In this blog, I have covered the important and mostly used algorithms.

**Note: Every algorithm has a way and reference. So create yours also and you definitely find some out of the box predictions.**

** Machine Learning Algorithm **broadly divided into two categories

- Supervised
- Unsupervised

Lets deep dive into some important and common algorithms

**Supervised Algorithm**

**Linear Regression**

Linear is self-explanatory that is constant or static or structured. Whenever your dataset has structured data like the difference between two numbers is the same then the machine will predict on the basis of difference.

Linear regression is a controlled learning algorithm which attempts by adapting a linear equation to the data to model the relationship between a permanent target variable and one of more independent variables.

For a linear return to be a good choice, a linear relationship between the independent variable(s) and the target variable must be created. Many methods exist to analyse the relationship between variables, such as dispersion patterns and correlation matrix. The plot below, for example, showed a positive correlation between an independent (x-axis) and a dependent (y-axis) variable. As one rises, so does the other.

A linear regression model aims to fit a regression line into the most significant data points describing relationships or connexions. Ordinary-least squares (OLE) are the most common technique. This approach allows the best return line to be formed by reducing the number of squares between the data points and the regression line.

2. **Support Vector Machine**

Support Vector Machine ( SVM) is a managed learning algorithm used mainly for classification tasks, but also for regression.

By drawing a border of judgment, SVM separates groups. The most significant aspect of SVM algorithms is how to establish or decide the decision boundary. Each observation (or data point) is traced in n-dimensional space prior to the creation of the decision boundary. The number of characteristics used is “n.” If, for example, “länge” and “distance” are used to define various cells, findings in a 2-dimensional space will be provided and decision limits will be a line. If three characteristics are used, a 3-dimensional plane is a decision limit.

3. **Naive Bayes Classifier**

Naive Bayes Classifier is a part of simple probabilistic classifiers. It is based on the application of Bayes theorem, with clear (naive) independence assumptions between the features. P(A) is the posterior probability, P(B) is the probability, P(A) is the prior probability class, and P(B) is the prior probability predictor.

4.** Least Square Regression**

You probably learned about linear regression earlier if you know statistics. The least square is a technique for performing linear regression. You can think of linear regression as the job of fitting a straight line through a series of dots. There are several potential ways to do this, and the “ordinary least squares” method goes like this — you can draw a line, and then for each data point, calculate the vertical distance between the point and the line, and add it up; the line suited will be the one where this number of distances is as small as possible.

5. ** Decision Tree**

Highly interpretable classification or regression model that divides data-feature values into branches at decision nodes ( e.g. if a feature is a colour, each color becomes a new branch) before the final decision output is taken.

**Unsupervised Algorithm**

**K-means clustering**

Insert data in such classes (k) that have the same data (as defined by model, not before human) Data with similar characteristics.The K-means algorithm will find the k number of centroids and then assigns the closest cluster to each data points, by keeping the centroids as small as possible.

The **‘means**‘ in the K-means meant for measurement of the data; that is, the centroid finding.

**Gaussian mixture model**

Generalized clustering of k-means, allowing greater versatility of groups’ size and form (clusters). It is used for extracting features from speech data, and they were also extensively used to monitor objects for multiple objects, which predict the number of mixture components and their means in a video sequence in each frame.

**Hierarchical clustering**

Clusters are split into a system of grouping along a hierarchical tree. Can be used by the loyalty card client of the cluster. The hierarchical clusters commence with k = N clusters and then combine them into one cluster for the next two days. The process is repeated until we meet the necessary number of K-clusters to fuse two clusters for k-1. Using the Euclidean distance we can find which clusters to combine with Ward’s (1963) algorithm. Then either the centroid or the medoid will represent the final cluster assignments. Hierarchical classification, which means it is reproducible, is deterministic.

**Recommender system**

Splits clusters into a classification scheme along a hierarchical tree. Can be used for customers of the cluster loyalty card. A recommendation system is a system that can predict the preferences for a number of items for a consumer and suggest the top items. A recommendation system One important explanation of why we need a framework that advises people to use in modern society is because the Internet is prevalent. In the past, people used to buy in a physical shop where there were restricted things available

**PCA/T-SNE**

Used primarily for decreasing the data dimension. The algorithms minimize the number of features to a maximum of 3 or 4.