Instagram Marketing Strategy

Instagram is turning out to be one of the most popular social media platforms on the world wide web. An Instagram marketing strategy is an indispensable marketing tool for businesses that want to get…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




DECISION TREE

In this Blog I will be writing about a widely used classification (machine learning) algorithm, that is, Decision Tree.

Here I will explain about what is Decision Tree, Types of Decision Tree Algorithm, How to create a Decision Tree, Applications of Decision Tree, Advantages/Disadvantages and finally I will provide link to my briefly explained Jupyter Notebook on implementation of decision tree algorithm from scratch.

So without any further due lets get started.

Below are some assumptions that we made while using decision tree:

Decision Trees are classified into two types, based on the target variables.

Let’s see what a decision tree looks like, and how they work when a new input is given for prediction.

Below is an image explaining the basic structure of the decision tree. Every tree has a root node, where the inputs are passed through. This root node is further divided into sets of decision nodes where results and observations are conditionally based. The process of dividing a single node into multiple nodes is called splitting. If a node doesn’t split into further nodes, then it’s called a leaf node, or terminal node. A subsection of a decision tree is called a branch or sub-tree (e.g. in the box in the image below).

There is also another concept that is quite opposite to splitting. If there are ever decision rules which can be eliminated, we cut them from the tree. This process is known as pruning, an is useful to minimize the complexity of the algorithm.

In this section, we shall discuss the core algorithms describing how decision trees are created. These algorithms are completely dependent on the target variable, however, these vary from the algorithms used for classification and regression trees.

There are several techniques that are used to decide how to split the given data. The main goal of decision trees is to make the best splits between nodes which will optimally divide the data into the correct categories. To do this, we need to use he right decision rules. The rules are what directly affect the performance of the algorithm.

There are some assumptions that need to be considered before we get started:

Let’s get started with the commonly used techniques to split, and thereby, construct the Decision tree.

When we use a node in a decision tree to partition the training instances into smaller subsets the entropy changes. Information gain is a measure of this change in entropy.
Definition: Suppose S is a set of instances, A is an attribute, Sv is the subset of S with A = v, and Values (A) is the set of all possible values of A, then

Entropy
Entropy is the measure of uncertainty of a random variable, it characterizes the impurity of an arbitrary collection of examples. The higher the entropy more the information content.
Definition: Suppose S is a set of instances, A is an attribute, Sv is the subset of S with A = v, and Values (A) is the set of all possible values of A, then

Example:

Building Decision Tree using Information Gain
The essentials:

Example:
Now, lets draw a Decision Tree for the following data using Information gain.

Training set: 3 features and 2 classes

Here, we have 3 features and 2 output classes.
To build a decision tree using Information gain. We will take each of the feature and calculate the information for each feature.

Split on feature X

Split on feature Y

Split on feature Z

From the above images we can see that the information gain is maximum when we make a split on feature Y. So, for the root node best suited feature is feature Y. Now we can see that while splitting the dataset by feature Y, the child contains pure subset of the target variable. So we don’t need to further split the dataset.

The final tree for the above dataset would be look like this:

2. Gini Index

Example:
Lets consider the dataset in the image below and draw a decision tree using gini index.

In the dataset above there are 5 attributes from which attribute E is the predicting feature which contains 2(Positive & Negative) classes. We have an equal proportion for both the classes.
In Gini Index, we have to choose some random values to categorize each attribute. These values for this dataset are:

Calculating Gini Index for Var A:
Value >= 5: 12
Attribute A >= 5 & class = positive: 5/12
Attribute A >= 5 & class = negative: 7/12
Gini(5, 7) = 1 — [ (5/12)² + (7/12)²] = 0.4860

Value < 5: 4
Attribute A < 5 & class = positive: 3/4
Attribute A < 5 & class = negative: 1/4
Gini(3, 1) = 1 –[(3/4)² + (1/4)²] = 0.375

By adding weight and sum each of the gini indices:
gini(Target, A) = (12/16)*(0.486) + (4/16)*(0.375) = 0.45825

Calculating Gini Index for Var B:
Value >= 3: 12
Attribute B>= 3 & class = positive: 8/12
Attribute B>= 3 & class = negative: 4/12
Gini(5, 7) = 1 — [ (8/12)² + (4/12)²] = 0.4460

Value < 3: 4
Attribute B< 3 & class = positive: 0/4
Attribute B< 3& class = negative: 4/4
Gini(3, 1) = 1 –[(3/4)² + (1/4)²] = 0.375

By adding weight and sum each of the gini indices:
gini(Target, B) = (12/16)*(0.446) + (0/16)*(0) = 0.3345

Decision Tree is one of the basic and widely-used algorithms in the fields of Machine Learning. It’s put into use across different areas in classification and regression modeling. Due to its ability to depict visualized output, one can easily draw insights from the modeling process flow. Here are a few examples wherein Decision Tree could be used,

There are a few pros and cons that come along with the decision trees. Let’s discuss the advantages first. Decision trees take very little time in processing the data when compared to other algorithms. Few preprocessing steps like normalization, transformation, and scaling the data can be skipped. Although there are missing values in the dataset, the performance of the model won’t be affected. A Decision Tree model is intuitive and easy to explain to the technical teams and stakeholders, and can be implemented across several organizations.

Here comes the disadvantages. In decision trees, small changes in the data can cause a large change in the structure of the decision tree that in turn leads to instability. The training time drastically increases, proportional to the size of the dataset. In some cases, the calculations can go complex compared to the other traditional algorithms.

Step 1: Import important libraries and load the data

Step 2: Calculate Entropy for each attribute and select the attribute which has maximum information gain as a root node

Step 3: Built a Decision Tree

Step 4: Make predictions

I tried to provide all the important information on getting started with Decision Tree and its implementation. I hope you will find something useful here. Thank you for reading till the end.

Add a comment

Related posts:

2022 Senate Guide

The Veritas Political Research Model handicaps 14 races as solidly Republican, 11 races as competitive, and 9 races as solidly Democratic. Democrats need to win 5 of 11 competitive races and all 9…

The SX Grants Program

This program will give our community members who want to build products, create content, and innovate new ideas within the SX Network ecosystem a chance to earn $SX for doing so! This program will…

Addressing Challenges with Monorepos

Working with a large Git monorepo can introduce continuous integration challenges, but there are some techniques you and your team can employ to overcome common hurdles. As an example of what a…