Forest Cover Type Prediction Using Machine Learning

In the vast and diverse world of forests, each type of vegetation holds its own unique ecological importance. Being able to predict these types of vegetation is incredibly important for ecological conservation, managing natural resources, and deepening our understanding of the natural world. This is where machine learning comes in.

The task at hand involves decoding the forest's secrets - predicting what type of vegetation covers a particular area based on a wide range of environmental characteristics. Machine learning algorithms are like modern-day codebreakers in this endeavor, revealing the hidden patterns concealed within the vast amounts of data collected. These cover types can be anything from tall Spruce/Fir trees to hardy Krummholz, and each of them plays a vital role in the rich biodiversity of the ecosystem. In essence, machine learning helps us unveil the forest's mysteries and contributes to our efforts to protect and manage these vital natural resources.

Data Summary

The research area encompasses four wilderness regions situated in the northern Colorado Roosevelt National Forest. Each data point represents a 30m x 30m patch. The task involves predicting an integer classification for the forest cover type, which can fall into one of seven categories:

Spruce/Fir
Lodgepole Pine
Ponderosa Pine
Cottonwood/Willow
Aspen
Douglas-fir
Krummholz

The training dataset, comprising 15,120 observations, provides both features and the Cover_Type. On the other hand, the test set contains only the features, requiring participants to predict the Cover_Type for each of the 565,892 observations in the test set.

Key Data Fields

Elevation: Measurement in meters denoting height
Aspect: Measurement in degrees azimuth representing direction
Slope: Measurement in degrees indicating a gradient
Horizontal_Distance_To_Hydrology: Measurement indicating the horizontal distance to the nearest water features
Vertical_Distance_To_Hydrology: Measurement indicating the vertical distance to the nearest water features
Horizontal_Distance_To_Roadways: Measurement indicating the horizontal distance to the nearest roadways
Hillshade_9am, Hillshade_Noon, Hillshade_3pm: Measurements representing the hillshade index at 9 am, noon, and 3 pm during the summer solstice
Horizontal_Distance_To_Fire_Points: Measurement indicating the horizontal distance to the nearest wildfire ignition points
Wilderness_Area: Binary columns indicating the presence (1) or absence (0) of the wilderness area
Soil_Type: Binary columns indicating the presence (1) or absence (0) of soil type
Cover_Type: Designation denoting forest cover type (1-7)

The wilderness areas are categorized as follows:

Rawah Wilderness Area
Neota Wilderness Area
Comanche Peak Wilderness Area
Cache la Poudre Wilderness Area

The soil types are:

Cathedral family - Rock outcrop complex, extremely stony.
Vanet - Ratake family complex, is very stony.
Haploborolis - Rock outcrop complex, rubbly.
Ratake family - Rock outcrop complex, rubbly.
Vanet family - Rock outcrop complex complex, rubbly.
Vanet - Wetmore families - Rock outcrop complex, stony.
Gothic family.
Supervisor - Limber families complex.
Troutville family- is very stony.
Bullwark - Catamount families - Rock outcrop complex, rubbly.
Bullwark - Catamount families - Rock land complex, rubbly.
Legault family - Rock land complex, stony.
Catamount Family - Rock land - Bullwark family complex, rubbly.
Pachic Argiborolis - Aquolis complex.
unspecified in the USFS Soil and ELU Survey.
Cryaquolis - Cryoborolis complex.
Gateview family - Cryaquolis complex.
Rogert family, very stony.
Typic Cryaquolis - Borohemists complex.
Typic Cryaquepts - Typic Cryaquolls complex.
Typic Cryaquolls - Leighcan family, till substratum complex.
Leighcan family, till substratum, extremely bouldery.
Leighcan family, till substratum - Typic Cryaquolls complex.
Leighcan family, extremely stony.
Leighcan family, warm, extremely stony.
Granile - Catamount families complex, very stony.
Leighcan family, warm - Rock outcrop complex, extremely stony.
Leighcan family - Rock outcrop complex, extremely stony.
Como - Legault family complex, extremely stony.
Como family - Rock land - Legault family complex, extremely stony.
Leighcan - Catamount families complex, extremely stony.
Catamount family - Rock outcrop - Leighcan family complex, extremely stony.
Leighcan - Catamount families - Rock outcrop complex, extremely stony.
Cryorthents - Rock land complex, extremely stony.
Cryumbrepts - Rock outcrop - Cryaquepts complex.
Bross family - Rock land - Cryumbrepts complex, extremely stony.
Rock outcrop - Cryumbrepts - Cryorthents complex, extremely stony.
Leighcan - Moran families - Cryaquolls complex, extremely stony.
Moran family - Cryorthents - Leighcan family complex, extremely stony.
40 Moran family - Cryorthents - Rock land complex, extremely stony.

Now, we will try to build a model that can predict the cover type of a forest.

Python Code to Predict Forest Cover Type using ML

Importing Libraries

import warnings
warnings.filterwarnings('ignore')
import pandas 
Import numpy

Reading the Dataset

df_dataset = pandas.read_csv("../input/train.csv") 

# Drop the initial 'Id' column as it solely contains serial numbers, which have no relevance in the prediction procedure.
df_dataset = df_dataset.iloc[:,1:]

Statistics of the Dataset

It refers to a summary or description of key numerical characteristics and properties of a dataset.

# Size of the dataframe
print(df_dataset.shape)

Output:

It's clear that there are 15,120 instances, each with 55 attributes. We can say that the data has been successfully loaded since the dimensions align with the data description

# Datatypes of the attributes
print(df_dataset.dtypes)

Output:

Data types of all attributes have been inferred as int64.

# Statistical description
pandas.set_option('display.max_columns', None)
print(df_dataset.describe())

Output:

The following observations were noticed:

No attributes have missing values since the count is consistent at 15,120 for all attributes. Thus, all rows can be utilized.
There are negative values in the 'Vertical_Distance_To_Hydrology,' which makes certain tests like chi-squared inapplicable.
Both 'Wilderness_Area' and 'Soil_Type' have undergone one-hot encoding. Therefore, they could potentially be converted back for specific analyses."Attributes 'Soil_Type7' and 'Soil_Type15' can be excluded as they remain constant.
Not all attributes share the same scales, implying that rescaling and standardization might be necessary for certain algorithms.

# Skewness of the distribution
print(df_dataset.skew())

Output:

Here, Values approaching zero indicate minimal skewness. Also several attributes within 'Soil_Type' exhibit significant skewness. Correcting this skewness could potentially benefit certain algorithms.

# Number of instances belonging to each class
df_dataset.groupby('Cover_Type').size()

Output:

We observe that each class is equally represented, indicating no need for class rebalancing.

Interaction with Dataset

Here we will interact with the dataset with regards to correlation and scatter plot.

Correlation

import numpy

# Correlation indicates the relationship between two attributes.
# For Correlation it is necessary to have continuous data. Hence, ignore Wilderness_Area and Soil_Type as they are binary

# sets the number of features considered
size = 10 

# create a dataframe with only 'size' features
data=df_dataset.iloc[:,:size] 

# get the names of all the columns
cols=data.columns 

# "Computes Pearson coefficients for all possible combinations."
corr_data = data.corr()

# Setting the threshold to choose only attributes with strong correlations
threshold = 0.5

# List of pairs along with correlation above threshold
list_corr = []

#Search for the highly correlated pairs
for i in range(0,size): #for 'size' features
    for j in range(i+1,size): #avoid repetition
        if (corr_data.iloc[i,j] >= threshold and corr_data.iloc[i,j] < 1) or (corr_data.iloc[i,j] < 0 and corr_data.iloc[i,j] <= -threshold):
            list_corr.append([corr_data.iloc[i,j],i,j]) #store correlation and columns index

#Sort to show higher ones first            
s_list_corr = sorted(list_corr,key=lambda x: -abs(x[0]))

#Print correlations and column names
for v,i,j in s_list_corr:
    print ("%s and %s = %.2f" % (cols[i],cols[j],v))

# Significant correlation is noted among the following pairs, suggesting a potential to reduce the feature set through techniques like PCA

Output:

Here, correlations provide insights into how different environmental variables are related to each other,

Scatter Plot

#import plotting libraries
import seaborn as sns
import matplotlib.pyplot as plt

# Scatter plot of only the highly correlated pairs
for v,i,j in s_list_corr:
    sns.pairplot(df_dataset, hue="Cover_Type", size=6, x_vars=cols[i],y_vars=cols[j] )
    plt.show()

Output:

Following are the points that are noted from the above plots:

The plots illustrate the classification of data points into their respective classes. There is some overlap in the distribution of classes within the plots.
The hillshade patterns exhibit appealing ellipsoid shapes when compared to each other.
The 'Aspect' and 'Hillshades' attributes together create a sigmoid pattern.

Horizontal and vertical distances to hydrology display an almost linear relationship.

Visualisation of Data

Now, we will visualize our data with a Violen Plot and further, we will group one hot attribute.

Box & Density Plot

# We will visualize all the attributes using Violin Plot - a combination of box and density plots

#names of all the attributes 
cols = df_dataset.columns

#number of attributes (exclude target)
size = len(cols)-1

# The x-axis has a target attribute to distinguish between classes
x = cols[size]

# The y-axis shows the values of an attribute
y = cols[0:size]

#Plot violin for all attributes
for i in range(0,size):
    sns.violinplot(data=df_dataset,x=x,y=y[i])  
    plt.show()

Output:

Following are the observations that are made through the violin plots:

Elevation exhibits a distinct distribution for most classes. It is highly correlated with the target variable, making it a significant attribute.
Aspect displays multiple normal distributions across several classes.
Horizontal distance to both roads and hydrology follows a similar distribution.
Hillshade at 9 a.m. and 12 p.m. exhibits a left skew.
Hillshade at 3 p.m. follows a normal distribution.
There are numerous zeros in the vertical distance to hydrology.
Wilderness_Area3 does not provide clear class distinction as it lacks values. However, other wilderness areas offer some potential for distinguishing classes.
Certain Soil_Type values, specifically 1, 5, 8, 9, 12, 14, and 18-22, as well as 25-30, and 35-40, contribute to class distinction due to their absence in many classes.

Grouping of One Hot Encoded Attributes

# Group one-hot encoded variables of a category into one single variable

#names of all the columns
cols = df_dataset.columns

#number of rows=r , number of columns=c
r,c = df_dataset.shape

#Create a new dataframe with r rows, one column for each encoded category, and target in the end
data = pandas.DataFrame(index=numpy.arange(0, r),columns=['Wilderness_Area','Soil_Type','Cover_Type'])

#Make an entry in 'data' for each r as category_id, target value
for i in range(0,r):
    w=0;
    s=0;
    # Category1 range
    for j in range(10,14):
        if (df_dataset.iloc[i,j] == 1):
            w=j-9  #category class
            break
    # Category2 range        
    for k in range(14,54):
        if (df_dataset.iloc[i,k] == 1):
            s=k-13 #category class
            break
    #Make an entry in 'data' for each r as category_id, target value        
    data.iloc[i]=[w,s,df_dataset.iloc[i,c-1]]

#Plot for Category1    
sns.countplot(x="Wilderness_Area", hue="Cover_Type", data=data)
plt.show()
#Plot for Category2
plt.rc("figure", figsize=(25, 10))
sns.countplot(x="Soil_Type", hue="Cover_Type", data=data)
plt.show()

Output:

Following are the things that we can conclude from the plot:
There is a substantial presence of WildernessArea_4 in cover_type 4, indicating a strong class distinction.
WildernessArea_3 doesn't provide significant class distinction
SoilType 1-6, 10-14, 17, 22-23, 29-33, 35, and 38-40 contribute significantly to class distinction, as they have notably high counts in some cases.

Dataset Cleaning

Now we will remove the unnecessary columns.

#Removal list initialize
rem = []

#Add constant columns as they don't help in the prediction process
for c in df_dataset.columns:
    if df_dataset[c].std() == 0: #standard deviation is zero
        rem.append(c)

#drop the columns        
df_dataset.drop(rem,axis=1,inplace=True)

print(rem)

Output:

Above are the columns that are dropped.

Dataset Preparation

Here we will do the following operations:

Original
Delete rows or impute values in case of missing
StandardScaler
MinMaxScaler
Normalizer

#get the number of rows and columns
r, c = df_dataset.shape

#get the list of columns
cols = df_dataset.columns
#create an array that has indexes of columns
i_cols = []
for i in range(0,c-1):
    i_cols.append(i)
#array of importance rank of all features  
ranks = []

#Extract only the values
array = df_dataset.values

#Y is the target column, X has the rest
X = array[:,0:(c-1)]
Y = array[:,(c-1)]

#Validation chunk size
val_size = 0.1

#Use a common seed in all experiments so that the same chunk is used for validation
seed = 0

#Split the data into chunks
from sklearn import cross_validation
X_train, val_X, Y_train, val_Y = cross_validation.train_test_split(X, Y, test_size=val_size, random_state=seed)

#Import libraries for data transformations
from sklearn.preprocessing import Imputer
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import Normalizer

#All features
X_all = []
#Additionally we will make a list of subsets
all_X_add =[]

#columns to be dropped
rem = []
#indexes of columns to be dropped
i_rem = []

#List of combinations
comb = []
comb.append("All+1.0")

#Add this version of X to the list 
X_all.append(['Orig','All', X_train,val_X,1.0,cols[:c-1],rem,ranks,i_cols,i_rem])

#point where categorical data begins
size=10

#Standardized
#Apply transform only for non-categorical data
X_temp = StandardScaler().fit_transform(X_train[:,0:size])
val_X_temp = StandardScaler().fit_transform(val_X[:,0:size])
#Concatenate non-categorical data and categorical
X_con = numpy.concatenate((X_temp,X_train[:,size:]),axis=1)
val_X_con = numpy.concatenate((val_X_temp,val_X[:,size:]),axis=1)
#Add this version of X to the list 
X_all.append(['StdSca','All', X_con,val_X_con,1.0,cols,rem,ranks,i_cols,i_rem])

#MinMax
#Apply transform only for non-categorical data
X_temp = MinMaxScaler().fit_transform(X_train[:,0:size])
val_X_temp = MinMaxScaler().fit_transform(val_X[:,0:size])
#Concatenate non-categorical data and categorical
X_con = numpy.concatenate((X_temp,X_train[:,size:]),axis=1)
val_X_con = numpy.concatenate((val_X_temp,val_X[:,size:]),axis=1)
#Add this version of X to the list 
X_all.append(['MinMax', 'All', X_con,val_X_con,1.0,cols,rem,ranks,i_cols,i_rem])

#Normalize
#Apply transform only for non-categorical data
X_temp = Normalizer().fit_transform(X_train[:,0:size])
val_X_temp = Normalizer().fit_transform(val_X[:,0:size])
#Concatenate non-categorical data and categorical
X_con = numpy.concatenate((X_temp,X_train[:,size:]),axis=1)
val_X_con = numpy.concatenate((val_X_temp,val_X[:,size:]),axis=1)
#Add this version of X to the list 
X_all.append(['Norm', 'All', X_con,val_X_con,1.0,cols,rem,ranks,i_cols,i_rem])

#Impute
#Imputer is not used as no data is missing

#List of transformations
trans_list = []

for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in X_all:
    trans_list.append(trans)

Feature Selection

It is a critical step in the data preprocessing phase of machine learning. It involves choosing a subset of the most relevant and informative features (variables or columns) from the dataset while discarding irrelevant or redundant ones.

#Select top 75%,50%,25%
list_ratio = [0.75,0.50,0.25]

#List of feature selection models
feat = []

#List of names of feature selection models
feat_list =[]

#Import the libraries
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier

#Add ExtraTreeClassifiers to the list
n = 'ExTree'
feat_list.append(n)
for val in list_ratio:
    comb.append("%s+%s" % (n,val))
    feat.append([n,val,ExtraTreesClassifier(n_estimators=c-1,max_features=val,n_jobs=-1,random_state=seed)])      

#Add GradientBoostingClassifiers to the list 
n = 'GraBst'
feat_list.append(n)
for val in list_ratio:
    comb.append("%s+%s" % (n,val))
    feat.append([n,val,GradientBoostingClassifier(n_estimators=c-1,max_features=val,random_state=seed)])   

#Add RandomForestClassifiers to the list 
n = 'RndFst'
feat_list.append(n)
for val in list_ratio:
    comb.append("%s+%s" % (n,val))
    feat.append([n,val,RandomForestClassifier(n_estimators=c-1,max_features=val,n_jobs=-1,random_state=seed)])   

#Add XGBClassifier to the list 
n = 'XGB'
feat_list.append(n)
for val in list_ratio:
    comb.append("%s+%s" % (n,val))
    feat.append([n,val,XGBClassifier(n_estimators=c-1,seed=seed)])   
        
#For all transformations of X
for trans,s, X, val_X, d, cols, rem, ra, i_cols, i_rem in X_all:
    #For all feature selection models
    for name,v, model in feat:
        #Train the model against Y
        model.fit(X,Y_train)
        #Combine the importance and index of the column in the array joined
        joined = []
        for i, pred in enumerate(list(model.feature_importances_)):
            joined.append([i,cols[i],pred])
        #Sort in descending order    
        joined_sorted = sorted(joined, key=lambda x: -x[2])
        #Starting point of the columns to be dropped
        rem_start = int((v*(c-1)))
        #List of names of columns selected
        cols_list = []
        #Indexes of columns selected
        i_cols_list = []
        #Ranking of all the columns
        rank_list =[]
        #List of columns not selected
        rem_list = []
        #Indexes of columns not selected
        i_rem_list = []
        #Split the array. Store selected columns in cols_list and remove them in rem_list
        for j, (i, col, x) in enumerate(list(joined_sorted)):
            #Store the rank
            rank_list.append([i,j])
            #Store selected columns in cols_list and indexes in i_cols_list
            if(j < rem_start):
                cols_list.append(col)
                i_cols_list.append(i)
            #Store not selected columns in rem_list and indexes in i_rem_list    
            else:
                rem_list.append(col)
                i_rem_list.append(i)    
        #Sort the rank_list and store only the ranks. Drop the index 
        #Append model name, array, columns selected and columns to be removed to the additional list        
        all_X_add.append([trans,name,X,val_X,v,cols_list,rem_list,[x[1] for x in sorted(rank_list,key=lambda x:x[0])],i_cols_list,i_rem_list])    

#Set figure size
plt.rc("figure", figsize=(25, 10))

#Plot a graph for different feature selectors        
for f_name in feat_list:
    #Array to store the list of combinations
    leg=[]
    fig, ax = plt.subplots()
    #Plot each combination
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in all_X_add:
        if(name==f_name):
            plt.plot(rank_list)
            leg.append(trans+"+"+name+"+%s"% v)
    #Set the tick names to the names of columns
    ax.set_xticks(range(c-1))
    ax.set_xticklabels(cols[:c-1],rotation='vertical')
    #Display the plot
    plt.legend(leg,loc='best')    
    #Plot the rankings of all the features for all combinations
    plt.show()

Output:

Ranking Summary

A ranking summary is a report or listing that provides information about the importance or ranking of individual features (variables) within a dataset. This summary is crucial for understanding the relevance and contribution of each feature to a machine-learning task and for making decisions about which features to include or exclude in a predictive model.

df_rank = pandas.DataFrame(data=[x[7] for x in all_X_add],columns=cols[:c-1])
_ = df_rank.boxplot(rot=90)
# The Below plot summarizes the rankings according to the standard feature selection techniques
#Top ranked attributes are ... first 10 attributes, Wilderness_Area1,4 ...Soil_Type 3,4,10,38-40

Output:

Rank Features Based on Median

Ranking features based on their median values is a straightforward approach to feature selection.

df_rank = pandas.DataFrame(data=[x[7] for x in all_X_add],columns=cols[:c-1])
med = df_rank.median()
print(med)
#Write medians to output file for exploratory study on ML algorithms
with open("median.csv", "w") as subfile:
       subfile.write("Column,Median\n")
       subfile.write(med.to_string())

Output:

Highest Median (Most Variability):

Soil_Type8
Soil_Type25
Wilderness_Area2

Lowest Median (Least Variability):

Elevation
Horizontal_Distance_To_Hydrology
Horizontal_Distance_To_Fire_Points

Now we will select features, based on the median ranking as it looks best case among other feature selection approaches.

#Select top 75%,50%,25%
list_ratio = [0.75,0.50,0.25]

#Median of rankings for each column
unsorted_rank = [0,8,11,4,5,2,5,7.5,9.5,3,8,28.5,14.5,2,35,19.5,12,14,37,25.5,50,44,9,28,20.5,19.5,40,38,20,38,43,35,44,22,24,33,49,42,46,47,27.5,19,31.5,23,28,42,30.5,46,40,12,13,18]

#List of feature selection models
feat = []

#Add Median to the list 
n = 'Median'
for val in list_ratio:
    feat.append([n,val])   

for trans,s, X, val_X, d, cols, rem_cols, ra, i_cols, i_rem in X_all:
    #Create subsets of feature lists based on ranking and list_ratio
    for name, v in feat:
        #Combine the importance and index of the column in the array joined
        joined = []
        for i, pred in enumerate(unsorted_rank):
            joined.append([i,cols[i],pred])
        #Sort in descending order    
        joined_sorted = sorted(joined, key=lambda x: x[2])
        #Starting point of the columns to be dropped
        rem_start = int((v*(c-1)))
        #List of names of columns selected
        cols_list = []
        #Indexes of columns selected
        i_cols_list = []
        #Ranking of all the columns
        rank_list =[]
        #List of columns not selected
        rem_list = []
        #Indexes of columns not selected
        i_rem_list = []
        #Split the array. Store selected columns in cols_list and remove them in rem_list
        for j, (i, col, x) in enumerate(list(joined_sorted)):
            #Store the rank
            rank_list.append([i,j])
            #Store selected columns in cols_list and indexes in i_cols_list
            if(j < rem_start):
                cols_list.append(col)
                i_cols_list.append(i)
            #Store not selected columns in rem_list and indexes in i_rem_list    
            else:
                rem_list.append(col)
                i_rem_list.append(i)    
        #Sort the rank_list and store only the ranks. Drop the index 
        #Append model name, array, columns selected and columns to be removed to the additional list        
        all_X_add.append([trans,name,X,val_X,v,cols_list,rem_list,[x[1] for x in sorted(rank_list,key=lambda x:x[0])],i_cols_list,i_rem_list])

#Import plotting library    
import matplotlib.pyplot as plt    

#Dictionary to store the accuracies for all combinations 
acc = {}

#List of combinations
comb = []

#Append the name of the transformation to trans_list
for trans in trans_list:
    acc[trans]=[]

Model

We will proceed to employ a range of machine-learning algorithms.

01. Linear Discriminant Analysis

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

#Set the base model
model = LinearDiscriminantAnalysis()
algo = "LDA"

##Set figure size
#plt.rc("figure", figsize=(25, 10))

#Accuracy of the model using all features
for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in X_all:
    model.fit(X[:,i_cols_list],Y_train)
    result = model.score(val_X[:,i_cols_list], val_Y)
    acc[trans].append(result)
    #print(trans+"+"+name+"+%d" % (v*(c-1)))
    #print(result)
comb.append("%s+%s of %s" % (algo,"All",1.0))
        
#Accuracy of the model using a subset of features    
for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in all_X_add:
    model.fit(X[:,i_cols_list],Y_train)
    result = model.score(val_X[:,i_cols_list], val_Y)
    acc[trans].append(result)
    #print(trans+"+"+name+"+%d" % (v*(c-1)))
    #print(result)
for v in list_ratio:
    comb.append("%s+%s of %s" % (algo,"Subset",v))

02. Logistic Regression

from sklearn.linear_model import LogisticRegression

C_list = [100]

for C in C_list:
    #Set the base model
    model = LogisticRegression(n_jobs=-1,random_state=seed,C=C)
   
    algo = "LR"

    ##Set figure size
    #plt.rc("figure", figsize=(25, 10))

    #Accuracy of the model using all features
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in X_all:
        model.fit(X[:,i_cols_list],Y_train)
        result = model.score(val_X[:,i_cols_list], val_Y)
        acc[trans].append(result)
        #print(trans+"+"+name+"+%d" % (v*(c-1)))
        #print(result)
    comb.append("%s with C=%s+%s of %s" % (algo,C,"All",1.0))

    #Accuracy of the model using a subset of features    
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in all_X_add:
        model.fit(X[:,i_cols_list],Y_train)
        result = model.score(val_X[:,i_cols_list], val_Y)
        acc[trans].append(result)
        #print(trans+"+"+name+"+%d" % (v*(c-1)))
        #print(result)
    for v in list_ratio:
        comb.append("%s with C=%s+%s of %s" % (algo,C,"Subset",v))

03. KNN

#Evaluation of various combinations of KNN Classifier using all the views

#Import the library
from sklearn.neighbors import KNeighborsClassifier

n_list = [1]

for n_neighbors in n_list:
    #Set the base model
    model = KNeighborsClassifier(n_jobs=-1,n_neighbors=n_neighbors)
   
    algo = "KNN"

    ##Set figure size
    #plt.rc("figure", figsize=(25, 10))

    #Accuracy of the model using all features
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in X_all:
        model.fit(X[:,i_cols_list],Y_train)
        result = model.score(val_X[:,i_cols_list], val_Y)
        acc[trans].append(result)
        #print(trans+"+"+name+"+%d" % (v*(c-1)))
        #print(result)
    comb.append("%s with n=%s+%s of %s" % (algo,n_neighbors,"All",1.0))

    #Accuracy of the model using a subset of features    
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in all_X_add:
        model.fit(X[:,i_cols_list],Y_train)
        result = model.score(val_X[:,i_cols_list], val_Y)
        acc[trans].append(result)
        #print(trans+"+"+name+"+%d" % (v*(c-1)))
        #print(result)
    for v in list_ratio:
        comb.append("%s with n=%s+%s of %s" % (algo,n_neighbors,"Subset",v))

04. Naive Bayes

#Evaluation of various combinations of Naive Bayes using all the views

#Import the library
from sklearn.naive_bayes import GaussianNB

#Set the base model
model = GaussianNB()
algo = "NB"

##Set figure size
#plt.rc("figure", figsize=(25, 10))

#Accuracy of the model using all features
for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in X_all:
    model.fit(X[:,i_cols_list],Y_train)
    result = model.score(val_X[:,i_cols_list], val_Y)
    acc[trans].append(result)
    #print(trans+"+"+name+"+%d" % (v*(c-1)))
    #print(result)
comb.append("%s+%s of %s" % (algo,"All",1.0))
        
#Accuracy of the model using a subset of features    
for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in all_X_add:
    model.fit(X[:,i_cols_list],Y_train)
    result = model.score(val_X[:,i_cols_list], val_Y)
    acc[trans].append(result)
    #print(trans+"+"+name+"+%d" % (v*(c-1)))
    #print(result)
for v in list_ratio:
    comb.append("%s+%s of %s" % (algo,"Subset",v))

05. Decision Tree Classifier

#Evaluation of various combinations of CART using all the views

#Import the library
from sklearn.tree import DecisionTreeClassifier

d_list = [13]

for max_depth in d_list:
    #Set the base model
    model = DecisionTreeClassifier(random_state=seed,max_depth=max_depth)
   
    algo = "CART"

    #Set figure size
    plt.rc("figure", figsize=(15, 10))

    #Accuracy of the model using all features
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in X_all:
        model.fit(X[:,i_cols_list],Y_train)
        result = model.score(val_X[:,i_cols_list], val_Y)
        acc[trans].append(result)
        #print(trans+"+"+name+"+%d" % (v*(c-1)))
        #print(result)
    comb.append("%s with d=%s+%s of %s" % (algo,max_depth,"All",1.0))

    #Accuracy of the model using a subset of features    
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in all_X_add:
        model.fit(X[:,i_cols_list],Y_train)
        result = model.score(val_X[:,i_cols_list], val_Y)
        acc[trans].append(result)
        #print(trans+"+"+name+"+%d" % (v*(c-1)))
        #print(result)
    for v in list_ratio:
        comb.append("%s with d=%s+%s of %s" % (algo,max_depth,"Subset",v))

06. Support Vector Machine

#Evaluation of various combinations of SVM using all the views

#Import the library
from sklearn.svm import SVC

c_list = [10]

for C in c_list:
    #Set the base model
    model = SVC(random_state=seed,C=C)

    algo = "SVM"

    #Set figure size
    #plt.rc("figure", figsize=(15, 10))

    #Accuracy of the model using all features
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in X_all:
        model.fit(X[:,i_cols_list],Y_train)
        result = model.score(val_X[:,i_cols_list], val_Y)
        acc[trans].append(result)
        #print(trans+"+"+name+"+%d" % (v*(c-1)))
        #print(result)
    comb.append("%s with C=%s+%s of %s" % (algo,C,"All",1.0))

07. Bagged Decision Tree

#Evaluation of various combinations of Bagged Decision Trees using all the views

#Import the library
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

#Base estimator
base_estimator = DecisionTreeClassifier(random_state=seed,max_depth=13)

n_list = [100]

for n_estimators in n_list:
    #Set the base model
    model = BaggingClassifier(n_jobs=-1,base_estimator=base_estimator, n_estimators=n_estimators, random_state=seed)
   
    algo = "Bag"

    #Set figure size
    plt.rc("figure", figsize=(20, 10))

    #Accuracy of the model using all features
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in X_all:
        model.fit(X[:,i_cols_list],Y_train)
        result = model.score(val_X[:,i_cols_list], val_Y)
        acc[trans].append(result)
        #print(trans+"+"+name+"+%d" % (v*(c-1)))
        #print(result)
    comb.append("%s with n=%s+%s of %s" % (algo,n_estimators,"All",1.0))

    #Accuracy of the model using a subset of features    
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in all_X_add:
        model.fit(X[:,i_cols_list],Y_train)
        result = model.score(val_X[:,i_cols_list], val_Y)
        acc[trans].append(result)
        #print(trans+"+"+name+"+%d" % (v*(c-1)))
        #print(result)
    for v in list_ratio:
        comb.append("%s with n=%s+%s of %s" % (algo,n_estimators,"Subset",v))

08. Random Forest Classifier

#Evaluation of various combinations of Random Forest using all the views

#Import the library
from sklearn.ensemble import RandomForestClassifier

n_list = [100]

for n_estimators in n_list:
    #Set the base model
    model = RandomForestClassifier(n_jobs=-1,n_estimators=n_estimators, random_state=seed)
   
    algo = "RF"

    #Set figure size
    plt.rc("figure", figsize=(20, 10))

    #Accuracy of the model using all features
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in X_all:
        model.fit(X[:,i_cols_list],Y_train)
        result = model.score(val_X[:,i_cols_list], val_Y)
        acc[trans].append(result)
        #print(trans+"+"+name+"+%d" % (v*(c-1)))
        #print(result)
    comb.append("%s with n=%s+%s of %s" % (algo,n_estimators,"All",1.0))

    #Accuracy of the model using a subset of features    
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in all_X_add:
        model.fit(X[:,i_cols_list],Y_train)
        result = model.score(val_X[:,i_cols_list], val_Y)
        acc[trans].append(result)
        #print(trans+"+"+name+"+%d" % (v*(c-1)))
        #print(result)
    for v in list_ratio:
        comb.append("%s with n=%s+%s of %s" % (algo,n_estimators,"Subset",v))

09. Extra Tree Bagging

from sklearn.ensemble import ExtraTreesClassifier

n_list = [100]

for n_estimators in n_list:
    #Set the base model
    model = ExtraTreesClassifier(n_jobs=-1,n_estimators=n_estimators, random_state=seed)
   
    algo = "ET"

    #Set figure size
    plt.rc("figure", figsize=(20, 10))

    #Accuracy of the model using all features
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in X_all:
        model.fit(X[:,i_cols_list],Y_train)
        result = model.score(val_X[:,i_cols_list], val_Y)
        acc[trans].append(result)
        comb.append("%s with n=%s+%s of %s" % (algo,n_estimators,"All",1.0))

    #Accuracy of the model using a subset of features    
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in all_X_add:
        model.fit(X[:,i_cols_list],Y_train)
        result = model.score(val_X[:,i_cols_list], val_Y)
        acc[trans].append(result)
    for v in list_ratio:
        comb.append("%s with n=%s+%s of %s" % (algo,n_estimators,"Subset",v))

10.AdaBoost (Boosting)

from sklearn.ensemble import AdaBoostClassifier

n_list = [100]

for n_estimators in n_list:
    #Set the base model
    model = AdaBoostClassifier(n_estimators=n_estimators, random_state=seed)
   
    algo = "Ada"

    #Set figure size
    plt.rc("figure", figsize=(20, 10))

    #Accuracy of the model using all features
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in X_all:
        model.fit(X[:,i_cols_list],Y_train)
        result = model.score(val_X[:,i_cols_list], val_Y)
        acc[trans].append(result)
     comb.append("%s with n=%s+%s of %s" % (algo,n_estimators,"All",1.0))

    #Accuracy of the model using a subset of features    
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in all_X_add:
        model.fit(X[:,i_cols_list],Y_train)
        result = model.score(val_X[:,i_cols_list], val_Y)
        acc[trans].append(result)

    for v in list_ratio:
        comb.append("%s with n=%s+%s of %s" % (algo,n_estimators,"Subset",v))

11. Gradient Boosting Classifier

from sklearn.ensemble import GradientBoostingClassifier

d_list = [9]

for max_depth in d_list:
    #Set the base model
    model = GradientBoostingClassifier(max_depth=max_depth, random_state=seed)
   
    algo = "SGB"

    #Set figure size
    plt.rc("figure", figsize=(20, 10))

    #Accuracy of the model using all features
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in X_all:
        model.fit(X[:,i_cols_list],Y_train)
        result = model.score(val_X[:,i_cols_list], val_Y)
        acc[trans].append(result)
    comb.append("%s with d=%s+%s of %s" % (algo,max_depth,"All",1.0))

12. Voting Classifier

from sklearn.ensemble import VotingClassifier

estimators_list =[]

estimators = []
model_01 = ExtraTreesClassifier(n_jobs=-1,n_estimators=100, random_state=seed)
estimators.append(('et', model_01))
model_02 = RandomForestClassifier(n_jobs=-1,n_estimators=100, random_state=seed)
estimators.append(('rf', model_02))
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
base_estimator = DecisionTreeClassifier(random_state=seed,max_depth=13)
model3 = BaggingClassifier(n_jobs=-1,base_estimator=base_estimator, n_estimators=100, random_state=seed)
estimators.append(('bag', model3))

estimators_list.append(['Voting',estimators])

for name, estimators in estimators_list:
    #Set the base model
    model = VotingClassifier(estimators=estimators, n_jobs=-1)
   
    algo = name

    #Set figure size
    plt.rc("figure", figsize=(20, 10))

    #Accuracy of the model using all features
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in X_all:
        model.fit(X[:,i_cols_list],Y_train)
        result = model.score(val_X[:,i_cols_list], val_Y)
        acc[trans].append(result)
        #print(trans+"+"+name+"+%d" % (v*(c-1)))
        #print(result)
    comb.append("%s+%s of %s" % (algo,"All",1.0))

    #Accuracy of the model using a subset of features    
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in all_X_add:
        model.fit(X[:,i_cols_list],Y_train)
        result = model.score(val_X[:,i_cols_list], val_Y)
        acc[trans].append(result)
    for v in list_ratio:
        comb.append("%s+%s of %s" % (algo,"Subset",v))

13. XGBoost

from xgboost import XGBClassifier

n_list = [300]

for n_estimators in n_list:
    #Set the base model
    model = XGBClassifier(n_estimators=n_estimators, seed=seed,subsample=0.25)
   
    algo = "XGB"

    #Set figure size
    plt.rc("figure", figsize=(20, 10))

    #Accuracy of the model using all features
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in X_all:
        model.fit(X[:,i_cols_list],Y_train)
        result = model.score(val_X[:,i_cols_list], val_Y)
        acc[trans].append(result)
    comb.append("%s with n=%s+%s of %s" % (algo,n_estimators,"All",1.0))

    #Accuracy of the model using a subset of features    
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in all_X_add:
        model.fit(X[:,i_cols_list],Y_train)
        result = model.score(val_X[:,i_cols_list], val_Y)
        acc[trans].append(result)

    for v in list_ratio:
        comb.append("%s with n=%s+%s of %s" % (algo,n_estimators,"Subset",v))

Model Evaluation

#Evaluation of baseline model of MLP using all the views

#Import libraries for deep learning
from keras.wrappers.scikit_learn import KerasClassifier
from keras.models import Sequential
from keras.layers import Dense

#Import libraries for encoding
from keras.utils import np_utils
from sklearn.preprocessing import LabelEncoder

#no. of output classes
y = 7

#random state
numpy.random.seed(seed)

# one hot encode class values
encoder = LabelEncoder()
Y_train_en = encoder.fit_transform(Y_train)
Y_train_hot = np_utils.to_categorical(Y_train_en,y) 
val_Y_en = encoder.fit_transform(val_Y)
val_Y_hot = np_utils.to_categorical(val_Y_en,y) 


# define baseline model
def baseline(v):
     # create model
     model = Sequential()
     model.add(Dense(v*(c-1), input_dim=v*(c-1), init='normal', activation='relu'))
     model.add(Dense(y, init='normal', activation='sigmoid'))
     # Compile model
     model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
     return model

# define a smaller model
def smaller(v):
 # create model
 model = Sequential()
 model.add(Dense(v*(c-1)/2, input_dim=v*(c-1), init='normal', activation='relu'))
 model.add(Dense(y, init='normal', activation='sigmoid'))
 # Compile model
 model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
 return model

# define a deeper model
def deeper(v):
 # create model
 model = Sequential()
 model.add(Dense(v*(c-1), input_dim=v*(c-1), init='normal', activation='relu'))
 model.add(Dense(v*(c-1)/2, init='normal', activation='relu'))
 model.add(Dense(y, init='normal', activation='sigmoid'))
 # Compile model
 model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
 return model

# Optimize using dropout and decay
from keras.optimizers import SGD
from keras.layers import Dropout
from keras.constraints import maxnorm

def dropout(v):
    #create model
    model = Sequential()
    model.add(Dense(v*(c-1), input_dim=v*(c-1), init='normal', activation='relu',constraint_W=maxnorm(3)))
    model.add(Dropout(0.2))
    model.add(Dense(v*(c-1)/2, init='normal', activation='relu', constraint_W=maxnorm(3)))
    model.add(Dropout(0.2))
    model.add(Dense(y, init='normal', activation='sigmoid'))
    # Compile model
    sgd = SGD(lr=0.1,momentum=0.9,decay=0.0,nesterov=False)
    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
    return model

# define decay model
def decay(v):
    # create model
    model = Sequential()
    model.add(Dense(v*(c-1), input_dim=v*(c-1), init='normal', activation='relu'))
    model.add(Dense(y, init='normal', activation='sigmoid'))
    # Compile model
    sgd = SGD(lr=0.1,momentum=0.8,decay=0.01,nesterov=False)
    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
    return model
    
est_list = [('MLP',baseline),('smaller',smaller),('deeper',deeper),('dropout',dropout),('decay',decay)]

for name, est in est_list:
 
    algo = name

    #Set figure size
    plt.rc("figure", figsize=(20, 10))

    #Accuracy of the model using all features
    for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in X_all:
        model = KerasClassifier(build_fn=est, v=v, nb_epoch=10, verbose=0)
        model.fit(X[:,i_cols_list],Y_train_hot)
        result = model.score(val_X[:,i_cols_list], val_Y_hot)
        acc[trans].append(result)
    #    print(trans+"+"+name+"+%d" % (v*(c-1)))
    #    print(result)
    comb.append("%s+%s of %s" % (algo,"All",1.0))

    ##Accuracy of the model using a subset of features    
    #for trans,name,X,val_X,v,cols_list,rem_list,rank_list,i_cols_list,i_rem_list in all_X_add:
    #    model = KerasClassifier(build_fn=est, v=v, nb_epoch=10, verbose=0)
    #    model.fit(X[:,i_cols_list],Y_train_hot)
    #    result = model.score(val_X[:,i_cols_list], val_Y_hot)
    #    acc[trans].append(result)
    #    print(trans+"+"+name+"+%d" % (v*(c-1)))
    #    print(result)
    #for v in list_ratio:
    #    comb.append("%s+%s of %s" % (algo,"Subset",v))

#Plot the accuracies of all combinations
fig, ax = plt.subplots()
#Plot each transformation
for trans in trans_list:
        plt.plot(acc[trans])
#Set the tick names to names of combinations
ax.set_xticks(range(len(comb)))
ax.set_xticklabels(comb,rotation='vertical')
#Display the plot
plt.legend(trans_list,loc='best')    
#Plot the accuracy for all combinations
plt.show()   

Output:

Following are the observations that are made through the model evaluation:

Linear Discriminant Analysis: The highest estimated performance stands at 65%, achieved when utilizing all features without any transformations. However, the performance of MinMax scaling and Normalizer techniques is notably subpar.
Logistic Regression: The highest estimated performance near 67% is attained using Logistic Regression (LR) with a value of C equal to 100, considering all attributes, and applying standardization to the data. Furthermore, performance tends to enhance as the value of C increases. Conversely, the performance of Normalizer and MinMax Scaler methods is generally unsatisfactory
KNN: The optimal estimated performance hovers around 86% when n_neighbors is set to 1, and the data is normalized.
Naive Bayes: The highest estimated performance is approximately 64%. The original dataset, even with only a 50% subset, outperforms all variations of Naive Bayes (NB) transformations.
Decision Tree Classifier: The top estimated performance is nearly 79%, achieved when the maximum depth is set to 13 and when using the original dataset.
SVM: The training time is significantly longer compared to other algorithms. The performance is notably inadequate for the original dataset, underscoring the significance of data transformation. The optimal estimated performance is around 77% when C is set to 10 and when using StandardScaler with a subset of 0.25.
Bagged Decision Tree: The highest estimated performance is nearly 82%, achieved with 100 n_estimators when using the original dataset.
Random Forest: The top estimated performance reaches almost 85% with 100 n_estimators.
Extra Trees: The highest estimated performance approaches 88% with 100 n_estimators, StandardScaler, and a subset of 0.75.
AdaBoost: The top estimated performance is approximately 38% with 100 n_estimators.
Gradient Boost: The training time is excessively long. The optimal estimated performance nears 86% when the depth is set to 7.
Voting: The highest estimated performance approaches 86%.
XGBoost: The top estimated performance is nearly 80% when using 300 n_estimators, a sub_sample of 0.25, and a subset of 0.75.

KNN, Voting, Extra Trees, and Random Forest algorithms have the highest performance for the prediction of Forest Cover type, from this, we can use any of these algorithms for future prediction of forest covers.

Conclusion

Forest cover type prediction using machine learning is a vital tool for the preservation and sustainable management of our forests. It enables us to make informed decisions, protect biodiversity, and ensure the longevity of these critical ecosystems. As technology and data continue to advance, so too will our ability to understand and protect the forests that are essential to life on Earth.

Next TopicAda Boost algorithm in Machine Learning

← prev next →