Traffic Prediction Using Machine Learning

Traffic prediction has always been a challenge for transportation planners and city managers. With the increasing growth of cities and the number of vehicles on the roads, the need for accurate and reliable traffic predictions has become more pressing. In recent years, machine learning has shown great promise in solving this problem.

Traffic prediction involves estimating the future behavior of traffic in a particular area. This information is useful for a variety of purposes, including reducing congestion, optimizing transportation systems, and improving road safety. In the past, traffic prediction has been based on traditional methods such as rule-based models and time-series analysis. However, these methods are often limited in their ability to capture the complexity and variability of traffic patterns.

Machine learning, on the other hand, is well-suited to handle large and complex datasets, making it an ideal tool for traffic prediction. Machine learning algorithms can automatically identify patterns and relationships in traffic data and use these to make predictions about future traffic conditions.

There are several types of machine learning algorithms that can be used for traffic prediction, including regression, time-series analysis, and artificial neural networks. Regression models use historical traffic data to predict future traffic conditions based on past trends. Time-series analysis models look at the patterns in traffic data over time and use these patterns to make predictions. Artificial neural networks, which are modeled on the structure of the human brain, are also commonly used for traffic prediction.

One of the key advantages of machine learning for traffic prediction is its ability to handle large and complex datasets. For example, traffic data may include information on traffic flow, vehicle speed, and traffic density, as well as other factors such as weather conditions, road conditions, and time of day. Machine learning algorithms can process this data and identify the most important factors that influence traffic patterns, making them ideal for traffic prediction.

Another advantage of machine learning for traffic prediction is its ability to adapt to changing conditions. Traditional traffic prediction methods are often limited in their ability to handle changes in traffic patterns, but machine learning algorithms can automatically adjust to these changes and continue to make accurate predictions.

In addition to these advantages, machine learning can also be used to improve the accuracy of traffic predictions by incorporating other sources of data, such as GPS data from vehicles, traffic cameras, and social media. For example, GPS data from vehicles can provide real-time information on traffic conditions, while traffic cameras can provide detailed information on traffic flow and density. Social media data, such as tweets about traffic conditions, can also be used to help improve the accuracy of traffic predictions.

While machine learning has many advantages for traffic prediction, it is not without its challenges. One of the biggest challenges is the quality of the data used for training the machine learning algorithms. For example, traffic data may be incomplete or inaccurate, and this can affect the accuracy of the predictions. Additionally, machine learning algorithms require large amounts of data to be effective, and this can be difficult to obtain in some cases.

Another challenge is the complexity of the algorithms used for traffic prediction. Machine learning algorithms can be difficult to understand and interpret, making it challenging to identify the factors that are driving the predictions. This can make it difficult to make changes to the algorithms or to improve their accuracy.

Now we will be exploring the dataset of four junctions and building a model to predict traffic on the same. This could potentially help in solving the traffic congestion problem by providing a better understanding of traffic patterns that will further help in building an infrastructure to eliminate the problem.

Code Implementation

Importing Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
import tensorflow
from statsmodels.tsa.stattools import adfuller
from sklearn.preprocessing import MinMaxScaler
from tensorflow import keras
from keras import callbacks
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, Flatten, Dense, LSTM, Dropout, GRU, Bidirectional
from tensorflow.keras.optimizers import SGD
import math
from sklearn.metrics import mean_squared_error

import warnings
warnings.filterwarnings("ignore")

Loading the Dataset

dataset = pd.read_csv("traffic.csv")
dataset.head()

Output:

About the Data

This dataset is a compilation of hourly counts of automobiles at four intersections. There are four features in the CSV file:

DateTime
Junctions
Vehicles
ID

The traffic data comes from several time periods since the sensors on each of these intersections were gathering data at different times. Data from several of the intersections were scarce or restricted.

Data Exploration

Feature engineering for EDA
Plotting time series
Parsing dates

dataset["DateTime"]= pd.to_datetime(dataset["DateTime"])
dataset = dataset.drop(["ID"], axis=1) #dropping IDs column
dataset.info()

Output:

# dataframe to be used for EDA
dataframe=dataset.copy()

# Let's plot the Timeseries
colors = [ "#FFD4DB","#BBE7FE","#D3B5E5","#dfe2b6"]
plt.figure(figsize=(20,4),facecolor="#627D78")
Time_series=sns.lineplot(x=dataframe['DateTime'],y="Vehicles",data=dataframe, hue="Junction", palette=colors)
Time_series.set_title("Years of Traffic at Junctions")
Time_series.set_ylabel("Vehicles in Number")
Time_series.set_xlabel("Date")

Output:

Text(0.5, 0, 'Date')

Notable details in the plot above:

Here, it is clear that the first junction is clearly trending higher.
Data for the fourth junction are scarce and only available after 2017.
The aforementioned storyline does not include any mention of the season; thus, in order to learn more about it, we must look into datetime composition.

Feature Engineering

At this stage, We are using DateTime to build a few additional functionalities. Namely:

Year
Month
Date in the given month
Days of week
Hour

# Exploring more features
dataframe["Year"]= dataframe['DateTime'].dt.year
dataframe["Month"]= dataframe['DateTime'].dt.month
dataframe["Date_no"]= dataframe['DateTime'].dt.day
dataframe["Hour"]= dataframe['DateTime'].dt.hour
dataframe["Day"]= dataframe.DateTime.dt.strftime("%A")
dataframe.head()

Output:

Exploratory Data Analysis

The newly formed features are going to be plotted now.

# Let's plot the Timeseries
new_features = [ "Year","Month", "Date_no", "Hour", "Day"]

for i in new_features:
    plt.figure(figsize=(10,2),facecolor="#627D78")
    ax=sns.lineplot(x=dataframe[i],y="Vehicles",data=dataframe, hue="Junction", palette=colors )
    plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

Output:

The plot described above leads to the following conclusions:

With the exception of the fourth junction, all junctions have shown a rising yearly tendency. As was already stated above, the fourth junction contains scant data that doesn't go back more than a year.
We can observe that around June, there is an increase in traffic at the first and second crossroads. This, we assume, may be related to summer vacation and related activities.
There is considerable consistency in the data on a monthly basis across all dates.
We may observe that there are peaks in the morning and evening and a fall in activity throughout the night for a given day. This is what was predicted.
Due to fewer automobiles on the roadways on Sundays than on other days of the week, traffic flows more smoothly. The traffic is consistent from Monday through Friday.

plt.figure(figsize=(12,5),facecolor="#627D78")
count = sns.countplot(data=dataframe, x =dataframe["Year"], hue="Junction", palette=colors)
count.set_title("Years of Traffic at Junctions")
count.set_ylabel("Vehicles in numbers")
count.set_xlabel("Date")

Output:

Text(0.5, 0, 'Date')

The count plot reveals that between 2015 and 2016, there was a rise in the number of automobiles. But, as we only have data for 2017 up to the seventh month, it is inconclusive to conclude the same regarding 2017.

corrmat = dataframe.corr()
plt.subplots(figsize=(10,10),facecolor="#627D78")
sns.heatmap(corrmat,cmap= "Pastel2",annot=True,square=True, )

Output:

Without a doubt, the existent trait has the biggest association.

A pair plan will be used to wrap up our EDA. Any data is represented in an intriguing overall manner.

Output:

Conclusions that We've Reached After This EDA:

Each of the four intersections has a different range of data. Just 2017's data are available for the fourth junction.
The annual trend for Junctions 1, 2, and 3 has varying slopes.
The first junction has a stronger weekly seasonality than the other junctions.

For the aforementioned reasons, we believe junctions should be modified to suit each one's specific requirements.

Data Transformation and Preprocessing

We shall proceed in the following order for this step:

At each junction, make unique frames and chart them
Plotting the series and changing it
Using the Augmented Dickey-Fuller test to determine if converted series are seasonal
Making training and test sets.

# Pivoting dataset from junction
dataframe_junction = dataset.pivot(columns="Junction", index="DateTime")
dataframe_junction.describe()

Output:

# Creating new dataframes
dataframe_1 = dataframe_junction[[('Vehicles', 1)]]
dataframe_2 = dataframe_junction[[('Vehicles', 2)]]
dataframe_3 = dataframe_junction[[('Vehicles', 3)]]
dataframe_4 = dataframe_junction[[('Vehicles', 4)]]
dataframe_4 = dataframe_4.dropna() #For only a few months, Junction 4 has only had minimal data.

# As DFS's data frame contains many indices, its index is lowering level one.
list_dfs = [dataframe_1, dataframe_2, dataframe_3, dataframe_4]
for i in list_dfs:
    i.columns= i.columns.droplevel(level=1)  

# Creates comparison dataframe charts using this function
def Sub_Plots4(dataframe_1, dataframe_2,dataframe_3,dataframe_4,title):
    fig, axes = plt.subplots(4, 1, figsize=(15, 8),facecolor="#627D78", sharey=True)
    fig.suptitle(title)
    #J1
    pl_1=sns.lineplot(ax=axes[0],data=dataframe_1,color=colors[0])
    #pl_1=plt.ylabel()
    axes[0].set(ylabel ="Junction 1")
    #J2
    pl_2=sns.lineplot(ax=axes[1],data=dataframe_2,color=colors[1])
    axes[1].set(ylabel ="Junction 2")
    #J3
    pl_3=sns.lineplot(ax=axes[2],data=dataframe_3,color=colors[2])
    axes[2].set(ylabel ="Junction 3")
    #J4
    pl_4=sns.lineplot(ax=axes[3],data=dataframe_4,color=colors[3])
    axes[3].set(ylabel ="Junction 4")
   
   
# It is displayed to test for stationarity.
Sub_Plots4(dataframe_1.Vehicles, dataframe_2.Vehicles,dataframe_3.Vehicles,dataframe_4.Vehicles,"Transformation of the Dataframe Before")

Output:

If a time series lacks a pattern or seasonality, it is said to be stagnant. Nonetheless, we observed a weekly periodicity and an increased tendency in the EDA over time. It is once again clear from the graphic above that Junctions one and two are trending higher. We will be able to notice the weekly seasonality more clearly if we restrict the span. At this time, we will skip that step and continue with the appropriate dataset transformations.

Steps for Transforming:

Normalizing
Differencing

# Normalize Function
def Normalize(dataframe,column):
    average = dataframe[column].mean()
    stdev = dataframe[column].std()
    df_normalized = (dataframe[column] - average) / stdev
    df_normalized = df_normalized.to_frame()
    return df_normalized, average, stdev

# Differencing Function
def Difference(dataframe,column, interval):
    diff = []
    for i in range(interval, len(dataframe)):
        value = dataframe[column][i] - dataframe[column][i - interval]
        diff.append(value)
    return diff

In light of the aforementioned observations, the following differencing procedure should be used to remove seasonality:

We'll be using the difference in weekly numbers for Junction 1.
The difference of consecutive days is a preferable option for junction two.
The difference between the hourly numbers will be used for Junctions 3 and 4.

# In order to make the series stationary, normalize and differ
dataframe_N1, avg_J1, std_J1 = Normalize(dataframe_1, "Vehicles")
Diff_1 = Difference(dataframe_N1, column="Vehicles", interval=(24*7)) #taking a week's difference
dataframe_N1 = dataframe_N1[24*7:]
dataframe_N1.columns = ["Norm"]
dataframe_N1["Diff"]= Diff_1

dataframe_N2, avg_J2, std_J2 = Normalize(dataframe_2, "Vehicles")
Diff_2 = Difference(dataframe_N2, column="Vehicles", interval=(24)) #taking a day's difference
dataframe_N2 = dataframe_N2[24:]
dataframe_N2.columns = ["Norm"]
dataframe_N2["Diff"]= Diff_2

dataframe_N3, avg_J3, std_J3 = Normalize(dataframe_3, "Vehicles")
Diff_3 = Difference(dataframe_N3, column="Vehicles", interval=1) #taking an hour's difference
dataframe_N3 = dataframe_N3[1:]
dataframe_N3.columns = ["Norm"]
dataframe_N3["Diff"]= Diff_3

dataframe_N4, avg_J4, std_J4 = Normalize(dataframe_4, "Vehicles")
Diff_4 = Difference(dataframe_N4, column="Vehicles", interval=1) #taking an hour's difference
dataframe_N4 = dataframe_N4[1:]
dataframe_N4.columns = ["Norm"]
dataframe_N4["Diff"]= Diff_4

Plots of Transformed Dataframe

Sub_Plots4(dataframe_N1.Diff, dataframe_N2.Diff,dataframe_N3.Diff,dataframe_N4.Diff,"Transformation of the Dataframe After")

Output:

The aforementioned charts appear to be linear. An Augmented Dickey-Fuller test will be run to make sure they are stationary.

# Stationary time series check Improved Dickey-Fuller test
def Stationary_check(dataframe):
    check = adfuller(dataframe.dropna())
    print(f"ADF Statistic: {check[0]}")
    print(f"p-value: {check[1]}")
    print("Critical Values:")
    for key, value in check[4].items():
        print('\t%s: %.3f' % (key, value))
    if check[0] > check[4]["1%"]:
        print("Time Series is Non-Stationary")
    else:
        print("Time Series is Stationary")
 

# examining the series' stationary state

List_df_ND = [ dataframe_N1["Diff"], dataframe_N2["Diff"], dataframe_N3["Diff"], dataframe_N4["Diff"]]
print("Checking the transformed series for stationarity:")
for i in List_df_ND:
    print("\n")
    Stationary_check(i)

Output:

Preparing the data for the neural network :

Splitting the test train sets
Assigning X as features and y as target
Reshaping data for neural net

# Several NA values were produced as a result of differencing using a week's worth of data.
dataframe_J1 = dataframe_N1["Diff"].dropna()
dataframe_J1 = dataframe_J1.to_frame()

dataframe_J2 = dataframe_N2["Diff"].dropna()
dataframe_J2 = dataframe_J2.to_frame()

dataframe_J3 = dataframe_N3["Diff"].dropna()
dataframe_J3 = dataframe_J3.to_frame()

dataframe_J4 = dataframe_N4["Diff"].dropna()
dataframe_J4 = dataframe_J4.to_frame()

# Splitting the dataset
def Split_data(dataframe):
    training_size = int(len(dataframe)*0.90)
    data_len = len(dataframe)
    train, test = dataframe[0:training_size],dataframe[training_size:data_len]
    train, test = train.values.reshape(-1, 1), test.values.reshape(-1, 1)
    return train, test
# Splitting the training and test datasets
Junction1_train, Junction1_test = Split_data(dataframe_J1)
Junction2_train, Junction2_test = Split_data(dataframe_J2)
Junction3_train, Junction3_test = Split_data(dataframe_J3)
Junction4_train, Junction4_test = Split_data(dataframe_J4)

# Target and Feature
def target_and_feature(dataframe):
    end_len = len(dataframe)
    X = []
    y = []
    steps = 32
    for i in range(steps, end_len):
        X.append(dataframe[i - steps:i, 0])
        y.append(dataframe[i, 0])
    X, y = np.array(X), np.array(y)
    return X ,y

# fixing the shape of X_test and X_train
def FeatureFixShape(train, test):
    train = np.reshape(train, (train.shape[0], train.shape[1], 1))
    test = np.reshape(test, (test.shape[0],test.shape[1],1))
    return train, test

# Assigning features and target
X_train_Junction1, y_train_Junction1 = target_and_feature(Junction1_train)
X_test_Junction1, y_test_Junction1 = target_and_feature(Junction1_test)
X_train_Junction1, X_test_Junction1 = FeatureFixShape(X_train_Junction1, X_test_Junction1)

X_train_Junction2, y_train_Junction2 = target_and_feature(Junction2_train)
X_test_Junction2, y_test_Junction2 = target_and_feature(Junction2_test)
X_train_Junction2, X_test_Junction2 = FeatureFixShape(X_train_Junction2, X_test_Junction2)

X_train_Junction3, y_train_Junction3 = target_and_feature(Junction3_train)
X_test_Junction3, y_test_Junction3 = target_and_feature(Junction3_test)
X_train_Junction3, X_test_Junction3 = FeatureFixShape(X_train_Junction3, X_test_Junction3)

X_train_Junction4, y_train_Junction4 = target_and_feature(Junction4_train)
x_test_Junction4, y_test_Junction4 = target_and_feature(Junction4_test)
X_train_Junction4, x_test_Junction4 = FeatureFixShape(X_train_Junction4, x_test_Junction4)

Model Building

We have decided to employ a Gated Recurrent Unit for our project (GRU). We are developing a function in this part that the neural network may use to access and fit the data frames for all four junctions.

#Model for the prediction
def GRU_model(X_Train, y_Train, X_Test):
    early_stopping = callbacks.EarlyStopping(min_delta=0.001,patience=10, restore_best_weights=True)
   
    #The GRU model
    model = Sequential()
    model.add(GRU(units=150, return_sequences=True, input_shape=(X_Train.shape[1],1), activation='tanh'))
    model.add(Dropout(0.2))
    model.add(GRU(units=150, return_sequences=True, input_shape=(X_Train.shape[1],1), activation='tanh'))
    model.add(Dropout(0.2))
    model.add(GRU(units=50, return_sequences=True, input_shape=(X_Train.shape[1],1), activation='tanh'))
    model.add(Dropout(0.2))
    model.add(GRU(units=50, return_sequences=True, input_shape=(X_Train.shape[1],1), activation='tanh'))
    model.add(Dropout(0.2))
   
    model.add(GRU(units=50, input_shape=(X_Train.shape[1],1), activation='tanh'))
    model.add(Dropout(0.2))
    model.add(Dense(units=1))
   
    # Compiling the model
    model.compile(optimizer=SGD(decay=1e-7, momentum=0.9),loss='mean_squared_error')
    model.fit(X_Train,y_Train, epochs=50, batch_size=150,callbacks=[early_stopping])
    pred_GRU= model.predict(X_Test)
    return pred_GRU

# To determine the root mean squared prediction error
def RMSE_Value(test,predicted):
    rmse = math.sqrt(mean_squared_error(test, predicted))
    print("The root mean squared error is {}.".format(rmse))
    return rmse

# Plotting the goal and forecast comparison plot
def PredictionsPlot(test,predicted,m):
    plt.figure(figsize=(12,5),facecolor="#627D78")
    plt.plot(test, color=colors[m],label="True Value",alpha=0.5 )
    plt.plot(predicted, color="#627D78",label="Predicted Values")
    plt.title("GRU Traffic Prediction Vs True values")
    plt.xlabel("DateTime")
    plt.ylabel("Number of Vehicles")
    plt.legend()
    plt.show()

Fitting the Model

We will now fit the four-joint training sets that have been changed to the constructed model and contrast them with the altered test sets.

Plotting the predictions and test set while fitting the first junction

#Predictions For First Junction
PredJ1 = GRU_model(X_train_Junction1,y_train_Junction1,X_test_Junction1)

Output:

#Results for J1
RMSE_J1=RMSE_Value(y_test_Junction1,PredJ1)
PredictionsPlot(y_test_Junction1,PredJ1,0)

Output:

The root mean squared error is 0.245881146563882.

Plotting the predictions and test set while fitting the second junction

#Predictions For Second Junction
PredJ2 = GRU_model(X_train_Junction2,y_test_Junction1,X_test_Junction2)

Output:

#Results for J2
RMSE_J2=RMSE_Value(y_test_Junction2,PredJ2)
PredictionsPlot(y_test_Junction2,PredJ2,1)

Output:

The root mean squared error is 0.5585970393765944.

Plotting the predictions and test set while fitting the third junction

#Predictions For Third Junction
PredJ3 = GRU_model(X_train_Junction3,y_train_Junction3,X_test_Junction3)

Output:

#Results for J3
RMSE_J3=RMSE_Value(y_test_Junction3,PredJ3)
PredictionsPlot(y_test_Junction3,PredJ3,2)

Output:

The root mean squared error is 0.6061366783632264.

Plotting the predictions and test set while fitting the fourth junction

#Predictions For Forth Junction
PredJ4 = GRU_model(X_train_Junction4,y_train_Junction4,x_test_Junction4)

Output:

#Results for J4
RMSE_J4=RMSE_Value(y_test_Junction4,PredJ4)
PredictionsPlot(y_test_Junction4,PredJ4,3)

Output:

The root mean squared error is 1.0241982484501175.

Results of the model

# Set the data in lists to the initial error values of the four junctions.
Junctions = ["Junction1", "Junction2", "Junction3", "Junction4"]
RMSE = [RMSE_J1, RMSE_J2, RMSE_J3, RMSE_J4]
list_of_tuples = list(zip(Junctions, RMSE))
# Creates pandas DataFrame.
Results = pd.DataFrame(list_of_tuples, columns=["Junction", "RMSE"])
Results.style.background_gradient(cmap="Pastel1")  

Output:

Note: The Root Mean Square Error is a very arbitrary performance indicator. As a result, we also include the outcome graphs in this project.

Inversing the Transformation of the Data

In this part, we will reverse the transformations we used to take the seasonality and trends out of the datasets. By carrying out this procedure, the forecasts will return to their previous level of accuracy.

# Functions to inverse transforms and plot comparative plots
# invert differenced forecast
def inverse_difference(last_ob, value):
    inversed = value + last_ob
    return inversed

#Plotting the comparison
def Sub_Plots2(df_1, df_2,title,m):
    fig, axes = plt.subplots(1, 2, figsize=(18,4), sharey=True,facecolor="#627D78")
    fig.suptitle(title)
   
    pl_1=sns.lineplot(ax=axes[0],data=df_1,color=colors[m])
    axes[0].set(ylabel ="Prediction")
   
    pl_2=sns.lineplot(ax=axes[1],data=df_2["Vehicles"],color="#627D78")
    axes[1].set(ylabel ="Orignal")

The first junction's inverse transform

# invert the differenced forecast for Junction 1
recover1 = dataframe_N1.Norm[-1412:-1].to_frame()
recover1["Pred"]= PredJ1
Transform_reverssed_J1 = inverse_difference(recover1.Norm, recover1.Pred).to_frame()
Transform_reverssed_J1.columns = ["Pred_Normed"]
#Invert the normalization J1
Final_J1_Pred = (Transform_reverssed_J1.values* std_J1) + avg_J1
Transform_reverssed_J1["Pred_Final"] =Final_J1_Pred
#Plotting the Predictions with originals
Sub_Plots2(Transform_reverssed_J1["Pred_Final"], dataframe_1[-1412:-1],"Pridictions And Orignals For Junction 1", 0)

Output:

The second junction's inverse transformation

#Invert the differenced J2
recover2 = dataframe_N2.Norm[-1426:-1].to_frame() #len as per the diff
recover2["Pred"]= PredJ2
Transform_reverssed_J2 = inverse_difference(recover2.Norm, recover2.Pred).to_frame()
Transform_reverssed_J2.columns = ["Pred_Normed"]
Final_J2_Pred = (Transform_reverssed_J2.values* std_J2) + avg_J2
Transform_reverssed_J2["Pred_Final"] =Final_J2_Pred
#Plotting the Predictions with originals
Sub_Plots2(Transform_reverssed_J2["Pred_Final"], dataframe_2[-1426:-1],"Pridictions And Orignals For Junction 2", 1)

Output:

The third junction's inverse transform

#Invert the differenced J3
recover3 = dataframe_N3.Norm[-1429:-1].to_frame() #len as per the diff
recover3["Pred"]= PredJ3
Transform_reverssed_J3 = inverse_difference(recover3.Norm, recover3.Pred).to_frame()
Transform_reverssed_J3.columns = ["Pred_Normed"]
#Invert the normalization J3
Final_J3_Pred = (Transform_reverssed_J3.values* std_J3) + avg_J3
Transform_reverssed_J3["Pred_Final"] =Final_J3_Pred
Sub_Plots2(Transform_reverssed_J3["Pred_Final"], dataframe_3[-1429:-1],"Pridictions And Orignals For Junction 3", 2)

Output:

The fourth junction's inverse transformation

#Invert the differenced J4
recover4 = dataframe_N4.Norm[-404:-1].to_frame()  #len as per the test set
recover4["Pred"]= PredJ4
Transform_reverssed_J4 = inverse_difference(recover4.Norm, recover4.Pred).to_frame()
Transform_reverssed_J4.columns = ["Pred_Normed"]
#Invert the normalization J4
Final_J4_Pred = (Transform_reverssed_J4.values* std_J4) + avg_J4
Transform_reverssed_J4["Pred_Final"] =Final_J4_Pred
Sub_Plots2(Transform_reverssed_J4["Pred_Final"], dataframe_4[-404:-1],"Pridictions And Orignals For Junction 4", 3)

Output:

Summary on the Dataset

Here To anticipate the traffic at four crossroads, we trained a GRU Neural network. To create a stationary time series, we employed a normalization and differentiating transform. We used a different strategy for each intersection to make it stationary because the junctions vary in trends and seasonality. We used the root mean squared error as the model's assessment measure. Also, we placed the predictions next to the initial test results. Conclusions are drawn from the data analysis:

Compared to junctions two and three, junction one is seeing a faster increase in the number of cars. Junction Four has very little data. Therefore, we can't draw any conclusions from it.

The Junction one's traffic has a stronger weekly seasonality as well as hourly seasonality. In comparison, other junctions are significantly linear.

Conclusion

In conclusion, traffic prediction using machine learning is an effective solution for addressing traffic congestion in urban areas. With the availability of vast amounts of traffic data, machine learning algorithms can accurately predict traffic flow and congestion patterns in real time. These predictions can be used to optimize traffic flow and improve the overall efficiency of transportation systems. While there are some challenges associated with traffic prediction using machine learning, the potential benefits are significant and can lead to improved transportation systems and reduced economic losses.

Next Topict-SNE in Machine Learning

← prev next →