Fake News Detection Using Machine Learning

In this digital age, fake news is a huge issue considering it hurts real-world communities by disseminating misinformation, destroying reputations, and igniting social unrest.

Fake news can be a result of misinformation, or it can be an intentional attempt to intentionally mislead people. Now it has become harder and harder to recognize whether the news is legitimate news from fake news as social media has grown a lot.

At the same time identifying and rectifying fake news is a significant concern for any news organization, so here comes machine learning, which can help in doing so.

Machine Learning Techniques have shown promising results in detecting fake news with the help of analyzing vast amounts of data, in which it identifies patterns and it provides outcomes that are based on those patterns. Machine Learning can be applied in various ways and fields for the detection of false information.

Strategy for Applying Machine Learning To Detecting Fake News

One strategy is to examine the language used in the news story using natural language processing (NLP) methods. Language patterns that are frequently present in publications that purport to be news can be recognized by NLP algorithms. For instance, false news pieces frequently distort facts, utilize spectacular titles, and employ more emotive language. Machine learning algorithms can determine whether an article is legitimate or fraudulent by examining the language it uses.

Utilizing network analysis is another method for spotting fake news. In this method, the network of social media accounts that are disseminating the news is analyzed by machine learning algorithms. A network of phoney accounts or automated programmes frequently spreads false news pieces. Machine learning algorithms can find patterns that are frequently present in networks of fake news by examining the network of accounts that are disseminating the news.

Finally, phoney news items can be detected by machine learning algorithms using fact-checking databases. Cross-Checking the statements that were made in the news story can be done using databases that contain data which has facts that are already confirmed. The credibility of the news statements can be evaluated through the machine learning algorithm through comparison of the facts that are in the database to news reports.

Large datasets of both actual and false news items are necessary to train machine learning algorithms for fake news identification. These datasets are used to train the algorithms so that they would be capable of recognizing the patterns that are there in fake news. The precision and accuracy of a machine learning algorithm can be enhanced by tuning it according to the feedback given by the user.

The use of machine learning for the detection of fake news is still in its early phases.

Machine Learning has the potential to combat and tackle the problem of fake news, even though it has serious consequences. Detecting False information before it can spread, machine learning can lessen the effect of fake news.

Machine learning algorithms used for fake news detection can be divided into two main categories: supervised and unsupervised learning.

Supervised learning algorithms are trained on labelled datasets, where each news article is labelled as either real or fake. The algorithm learns from the labelled dataset and is then used to classify new news articles as real or fake. Supervised learning algorithms include logistic regression, decision trees, support vector machines, and neural networks.

Unsupervised learning algorithms, on the other hand, do not require labelled datasets. Instead, they use clustering techniques to group news articles into clusters based on their similarities. The algorithm then identifies the characteristics of the clusters that contain fake news articles. Unsupervised learning algorithms include k-means clustering, hierarchical clustering, and association rule learning.

Advantages of Machine Learning For Detecting Fake News

There are several advantages of using machine learning for detecting fake news:

Machine learning algorithms are capable of swiftly and effectively analyzing massive volumes of data. Because there are so many news articles published every day, it is impossible for humans to manually analyze every article. News outlets and social media platforms can easily identify false news because of machine learning algorithms' ability to handle massive volumes of data quickly.
Algorithms that use machine learning can find links and patterns in data that may not be obvious to people. Machine learning algorithms can precisely identify fake news stories by examining the wording, sources, and social media networks linked to news pieces.
In order to stop the spread of incorrect information, social media platforms and news organizations can immediately take action thanks to machine learning algorithms' ability to identify fake news stories in real-time.
Algorithms that use machine learning are able to pick up new information and adapt. Machine learning algorithms may be trained to recognize new trends and identify new sorts of fake news stories as fake news tactics advance.
The process of identifying false news stories may be automated with machine learning algorithms. Humans will have less work to do, as a result, freeing them up to work on things like fact-checking and investigative journalism.
The detection of false news stories may be done at a reasonable price using machine learning algorithms. Once taught, the algorithms may be widely used without incurring a lot of expense.

Limitation Of Machine Learning For Detecting Fake News

Fake news detection using machine learning has its limitations.

Machine Learning algorithms are only on the data that they are trained on. If the dataset is biased, so will the algorithm. So we need to keep in mind that we have to consider the randomness of the datasets that contain news articles from various sources.

Machine learning techniques are capable of identifying fake news, but they are not entirely reliable, as there is always a possibility of misidentification of true news as fake and vice versa. Therefore we need to consider multiple strategies, such as fact-checking, which are necessary to evaluate the authenticity of the news.

Code:

Now, we will try to implement machine learning methods for the detection of fake news. Here we will have two datasets: "Fake.csv" and "True.csv".

One contains fake news, and the other contains true news.

Importing Libraries

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
import re
import string

Importing Dataset

dataframe_fake = pd.read_csv("Fake.csv")
dataframe_true = pd.read_csv("True.csv")

dataframe_fake.head()

Output:

Now we will insert a column in both of the datasets named "class", which will be the target feature. In a fake dataframe, we will give a value of 1 to the class and on the other hand, with true, we will allocate 0.

Note: 0 means it is true news, and 1 means it is a fake news

dataframe_true["class"] = 0
dataframe_true["class"] = 1

# Now, we will look at the shape of both the dataset
dataframe_fake.shape, dataframe_true.shape

Output:

dataframe_fake dataset contains 23481 rows and 5 columns.

dataframe_true dataset contains 21417 rows and 5 columns.

Let's have some manual testing

# We will remove the last 10 rows for manual testing
dataframe_fake_manual_testing = dataframe_fake.tail(10)
for i in range(23480,23470,-1):
    dataframe_fake.drop([i], axis = 0, inplace = True)
   
   
dataframe_true_manual_testing = dataframe_true.tail(10)
for i in range(21416,21406,-1):
    dataframe_true.drop([i], axis = 0, inplace = True)
# Let's have a look at the change in the shape of both the dataset
dataframe_fake.shape, dataframe_true.shape

Output:

If you look here, there is a decrease in the number of rows. It is because we took 10 rows from each dataset for manual testing.

#Inserting the class column in both of the manual testing datasets
dataframe_fake_manual_testing["class"] = 0
dataframe_true_manual_testing["class"] = 1

dataframe_fake_manual_testing.head(10)

Output:

Merging True and Fake Dataframes

Here, we will merge 'dataframe_fake' and 'dataframe_true' to form a new dataset so that we perform the machine learning operations on it.

dataframe_merge = pd.concat([dataframe_fake, dataframe_true], axis =0 )
dataframe_merge.head(10)

Output:

When we have concat the datasets, the rows don't have randomness.

# We will remove the columns that are required for us
dataframe = dataframe_merge.drop(["title", "subject","date"], axis = 1)

# Let's check if there are any null values in the dataset
dataframe.isnull().sum()

Output:

Luckily, we don't have any missing values in our dataset.

As we have only concat the two datasets so it will be true and fake datasets are arranged just after one another. So we need to create randomness in the dataset. We can shuffle the rows of the dataset.

# Here is the random shuffling of the rows in dataset 
dataframe = dataframe.sample(frac = 1)
dataframe.head()

Output:

Here, we have created the randomness in the dataset by shuffling the rows.

If you have noticed the indexing has been messed up, we will look for it.

dataframe.reset_index(inplace = True)
dataframe.drop(["index"], axis = 1, inplace = True)
dataframe.head()

Output:

We have fixed the indexing in the dataset.

Function to Process the Texts

Here we will create a function that can process the texts in the news so that it is understandable for algorithms.

def wordopt(t):
    t = t.lower()
    t = re.sub('\[.*?\]', '', t)
    t = re.sub("\\W"," ",t)
    t = re.sub('https?://\S+|www\.\S+', '', t)
    t = re.sub('<.*?>+', '', t)
    t = re.sub('[%s]' % re.escape(string.punctuation), '', t)
    t = re.sub('\n', '', t)
    t = re.sub('\w*\d\w*', '', t)    
    return t





dataframe["text"] = dataframe["text"].apply(wordopt)

#Now we will define the dependent variable and independent variables
x = dataframe["text"]
y = dataframe["class"]

# Splitting the Dataset into a Training and Testing Set
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)

Convert Text to Vectors

Text to vectors is a technique that involves transforming text data into numerical formats suitable for use by machine learning algorithms. This is significant because machine learning algorithms can only work with numerical inputs, and by converting text into vectors, we can represent textual data in a manner that is simple to analyze and process using these algorithms.

from sklearn.feature_extraction.text import TfidfVectorizer


vectorization = TfidfVectorizer()
xv_train = vectorization.fit_transform(x_train)
xv_test = vectorization.transform(x_test)

Modelling

Creating a mathematical model of a system or dataset involves utilizing a variety of techniques and algorithms. When given new data, the model can predict or take action based on patterns and correlations it has learned from the input data.

Here we will use different machine learning algorithms to train them on the dataset and later use them for the prediction of fake news.

1. Logistic Regression

from sklearn.linear_model import LogisticRegression


LR = LogisticRegression()
LR.fit(xv_train,y_train)

Output:

pred_lr=LR.predict(xv_test)
LR.score(xv_test, y_test)

Output:

The accuracy of the model is quite high, considering it is about 99%.

2. Decision Tree Classifier

from sklearn.tree import DecisionTreeClassifier


DT = DecisionTreeClassifier()
DT.fit(xv_train, y_train)

Output:

pred_dt = DT.predict(xv_test)
DT.score(xv_test, y_test)

Output:

The accuracy Decision Tree Classifier is around 99% which is almost close to perfect.

3. Gradient Boost Classifier

from sklearn.ensemble import GradientBoostingClassifier


GBC = GradientBoostingClassifier(random_state=0)
GBC.fit(xv_train, y_train)

Output:

pred_gbc = GBC.predict(xv_test)
GBC.score(xv_test, y_test)

Output:

The same is the case with Gradient Boost Classifier.

4. Random Forest Classifier

from sklearn.ensemble import RandomForestClassifier


RFC = RandomForestClassifier(random_state=0)
RFC.fit(xv_train, y_train)

Output:

pred_rfc = RFC.predict(xv_test)
RFC.score(xv_test, y_test)

Output:

Random Forest Classifiers' accuracy is also high.

The accuracy of all the machine learning models is almost the same, 99%.

Model Testing

Here we are going to use all four models to check whether they are capable of detecting fake news. We have to check manually.

def output_lable(n):
    if n == 0:
        return "Fake News"
    elif n == 1:
        return "Not A Fake News"
   
def manual_testing(news):
    testing_news = {"text":[news]}
    new_def_test = pd.DataFrame(testing_news)
    new_def_test["text"] = new_def_test["text"].apply(wordopt)
    new_x_test = new_def_test["text"]
    new_xv_test = vectorization.transform(new_x_test)
    pred_LR = LR.predict(new_xv_test)
    pred_DT = DT.predict(new_xv_test)
    pred_GBC = GBC.predict(new_xv_test)
    pred_RFC = RFC.predict(new_xv_test)


    return print("\n\nLR Prediction: {} \nDT Prediction: {} \nGBC Prediction: {} \nRFC Prediction: {}".format(output_lable(pred_LR[0]),                                                                                                       output_lable(pred_DT[0]),
                                                                                                              output_lable(pred_GBC[0]),
                                                                                                              output_lable(pred_RFC[0])))

news = str(input())
manual_testing(news)

Output:

Absolutely right; the prediction is correct.

news = str(input())
manual_testing(news)

Output:

Absolutely right; the prediction is correct.

dataframe_true.head()
news = str(input())
manual_testing(news)

Output:

Absolutely right; the prediction is correct.

The model we have made is producing accurate results, considering the accuracy of all the models, which was almost 99%, so we can say machine learning can be used as a tool for detecting fake news.

Conclusion

Fake news detection using machine learning algorithms is a promising approach to combating fake news. Machine learning algorithms can analyze large datasets and identify patterns that are commonly found in fake news articles. By detecting fake news articles before they are widely disseminated, machine learning algorithms can prevent the harm caused by fake news. However, it is important to use diverse datasets and other techniques, such as fact-checking, to verify the authenticity of news articles.

Next TopicGenetic Programming VS Machine Learning

← prev next →