Real-Time Data Analysis from Social Media Data in Python

This tutorial outline:

Local and universal cognitive processes
Enhancing the output
Identifying common trends
Delving further into the hottest trend
Frequency analysis
Trend--related activity
A table that may convey a thousand words
An analysis of the languages employed
Closing remarks - summary

Global and Local Patterns

Even though we may not be Twitter enthusiasts, we must realize that it significantly impacts the globe. In addition to being rich in insights, Twitter data offers near real-time monitoring of Twitter storms. This implies that we may pick up on the global thinking and emotion waves as they emerge.

As with any location where there are riches, security personnel on Twitter are preventing us from immediately accessing the data to use their application programming interfaces (APIs) for data collecting; several authentication procedures must be followed. We already have a green--pass from security because our objective today is to learn how to extract insights from data. In the datasets folder, where our data is available, Twitter offers local and global trends. Let's load and examine data for subjects trending globally and in us at the time of the query - an example of a JSON response from a call to Twitter's GET trends /place API.

Social Media Dataset

Importing the appropriate Python libraries and using the right dataset will allow us to begin Real-Time data analysis from social media data Using Python

. Main table (for reference only):

ScreenName	Text	Followers	favorites
katyperry	RT @ katyperry: Sure, the Mueller report is out, but @ lildickymessages' "Earth" will be, too, tonight. Could you not say I never tried to save? - - - - - - - - - - - - - -	107195569	10557
katyperry		107195568	10556
The Ellen Show	RT @ TheEllenShow: .@ lildickymessages, @ justinbieber, @ MileyCyrus, @ katyperry, @ ArianaGrande, and more, all in one music video. My head might - - - - - - - - - - - - - -	77474846	10086
Leo Di	RT @ LeoDi: The @ didn't partners that will benefit from this collaboration include @ SharkRayFund, @ SolutionsProj, @ GreengrantsFun- - - - - - - - - - - - - -	18988898	78137
Leo Di	RT @ LeoDi: Thank you to @ lildickymessages and all the artists that came together to make this happen. Net profits from the song, video,…	18988898	416018
halsey	RT @ halsey: 🐯🐯🐯 meeeeeeeow 👅 #WeLikeTheEarth https: / / t.co / JJr6vmKG7U	10564844	69684
halsey		10564841	69684
Scooter braun	RT @ scooterbraun: Watch #EARTH NOW! #WeLikeTheEarth - - - - - - - - - - - - - -	3885179	14635
Spotify	RT @ Spotify: This is epic. @ lildickymessages got @ justinbieber, @ arianagrande, @ halsey, @ sanbenito, @ Sheeran, @ SnoopDogg, @ Shawn Mendes, - - - - - - - - - - - - - -	4973477	437459
TomHall	RT @ TomHall: 🐆 Look Out! 🐆 #Cheetah @ LeoDi #FridayFeeling #WeLikeTheEarth https: / / t.co / w1O1NyTbwX- - - - - - - - - - - - - -	590841	454
thingy	RT @ thingy: / rat / YO GUYS LISTEN TO THIS SONGGGGG #WeLikeTheEarth https: / / t.co / trUgRG7QBH- - - - - - - - - - - - - -	445977	43
SB__Projects	RT @ SB__Projects: #WeLikeTheEarth out now with @ lildickymessages, @ justinbieber, @ ArianaGrande, @ zacbrownband, and 45+ of your other faves htt…	98418	730

Reading the Data

Source Code Snippet

import json
# Load W W__trends and us__trends11 data into the the given variables respectivelyW W__trends = json.loads(open(' /content /W WTrends.json').read())us__trends11 = json.loads(open(' /content /usTrends.json').read())

Prettifying the expected output

We needed help--to--read data! Fortunately, we can use Jason.dumps() function to format it into a likely JSON string.

Source Code Snippet

# Pretty--printing the results. First, W W and then our trends.
print("W W trends:", W W__trends)print(" \ n", "us trends:", us__trends11)

Output:

W W trends: [{'trends': [{'name': '#BeratKandili', 'url': 'http: / /twitter.com /search?q= % 43BeratKandili', 'promoted__content': None, 'query': ' % 43BeratKandili', 'tweet__volume': 46373}, {'name': '#GoodFriday', 'url': 'http: / /twitter.com /search?q= % 43GoodFriday', 'promoted__content': None, 'query': ' % 43GoodFriday', 'tweet__volume': 81891}, {'name': '#WeLikeTheEarth', 'url': 'http: / /twitter.com /search?q= % 43WeLikeTheEarth', 'promoted__content': None, 'query': ' % 43WeLikeTheEarth', 'tweet__volume': 159698}, {'name': '#195TLdenTTVeril: None, 'query': ' % E6 % B1 % A0 % E8 % A4 % 8B % E3 % 81 % AE % E4 % BA % 8B % E6 % 95 % 85', 'tweet__volume': 34381}, {'name': '????', 'url': 'http: / /twitter.com /search?q= % E3 % 83 % 97 % E3 % 83 % AA % E3 % 84 % A6 % E3 % 84 % B9', 'promoted__content': None, 'query': ' % E3 % 83 % 97 % E3 % 83 % AA % E3 % 84 % A6 % E3 % 84 % B9', 'tweet__volume': 44944}, {'name': 'Hemant Karkare', 'url': 'http: / /twitter.com /search?q= % 44Hemant+Karkare % 44', 'promoted__content': None, 'query': ' % 44Hemant+Karkare % 44', 'tweet__volume': 44067}}]

Finding the similar trends

We can see from the attractively printed results (the output of the preceding task) that:

The name of the trending subject, the query parameter that may be used to search for the subject matter on Twitter--Search, the search query URL, and the volume of messages over the last 44 hours, if available, are all included in our array of trend objects. The trends are updated every five minutes.
The hashtags #BeratKandili, #GoodFriday, and #WeLikeTheEarth were popular on W W during the search.
"tweet__volume" indicates that among the three, #WeLikeTheEarth was the most commonly used popular.
"Tweet__volume" is not used to order the results.
several tendencies are exclusive to us.

The two trends may be quickly skimmed over to identify shared patterns, but let's avoid making a "manual" effort. We may loop through the two tendencies objects, turn the lists of individuals to sets, and then execute the intersection function to obtain the names shared by the two sets using Python's set data structure.

Source Code Snippet

# Extracting all the W W trend names from W W__trends
world__trend11 = set([trend['name'] for trend in W W__trends[0]['trends']])# Extracting all the us trend names from us__trends11
us__trends11 = set([trend['name'] for trend in us__trends11[0]['trends']])# Getting the intersection of the two sets of trends
common__trends11 = world__trend11.intersection(us__trends11)# Inspecting the data
print(world__trend11, " \ n")
print(us__trends11, " \ n")
print (len(common__trends11), "common trends:", common__trends11)

Output:

{'# BLACKPINKxCorden', '東京・池袋衝突事故', '# ViernesSanto', '# Karfreitag', '# AFLNorthDons', '# HardikPatel', '# NRLBulldogsSouths', 'Lyra McKee', '# DuyguAsena', '�rg�tde?il arkada?grubu', 'Shiv Sena', '# يوم__الجمعه', 'プリウス', '# TheJudasInMyLife', '# ConCalmaRemix', '# DragRace', '重体の女性と女児', '# WeLikeTheEarth', 'Derry', 'Lil Dicky', '# ShivSena', 'Derrick White', 'Priyanka Chaturvedi', '# KpuJanganCurang', '# BeratKandili', '# Hay?rl?Cumalar', '# HanumanJayanti', '# CHIvLIO', '# Jersey', '# DinahJane1', '# JunquerasACN', '브이알', 
{'# BLACKPINKxCorden', 'Seth Abramson', '# WorldofWarcraftMains', '# MakeAMovieSensual', '# AFLNorthDons', 'Kevin Durant', '# WGAMIX', '# NRLBulldogsSouths', '# GSWvsLAC', 'Lyra McKee', 'Mike Anderson', 'Oshie', '# WhatStopsYouFromGoingHome', '# FridayFeeling', '# GossipShouldBe', '# StarTrekDiscovery', 'David Fletcher', '# TimeToImpeach', 'Tomas Hertl', 'Silky', '# ConCalmaRemix', '"Earth"', '# TheLegendOfVoxMachina', '# DragRace', '# LilDicky', 'Derry', '# WeLikeTheEarth', 'Lil Dicky', '# MyInnerDemonSaid', 'WE LIKE THE EARTH', 'Derrick White', 'Yvie', '# RPDR', 'Gallant', 'Lone Wolf and Cub', '# DinahJane1', '# HustleAndSoul', '# fridaymotivation', '# MyDrunkUncleSays', '# CUZILIKEYOU', 'Shy Glizzy', 'George Conway', 'Servais', '# CriticalRoleSpoilers', '# WeirdDateStories', '# DontChangeOutNow', '# Earth', 'Kazuo Koike', '# rupaulsdragrace', 'Game 6'} 

11 common trends: {'# BLACKPINKxCorden', 'Derrick White', '# AFLNorthDons', '# NRLBulldogsSouths', 'Lyra McKee', '# ConCalmaRemix', '# WeLikeTheEarth', '# DinahJane1', '# DragRace', 'Derry', 'Lil Dicky'}

Hot Trend Exploration

Out of the two collections of trends (each of size Fifty), we can observe from the intersection (final output) that there are 11 overlapping subjects. One prevalent tendency, in particular, seems quite intriguing: It's encouraging to see everyone on Twitter expressing their liking for Mother Earth with the hashtag #WeLikeTheEarth. When we ran the query to gather the trends, folks in us were enraged about things that only applied to them, so we had little or a far larger overlap.

Source Code Snippet

# Extracting all the W W trend names from W W__trends
world__trend11 = set([trend['name'] for trend in W W__trends[0]['trends']])# Extracting all the us trend names from us__trends11
us__trends11 = set([trend['name'] for trend in us__trends11[0]['trends']])# Getting the intersection of the two sets of trends
common__trends11 = world__trend11.intersection(us__trends11)# Inspecting the data
print(world__trend11, " \ n")
print(us__trends11, " \ n")
print (len(common__trends11), "common trends:", common__trends11)

Explanation: We've discovered a trend: #WeLikeTheEarth. Let's check out the narrative it is yelling at us to tell now! With the hashtag in question as a query parameter, we can retrieve messages connected to it through Twitter's search API. The 'WeLikeTheEarth.json' file that the search API returned is what we have saved there. So let's load this data set and investigate this pattern in more detail.

Source Code Snippet

# Loading the data
messages = json.loads(open(' /content /WeLikeTheEarth.json').read())# Inspecting some messages
messages[0:4]

Output:

[{'contributors': None,
  'coordinates': None,
  'created__at': 'Fri Apr 19 08:46:48 +0000 4019',
  'entities': {'hashtags': [{'indices': [30, 45], 'text': 'WeLikeTheEarth'}],
   'symbols': [],
   'urls': [{'display__url': 'youtu.be /pvuN__WvF1to',
     'expanded__url': 'https: / /youtu.be /pvuN__WvF1to',
     'indices': [46, 69],
     'url': 'https: / /t.co /L44XsoT5P1'}],
   'user__mentions': [{'id': 1409516660,
     'id__str': '1409516660',
     'indices': [3, 18],
     'name': 'LD',
     'screen__name': 'lildickymessages'}]},
  'favorite__count': 0,
  'favorited': False,
  'geo': None, 'id': 1119160405470543904,
  'id__str': '1119160405470543904', 'in__response__to__screen__name': None,
  'in__response__to__status__id': None, 'in__response__to__status__id__str': None,
  'in__response__to__user__id': None, 'in__response__to__user__id__str': None,
  'is__quote__status': False, 'lang': 'en',
  'metadata': {'iso__language__code': 'en', 'result__type': 'recent'}, 'place': None,
  'possibly__sensitive': False,
  'retweet__count': 7484,  'retweeted': False, 'retweeted__status': {'contributors': None,  'coordinates': None, 'created__at': 'Fri Apr 19 04:44:49 +0000 4019',
   'entities': {'hashtags': [{'indices': [10, 45], 'text': 'WeLikeTheEarth'}],
    'symbols': [], 'urls': [{'display__url': 'youtu.be /pvuN__WvF1to',
      'expanded__url': 'https: / /youtu.be /pvuN__WvF1to',  'indices': [46, 49],
      'url': 'https: / /t.co /L44XsoT5P1'}], 'user__mentions': []},
   'favorite__count': 13317, 'translator__type': 'none',
    'url': 'https: / /t.co /aFrPkkJKqs', 'utc__offset': None,
    'verified': True}},
  'source': '<a href="http: / /twitter.com /download /android" rel="nofollow">Twitter for Android< /a>',
  'text': 'RT @lildickymessages: 🌎 out now #WeLikeTheEarth https: / /t.co /L44XsoT5P1',   'truncated': False,
  'user': {'contributors__enabled': False,
   'created__at': 'Fri Nov 05 44:45:49 +0000 4010',
   'default__profile': False,
   'default__profile__image': False,
   'description': "''you just got knocked the fuck out''",
   'entities': {'description': {'urls': []},
    'url': {'urls': [{'display__url': 'likeeeujdb.tumblr.com',
       'expanded__url': 'http: / /likeeeujdb.tumblr.com /',
       'indices': [0, 43]}]

Digging more deeper

Printing the initial two message items helps us realize that a message is much more than simply a brief text message, which is how we often conceive of them. Let's stay calm by the volume of data in a message object! See if we can uncover ideas by concentrating on a few fascinating subjects.

Source Code Snippet

# Extract the texts of all the messages from the message object
texts = [tweet['text'] for message in messages]# Extracting screen names of users tweeting about #WeLikeTheEarth
names = [user__mention['screen__name'] for message in messages for user__mention in a tweet['entities']['user__mentions']]# Extracting all the hashtags being used when talking about this topic
hashtags = [hashtag['text'] for message in messages for the hashtag in a tweet['entities']['hashtags']]# Inspecting the first ten results
print (json.dumps(texts[0:10], indent=1)," \ n")
print (json.dumps(names[0:10], indent=1)," \ n")
print (json.dumps(hashtags[0:10], indent=1)," \ n")

Output:

[  "RT @lildickymessages:  \ ud83c \ udf0e out now #WeLikeTheEarth https: / /t.co /L44XsoT5P1",
 " \ ud83d \ udc9a \ ud83c \ udf0e \ ud83d \ udc9a  #WeLikeTheEarth  \ ud83d \ udc47 \ ud83c \ udffc",
 "RT @cabeyoomoon: Ta piosenka to bop,  wpada w ucho  i dochody z niej id \ u0105 na dobry cel,  warto s \ u0144ucha \ u0107 w k \ u00f3 \ u0144ko i w k \ u00f3 \ u0144ko gdziekolwiek si \ u0119 ty \ u4046",
 "#WeLikeTheEarth  \ nCzemu ja si \ u0119 pop \ u0144aka \ u0144am",
 "RT @Spotify: This is epic. @lildickymessages got @justinbieber, @arianagrande, @halsey, @sanbenito, @edsheeran, @SnoopDogg, @ShawnMendes, @Kr \ u4046",
 "RT @biebercentineo: Justin : are we gonna die?  \ nLil dicky: you know bieber we might die  \ n \ nBTCH IM CRYING #EARTH #WeLikeTheEarth #WELIKEEART \ u4046",
 "RT @dreamsiinflate: #WeLikeTheEarth  \ u401ci am a fat fucking pig \ u401d okay brendon urie https: / /t.co /FdJmq31xZc",
 "Literally no one: \ n \ nMe in the past 4 hours: \ n \ nI'm a koala and I sleep all the time, so what, it's cute  \ ud83c \ udfb6 \ n \ n#WeLikeTheEarth #EdSheeranTheKoala",
 "RT @Yuuupthatsme: Mia \ u0144e \ u015b by \ u0107  \ u017cyraf \ u0105 #WeLikeTheEarth https: / /t.co /0kNCpU8o6q",
 "RT @jaguareffects: eu prestando aten \ u00e7 \ u00e3o no  \ u00e1udio pra identificar cada artista \ n \ n#WeLikeTheEarth https: / /t.co /0cDtiV4t1E" ]

Frequency analysis

We may infer the following from the first five results of the most recent extraction:

The song in question is about loving the Earth;
Several well-known musicians, including Lil Dicky, are driving this Twitter trend;
Ed Sheeran appears as a cuddly koala; and
The hashtag "EdSheeranTheKoala"!

We got a feel of the data by looking at the top 10 items in the interesting fields. Now that we've completed a straightforward but crucial exercise, calculating frequency distributions, we can look closer. Starting with frequencies is a good idea since it gives you ideas for moving forward.

Source Code Snippet

# Importing modules
from collections import Counter# Counting occurrences / getting frequency list of all names and hashtags for an item in [names, hashtags]:
    c = Counter(item)    # Inspecting the ten most common items in c
    print (c.most__common(10), " \ n")

Output:

[('lildickymessages', 104) , ('LeoDi' , 44) , ('ShawnMendes' , 33) , ('Halsey , 31) , ('ArianaGrande' , 30) , ('Justin Bieber , 49) , ('Spotify' , 46) , ('edsheeran' , 46) , ('sanbenito' , 45) , ('SnoopDogg' , 45)] 

[('WeLikeTheEarth' , 313) , ('4future' , 14) , ('19aprile' , 14) , ('EARTH' , 11) , ('fridaysforfuture' , 10) , ('EarthMusicVideo' , 3) , ('ConCalmaRemix' , 3) , ('Earth' , 3) , ('aliens' , 4) , ('AvengersEndgame' , 4)]

Activity more about the trend

We may further develop our conclusions based on the most recent frequency distributions:

It is more secure to state that this was a Lil Dicky song video with the hashtag "EarthMusicVideo."
Leo Di is an environmentalist. Thus, it is not surprising to see his name included here, even if he is not a musician.
Additionally, we can state that the video was published on a Friday, most likely on April 19.
We have been able to glean a wealth of knowledge. It's very potent.
Do all re-text happen around a certain tweet? Let's further examine the data to look for trends in the activity surrounding the messages.
The'retweeted__status' feature of a retweeted message contains a wealth of information about the original tweet's content and author.

We may determine how popular a message is by examining the favorite count and retweet count data. With so many celebrities in the photo, can we determine whether their support for #WeLikeTheEarth affected a sizable fraction of their followers if we additionally extract the tweeter's follower count?

The total amount of times the initial message was retweeted is provided by the retweet__count field. Both the initial message and any subsequent re-test should have the same information. To get your mind around the many fields, play around with some sample messages and the official documentation.

Source Code Snippet

re-text = [ (tweet['retweet__count'], tweet['retweeted__status']['favorite__count'], tweet['retweeted__status']['user']['followers__count'], tweet['retweeted__status']['user']['screen__name'], tweet['text']) for message in messages if 'retweeted__status' in tweet]  

A table that speaks thousands of words

Let's modify the data and make it richer and better looking since "looks matter!"

Source Code Snippet

# Import modules
import matplotlib.pyplot as plot
import pandas as pd # Create a Data Frame and visualize the data in a format
df = pd.DataFrame(re-text, columns['Re-text','Favorites',' Followers', 'ScreenName','Text']).groupby(['ScreenName', 'Text', 'Followers']).sum().sort__values(by=['Followers'], ascending=False)df. Style.background__gradient()

Analyzing all the above-used languages

According to our table, Lil Dicky's followers responded the most; 44.4% liked his initial tweet.

Even though the singer Katy Perry and Degeneres have enormous Twitter followings, few of their fans responded.
Leo received the most re-text and likes in terms of numbers, but just 4.19 percent of the people he followed liked his first tweet.

In addition, the fact that this was Lil Dicky's music video may have contributed to the stark contrasts in responses. Due to his significant contribution to this campaign, Leo still gained more traction than Katy or Ellen.

Can the data be analyzed for any more intriguing patterns? We could distinguish several languages from the text of the messages, so let's make a frequency distribution for each.

Source Code Snippet

# Extracting language for each message and add it to the list of languages
messages__languages = []
for message in messages:
    messages__languages.append(tweet['lang'])messages__sources = []
for message in messages:
    messages__sources.append(tweet['source'])# Plot the distribution of language
%matplotlib inlineplot.hist(messages__languages)

Output:

107. ,  44. ,  14. ,  36. ,  34. ,   3. ,   4. ,   1. ,   4.]) ,
 array([ 0.  ,  1.3 ,  4.6 ,  3.4 ,  5.4 ,  6.5 ,  7.8 ,  4.1 , 10.4 , 11.7 , 13. ]),
 <a list of 10 Patch objects>)
(array([303. , 107. ,  44. ,  14. ,  36. ,  34. ,   3. ,   4. ,   1. ,   4.]) ,
 array([ 0.  ,  1.3 ,  4.6 ,  3.4 ,  5.4 ,  6.5 ,  7.8 ,  4.1 , 10.4 , 11.7 , 13. ]),
 <a list of 10 Patch objects>)

Summary: Find thoughts

The final histogram reveals that: The majority of messages were in English, while the next three most popular languages were Polish, Italian, and Spanish. There were several messages in an unrelated language to Facebook (lang = "und").

Why is this kind of knowledge helpful? It can help us understand the "category" of people interested in this subject (clustering). To address inquiries such as, "Does owning an iPhone compared to Android influence the tendency of individuals towards this trend?" we may additionally analyze the device type used among the Twitteratis tweets. I'll leave you with it as a new workout!

What a thrilling adventure it has been! We have come a long way from virtually complete ignorance when we first began.

We have covered much today, from location-based comparisons to analyzing message activity to discovering patterns across languages and devices. Let's give ourselves a deserved pat on the back!

Next TopicException handling in Python

← prev next →