Difference between Web Content, Web Structure, and Web Usage Mining
Web mining is the application of data mining techniques to extract knowledge from web data, including web documents, hyperlinks between documents, usage logs of websites, etc. Web mining aims to discover and retrieve useful and interesting patterns from large data sets and classic data mining. Big data act as data sets on web mining. Web data includes information, documents, structure, and profile. Web mining is based on two concepts defined, process-based and data-driven. In general, the use of web mining typically involves several steps, such as collecting data, selecting the data before processing, knowledge discovery, and analysis.
The internet has become a crucial part of our lives nowadays, so the techniques that help extract data on the web are an interesting area of research. These techniques help to extract knowledge from Web data, in which at least one of structure or usage (Weblog) data is used in the mining process (with or without other types of the web). In general, Web mining tasks can be classified into three categories:
All three categories focus on the process of knowledge discovery of implicit, previously unknown, and potentially useful information from the web. Each of them focuses on different mining objects of the web. Let's study all of the three categories in brief for good understanding.
What is Web Content Mining?
Web Content Mining can be used for the mining of useful data, information, and knowledge from web page content. Web content mining performs scanning and mining of the text, images, and group of web pages according to the content of the input by displaying the list in search engines.
It is also quite different from data mining because web data are mainly semi-structured or unstructured, while data mining deals primarily with structured data. Web content mining is also different from text mining because of the semi-structured nature of the web, while text mining focuses on unstructured texts. Thus, Web content mining requires creative applications of data mining and text mining techniques and its own unique approaches.
In the past few years, there has been a rapid expansion of activities in the web content mining area. This is not surprising because of the phenomenal growth of web content and the significant economic benefit of such mining. However, due to the heterogeneity and the lack of structure of web data, automated discovery of targeted or unexpected knowledge information still present many challenging research problems. Web content mining could be differentiated from two approaches, such as:
1. Agent-based Approach
This approach involves intelligent systems. It aims to improve information finding and filtering. It usually relies on autonomous agents that can identify relevant websites. And it could be placed into the following three categories, such as:
2. Data based approach
Data based approach is used to organize semi-structured data present on the internet into structured data. It aims to model the web data into a more structured form to apply standard database querying mechanisms and data mining applications to analyze it.
Web Content Mining Challenges
Web content mining has the following problems or challenges also with their solutions, such as:
What is Web Structure Mining?
The challenge for Web structure mining is to deal with the structure of the hyperlinks within the web itself. Link analysis is an old area of research. However, with the growing interest in Web mining, the research of structure analysis has increased. These efforts resulted in a newly emerging research area called Link Mining, which is located at the intersection of the work in link analysis, hypertext, web mining, relational learning, inductive logic programming, and graph mining.
Web structure mining uses graph theory to analyze a website's node and connection structure. According to the type of web structural data, web structure mining can be divided into two kinds:
The web contains a variety of objects with almost no unifying structure, with differences in the authoring style and content much greater than in traditional collections of text documents. The objects in the WWW are web pages, and links are in, out, and co-citation (two pages linked to by the same page). Attributes include HTML tags, word appearances, and anchor texts. Web structure mining includes the following terminology, such as:
An example of a technique of web structure mining is the PageRank algorithm used by Google to rank search results. A page's rank is decided by the number and quality of links pointing to the target node.
Link mining had produced some agitation on some traditional data mining tasks. Below we summarize some of these possible tasks of link mining which are applicable in Web structure mining, such as:
What is Web Usage Mining?
Web Usage Mining focuses on techniques that could predict the behavior of users while they are interacting with the WWW. Web usage mining, discovering user navigation patterns from web data, trying to discover useful information from the secondary data derived from users' interactions while surfing the web. Web usage mining collects the data from Weblog records to discover user access patterns of web pages. Several available research projects and commercial tools analyze those patterns for different purposes. The insight knowledge could be utilized in personalization, system improvement, site modification, business intelligence, and usage characterization.
The only information left behind by many users visiting a Web site is the path through the pages they have accessed. Most of the Web information retrieval tools only use textual information, while they ignore the link information that could be very valuable. In general, there are mainly four kinds of data mining techniques applied to the web mining domain to discover the user navigation pattern, such as:
1. Association Rule Mining
Association rule is the most basic rule of data mining methods which is used more than other methods in web usage mining. This method enables the website for more efficient content organization or provides recommendations for an effective cross-selling product.
These rules are statements in the form X => Y where (X) and (Y) are the set of available items in a series of transactions. The rule of X => Y states that transactions that contain items in X may also include items in Y. Association rules in the web usage mining are used to find relationships between pages that frequently appear next to one another in user sessions.
2. Sequential Patterns
Sequential patterns are used to discover the subsequence in a large volume of sequential data. In web usage mining, sequential patterns are used to find user navigation patterns that frequently appear at meetings. The sequential patterns may seem to be association rules. But the sequential patterns are included the time, which means that the sequence of events that occurred is defined in sequential patterns. Algorithms that are used to extract association rules can also be used to generate sequential patterns. Two types of algorithms are used for sequential mining patterns.
Clustering techniques diagnose groups of similar items among high volumes of data. This is done based on distance functions which measure the degree of similarity between different items. Clustering in web usage mining is used for grouping similar meetings. What is important in this type of search is the contrast between the user and individual groups. Two types of interesting clustering can be found in this area: user clustering and page clustering.
Clustering of user records is usually used to analyze web mining and web analytics tasks. More knowledge derived from clustering is used to partition the market in e-commerce. Different methods and techniques are used for clustering, which includes:
The repetitive patterns are first extracted from the user's sessions using association rules in other clustering methods. Then, these patterns are used to construct a graph where the nodes are the visited pages. The edges of the graph connect two or more pages. If these pages exist in a pattern extracted, the weight will be assigned to the edges that show the relationship between the nodes. Then, for clustering, this graph is recursively divided to user behavior groups are detected.
4. Classification Mining
Discovering classification rules allows one to develop a profile of items belonging to a particular group according to their common attributes. This profile can classify new data items added to the database. In Web Mining, classified techniques allow one to develop a profile for clients who access particular server files based on demographic information available on those clients or their navigation patterns.
Web usage mining has many advantages, making this technology attractive to corporations, including government agencies.
Web usage mining by itself does not create issues, but when used on data of personal nature, this technology might cause concerns.
Web Usage Mining Applications
The main objective of web usage mining is to collect data about the user's navigation patterns. This information can improve the Web sites in the user view. There are three main applications of this mining, such as:
1. Privatization of web content
Web usage mining techniques can be used for the personalization of web users. For example, user behavior can be immediately predicted by comparing her current survey patterns with those extracted from the log files. Recommendation systems with a real application in this area suggest links that direct the user to his favorite pages. Some sites also organize their product catalogs based on the predicted interests of a specific user and represent them.
2. Pre - recovery
The results of web usage mining can be used to improve the performance of Web servers and Web-based applications. Web usage mining can be used for retrieving and caching strategies and thus reduce the response time of Web servers.
3. Improvement of Web site design
Usability is one of the most important issues in designing and implementing websites. The results of web usage mining can help to appropriate the design of websites. Adaptive websites are an application of this type of mining. Website content and structure are dynamically reorganized based on data derived from user behavior in these sites.
Difference between Web Content, Web Structure, and Web Usage Mining
Here are the following difference between web content, web structure, and web usage mining, such as: