Data Mining MCQ

This section of interview questions and answers focuses on "Data Mining". One can practice these interview questions to improve their concepts needed for various interviews (campus interviews, walk-in interviews, and company interviews).

1) Which of the following refers to the problem of finding abstracted patterns (or structures) in the unlabeled data?

Supervised learning
Unsupervised learning
Hybrid learning
Reinforcement learning

Answer: b

Explanation: Unsupervised learning is a type of machine learning algorithm that is generally used to find the hidden structured and patterns in the given unlabeled data.

2) Which one of the following refers to querying the unstructured textual data?

Information access
Information update
Information retrieval
Information manipulation

Answer: c

Explanation: Information retrieval refers to querying the unstructured textual data. We can also understand information retrieval as an activity (or process) in which the tasks of obtaining information from system recourses that are relevant to the information required from the huge source of information.

3) Which of the following can be considered as the correct process of Data Mining?

Infrastructure, Exploration, Analysis, Interpretation, Exploitation
Exploration, Infrastructure, Analysis, Interpretation, Exploitation
Exploration, Infrastructure, Interpretation, Analysis, Exploitation
Exploration, Infrastructure, Analysis, Exploitation, Interpretation

Answer: a

Explanation: The process of data mining contains many sub-processes in a specific order. The correct order in which all sub-processes of data mining executes is Infrastructure, Exploration, Analysis, Interpretation, and Exploitation.

4) Which of the following is an essential process in which the intelligent methods are applied to extract data patterns?

Warehousing
Data Mining
Text Mining
Data Selection

Answer: b

Explanation: Data mining is a type of process in which several intelligent methods are used to extract meaningful data from the huge collection ( or set) of data.

5) What is KDD in data mining?

Knowledge Discovery Database
Knowledge Discovery Data
Knowledge Data definition
Knowledge data house

Answer: a

Explanation: The term KDD or Knowledge Discovery Database is refers to a broad process of discovering the knowledge in the data and emphasizes the high-level applications of specific Data Mining techniques as well.

6) The adaptive system management refers to:

Science of making machine performs the task that would require intelligence when performed by humans.
A computational procedure that takes some values as input and produces some values as the output.
It uses machine learning techniques, in which programs learn from their past experience and adapt themself to new conditions or situations.
All of the above.

Answer: c

Explanation: Generally, adaptive system management refers to using machine learning techniques. In which the programs learn from their past experience and adapt themselves for new conditions and events.

7) For what purpose, the analysis tools pre-compute the summaries of the huge amount of data?

In order to maintain consistency
For authentication
For data access
To obtain the queries response

Answer: d

Explanation:

Whenever a query is fired, the response of the query would be put very earlier. So, for the query response, the analysis tools pre-compute the summaries of the huge amount of data. To understand it in more details, consider the following example:

Suppose that to get some information about something, you write a keyword in Google search. Google's analytical tools will then pre-compute large amounts of data to provide a quick output related to the keywords you have written.

8) What are the functions of Data Mining?

Association and correctional analysis classification
Prediction and characterization
Cluster analysis and Evolution analysis
All of the above

Answer: d

Explanation: In data mining, there are several functionalities used for performing the different types of tasks. The common functionalities used in data mining are cluster analysis, prediction, characterization, and evolution. Still, the association and correctional analysis classification are also one of the important functionalities of data mining.

9) In the following given diagram, which type of clustering is used?

Hierarchal
Naive Bayes
Partitional
None of the above

Answer: a

Explanation: In the above-given diagram, the hierarchal type of clustering is used. The hierarchal type of clustering categorizes data through a variety of scales by making a cluster tree. So the correct answer is A.

10) Which of the following statements is incorrect about the hierarchal clustering?

The hierarchal type of clustering is also known as the HCA
The choice of an appropriate metric can influence the shape of the cluster
In general, the splits and merges both are determined in a greedy manner
All of the above

Answer: a

Explanation: All following statements given in the above question are incorrect, so the correct answer is D.

11) Which one of the following can be considered as the final output of the hierarchal type of clustering?

A tree which displays how the close thing are to each other
Assignment of each point to clusters
Finalize estimation of cluster centroids
None of the above

Answer: a

Explanation: The hierarchal type of clustering can be referred to as the agglomerative approach.

12) Which one of the following statements about the K-means clustering is incorrect?

The goal of the k-means clustering is to partition (n) observation into (k) clusters
K-means clustering can be defined as the method of quantization
The nearest neighbor is the same as the K-means
All of the above

Answer: c

Explanation: There is nothing to deal in between the k-means and the K- means the nearest neighbor.

13) Which of the following statements about hierarchal clustering is incorrect?

The hierarchal clustering can primarily be used for the aim of exploration
The hierarchal clustering should not be primarily used for the aim of exploration
Both A and B
None of the above

Answer: a

Explanation: The hierarchical clustering technique can be used for exploration because it is the deterministic technique of clustering.

14) Which one of the clustering technique needs the merging approach?

Partitioned
Naïve Bayes
Hierarchical
Both A and C

Answer: c

Explanation: The hierarchal type of clustering is one of the most commonly used methods to analyze social network data. In this type of clustering method, multiple nodes are compared with each other on the basis of their similarities and several larger groups' are formed by merging the nodes or groups of nodes that have similar characteristics.

15) The self-organizing maps can also be considered as the instance of _________ type of learning.

Supervised learning
Unsupervised learning
Missing data imputation
Both A & C

Answer: b

Explanation: The Self Organizing Map (SOM), or the Self Organizing Feature Map is a kind of Artificial Neural Network which is trained through unsupervised learning.

16) The following given statement can be considered as the examples of_________

Suppose one wants to predict the number of newborns according to the size of storks' population by performing supervised learning

Structural equation modeling
Clustering
Regression
Classification

Answer: c

Explanation: The above-given statement can be considered as an example of regression. Therefore the correct answer is C.

17) In the example predicting the number of newborns, the final number of total newborns can be considered as the _________

Features
Observation
Attribute
Outcome

Answer: d

Explanation: In the example of predicting the total number of newborns, the result will be represented as the outcome. Therefore, the total number of newborns will be found in the outcome or addressed by the outcome.

18) Which of the following statement is true about the classification?

It is a measure of accuracy
It is a subdivision of a set
It is the task of assigning a classification
None of the above

Answer: b

Explanation: The term "classification" refers to the classification of the given data into certain sub-classes or groups according to their similarities or on the basis of the specific given set of rules.

19) Which of the following statements is correct about data mining?

It can be referred to as the procedure of mining knowledge from data
Data mining can be defined as the procedure of extracting information from a set of the data
The procedure of data mining also involves several other processes like data cleaning, data transformation, and data integration
All of the above

Answer: d

Explanation: The term data mining can be defined as the process of extracting information from the massive collection of data. In other words, we can also say that data mining is the procedure of mining useful knowledge from a huge set of data.

20) In data mining, how many categories of functions are included?

Answer: c

Explanation: There are only two categories of functions included in data mining: Descriptive, Classification and Prediction. Therefore the correct answer is C.

21) Which of the following can be considered as the classification or mapping of a set or class with some predefined group or classes?

Data set
Data Characterization
Data Sub Structure
Data Discrimination

Answer: d

Explanation: The discrimination refers to the mapping (or classification) of a class with some predefined groups or classes. So the correct answer is D.

22) The analysis performed to uncover the interesting statistical correlation between associated -attributes value pairs are known as the _______.

Mining of association
Mining of correlation
Mining of clusters
All of the above

Answer: b

Explanation: Mining of correlation refers to the additional analysis performed for uncovering the interesting statistical correlation in between associated-attribute-value pairs.

23) Which one of the following can be defined as the data object which does not comply with the general behavior (or the model of available data)?

Evaluation Analysis
Outliner Analysis
Classification
Prediction

Answer: b

Explanation: It may be defined as the object that doesn't comply with the general behavior or with the model of available data.

24) Which one of the following statements is not correct about the data cleaning?

It refers to the process of data cleaning
It refers to the transformation of wrong data into correct data
It refers to correcting inconsistent data
All of the above

Answer: d

Explanation: Data cleaning is a kind of process that is applied to data set to remove the noise from the data (or noisy data), inconsistent data from the given data. It also involves the process of transformation where wrong data is transformed into the correct data as well. In other words, we can also say that data cleaning is a kind of pre-process in which the given set of data is prepared for the data warehouse.

25) The classification of the data mining system involves:

Database technology
Information Science
Machine learning
All of the above

Answer: d

Explanation: Generally, the classification of a data mining system depends on the following criteria: Database technology, machine learning, visualization, information science, and several other disciplines.

26) In order to integrate heterogeneous databases, how many types of approaches are there in the data warehousing?

Answer: d

Explanation: In general, data warehousing consist of data integration, data cleaning, and data consolidations. Therefore to integrate heterogeneous databases, there are two approaches that are update-driven approach and the query-driven approach. So the correct answer is D.

27) The issues like efficiency, scalability of data mining algorithms comes under_______

Performance issues
Diverse data type issues
Mining methodology and user interaction
All of the above

Answer: a

Explanation: In order to extract information effectively from a huge collection of data in databases, the data mining algorithm must be efficient and scalable. Therefore the correct answer is A.

28) Which of the following is the correct advantage of the Update-Driven Approach?

This approach provides high performance.
The data can be copied, processed, integrated, annotated, summarized and restructured in the semantic data store in advance.
Both A and B
None of the above

Answer: c

Explanation: The statements given in both A and B are the advantage of the Update-Driven Approach in Data Warehousing. So the correct answer is C.

29) Which of the following statements about the query tools is correct?

Tools developed to query the database
Attributes of a database table that can take only numerical values
Both and B
None of the above

Answer: a

Explanation: The query tools are used to query the database. Or we can also say that these tools are generally used to get only the necessary information from the entire database.

30) Which one of the following correctly defines the term cluster?

Group of similar objects that differ significantly from other objects
Symbolic representation of facts or ideas from which information can potentially be extracted
Operations on a database to transform or simplify data in order to prepare it for a machine-learning algorithm
All of the above

Answer: a

Explanation: The term "cluster" refers to the set of similar objects or items that differ significantly from the other available objects. In other words, we can understand clusters as making groups of objects that contain similar characteristics form all available objects. Therefore the correct answer is A.

31) Which one of the following refers to the binary attribute?

This takes only two values. In general, these values will be 0 and 1, and they can be coded as one bit
The natural environment of a certain species
Systems that can be used without knowledge of internal operations
All of the above

Answer: a

Explanation: In general, the binary attribute takes only two types of values, that are 0 and 1and these values can be coded as one bit. So the correct answer will be A.

32) Which of the following correctly refers the data selection?

A subject-oriented integrated time-variant non-volatile collection of data in support of management
The actual discovery phase of a knowledge discovery process
The stage of selecting the right data for a KDD process
All of the above

Answer: c

Explanation: Data selection can be defined as the stage in which the correct data is selected for the phase of a knowledge discovery process (or KKD process). Therefore the correct answer C.

33) Which one of the following correctly refers to the task of the classification?

A measure of the accuracy, of the classification of a concept that is given by a certain theory
The task of assigning a classification to a set of examples
A subdivision of a set of examples into a number of classes
None of the above

Answer: b

Explanation: The task of classification refers to dividing the set into subsets or in the numbers of the classes. Therefore the correct answer is C.

34) Which of the following correctly defines the term "Hybrid"?

Approach to the design of learning algorithms that is structured along the lines of the theory of evolution.
Decision support systems that contain an information base filled with the knowledge of an expert formulated in terms of if-then rules.
Combining different types of method or information
None of these

Answer: c

Explanation: The term "hybrid" refers to merging two objects and forms individual object that contains features of the combined objects.

35) Which of the following correctly defines the term "Discovery"?

It is hidden within a database and can only be recovered if one is given certain clues (an example IS encrypted information).
An extremely complex molecule that occurs in human chromosomes and that carries genetic information in the form of genes.
It is a kind of process of executing implicit, previously unknown and potentially useful information from data
None of the above

Answer: c

Explanation: The term "discovery" means to discover something new that has not yet been discovered. It can also be interpreted as a process of executing underlying, previously unknown and potentially useful information from data.

36) Euclidean distance measure is can also defined as ___________

The process of finding a solution for a problem simply by enumerating all possible solutions according to some predefined order and then testing them
The distance between two points as calculated using the Pythagoras theorem
A stage of the KDD process in which new data is added to the existing selection.
All of the above

Answer: c

Explanation: Euclidean distance measure can be defined as the calculating distance between two points in either in-plane or three-dimensional space measures the length of the segments connecting two points. It can also define as the distance between two points as calculated using the Pythagoras theorem.

37) Which one of the following can be considered as the correct application of the data mining?

Fraud detection
Corporate Analysis & Risk management
Management and market analysis
All of the above

Answer: d

Explanation: Data mining is highly useful in a variety of areas such as fraud detection, corporate analysis, and risk management, and market analysis, etc., so the correct option is D.

38) Which one of the following correctly refers to the Class study in the data cauterization?

Final class
Study class
Target class
Both A and C

Answer: c

Explanation: In the data cauterization, generally, the study class refers to the target class, and the study class is the class that is under the process of summarizing data.

39) Which of the following refers to the sequence of pattern that occurs frequently?

Frequent sub-sequence
Frequent sub-structure
Frequent sub-items
All of the above

Answer: a

Explanation: In data mining, the frequent sub-sequence refers to a certain sequence of patterns that occurs frequently, for example, buying a camera followed by the memory card. So the correct answer will be A.

40) Which one of the following refers to the model regularities or to the objects that trends or not consistent with the change in time?

Prediction
Evolution analysis
Classification
Both A and B

Answer: b

Explanation: In general, the evolution analysis refers to the model regularities or the object trends that vary with change in time.

41) The issues like "handling the rational and complex types of data" comes under which of the following category?

Diverse Data Type
Mining methodology and user interaction Issues
Performance issues
All of the above

Answer: a

Explanation: It is quite often that a database can contain multiple types of data, complex objects, and temporary data, etc., so it is not possible that only one type of system can filter all data. Therefore this type of issue comes under the category Diverse Data type. So the correct answer is A.

42) Which of the following also used as the first step in the knowledge discovery process?

Data selection
Data cleaning
Data transformation
Data integration

Answer: b

Explanation: Data cleaning is included as one of the first steps of the knowledge discovery process. So the correct answer is B.

43) Which of the following refers to the steps of the knowledge discovery process, in which the several data sources are combined?

Data selection
Data cleaning
Data transformation
Data integration

Answer: d

Explanation: The step "data integration" of the knowledge discovery process refers to combining several data sources. Therefore the correct answer is D.

44) Which of the following can be considered as the drawback of the query-Driven approach in data warehousing?

This approach is expensive for queries that require aggregations
This approach is expensive insufficient, and very frequent queries
This approach requires a very complex integration and filtering process
All of the above

Answer: d

Explanation: All statements given in the above question are drawbacks of the query-driven approach. Therefore the correct answer is D.

45) Which of the following correctly refers to the term "Data Independence"?

It means that the programs are not dependent on the logical attributes
It refers to that data that is defined separately, not included in the program
It means that the programs are totally dependent on the physical attributes of data
Both A and C

Answer: d

Explanation: The term "Data Independence" refers that the programs are not dependent on the physical attributes of data and neither on the logical attributes of data.

46) Which of the following is generally used by the E-R model to represent the weak entities?

Diamond
Doubly outlined rectangle
Dotted rectangle
Both B & C

Answer: b

Explanation: Generally, the double outline rectangle is used in the E-R model to represent the weak entities.

47) Which one of the following refers to the Black Box?

It can be referred as the system that can be used without the knowledge of the internal operations
It referrers the natural environment of the specific species
It takes only two values at most that are 0 and 1
All of the above

Answer: a

Explanation: Black Box is referred to as the system which takes only two values at most are zero and one.

48) Which one of the following issues must be considered before investing in data mining?

Compatibility
Functionality
Vendor consideration
All of the above

Answer: d

Explanation: The common but important issues like functionality and compatibility must always be discussed before investing in data mining. Therefore the correct answer is D.

49) The term "DMQL" stands for _____

Data Marts Query Language
DBMiner Query Language
Data Mining Query Language
None of the above

Answer: c

Explanation: The term "DMQL" refers to the Data Mining Query Language. Therefore the correct answer is C.

50) In certain cases, it is not clear what kind of pattern need to find, data mining should_________:

Try to perform all possible tasks
Perform both predictive and descriptive task
It may allow interaction with the user so that he can guide the mining process
All of the above

Answer: c

Explanation: In some data mining operations where it is not clear what kind of pattern needed to find, here the user can guide the data mining process. Because a user has a good sense of which type of pattern he wants to find. So, he can eliminate the discovery of all other non-required patterns and focus the process to find only the required pattern by setting up some rules. Therefore the correct answer is C.

Next Topic#

← prev next →