M.Sc-M.Sc Computer Science 2nd Sem CS - 203 : Data Mining and Data Warehousing(University of Pune, Pune-2013)
M.Sc. (Semester - II)
C OMPUTER SCIENCE
SEAT No. :
[Total No. of Pages : 3
CS - 203 : Data Mining and Data Warehousing
(2011 Pattern)
Time :3 Hours] [Max. Marks :80
Instructions to the candidates:-
1) All questions are compulsory.
2) Draw neat diagrams wherever necessary.
3) Figures to the right indicate full marks.
Q1) Attempt any eight of the following :
[ 8 × 2 = 16]
What is the need for data warehousing ? Define an FP- Tree.
Define an association rule.
What are visualization technique ?
What are the types of sequence mining ?
What is clustering ?
What do you mean by active learning ? When to apply chi- square test.
What are key terms used in confusion matrix ?
What are Bayesian classifier ?
Q2) Attempt any four of the following : [ 4 × 4 = 16]
What are the data mining applications ? What are the challenges in web mining?
What is WEKA ? What are the advantages of WEKA ?
What are the two approaches to avoid overfitting ? What are the different ways of handling noisy data ?
P.T.O.
Q3) Attempt any two of the following : [ 2 × 8 = 16]
Suppose that a data warehouse of a match consists of the four dimensions
date , spectator, location and game and the two measures count and charge, where charge is the fare that a spectator pays when watching a game on a given date, spectators may be students, adults or seniors, with each category having its own charge rate.
Draw a star schema diagram for the data warehouse.
The following table consists of training tuples from the all electronics
customer database . The data have been generalized and in given below:
RID age income Student Credit Class:
rating buys-
Computer
1 youth high no fair no
2 " " " excellent "3 middle-age " " fair yes
4 senior medium " " "
5 " low yes " "6 " " " excellent no 7 middle -aged " " " yes 8 youth medium no fair no 9 " low yes fair yes
10 senior medium " fair "
11 youth " " excellent "
12 middle-eaged " no " "
13 " high yes fair "
14 senior medium no excellent no
i) Draw a decision tree for the concept buys - computer.
[4339] - 203 2
Compute information gain of the attribute age, to find the splitting criterion
for the tuples. (The class label attribute buys. Computer has two distinct values. Let class C1 corresponds to no. of yes and C2 corresponds to no. of no).
Explain basic data mining tasks of predictive model.
Q4) Attempt any four of the following: [ 4× 4 =16]
What are data preprocessing techniques ? Explain any one.
Differentiate between Text mining and Web mining.
Explain sampling algorithm with an example.
How is a data warehouse different from a database ? How are they similar?
Differentiate between OLTP and OLAP.
Q5) Attempt any four of the following : [ 4 ×4 = 16]
a) Explain the major steps of decision tree classification.
b) How does the k- means clustering algorithm works ?
c) Write a note on non linear regression.
d) Explain following accuracy measures (any two) :
i) Bootstrap
ii) F - measure
iii) Precision
iv) Cross- validation
e) Differentiate between agglomerative and divisive clustering method.
nnn
[4339] - 203 3
Earning: Approval pending. |