How To Exam?

a knowledge trading engine...

M.Sc-M.Sc Computer Science 2nd Sem CS - 203 : Data Mining and Data Warehousing(University of Pune, Pune-2013)

Friday, 28 November 2014 12:42Nitha

M.Sc. (Semester - II)

C OMPUTER SCIENCE

SEAT No. :

[Total No. of Pages : 3

CS - 203 : Data Mining and Data Warehousing

(2011 Pattern)

Time :3 Hours] [Max. Marks :80

Instructions to the candidates:-

1) All questions are compulsory.

2) Draw neat diagrams wherever necessary.

3) Figures to the right indicate full marks.

Q1) Attempt any eight of the following :

[ 8 × 2 = 16]

What is the need for data warehousing ? Define an FP- Tree.

Define an association rule.

What are visualization technique ?

What are the types of sequence mining ?

What is clustering ?

What do you mean by active learning ? When to apply chi- square test.

What are key terms used in confusion matrix ?

What are Bayesian classifier ?

Q2) Attempt any four of the following : [ 4 × 4 = 16]

What are the data mining applications ? What are the challenges in web mining?

What is WEKA ? What are the advantages of WEKA ?

What are the two approaches to avoid overfitting ? What are the different ways of handling noisy data ?

P.T.O.

Q3) Attempt any two of the following : [ 2 × 8 = 16]

Suppose that a data warehouse of a match consists of the four dimensions

date , spectator, location and game and the two measures count and charge, where charge is the fare that a spectator pays when watching a game on a given date, spectators may be students, adults or seniors, with each category having its own charge rate.

Draw a star schema diagram for the data warehouse.

The following table consists of training tuples from the all electronics

customer database . The data have been generalized and in given below:

RID age income Student Credit Class:

rating buys-

Computer

1 youth high no fair no

2 " " " excellent "3 middle-age " " fair yes

4 senior medium " " "

5 " low yes " "6 " " " excellent no 7 middle -aged " " " yes 8 youth medium no fair no 9 " low yes fair yes

10 senior medium " fair "

11 youth " " excellent "

12 middle-eaged " no " "

13 " high yes fair "

14 senior medium no excellent no

i) Draw a decision tree for the concept buys - computer.

[4339] - 203 2

Compute information gain of the attribute age, to find the splitting criterion

for the tuples. (The class label attribute buys. Computer has two distinct values. Let class C1 corresponds to no. of yes and C2 corresponds to no. of no).

Explain basic data mining tasks of predictive model.

Q4) Attempt any four of the following: [ 4× 4 =16]

What are data preprocessing techniques ? Explain any one.

Differentiate between Text mining and Web mining.

Explain sampling algorithm with an example.

How is a data warehouse different from a database ? How are they similar?

Differentiate between OLTP and OLAP.

Q5) Attempt any four of the following : [ 4 ×4 = 16]

a) Explain the major steps of decision tree classification.

b) How does the k- means clustering algorithm works ?

c) Write a note on non linear regression.

d) Explain following accuracy measures (any two) :

i) Bootstrap

ii) F - measure

iii) Precision

iv) Cross- validation

e) Differentiate between agglomerative and divisive clustering method.

nnn

[4339] - 203 3

1
2
3
4
5

( 0 Votes )

Add comment

JComments

Earning: Approval pending.