Anna University Trichy 2009 B.Tech Information Technology Dataware housing & mining - Question Paper

Branch : Dept. of info Technology Date:16.09.09
Year/ Sem : IV / seventh Sem Total Marks:50
Sub. Code / Name: CS1004 – Dataware housing & mining Time : one ½ Hrs.

Part-A 7X2=14

1. Explain the difference ranging from Star & Snowflake scheme.
2. What are the task to be accomplished as part of preprocessing
3. What is the frequent itemset priority.?
4. What is overfitting and how to prevent it?.
5. What do you mean by Relevance analysis.?
6. What is Cuboid and how it is used in Datamining.?
7. How OLAP helped in concept characterization.

Part – B 3 X12 =36
1. A database has 5 transactions. Let min-sup=60% and min-conf=80%

TID Items_bought
T100 {M,O,N,K,E,Y}
T200 {D,O,N,K,E,Y}
T300 {M,A,K,E}
T400 {M,U,C,K,Y}
T500 {C,O,O,K,I,E}

(a) Find all frequent itemsets using apriori and FP-growth, the efficiency of the 2 mining processes.
(b) List all of the strong association rules(with support s and confidence c)matching the subsequent metarule, where X is a variable representing customers,and itemI denotes variables representing items(e.g., "a","b",etc.):

?x € Transaction,buys(x,item1)^buys(X,item2)=>buys(X,item3) [s,c].

2. Construct frequent trend tree using FP growth

T100 I1,I2,I5
T200 I2,I4
T300 I2,I3
T400 I1,I2,I4
T500 I1,I3
T600 I2,I3
T700 I1,I3
T800 I1,I2,I3,I5
T900 I1,I2,I3

3. calculate the expected info gain for the subsequent class tagged traing tuples.

Rid Age Income learner redit_rating class:buys_computer
one youth high no fair no
two youth high no excellent no
three middle_aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle_aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle_aged medium no excellent yes
13 middle_aged high yes fair yes
14 senior medium no excellent no.

