a knowledge trading engine...

Birla Institute of Technology (BIT Mesra) 2006 DATA MINING - Question Paper

Saturday, 19 January 2013 10:35Web

Birla Institute of Technology & Science, Pilani

Distance Learning Programmes Division

Second Semester 2006-2007

Mid-Semester Test

(EC-1 Regular)

Course No. : IS ZC415

Course Title : DATA MINING

No. of Pages = 1

No. of ques. = five
Nature of examination : Closed Book

Weightage : 40%

Duration : two Hours

Date of examination : 03/02/2007 (AN)

Note:

1. Please follow all the Instructions to Candidates provided on the cover page of the ans book.

2. All parts of a ques. should be answered consecutively. every ans should begin from a fresh page.

3. Mobile phones and computers of any type should not be brought inside the exam hall.

4. Use of any unfair means will outcome in severe disciplinary action.

Q.1. define why concept hierarchies are important in data mining. (4)

Q.2. provide 1 innovative example where the study of outliers is useful. Also propose a few method to detect outliers. (4)

Q.3. Suppose a group of 12 sales price records has been sorted as follows:

5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215

(a) Partition them into 3 bins using equal width partitioning.

(b) Use smoothing by bin means to smooth the data, using a bin of depth 3.

(c) Perform z-score normalization.

(d) Sketch example of stratified sampling: Use sample of size six and the strata “low”, “medium”, and “high”. (6)

Q.4. A data warehouse consists of the 4 dimensions date, spectator, location and game, and a measure charge. Charge is the fare that a spectator pays when watching a game on a provided date. Spectators may be students, adults, or seniors with every category having its own charge rate. Draw a star schema diagram for the data warehouse. Show the fact and dimension tables attributes and mark primary and foreign keys. discuss how we can perform association rule mining. (6)

Q.5. A database has 5 transactions. Let min_sup =60% and min_conf =80%

TID
Items Bought

T100
M, O, N, K, E, Y

T200
D, O, N, K, E, Y

T300
M, A, K, E

T400
M, U, C. K, Y

T500
C, O, K, I, E

(a) Use Apriori to obtain frequent item sets of 1st 4 transactions. Mark non-frequent and pruned items by NF and P respectively.

(b) Use Sampling algorithm to obtain all frequent itemsets of the whole dataset. Solution of (a) can be taken as set of potential itemsets.

(c) obtain all strong association rules.

(d) Use FP Growth to obtain frequent itemsets (5 + four + three + eight = 20)

********

1
2
3
4
5

( 0 Votes )

Add comment

JComments

Earning: Approval pending.