a knowledge trading engine...

Anna University Chennai 2008-1st Sem B.Tech Information Technology CS 1004 – DATA WAREHOUSING AND MINING Anna university seventh semester - Question Paper

Saturday, 02 March 2013 10:20Web

B.E. / B.Tech DEGREE EXAMINATION, NOVEMBER/DECEMBER 2008

Seventh Semester

info Technology

CS 1004 – DATA WAREHOUSING AND MINING

(Regulation 2004)

Time: 3 hours Maximum: 100 marks

ans ALL ques..

PART A – (10 x 2=20 marks)

1. What is the difference ranging from view and materialized view?
2. Explain the Difference ranging from star and snowflake schema?
3. Mention the different tasks to be accomplished as part of data pre-processing.
4. Define Data Mining.
5. What is over fitting and what can you do to prevent it?
6. In classification trees, elaborate surrogate splits, and how are they used?
7. What is the objective function of the K-Means algorithm?
8. The naïve Bayes’ classifier makes what assumption that motivates its name?
9. What is the frequent itemset property?
10. Mention the advantages of Hierarchical clustering.

PART B – (5 x 16 = 80 marks)

11. (a) Enumerate the building blocks of a data warehouse. discuss the
importance of metadata in a data warehouse environment. elaborate the
challenges in metadata management?[Marks 16]

Or

(b) (i) Distinguish ranging from the entity-relationship modeling technique
and dimensional modeling. Why is the entity-relational modeling
technique not suitable for the data warehouse?[Marks 8]

(ii) Create a star schema diagram that will enable FIT-WORLD GYM
INC. to analyze their revenue. The fact table will include – for every
instance of revenue taken – attribute(s) useful for analyzing
revenue. The star schema will include all dimensions that can be
useful for analyzing revenue. Formulate query: “Find the
percentage of revenue generated by members in the last year”.
How many cuboids are there in the complete data cube?[Marks 8]

12. (a) Explain the five steps in the Knowledge Discovery in Databases (KDD)
process. explain in brief the characterization of data mining algorithms.
explain in brief important implementation problems in data mining.
[Marks five + six + 5]

Or

(b) Distinguish ranging from statistical inference and exploratory data analysis.
Enumerate and explain different statistical techniques and methods for
data analysis. Write a short note on machine learning. What is
supervised and unsupervised learning? Write a short note on regression
and correlation.[Marks 16]

13. (a) Decision tree induction is a popular classification method. Taking 1 typical decision tree induction algorithm , briefly outline the method of
decision tree classification.[Marks 16]

Or

(b) Consider the subsequent training dataset and the original decision tree
induction algorithm (ID3). Risk is the class tag attribute. The Height
values have been already discretized into disjoint ranges. compute the
info gain if Gender is chosen as the test attribute. compute the
info gain if Height is chosen as the test attribute. Draw the final
decision tree (without any pruning) for the training dataset. Generate all
the “IF-THEN rules from the decision tree.

Gender Height Risk

F (1.5, 1.6) Low
M (1.9, 2.0) High
F (1.8, 1.9) Medium
F (1.8, 1.9) Medium
F (1.6, 1.7) Low
M (1.8, 1.9) Medium
F (1.5, 1.6) Low
M (1.6, 1.7) Low
M (2.0, 8) High
M (2.0, 8) High
F (1.7, 1.8) Medium
M (1.9, 2.0) Medium
F (1.8, 1.9) Medium
F (1.7, 1.8) Medium
F (1.7, 1.8) Medium
[Marks 16]
14. (a) provided the subsequent transactional database

1 C, B, H
2 B, F, S
3 A, F, G
4 C, B, H
5 B, F, G
6 B, E, O

(i) We want to mine all the frequent itemsets in the data using the
Apriori algorithm. presume the minimum support level is 30%. (You need to provide the set of frequent itemsets in L1, L2,… candidate itemsets in C1, C2,…) [Marks 9]

(ii) obtain all the association rules that involve only B, C.H (in either left
or right hand side of the rule). The minimum confidence is 70%.[Marks7]

Or

(b) define the multi-dimensional association rule, giving a suitable example.[Marks 16]

15. (a) BIRCH and CLARANS are 2 interesting clustering algorithms that perform effective clustering in large data sets.
(i) Outline how BIRCH performs clustering in large data sets. [Marks 10]
(ii) Compare and outline the major differences of the 2 scalable clustering algorithms : BIRCH and CLARANS.[Marks 6]

Or

(b) Write a short note on web mining taxonomy. discuss the various activities of text mining. explain and what are the current patterns in data mining.[Marks 6+5+5]

1
2
3
4
5

( 0 Votes )

Add comment

JComments

Earning: Approval pending.