a knowledge trading engine...

The Institution of Engineers,India 2006 A.M.I.E.T.E Electronics & Communication Engineering DATA WAREHOUSING AND DATA MINING - Question Paper

Saturday, 15 June 2013 06:15Web

Code: T-19 Subject: DATA WAREHOUSING AND DATA MINING
Time: three Hours June 2006 Max. Marks: 100

NOTE: There are nine ques. in all.

· ques. one is compulsory and carries 20 marks. ans to Q. 1. must be written in the space given for it in the ans book supplied and nowhere else.

· Out of the remaining 8 ques. ans any 5 ques.. every ques. carries 16 marks.

· Any needed data not explicitly given, may be suitably presumed and said.

Q.1 select the accurate or best option in the following: (2x10)

a. An example of OLTP is

(A) printing of payslips in payroll system

(B) withdrawal of money at ATM

(C) preparation of reservation chart in railway reservation system.

(D) computation of Income Tax.

b. Which of the subsequent is actual for OLAP

(A) usually requires read-only access.

(B) can be easily performed on any relational database.

(C) works on detailed operational data.

(D) satisfies the needs of operational level tasks in database systems.

c. Fact-constellation schema in a data warehouse is

(A) a normalized star schema

(B) a normalized snowflake schema

(C) collection of star schemata

(D) a subject oriented star schema.

d. Data integration means

(A) combining relevant fields of the record to derive aggregates.

(B) selecting task relevant data from various data sources.

(C) combining task relevant data from various data sources.

(D) normalizing and aggregating data using a well-defined technique.

e. During decision tree induction, pre-pruning leads to

(A) construction of full grown trees.

(B) small and right trees.

(C) small and in-accurate trees

(D) oversimplified and right trees.

f. Machine learning algorithms are not suitable for data mining because

(A) there is no machine for data mining.

(B) the algorithms are not scalable.

(C) they are not designed to mine association rules.

(D) they cannot discover hidden trends from data.

g. ETL tools are used for

(A) populating a relational database.

(B) populating a data warehouse.

(C) mining large relational database.

(D) mining large data warehouses.

h. PCA is a technique used for

(A) Mining trends. (B) Reducing data.

(C) Integrating data. (D) Cleaning data.

i. Whether a loan application is to be accepted or rejected can be decided by the subsequent technique:

(A) Wavelet transform (B) Classification.

(C) Association rule. (D) Normalization.

j. Whether items X and Y should be put on sale together can be decided by the subsequent technique

(A) decision tree (B) association rule

(C) binning (D) sampling

ans any 5 ques. out of 8 ques..

every ques. carries 16 marks.

Q.2 a. List 2 issues that arise with naturally evolving architectures. discuss with the help of suitable examples. (8)

b. List the outcomes that are achieved by monitoring the data warehouse environment. (8)

Q.3 a. elaborate the perceived goals of executive info systems? elaborate the practical difficulties in the implementation of these systems? (6)

b. With the help of a diagram, show how an EIS is supported by a data warehouse. (6)

c. discuss ‘Event mapping’ with the help of an example. (4)

Q.4 a. elaborate the issues relating to the storage and use of external data in a data warehouse? How are these issues handled? (6)

b. What do you understand by Metadata? How does it facilitate the use of external data in a data warehouse? (6)

c. Why is archiving of external data an important activity in a data warehouse. (4)

Q.5 a. Why are derived data and DSS data kept out of corporate data model? (4)

b. discuss the difference ranging from a migration plan and a methodology. Why do methodologies fail during implementation? (2+6)

c. Why is a feed back loop important for success of data warehouse implementation? (4)

Q.6 a. List and discuss 4 differences ranging from operational database system and data warehouse. (8)

b. elaborate advantages of using update-driven approach over query-driven approach for integrating multiple heterogonous info sources? (6)

c. What is a data cube? (2)

Q.7 Consider a data warehouse University which consists of the subsequent dimensions: Student, Course, Semester, Instructor and 2 measures count and grade. Make assumptions about the key fields for the dimensions.

i. Draw a star schema diagram for the data warehouse.

ii. Using DMQL describe the cube and dimensions.

iii. presume that semester can be rolled up to a year. Draw the snowflake schema corresponding to this.

iv. What is the OLAP operation to list the avg. grade of CS courses for the year?

v. Specify the SQL query corresponding to (iv) above assuming the implementation in ROLAP. (3+3+2+4+4)

Q.8 a. discuss 4 methods of handling missing values during data cleaning. (8)

b. What is normalization during data transformation? Differentiate ranging from

the 3 kinds of normalization. provide an example for every kind. (8)

Q.9 a. provided the subsequent transaction database D, mine for association rules with minimum support = 0.5 and minimum confidence = 0.8 (6)

D

Tid

1
A B D

2
A C

3
A B C

4
B D E

5
D E

6
A C

7
B D

8
B D E

b. Write a Generic decision tree induction algorithm. (5)

c. provided the subsequent decision tree for the concept buys-computer,

which shows whether or not a customer is likely to purchase a computer,

extract classification rules. (5)

Code: C-12 / T-10 Subject: DATA COMMUNICATION & NETWORKS

Code: T-19 Subject: DATA WAREHOUSING AND DATA MINING

Time: 3 Hours June 2006 Max. Marks: 100

NOTE: There are 9 Questions in all.

Question 1 is compulsory and carries 20 marks. Answer to Q. 1. must be written in the space provided for it in the answer book supplied and nowhere else.

Out of the remaining EIGHT Questions answer any FIVE Questions. Each question carries 16 marks.

Any required data not explicitly given, may be suitably assumed and stated.

Q.1 Choose the correct or best alternative in the following: (2x10)

a. An example of OLTP is

(A) printing of payslips in payroll system

(B) withdrawal of money at ATM

(C) preparation of reservation chart in railway reservation system.

(D) calculation of Income Tax.

b. Which of the following is true for OLAP

(A) usually requires read-only access.

(B) can be easily performed on any relational database.

(C) works on detailed operational data.

(D) satisfies the needs of operational level tasks in database systems.

c. Fact-constellation schema in a data warehouse is

(A) a normalized star schema

(B) a normalized snowflake schema

(C) collection of star schemata

(D) a subject oriented star schema.

d. Data integration means

(A) combining relevant fields of the record to derive aggregates.

(B) selecting task relevant data from different data sources.

(C) combining task relevant data from different data sources.

(D) normalizing and aggregating data using a well-defined technique.

e. During decision tree induction, pre-pruning leads to

(A) construction of full grown trees.

(B) small and accurate trees.

(C) small and in-accurate trees

(D) oversimplified and accurate trees.

f. Machine learning algorithms are not suitable for data mining because

(A) there is no machine for data mining.

(B) the algorithms are not scalable.

(C) they are not designed to mine association rules.

(D) they cannot discover hidden patterns from data.

g. ETL tools are used for

(A) populating a relational database.

(B) populating a data warehouse.

(C) mining large relational database.

(D) mining large data warehouses.

h. PCA is a technique used for

(A) Mining patterns. (B) Reducing data.

(C) Integrating data. (D) Cleaning data.

i. Whether a loan application is to be accepted or rejected can be decided by the following technique:

(A) Wavelet transform (B) Classification.

(C) Association rule. (D) Normalization.

j. Whether items X and Y should be put on sale together can be decided by the following technique

(A) decision tree (B) association rule

(C) binning (D) sampling

Answer any FIVE Questions out of EIGHT Questions.

Each question carries 16 marks.

Q.2 a. List two problems that arise with naturally evolving architectures. Explain with the help of suitable examples. (8)

b. List the results that are achieved by monitoring the data warehouse environment. (8)

Q.3 a. What are the perceived goals of executive information systems? What are the practical difficulties in the implementation of these systems? (6)

b. With the help of a diagram, show how an EIS is supported by a data warehouse. (6)

c. Explain Event mapping with the help of an example. (4)

Q.4 a. What are the problems relating to the storage and use of external data in a data warehouse? How are these problems handled? (6)

b. What do you understand by Metadata? How does it facilitate the use of external data in a data warehouse? (6)

c. Why is archiving of external data an important activity in a data warehouse. (4)

Q.5 a. Why are derived data and DSS data kept out of corporate data model? (4)

b. Explain the difference between a migration plan and a methodology. Why do methodologies fail during implementation? (2+6)

c. Why is a feed back loop important for success of data warehouse implementation? (4)

Q.6 a. List and explain four differences between operational database system and data warehouse. (8)

b. What are advantages of using update-driven approach over query-driven approach for integrating multiple heterogonous information sources? (6)

c. What is a data cube? (2)

Q.7 Consider a data warehouse University which consists of the following dimensions: Student, Course, Semester, Instructor and two measures count and grade. Make assumptions about the key fields for the dimensions.

i. Draw a star schema diagram for the data warehouse.

ii. Using DMQL define the cube and dimensions.

iii. Assume that semester can be rolled up to a year. Draw the snowflake schema corresponding to this.

iv. What is the OLAP operation to list the average grade of CS courses for the year?

v. Specify the SQL query corresponding to (iv) above assuming the implementation in ROLAP. (3+3+2+4+4)

Q.8 a. Explain four methods of handling missing values during data cleaning. (8)

b. What is normalization during data transformation? Differentiate between

the three types of normalization. Give an example for each type. (8)

Q.9 a. Given the following transaction database D, mine for association rules with minimum support = 0.5 and minimum confidence = 0.8 (6)

D

Tid
1	A B D
2	A C
3	A B C
4	B D E
5	D E
6	A C
7	B D
8	B D E

b. Write a Generic decision tree induction algorithm. (5)

c. Given the following decision tree for the concept buys-computer,

which indicates whether or not a customer is likely to purchase a computer,

extract classification rules. (5)

1
2
3
4
5

( 0 Votes )

Add comment

JComments

Earning: Approval pending.