How To Exam?

a knowledge trading engine...

Jawaharlal Nehru Technological University Kakinada 2009-1st Sem B.Tech Computer Science and Engineering IV Regular s, DATA WAREHOUSING AND DATA MINING (Computer Science & Engineering) -

Friday, 09 August 2013 10:15Web

Code No: M0502

Set No. 1

IV B.Tech I Semester Regular Examinations, November 2009

DATA WAREHOUSING AND DATA MINING

(Computer Science & Engineering)

Time: 3 hours

Max Marks: 80

Answer any FIVE Questions

All Questions carry equal marks

1. (a) Explain data mining as a step in the process of knowledge discovery.

(b) Diﬀerentiate operational database systems and data warehousing.[8+8]

2. (a) Brieﬂy discuss the forms of Data preprocessing with neat diagram.

(b) Brieﬂy discuss the parametric and non- parametric methods of Numerosity

reduction..[8+8]

3. Explain the syntax for the following data mining primitives:

(a) Task-relevant data

(b) The kind of knowledge to be mined

(c) Interestingness measures

(d) Presentation and visualization of discovered patterns[16]

4. (a) How can we perform attribute relevant analysis for concept description? Ex-

plain.

(b) Brieﬂy explain about the presentation of class comparison descriptions. [8+8]

5. Compare and contrast the diﬀerences between mining single dimensional Boolean

Association rules and multilevel Association rules for transactional databases. [16]

6. (a) Why naive Bayesian classiﬁcation called naive? Brieﬂy outline the major

ideas of naive Bayesian classiﬁcation.

(b) Deﬁne regression. Brieﬂy explain about linear, non-linear and multiple regres-

sions.[8+8]

7. (a) Use a diagram to illustrate how, for a constant MinPts value, density-based

clusters with respect to a higher density (i.e., a lower value for ε , the neigh-

borhood radius) are completely contained in density- connected sets obtained

with respect to a lower density.

(b) Give an example of how speciﬁc clustering methods may be integrated, for

example, where one clustering algorithm is used as a preprocessing step for

another.1 of 2[8+8]

8. (a) Explain similarity search in multimedia data.

(b) Explain similarity search in time-series analysis.1 of 2

(c) What is meant by authoritative web pages? Explain about mining the webs

link structures to identify authoritative web pages. [5+6+5]

Code No: M0502

Set No. 2

IV B.Tech I Semester Regular Examinations, November 2009

DATA WAREHOUSING AND DATA MINING

(Computer Science & Engineering)

Time: 3 hours

Answer any FIVE Questions

All Questions carry equal marks

Max Marks: 80

1. (a) Explain the eﬃcient computation of data cubes.

(b) Discuss the eﬃcient processing of OLAP queries.[8+8]

2. Brieﬂy discuss the Discretization and concept hierarchy techniques.[16]

3. Explain the syntax for the following data mining primitives:

(a) Task-relevant data

(b) The kind of knowledge to be mined

(c) Interestingness measures

(d) Presentation and visualization of discovered patterns.[16]

4. (a) Diﬀerentiate attribute generalization threshold control and generalized rela-

tion threshold control.

(b) Diﬀerentiate between predictive and descriptive data mining.[8+8]

5. Propose a method for mining hybrid-dimension association rules (multidimensional

association rules with repeating predicates)and explain with an example. [16]

6. (a) Why naive Bayesian classiﬁcation called naive? Brieﬂy outline the major

ideas of naive Bayesian classiﬁcation.

(b) Deﬁne regression. Brieﬂy explain about linear, non-linear and multiple regres-

sions.[8+8]

7. The following table contains the attributes name, gender, trait-1, trait-2, trait-3,

and trait-4, where name is an object-id, gender is a symmetric attribute, and the

remaining trait attributes are asymmetric, describing personal traits of individuals

who desire a penpal. Suppose that a service exists that attempt to ﬁnd pairs of

compatible penpals.

Name gender trair-1 trait-2 trait-3 trait-4

Kevan M N P P N

Caroline F N P P N

Erilk M P N N P

. . . . . .

For asymmetric attribute values, let the value P be set to 1 and the value N be set

to 0. Suppose that the distance between objects (potential penpals) is computed

based only on the asymmetric variables.

(a) Show the contingency matrix for each pair given Kevan, Caroline, and Erik.

(b) Compute the simple matching coeﬃcient for each pair.

(c) Compute the Jaccard coeﬃcient for each pair.

(d) Who do you suggest would make the best pair of penpals? Which pair of

individuals would be the least compatible. [4+4+4+4]

8. (a) What is multimedia database? Explain mining multimedia databases.

(b) What is a time-series database? What is a sequence database? Explain mining

time-series and sequence data.[8+8]

Code No: M0502

Set No. 3

IV B.Tech I Semester Regular Examinations, November 2009

DATA WAREHOUSING AND DATA MINING

(Computer Science & Engineering)

Time: 3 hours

Max Marks: 80

Answer any FIVE Questions

All Questions carry equal marks

1. (a) Explain data mining as a step in the process of knowledge discovery.

(b) Diﬀerentiate operational database systems and data warehousing..[8+8]

2. Explain various data reduction techniques[16]

3. The four major types of concept hierarchies are: schema hierarchies, set-grouping

hierarchies, operation-derived hierarchies, and rule-based hierarchies.

(a) Brieﬂy deﬁne each type of hierarchy.

(b) For each hierarchy type, provide an example.[16]

4. (a) Diﬀerentiate attribute generalization threshold control and generalized rela-

tion threshold control.

(b) Diﬀerentiate between predictive and descriptive data mining.[8+8]

5. (a) Explain about constraint-based Association mining.

(b) Give an example for Association rule mining? Classify Association rules.[8+8]

6. (a) Given a decision tree, you have the option of (i) converting the decision tree

to rules and then pruning the resulting rules, or (ii) pruning the decision tree

and then converting the pruned tree to rules. What advantages does former

option have over later one. Explain.

(b) Can any ideas from association rule mining be applied to classiﬁcation? Ex-

plain.[8+8]

7. Explain the following:

(a) DBSCAN

(b) OPTICS

(c) DENCLUE

(d) BIRCH. [4+4+4+4]

8. (a) What is spatial data warehouse? What are the diﬀerent types of dimensions

in a spatial data cube? What are the diﬀerent types of measures in a spatial

data cube?

(b) What is keyboard-based association analysis? How can automated document

classiﬁcation be performed?

(c) Brieﬂy discuss about mining the World Wide Web.[2+2+2+2+2+6]

Code No: M0502

Set No. 4

IV B.Tech I Semester Regular Examinations, November 2009

DATA WAREHOUSING AND DATA MINING

(Computer Science & Engineering)

Time: 3 hours

Max Marks: 80

Answer any FIVE Questions

All Questions carry equal marks

1. (a) Describe three challenges to data mining regarding data mining methodology

and user interaction issues.

(b) Explain Indexing OLAP data..[8+8]

2. Explain various data reduction techniques.[16]

3. (a) Discuss the various forms of visualizing the discovered patterns.

(b) Discuss about the task-relevant data speciﬁcation[8+8]

4. Suppose that the data for analysis include the attribute age. The age values for

the data tuples are (in increasing order):

13,15,16,16,19,20,20,21,22,22,25,25,25,25,30,33,33,35,35,35,35,36,40,45,46,52,70.

(a) What is the mean of the data?

(b) What is the median?

(c) What is the mode of the data? Comment on the datas modality.

(d) What is the mid range of the data?

(e) Can you ﬁnd (roughly) the ﬁrst quartile(Q1),and third quartile(Q3) of the

data?

(f) Give the ﬁve number summaries of the data.

(g) Show a box plot of the data.

(h) How is the quantile-quantile plot diﬀerent from a quantile plot?[16]

5. Sequential patterns can be mined in methods similar to the mining of association

rules. Design an eﬃcient algorithm to mine multilevel sequential patterns from

a transaction database. An example of such a pattern is the following A customer

who buys a PC will buy Microsoft software within three months, on which one

may drill down to ﬁnd a more reﬁned version of the patterns, such as A customer

who buys a Pentium PC will buy Microsoft oﬃce within three months. [16]

6. (a) What is classiﬁcation? What is prediction? Describe issues regarding classiﬁ-

cation and prediction.

(b) Explain Bayesian belief networks. How does a Bayesian belief network train?

[8+8]

7. (a) Write algorithms for k-Means and k-Medoids. Explain.

(b) Discuss about density-based methods.[8+8]

8. (a) Explain the classiﬁcation and prediction analysis of multimedia data.

(b) What are basic measures for text retrieval? What methods are there for

information retrieval?

(c) What is meant by authoritative Web pages? Explain about mining the Webs

link structures to identify authoritative web page. [4+6+6]

1
2
3
4
5

( 0 Votes )

Add comment

JComments

Earning: Approval pending.