a knowledge trading engine...

DOEACC Society 2006 DOEACC C Level CE1 - Advanced Computer Architecture( ) - Question Paper

Friday, 14 June 2013 04:30Web

Page 2 of 2

these total 50% of the instructions. If the miss penalty is 25 clock cycles and the miss
rate is 2%, how much faster would the computer be if all instructions were cache hits?
b) A pipeline P is obtained to give a speedup of 6.16 when operating at 100 MHz and an
efficiency of 88 percent.
i) How many stages does P have?
ii) elaborate P’s MIPS and CPI performance levels?
c) Let’s use an in-order execution computer for the 1st example, such as the Ultra-SPARC
III. presume the cache miss penalty is 100 clock cycles, and all instructions normally take
1.0-clock cycles (ignoring memory stalls). presume the avg. miss rate is 2%, there is
an avg. of 1.5 memory references per instruction, and the avg. number of cache
misses per 1000 instructions is 30. What is the impact on performance when behavior of
the cache is included? compute the impact using both misses per instruction and miss
rate.
(4+8+6)
5.
a) What is a nonblocking cache? explain its advantages.
b) discuss compiler-controlled prefetching technique.
c) What is a virtual cache? discuss why virtual caches are not popular?
d) What can interleaving and wide memory buy? Consider the subsequent description of a
computer and its cache performance:
CE1-R3 Page two of three January, 2006
Block size = one word
Memory bus width = one word
Miss rate = 3%
Memory accesses per instruction = 1.2
Cache miss penalty = 64 cycles (as above)
avg. cycles per instruction (ignoring cache misses) = 2
If we change the block size to two words, the miss rate falls to 2%, and a 4-word block has
a miss rate of 1.2%. What is the improvement in performance of interleaving 2 ways
versus doubling the width of memory and the bus?
(4+4+5+5)
6.
a) explain Flynn’s classification of processors.
b) Compare shared memory multiprocessor architecture and distributed memory
architecture.
c) Suppose you want to achieve a speedup of 80 with 100 processors. What fraction of the
original calculation can be sequential?
d) Suppose we have an application running on a 32-bit multiprocessor, which has a 400 ns
time to handle reference to a remote memory. For this application, presume that all the
references other than those involving communication hit in the local memory hierarchy,
which is slightly optimistic. Processors are stalled on a remote request, and the
processor clock rate is one GHz. If the base IPC (assuming that all references hit in the
cache) is 2, how much faster is the multiprocessor if there is no communication versus if
0.2% of the instructions involve a remote communication reference?
(3+4+6+5)
7.
a) Compare shared and switched interconnection media.
b) define the subsequent terminologies associated with multiprocessor operating systems
and MIMD algorithms:
i) Protection mechanisms
ii) Scheduling
iii) Degree of decomposition of a parallel algorithm.
c) The CM-5 supercomputer used wormwhole routing, with every switch buffer being just 4
bits per port. Compare efficiency of store-and-forward versus wormhole routing for a 128-
node machine using a CM-5 interconnection sending a 16-byte payload. presume every
switch takes 0.25 µs and that the transfer rate is 20MB/sec.
(4+[3´3]+5)
CE1-R3 Page three of three January, 2006

«
Start
Prev
1
2
Next
End
»

1
2
3
4
5

( 0 Votes )

Add comment

JComments

Earning: Approval pending.