Computer Architecture 22200/32200 Homework #3
Due May 4


Turnin the cache simulator by using the instructions given here. The simulator is due BEFORE class on May 4th. Turnin the other parts of the homework either in email to Xuehai or in class on May 4th.
  1. (32200 only, 20 points) Readings In Computer Architecture
    Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers by N. P. Jouppi

  2. (22200/32200, 70 points) Questions from H&P book
    5.4a-e (50 points), 5.18 (20 points)
    Use the following assumptions for 5.4a-e:

  3. (22200/32200, 40 points) Write a loop that induces memory accesses such that the L2 cache only incurs compulsory misses, but every memory access causes a TLB compulsory or capacity miss. Assume an 128-entry fully associative TLB, 4KB pages, and a 256KB 4-way L2 cache with a 64-byte cache line. Briefly explain why the loop results in many more TLB misses than L2 cache misses even though the TLB implicitly caches more data.

  4. (22200/32200, 200 points) Write a simulator for an inclusive and an exclusive 2-level cache and run experiments. We provide the skeleton code and memory access traces (both courtesy of Tim Sherwood from UC, Santa Barbara) for experiments. You must implement the interface provided in the skeleton code.
    Skeleton Code
    • Download the skeleton code ( http://people.cs.uchicago.edu/~hai/courses/cs322/hw3/cachesim.tar.gz).
    • Run "make" to create the executable "cachesimulator".
    • Already you can run the simulator by passing it a zipped trace, "cachesimulator -f crafty_300k.cache.gz". The default implementation always indicates a miss.
    • You MUST use the same driver code to turn in your simulator. Edit the CacheSim files to implement the simulator. You can add any functionality you need, but make sure you implement the interface provided: the CacheSim constructor, destructor, and access method.
    • Although you must use the provided driver to turn in the code, you can, of course, write your own driver to test the simulator.

    Traces
    The trace files are in the world readable directory /home/mstrout/traces. Please do not copy the trace files so that we conserve disk space. Each gzip file can be passed as a command line parameter to the cachesimulator executable that is created by the skeleton code Makefile. For example, "cachesimulator -f /home/mstrout/traces/crafty_300k.cache.gz". The cachesimulator driver code reads the memory reference traces in a gzipped format, so do not ungzip them. This assignment will use six traces, a small one and a large one for each of the following three applications:
    • Trace 1: gcc, A trace from part of the gcc compiler
    • Trace 2: gzip, A trace taken from the compression of a graphic file using gzip
    • Trace 3: crafty, A trace from a chess playing program

    You will be implementing the CacheSim class that has been started in the cachesim.h and cachesim.cpp files. To test your cachesim implementation, we will be calling the constructor with the following parameters:
    • L1_size - L1 cache size in bytes
    • L1_assoc - L1 set associativity
    • L1_blocksize - L1 blocksize in bytes
    • L2_size - L2 cache size in bytes
    • L2_assoc - L2 set associativity
    • L2_blocksize - L2 blocksize in bytes
    • L2_type - whether the L2 cache should be exclusive or inclusive

    Your cache simulator should satisfy the following requirements and can make the following assumptions:

    • an associativity will be one, two, or four
    • L2.size > L1.size
    • L2.blocksize >= L1.blocksize
    • Implement an LRU replacement policy
    • Return hit or miss for L1 and L2 separately. If L1 is a hit then the default return value for L2 can be miss. (Although it won't be counted as such in the summary results).
    • Assume that all accesses are reads. Therefore, you need not worry about the write-back or write-through policy and write-allocate or write no-allocate.
    • The traces use 32 bit addresses.
    • For an inclusive L2 cache, if a block is in the L1 cache then it must be in the L2 cache. However, a block can get kicked out of the L1 cache and still be in the L2 cache.
    • For an exclusive L2 cache, a block in the L1 cache is never in the L2 cache. See Section 5.4 in H&P.

    Experiments
    1. Graph the global miss rate of an inclusive L2 versus an exclusive L2 when the L1 cache is 8KB direct-mapped with a 32byte block size. Let the L2 block size be the same, and let the L2 have 2-way set associativity. Vary the L2 cache size from 16K to 1MB via powers of two. Generate one graph for each big trace.
    2. Compare the global and local miss rates on the large traces for any two existing computer architectures by plugging in their L1 and L2 cache characteristics.

mstrout@mcs.anl.gov April 2004