Mitigating the impact of decompression latency in L1 compressed data caches via prefetching

Rea, Sean

dc.contributor.advisor	Atoofian, Ehsan
dc.contributor.author	Rea, Sean
dc.date	2017
dc.date.accessioned	2018-03-06T15:52:03Z
dc.date.available	2018-03-06T15:52:03Z
dc.date.issued	2017
dc.identifier.uri	https://knowledgecommons.lakeheadu.ca/handle/2453/4134
dc.description.abstract	Expanding cache size is a common approach for reducing cache miss rates and increasing performance in processors. This approach, however, comes at a cost of increased static and dynamic power consumption by the cache. Static power scales with the number of transistors in the design, while dynamic power increases with the number of transistors being switched and the effective operating frequency of the cache. Cache compression is a technique that can increase the effective capacity of cache memory without experiencing the same gains in static and dynamic power consumption. Alternatively, this technique can reduce the physical size and therefore the static and dynamic energy usage of the cache while maintaining reasonable effective cache capacity. A drawback of compression is that a delay, or decompression latency, is experienced when accessing the compressed data, which affects the critical execution path of the processor. This latency can have a noticeable impact on processor performance, especially when implemented in first level caches. Cache prefetching techniques have been used to hide the latency of lower level memory accesses. This work aims to investigate the combination of current prefetching techniques and cache compression techniques to reduce the effect of decompression latency and therefore improve the feasibility of power reduction via compression in high level caches. We propose an architecture that combines L1 data cache compression with table-based prefetching to predict which cache lines will require decompression. The architecture then performs decompression in parallel, moving the delay due to decompression off the critical path of the processor. The architecture is verified using 90nm CMOS technology simulations in a new branch of SimpleScalar, using Wattch as a baseline, and cache model inputs from CACTI. Compression and decompression hardware are synthesized using the 90nm Cadence GPDK and verified at the register-transfer level. The results of our verifications demonstrate that using Base-Delta-Immediate (BΔI) compression, in combination with Last Outcome (LO), Stride (S), and Two-Level (2L) prefetch methods, or hybrid combinations of these methods (S/LO or 2L/S), provides performance improvement over Base-Delta-Immediate (BΔI) compression alone in L1 data cache. On average, across the SPEC CPU 2000 benchmarks tested, Base-Delta-Immediate (BΔI) compression results in a slowdown of 3.6%. Implementing a 1K-Set Last Outcome prefetch mechanism improves slowdown to 2.1% and reduces the energy consumption of the L1 Data Cache by 21% versus a baseline scheme with no compression.	en_US
dc.language.iso	en_US	en_US
dc.subject	Cache compression	en_US
dc.subject	Power consumption	en_US
dc.subject	Cache prefetching techniques	en_US
dc.title	Mitigating the impact of decompression latency in L1 compressed data caches via prefetching	en_US
dc.type	Thesis
etd.degree.name	Master of Science	en_US
etd.degree.level	Master	en_US
etd.degree.discipline	Engineering : Electrical & Computer	en_US
etd.degree.grantor	Lakehead University	en_US

Files in this item

Name:: ReaS2017m-3b.pdf
Size:: 3.243Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Electronic Theses and Dissertations from 2009 [1744]

Show simple item record