Power-aware caches for GPGPUs

dc.contributor.advisorAtoofian, Ehsan
dc.contributor.authorSaghir, Ahsan
dc.date.accessioned2016-01-11T19:33:26Z
dc.date.available2016-01-11T19:33:26Z
dc.date.created2015
dc.date.issued2015
dc.description.abstractIn this thesis, we propose two optimization techniques to reduce power consumption in L1 caches (data, texture and constant), shared memory and L2 cache. The first optimization technique targets static power. Evaluation of GPGPU applications shows that once a cache block is accessed by a thread, it takes several hundreds of clock cycles until the same block is accessed again. The long inter-access cycle can be used to put cache cells into drowsy mode and reduce static power. While drowsy cells reduce static power, they increase access time as voltage of a cache cell in drowsy mode should be raised before the block can be accessed. To mitigate performance impact of drowsy cells, we propose a novel technique called coarse grained drowsy mode. In coarse grained drowsy mode, we partition each cache into regions of consecutive cache blocks and wake up a region upon cache access. Due to temporal and spatial locality of cache accesses, this method dramatically reduces performance impact caused by drowsy cells. The second optimization technique relies on branch divergence in GPGPUs. The execution model in GPGPUs is Single Instruction Multiple Thread (SIMT) which means processing cores execute the same instruction with different data for GPGPU threads. The SIMT execution model may result in divergence of threads when a control instruction is executed. GPGPUs execute branch instructions in two phases. In the first phase, threads in the taken path are active and the rest are idle. In the second phase, threads in the not-taken path are executed and the rest are idle. Contemporary GPGPUs access all portions of cache blocks, although some threads are idle due to branch divergence. We propose accessing only portions of cache blocks corresponding to active threads. By disabling unnecessary sections of cache blocks, we are able to reduce dynamic power of caches. Our results show that on average, the two optimization techniques together reduce power of caches by up to 98% and 15% for static and dynamic power, respectively.en_US
dc.identifier.urihttp://knowledgecommons.lakeheadu.ca/handle/2453/711
dc.language.isoen_USen_US
dc.subjectGeneral Purpose Graphics Processing Units (GPGPUs)en_US
dc.subjectMicroprocessorsen_US
dc.subjectPower consumptionen_US
dc.titlePower-aware caches for GPGPUsen_US
dc.typeThesis
etd.degree.disciplineEngineering : Electrical & Computeren_US
etd.degree.grantorLakehead Universityen_US
etd.degree.levelMasteren_US
etd.degree.nameMaster of Scienceen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SaghirA2015m-1a.pdf
Size:
1.34 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: