Power-aware caches for GPGPUs

Saghir, Ahsan

dc.contributor.advisor	Atoofian, Ehsan
dc.contributor.author	Saghir, Ahsan
dc.date.accessioned	2016-01-11T19:33:26Z
dc.date.available	2016-01-11T19:33:26Z
dc.date.created	2015
dc.date.issued	2015
dc.identifier.uri	http://knowledgecommons.lakeheadu.ca/handle/2453/711
dc.description.abstract	In this thesis, we propose two optimization techniques to reduce power consumption in L1 caches (data, texture and constant), shared memory and L2 cache. The first optimization technique targets static power. Evaluation of GPGPU applications shows that once a cache block is accessed by a thread, it takes several hundreds of clock cycles until the same block is accessed again. The long inter-access cycle can be used to put cache cells into drowsy mode and reduce static power. While drowsy cells reduce static power, they increase access time as voltage of a cache cell in drowsy mode should be raised before the block can be accessed. To mitigate performance impact of drowsy cells, we propose a novel technique called coarse grained drowsy mode. In coarse grained drowsy mode, we partition each cache into regions of consecutive cache blocks and wake up a region upon cache access. Due to temporal and spatial locality of cache accesses, this method dramatically reduces performance impact caused by drowsy cells. The second optimization technique relies on branch divergence in GPGPUs. The execution model in GPGPUs is Single Instruction Multiple Thread (SIMT) which means processing cores execute the same instruction with different data for GPGPU threads. The SIMT execution model may result in divergence of threads when a control instruction is executed. GPGPUs execute branch instructions in two phases. In the first phase, threads in the taken path are active and the rest are idle. In the second phase, threads in the not-taken path are executed and the rest are idle. Contemporary GPGPUs access all portions of cache blocks, although some threads are idle due to branch divergence. We propose accessing only portions of cache blocks corresponding to active threads. By disabling unnecessary sections of cache blocks, we are able to reduce dynamic power of caches. Our results show that on average, the two optimization techniques together reduce power of caches by up to 98% and 15% for static and dynamic power, respectively.	en_US
dc.language.iso	en_US	en_US
dc.subject	General Purpose Graphics Processing Units (GPGPUs)	en_US
dc.subject	Microprocessors	en_US
dc.subject	Power consumption	en_US
dc.title	Power-aware caches for GPGPUs	en_US
dc.type	Thesis
etd.degree.name	Master of Science	en_US
etd.degree.level	Master	en_US
etd.degree.discipline	Engineering : Electrical & Computer	en_US
etd.degree.grantor	Lakehead University	en_US

Files in this item

Name:: SaghirA2015m-1a.pdf
Size:: 1.342Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Electronic Theses and Dissertations from 2009 [1635]

Show simple item record