Lakehead University Library Logo
    • Login
    View Item 
    •   Knowledge Commons Home
    • Electronic Theses and Dissertations
    • Electronic Theses and Dissertations from 2009
    • View Item
    •   Knowledge Commons Home
    • Electronic Theses and Dissertations
    • Electronic Theses and Dissertations from 2009
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.
    quick search

    Browse

    All of Knowledge CommonsCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsDisciplineAdvisorCommittee MemberThis CollectionBy Issue DateAuthorsTitlesSubjectsDisciplineAdvisorCommittee Member

    My Account

    Login

    Zero-shot pruning of transformer language models using non-dominated sorting genetic algorithm

    View/Open
    Embargoed until May 11, 2026 (1001.Kb)
    Date
    2025
    Author
    Jahadi, Reza
    Metadata
    Show full item record
    Abstract
    Large Language Models (LLMs) are advanced neural networks trained on massive text corpora to understand and generate human language. While they have grown significantly in power and capability, their extensive parameter counts result in high computational costs. Current pruning techniques typically apply uniform sparsity across all layers of LLMs. However, not all layers contribute equally to the model's performance. Therefore, we propose a non-uniform sparsity mapping algorithm that assigns varying levels of sparsity to each layer according to its impact on the model's performance. To identify the optimal effective allocation schemes, we create a search space comprising a population of candidates for sparsity mapping in the LLM. We leverage an evolutionary algorithm to conduct crossover and mutation on the top-performing candidates within this population, guided by performance evaluations. To determine the optimal sparsity mapping, we employ the Non-dominated Sorting Genetic Algorithm II (NSGA-II), which provides us with a set of optimal solutions that balance pruning ratio and performance trade-offs. We implement an unstructured pruning approach to maximize sparsity. The pruning process is conducted on a sorted list of weights in each layer, utilizing Hessian approximation (the second-order term of the Taylor series expansion) as the basis for selection. Furthermore, we employ a novel technique for efficiently measuring the importance scores of LLM layers. In this approach, the LLM is divided into several chunks of layers, and at each iteration, the importance score for the selected chunk is computed. By utilizing a gradient accumulation technique, we collect the scores for mini-batches of input data. We applied our algorithm on GPT2 architecture to demonstrate the applicability of our algorithm for LLMs from millions of parameters to billions of parameters. We perform comprehensive experiments using the Wikitext and PTB datasets, showing that our method leads to substantial performance improvements on the GPT-2 Medium, Large, and XL models. Remarkably, the GPT-2 model pruned using our algorithm achieves a 15.8% and 3.8% smaller model over state-of-the-art techniques, DistilGPT-2 and ZipLM, respectively, while offering less performance degradation. Notably, our approach requires no retraining or fine-tuning, in contrast to these existing methods, which rely on extensive retraining.
    URI
    https://knowledgecommons.lakeheadu.ca/handle/2453/5502
    Collections
    • Electronic Theses and Dissertations from 2009 [1635]

    Lakehead University Library
    Contact Us | Send Feedback

     

     


    Lakehead University Library
    Contact Us | Send Feedback