Lakehead University Library Logo
    • Login
    View Item 
    •   Knowledge Commons Home
    • Electronic Theses and Dissertations
    • Electronic Theses and Dissertations from 2009
    • View Item
    •   Knowledge Commons Home
    • Electronic Theses and Dissertations
    • Electronic Theses and Dissertations from 2009
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.
    quick search

    Browse

    All of Knowledge CommonsCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsDisciplineAdvisorCommittee MemberThis CollectionBy Issue DateAuthorsTitlesSubjectsDisciplineAdvisorCommittee Member

    My Account

    Login

    Design and optimization of heterogeneous coded distributed computing

    Thumbnail
    View/Open
    ZhangS2025m-2b.pdf (1.623Mb)
    Date
    2025
    Author
    Zhang, Siyu
    Metadata
    Show full item record
    Abstract
    The massive increase in data volume in recent years has posed significant challenges for traditional data processing systems. Although distributed computing has been considered as an effective solution, its efficient implementation faces the challenge of the high communication overhead incurred by data exchange (shuffling) between workers. Coded Distributed Computing (CDC) has been proposed by utilizing coded multicasting to reduce the shuffling load. To our best knowledge, existing works on the CDC only consider input files with uniform file size, limiting their practicality in real-world applications. To address this limitation, we propose a Heterogeneous Coded Distributed Computing (HetCDC) scheme to handle input files of nonuniform sizes. We then formulate a joint optimization problem to optimize the file placement and coded shuffling strategies to minimize the shuffling load. Through reformulation, we convert the nonconvex optimization problem into an integer linear programming problem and solve it through the branch-and-cut method. Numerical studies show the proposed HetCDC outperforms existing works. Based on the Het- CDC, we further develop a Heterogeneous TeraSort algorithm to improve the sorting time of traditional TeraSort, which is a key building blocks for many big data processing algorithms.
    URI
    https://knowledgecommons.lakeheadu.ca/handle/2453/5546
    Collections
    • Electronic Theses and Dissertations from 2009 [1745]

    Lakehead University Library
    Contact Us | Send Feedback

     

     


    Lakehead University Library
    Contact Us | Send Feedback