Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • nomad-FAIR nomad-FAIR
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 141
    • Issues 141
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 14
    • Merge requests 14
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • nomad-lab
  • nomad-FAIRnomad-FAIR
  • Issues
  • #751

Closed
Open
Created Feb 20, 2022 by Theodore Chang@thchangDeveloper

Optimize ArchiveReader read disk performance

The current implementation that uses blocks receives the entry id from external and uses binary search to search blocks to locate if the target entry is in the archive.

The _load_toc_block() method performs the following steps.

  1. compute the block number of the target entry
  2. load the corresponding block from the disk
  3. unpack entries and populate then into the self._toc dict

Consider an archive of n entries and, for simplicity, assume 1 entry per block so that entry and block are interchangable in the following context.

When search for each entry, the binary search requires O(log(n)) operations with O(log(n)) disk reads. Assume to read all n entries, then O(n*log(n)) disk reads are required. However, O(n) disk reads are required to read all entries.

To fix:

Maintain an internal list to record if the block has been read. If so, skip reading the block from disk. Reduce disk opeartion from O(n*log(n)) to O(n).

Edited Feb 20, 2022 by Theodore Chang
Assignee
Assign to
Time tracking