Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • nomad-FAIR nomad-FAIR
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 217
    • Issues 217
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 28
    • Merge requests 28
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • nomad-labnomad-lab
  • nomad-FAIRnomad-FAIR
  • Issues
  • #110
Closed
Open
Issue created Feb 18, 2019 by Markus Scheidgen@mscheidgOwner27 of 27 checklist items completed27/27 checklist items

Perform the migration

Situation

We have scripts for various operations that help the migration. The most useful are

  • package index
  • coe index
  • upload/migrate packages
  • find missing packages
  • some command for manipulating uploads

What was processed:

  • all data on /nomad/repository
  • lots of missing stuff from /data@web-repository-nomad

Everything in varying qualities over multiple versions

Checks to do

  • restricted files match restricted calcs: no not even the with_embargo fits
  • duplicates with no user
  • uploads do not belong to anyone
  • duplicates in 290, 125
  • what todo with MP (502)
  • duplicates with same PID
  • duplicates in the rest
  • nomad versions
  • compression on repo files

Definite things to do

  • repair embargo data in non 290/125
  • remove full pid dups for non 290/125
  • merge, reprocessed, delete pid dups for 290
  • repair embargo in 290 uploads
  • merge, reprocessed, delete pid dups for 125
  • repaid embargo in 125 uploads
  • remove full non nomad user uploads
  • reprocess with latest nomad version
  • move uploads to right user
  • move everything to the 'production' deployment
  • re-usable 'manual' to migrate new legacy data

Open issues

(will be tracked with further issues)

  • some missing files (use nomad migration missing), see also Known exceptions below
  • migrate stuff that was added later
  • not all files are compressed
  • in the db, calc.metadata.upload_time is stored as string and not data (for now this has not caused any issues)
  • There are still duplicates (<250k), number might be wrong since the underlying hash was changed mid migration. There are duplicates even with multiple PIDs and we ought to keep them.

Known exceptions

  • two exciting based uploads (very samll, 4 calcs) where parsing just does not stop. Package ids are: 8TF65FgwQgSSrzFESyvS4w and DhSOQ163THOMlrWlBOpnWA
Edited Aug 08, 2019 by Markus Scheidgen
Assignee
Assign to
Time tracking