Perform the migration
Situation
We have scripts for various operations that help the migration. The most useful are
-
package index -
coe index -
upload/migrate packages -
find missing packages -
some command for manipulating uploads
What was processed:
-
all data on /nomad/repository -
lots of missing stuff from /data@web-repository-nomad
Everything in varying qualities over multiple versions
Checks to do
-
restricted files match restricted calcs: no not even the with_embargo fits -
duplicates with no user -
uploads do not belong to anyone -
duplicates in 290, 125 -
what todo with MP (502) -
duplicates with same PID -
duplicates in the rest -
nomad versions -
compression on repo files
Definite things to do
-
repair embargo data in non 290/125 -
remove full pid dups for non 290/125 -
merge, reprocessed, delete pid dups for 290 -
repair embargo in 290 uploads -
merge, reprocessed, delete pid dups for 125 -
repaid embargo in 125 uploads -
remove full non nomad user uploads -
reprocess with latest nomad version -
move uploads to right user -
move everything to the 'production' deployment -
re-usable 'manual' to migrate new legacy data
Open issues
(will be tracked with further issues)
- some missing files (use
nomad migration missing
), see also Known exceptions below - migrate stuff that was added later
- not all files are compressed
- in the db, calc.metadata.upload_time is stored as string and not data (for now this has not caused any issues)
- There are still duplicates (<250k), number might be wrong since the underlying hash was changed mid migration. There are duplicates even with multiple PIDs and we ought to keep them.
Known exceptions
- two exciting based uploads (very samll, 4 calcs) where parsing just does not stop. Package ids are:
8TF65FgwQgSSrzFESyvS4w
andDhSOQ163THOMlrWlBOpnWA