Cannot delete uploads if .volumes lies on an nfs share
I created a nomad deployment at fhi via an adapted docker-compose file. For backup and storage issues we moved the .volumes folder to an nfs share. However, this makes it impossible to delete any uploads via the nomad gui. The uploads persists and the status changes to Process delete_upload failed: OSError: [Errno 39] Directory not empty: 'archive'
. A second try changes the status again to Process delete_upload failed: OSError: [Errno 16] Device or resource busy: '.nfs000000000c01329400000001'
.
I tried it via nomad admin uploads rm
, too. It shows a similar error message:
1 uploads selected, deleting ...
ERROR nomad.cli 2022-11-22T07:29:36 could not delete files
- exception: Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/nomad/cli/admin/uploads.py", line 424, in delete_upload
upload_files.delete()
File "/usr/local/lib/python3.7/site-packages/nomad/files.py", line 671, in delete
shutil.rmtree(self.os_path)
File "/usr/local/lib/python3.7/shutil.py", line 494, in rmtree
_rmtree_safe_fd(fd, path, onerror)
File "/usr/local/lib/python3.7/shutil.py", line 432, in _rmtree_safe_fd
_rmtree_safe_fd(dirfd, fullname, onerror)
File "/usr/local/lib/python3.7/shutil.py", line 452, in _rmtree_safe_fd
onerror(os.unlink, fullname, sys.exc_info())
File "/usr/local/lib/python3.7/shutil.py", line 450, in _rmtree_safe_fd
os.unlink(entry.name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs000000000c01329400000001'
- exception_hash: JM34EDxYaKn6rnUZaYL8erVJcKaF
- nomad.commit: 88fba0386
- nomad.deployment: oasis
- nomad.service: nomad_oasis_app
- nomad.version: 1.1.5
This is the error message after a first try of deletion with nomad admin uploads rm
:
ERROR nomad.cli 2022-11-22T07:32:02 could not delete files
- exception: Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/nomad/cli/admin/uploads.py", line 424, in delete_upload
upload_files.delete()
File "/usr/local/lib/python3.7/site-packages/nomad/files.py", line 671, in delete
shutil.rmtree(self.os_path)
File "/usr/local/lib/python3.7/shutil.py", line 494, in rmtree
_rmtree_safe_fd(fd, path, onerror)
File "/usr/local/lib/python3.7/shutil.py", line 436, in _rmtree_safe_fd
onerror(os.rmdir, fullname, sys.exc_info())
File "/usr/local/lib/python3.7/shutil.py", line 434, in _rmtree_safe_fd
os.rmdir(entry.name, dir_fd=topfd)
OSError: [Errno 39] Directory not empty: 'archive'
- exception_hash: i4sNv8DQkHjsy-qHI4I0XBUjcmeu
- nomad.commit: 88fba0386
- nomad.deployment: oasis
- nomad.service: nomad_oasis_app
- nomad.version: 1.1.5
Running cli removes the uploads from the nomad gui but they still persist in .volumes/fs
.
This seems to be related to nfs creating hidden files, while files are still open (see https://bugzilla.redhat.com/show_bug.cgi?id=1362667). Therefore rmdir
is not able to remove the folder. So probably the access to the upload should be closed before deleting (if possible?). Alternatively, os.remove(...)
could be used recursively to delete this hidden nfs files before calling os.rmdir(...)
.