Currently, there seems to be no option to cancel an upload during processing. Trying to delete a pending upload results in the error: "Unexpected error: "The upload is still being processed.". Please try again and let us know, if this error keeps happening.".
I have several instances of uploads hanging indefinitely, e.g., upload ids: -NGoTYl-QYGg1W4ze0HcXA, Yvvzqxc0Q-6HfXTTMYWHog. Not sure what the issue is in these cases, but it seems to happen to me occasionally with MD data when the processing time is relatively long.
Do we want to give the user the option to delete and try again? Or is this something that they should always have to contact someone about?
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
I'm not sure how to go about finding the source of the issue in these individual cases. Since there are other issues related to the size of my data (e.g., in terms of the gui sometimes crashing as we previously discussed), I wonder if we fix some of these other symptoms if it may relieve the frequency of this happening.
In the mean time, just for my personal development usage (i.e., local gui on my laptop), is there any way to override this and delete the entry? Or else I ended up having to clear all my uploads when this happens (which is ultimately not that big of a deal though).
Interesting. I feel a bit responsible for the process system since I did a rather major refactoring. We try to catch SoftTimeLimitExceeded exceptions, and set the upload to state failed in that case. Would be interesting to look on the logs, but for that I think I would need your help @mscheidg.
Typically the processing should fail at some point with a timeout. In your case the processing did not even start, i.e. no failure due to time outs. We have to find a way to deal with processings stuck in "pending" state. @dsikter I think in this case SoftTimeLimitExceeded will not really be used/count. Maybe check with celery about the "PENDING" state.
After some thinking and experimenting, I think the problem is that the job was, for some reason, not successfully sent to celery/rabbitmq. I can also get uploads to get stuck like this, if I simply shut down the rabbitmq docker process and then try to add a file to an upload. I think this problem is also easily fixable, by adding a try-catch and set the status to FAILURE if an exception occurs (i.e. if the job could not be sent to the queue).
During testing I also discovered a strange behavior of the current celery version: when I shut down the rabbitmq process, I get an exception the first time I try to add a job to the queue, but subsequent attempts to add jobs instead become blocking, waiting for the connection to rabbitmq to be restored, and then continuing. The first and subsequent calls are thus treated differently, and I think the way the later calls are treated is bad, as it means that initiating a process can take arbitrarily long, and risks leaving us with an upload stuck in status PENDING. So I tested upgrading celery to the latest version, and then this inconsistent behavior disappeared (with celery upgraded, we always get an exception, like I want).
To upgrade celery I basically just bumped python versions on celery and three depending packages (click, pytz and uvicorn), and I also ticked up the version of the docker image in the docker-compose.yml. CI works without any problems. Are there any special considerations/actions that need to be taken when upgrading celery, that I don't know of, or does it sound reasonable that this would be all that is needed? @mscheidg?
Ah, ok. I did bumb the rabbitmq version in the docker image, but it's not actually necessary, so I reverted it. Should be fine to merge now. I also checked the logs, and it looks like the problem with these two uploads indeed was that the api failed to connect to rabbitmq at the time).