Add timeout support for individual steps and overall deployment
We just found an issue where our CI deployments via Octopus had been backed up for more than a day because one of them was hung on an FTP to Azure issue. We found out because a tester tried to run a promotion to Test which didn't complete in a timely manner.
This highlighted that there is no step timeout available so that we could for example have the step fail if it doesn't complete within X minutes. Similarly, we should be able to fail the overall deployment if the collective steps take longer than Y minutes.
Until we implement this feature, the work around would be to monitor the deployment using the API and cancel it if a timeout has been reached.
I have written up a script (https://octopus.com/blog/automating-octopus-with-azure-functions) that retrieves all running deployments for a project and cancels it if:
- it has been running for more that 30 minutes
- the first step has been running fro more than 20 minutes
- the first step has not output any logs in 5 minutes
This can run as a scheduled task. Alternatively you can use the subscriptions feature to kick off a piece of code (eg Azure function) that does this polling, as described in this blog post: https://octopus.com/blog/automating-octopus-with-azure-functions
— Robert W
Aaron Roydhouse commented
This forum issue also has Octopus suggesting you run your own scheduler and script outside of Octopus to polyfill for the lack of timeout functionality.
Unfortunately all these work-arounds cancel the whole Task, not just the Step. So there is no opportunity to run on-fail/recovery/clean-up/notify Steps later in the Task. So there is really no solution or complete workaround to achieve Step Timeouts still.
Still not a feature at this time and Octopus said they just don't see it ever happening because of Dev time constraints and priorities. This is a make or break feature for my company as it leaves several people sitting and waiting for a long running step that is failing. This costs way too much time, which is money, frustration, and unreliability to perform smooth rollouts with Octopus. We are considering not using Octopus anymore because of this unreliability alone and we've invested a lot into it already. Even after the huge time and financial investment it's not worth the headache and unpredictability
Pretty sad that this isn't a core feature.
Tudor Pastor commented
Also, it's almost 2019. Why don't we have proper timeout mechanics?
Tudor Pastor commented
Where can we find this script?
Yes, please implement a custom timeout feature within each Octopus step that would make it easier to avoid endless deployment time.
Michael W commented
Kyle Carmitchel commented
I'm having an issue with a downstream system (functional testing tool) that is itself hanging, but unfortunately the state in which its hanging is causing Octopus to spin on the step talking to it indefinitely. This will then cause all deployments behind it to back up and also hang indefinitely.
If I could simply specify a timeout on the step talking to the testing tool, where if it takes longer than X minutes, simply cancel and fail the step, it would be a huge help to us.
Andreas Gullberg Larsen commented
Same story here, we are frequently getting hung deployments due to a large web app (250+ MB) that takes long to deploy to Azure webapp and seems to hang for days instead of timing out.
I'd expect it to time out to a sane default (30 minutes?) that is configurable for each process step as well as the entire process as a whole.
When it does timeout, I'd expect an email to be sent out to the administrators.