Automatically retry steps if they are failed
Now when a step failed you have to set guided failure on in order to retry the step, it would be handy to have a setting where you say always retry failed steps.
For example if you are doing continues deployment where a build server triggers a deployment from a build, you have to:
- login into octopus
- select the correct project
- manually intervene and assign it to you
- retry it
It would be much better if the retry can be done automatically
A band-aid i would recommend trying for people who have these kinds of steps that routinely fail is to just clone the step and on the clone just update the Run Condition to "Failure:only run when a previous step failed"
Furthermore if you have other steps that may fail beyond the one you know works with a retry just leverage the output variable run condition "Variable: only run when the variable expression is true" This is probably the cleanest way to gauge this but is more advanced for some. You can add a custom script with some powershell to define a true or false value.
Steve Morgan commented
I am using Azure App Service and my deployments fail regularly this feature would be fantastic, I would like there to be a feature that retries the failed step and any previous steps that you state if possible.
Aaron Roydhouse commented
It would be great to combine this with timeouts for Steps. So if an idempotent Step gets stuck, it can eventually timeout and retry. Currently your Step can wedge for _days_. And you can only cancel the whole Task, not the Step.
Steve Land commented
In our case, we frequently have failures with several causes such as
> Flaky proxy server
> SFTP errors
> Delay between terraform steps completing & services becoming live
> Azure Zipdeploy failures
in 90% of these cases an immediate retry succeeds - and since the majority of our deployments are CI triggered it would be a really great user experience if we didnt have to do this manually.
Bryan Roth commented
This would be very useful. I've noticed built-in Octopus steps failing because of a locked file, and simply retrying that step often succeeds. It would be nice to have an option to retry a step if a failure occurs up to a certain amount of retries.
I know it's possible to bake in retry functionality into script steps or step templates that you create.
Duangchan Ueta commented
Please add this. it would help us overcome failure turning on the vms.
This would be brilliant, Ie try 3 times and then stop with failure etc
This feature would be greatly appreciated, specially for long running projects. We have sporadic timeout issues with big files and it would be extremely useful.
Please add this. Many of our deploys are on Windows and there are so many things that can go wrong where a simply retry, just once, seems to correct it. To get around this issue, we've had to add our own custom retry methods which have helped a lot but we still have many step templates to port over. It would be so much more convenient and scalable to have this built into octo.
Mathew Gallagher commented
This feature would be helpful. Our particular instance comes with an issue we have starting one of our Windows services. Do to old code the service manager will sometimes time out. Simply retrying the step in a guided failure usually gets it to start. Would love to automate this error test and correction.
This feature would be useful. Have a system var that you can set for Deployment.MaxRetry. We have sporadic file access issues, but the main one is 'log4net:WARN Cannot RollFile', which clears itself on first retry.
Lee Cherry commented
This would also be useful in instances of locked files believe it or not. I have seen lots of instances whereby the deployment is waiting on a retry due to not being able to update a file, I don't do anything other than hit retry and it works. Suggests at that moment in time a file was locked.
Other instances where it would be useful is when multiple deployments are happening to the same machine, e.g. IIS websites, you may want to stop IIS for one for the sites to be deployed, and then start it again. If there were multiple different sites on the same box, it would be good to deploy all projects at once and let Octopus do the retry should IIS be stopped when it needs to be started
I would need this if tentacle was offline, to do the deployment when it comes online.
In my place not every deployment targert in online all the time.
Sujit Singh commented
For Azure deployment steps I wonder if adding number of retry can be added in the step template as a parameter to achieve this.
We would also really appreciate a retry setting. Also having issues with failed azure resource group deployments that usually work on the first manual retry.
Glen Boonzaier commented
We really need this for Azure deployments. I have to manually intervene about 6-10 times per deployment as steps often fail due to timeout or connectivity problems.
This would be great, especially with the email step, sometimes the email step fails on connection to smtp and marks the whole deployment as failed even though that was the only portion that failed.