Replies: 2 comments
-
One potential solution could be to break down your long-running jobs into smaller, more manageable chunks that can be completed within the 5-day timeframe. This could involve splitting your deep learning model training process into smaller batches or stages, allowing each stage to complete within the time limit imposed by GitHub Actions. Another option could be to explore alternative CI/CD platforms or self-hosted solutions that offer more flexibility in terms of job duration limits. Some platforms may have longer timeout limits or allow for more customization in terms of job execution. Additionally, you could consider optimizing your deep learning model training process to reduce the overall runtime. This could involve optimizing your code, using more efficient algorithms, or optimizing your hardware resources to speed up the training process. Lastly, you may want to reach out to GitHub support or community forums to see if there are any workarounds or future updates planned to address this limitation specifically for long-running jobs on self-hosted runners. |
Beta Was this translation helpful? Give feedback.
-
🕒 Discussion Activity Reminder 🕒 This Discussion has been labeled as dormant by an automated system for having no activity in the last 60 days. Please consider one the following actions: 1️⃣ Close as Out of Date: If the topic is no longer relevant, close the Discussion as 2️⃣ Provide More Information: Share additional details or context — or let the community know if you've found a solution on your own. 3️⃣ Mark a Reply as Answer: If your question has been answered by a reply, mark the most helpful reply as the solution. Note: This dormant notification will only apply to Discussions with the Thank you for helping bring this Discussion to a resolution! 💬 |
Beta Was this translation helpful? Give feedback.
-
Select Topic Area
Question
Body
Recently there was an update in github actions (blog post) that makes our jobs in a workflows for self-hosted runners timeout in 5 days (while still allowing the workflow to be run for 35 days).
This seems like a not convenient limitation to artificially split some long-running jobs (we have some) into several jobs and it is sometimes not trivial to do. E.g. our use-case is launch deep learning model training given limited hardware resources, so it was a convenient tool for us -- and no longer is.
Therefore, there is a question what solutions could we have overlooked that would allow us to have some jobs running for more than 5 days under current new limitations?
Beta Was this translation helpful? Give feedback.
All reactions