Aborted work and a lot of wasted time - again

Message boards : Number crunching : Aborted work and a lot of wasted time - again

To post messages, you must log in.

AuthorMessage
walli

Send message
Joined: 4 Nov 12
Posts: 5
Credit: 14,685,197
RAC: 198
Message 95210 - Posted: 23 Apr 2020, 14:19:14 UTC

Hi guys,

I just noticed that hundreds of my work units were aborted by the server yesterday, and I'm talking not only about not-yet-started ones, but tasks which already ran for hours. Altogether I lost about (2816696.11 seconds) 3.26 days of work/runtime!

I understand if you don't need the results anymore (they were *all* "aborted by project - no longer usable"), but then please go at least for some credit-compensation in the future...

Thanks,

walli
ID: 95210 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bartonius

Send message
Joined: 4 Apr 20
Posts: 1
Credit: 70,262
RAC: 0
Message 95275 - Posted: 24 Apr 2020, 5:27:31 UTC - in response to Message 95210.  

It was the same for me, although I had a lot less aborted WUs. We donate our CPU Time and actually also some money with our electrical bills and then the server is just aborting the work and is'nt even rewarding credits for the work done.

I stopped calculating for Rosetta now, thats not how you can treat your users.
ID: 95275 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Admin
Project administrator

Send message
Joined: 1 Jul 05
Posts: 4805
Credit: 0
RAC: 0
Message 95277 - Posted: 24 Apr 2020, 6:55:01 UTC

We used to grant claimed credit to all canceled, past deadline, and invalid results and I see no reason why we shouldn't continue doing this. I think this task was lost after our last server update. I just restarted it and it will run every 6 hours.
ID: 95277 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
walli

Send message
Joined: 4 Nov 12
Posts: 5
Credit: 14,685,197
RAC: 198
Message 95297 - Posted: 24 Apr 2020, 11:00:28 UTC

Hi guys,

We used to grant claimed credit to all canceled


Nope, see above or the other big thread where this topic is currently on discussion - people got no credits for the server-side aborted tasks... https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13852

past deadline


Yes and No. I had a bunch of tasks about 2-3 weeks ago ("rb*" resp. "*robetta*"), which got no credits at all as soon as they were past the deadline even for a single second. There was no other info/hint in the stderr-log which leads to any other assumption but the deadline.

and invalid results


I'm pretty sure that this ist not the case, but I have no work units/tasks to look at atm to confirm this because the results are purged very fast.

I think this task was lost after our last server update.


Which "task"? Do you speak of a setting or something like a cronjob for this credit-problem...?

Thanks for your reply and best regards,

walli
ID: 95297 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1481
Credit: 14,602,674
RAC: 15,288
Message 95298 - Posted: 24 Apr 2020, 11:14:40 UTC - in response to Message 95297.  
Last modified: 24 Apr 2020, 11:15:38 UTC

Hi guys,
We used to grant claimed credit to all canceled
Nope, see above or the other big thread where this topic is currently on discussion - people got no credits for the server-side aborted tasks... https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13852
Double check what was posted there-
We used to grant claimed credit to all canceled

Grant
Darwin NT
ID: 95298 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
walli

Send message
Joined: 4 Nov 12
Posts: 5
Credit: 14,685,197
RAC: 198
Message 95302 - Posted: 24 Apr 2020, 13:02:44 UTC - in response to Message 95298.  

My bad, apologies. :)
ID: 95302 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JohnDK
Avatar

Send message
Joined: 6 Apr 20
Posts: 33
Credit: 2,390,240
RAC: 354
Message 95317 - Posted: 24 Apr 2020, 18:49:53 UTC

Very annoying, 4 WUs cancelled, all running over 50.000 secs. Is it me or what?

https://boinc.bakerlab.org/rosetta/results.php?hostid=4063805&offset=0&show_names=0&state=6&appid=
ID: 95317 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Admin
Project administrator

Send message
Joined: 1 Jul 05
Posts: 4805
Credit: 0
RAC: 0
Message 95319 - Posted: 24 Apr 2020, 19:59:27 UTC

Please let us know if the claimed credit is not granted within a 6 hour time period. I added the credit granting task as a cron job that runs every 6 hours.

I have confirmed that it is working.
ID: 95319 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1981
Credit: 38,448,817
RAC: 14,577
Message 95324 - Posted: 24 Apr 2020, 21:24:37 UTC - in response to Message 95317.  

Very annoying, 4 WUs cancelled, all running over 50.000 secs. Is it me or what?

https://boinc.bakerlab.org/rosetta/results.php?hostid=4063805&offset=0&show_names=0&state=6&appid=

Not you - it's the size of the upload file again
And none of your runtimes were unreasonably long either

<error_code>-131 (file size too big)</error_code>

ID: 95324 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1981
Credit: 38,448,817
RAC: 14,577
Message 95361 - Posted: 25 Apr 2020, 19:29:35 UTC - in response to Message 95324.  

Very annoying, 4 WUs cancelled, all running over 50.000 secs. Is it me or what?

https://boinc.bakerlab.org/rosetta/results.php?hostid=4063805&offset=0&show_names=0&state=6&appid=

Not you - it's the size of the upload file again
And none of your runtimes were unreasonably long either

<error_code>-131 (file size too big)</error_code>

Credited, I now notice. Sounds like that clean-up job caught up with it. Good news
ID: 95361 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Aborted work and a lot of wasted time - again



©2024 University of Washington
https://www.bakerlab.org