All FFD_ units ending with Validate error

Message boards : Number crunching : All FFD_ units ending with Validate error

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Trotador

Send message
Joined: 30 May 09
Posts: 108
Credit: 291,214,977
RAC: 0
Message 78703 - Posted: 6 Sep 2015, 6:33:15 UTC
Last modified: 6 Sep 2015, 6:33:49 UTC

Hi

It seems that every FFD_ wu issued in the last two or three days is ending in vaslidation error in my computer.

Is it also anyuone else case?
ID: 78703 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile wyxchari

Send message
Joined: 27 Nov 14
Posts: 11
Credit: 85,318
RAC: 0
Message 78704 - Posted: 6 Sep 2015, 7:24:38 UTC

It is a bug. They are repairing. Put rosetta paused a few days and try again. While computes other projects. I have done it.

756338212 685529164 5 Sep 2015 18:03:44 UTC 5 Sep 2015 18:42:32 UTC Over Client error Compute error 0.00 0.00 ---
756310232 685503587 5 Sep 2015 16:53:59 UTC 5 Sep 2015 16:58:04 UTC Over Client error Compute error 0.00 0.00 ---
756307390 685500806 5 Sep 2015 17:08:01 UTC 5 Sep 2015 17:12:11 UTC Over Client error Compute error 0.00 0.00 ---
756247891 685454889 5 Sep 2015 9:29:07 UTC 5 Sep 2015 16:53:59 UTC Over Client error Compute error 0.00 0.00 ---
756247277 685454435 5 Sep 2015 9:25:01 UTC 5 Sep 2015 9:29:07 UTC Over Client error Compute error 0.00 0.00 ---
756242469 685450844 5 Sep 2015 8:53:27 UTC 5 Sep 2015 9:25:01 UTC Over Client error Compute error 0.00 0.00 ---
756227466 685409102 5 Sep 2015 7:15:32 UTC 5 Sep 2015 8:49:20 UTC Over Client error Compute error 0.00 0.00 ---
756225269 685438074 5 Sep 2015 7:02:02 UTC 5 Sep 2015 7:03:12 UTC Over Client error Compute error 0.00 0.00 ---
756222663 685435942 5 Sep 2015 6:50:29 UTC 5 Sep 2015 7:02:02 UTC Over Client error Compute error 0.00 0.00 ---
756221134 685434760 5 Sep 2015 6:40:17 UTC 5 Sep 2015 6:46:20 UTC Over Client error Compute error 0.00 0.00 ---
756215685 685430302 5 Sep 2015 6:46:20 UTC 5 Sep 2015 6:50:29 UTC Over Client error Compute error 0.00 0.00 ---
756215055 685429746 5 Sep 2015 7:07:17 UTC 5 Sep 2015 7:11:24 UTC Over Client error Compute error 0.00 0.00 ---
756214923 685429614 5 Sep 2015 7:11:24 UTC 5 Sep 2015 7:15:32 UTC Over Client error Compute error 0.00 0.00 ---
756148171 685380906 4 Sep 2015 21:55:44 UTC 5 Sep 2015 6:40:17 UTC Over Client error Compute error 0.00 0.00 ---
756147522 685380434 4 Sep 2015 21:51:38 UTC 4 Sep 2015 21:55:44 UTC Over Client error Compute error 0.00 0.00 ---
756146726 685379842 4 Sep 2015 21:43:22 UTC 4 Sep 2015 21:47:31 UTC Over Client error Compute error 0.00 0.00 ---
756146379 685379499 4 Sep 2015 21:47:31 UTC 4 Sep 2015 21:51:38 UTC Over Client error Compute error 0.00 0.00 ---
756142269 685191678 4 Sep 2015 21:11:45 UTC 4 Sep 2015 21:43:22 UTC Over Client error Compute error 0.00 0.00
ID: 78704 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Trotador

Send message
Joined: 30 May 09
Posts: 108
Credit: 291,214,977
RAC: 0
Message 78705 - Posted: 6 Sep 2015, 11:31:35 UTC

Yours could be another type of error making the wus abort just at the beggining. FFD_ units are mostly failing at validation, which is worse since they are crunched and sent back to the server to fail there. I'm seeing that all wingmen are also having Validate error for these wus.
ID: 78705 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile wyxchari

Send message
Joined: 27 Nov 14
Posts: 11
Credit: 85,318
RAC: 0
Message 78706 - Posted: 6 Sep 2015, 15:05:16 UTC

Sep 3, 2015. The minirosetta application has been updated to 3.62.

After upgrading to Rosetta 3.62 from day 4:

Task ID 755987746
Name rb_09_01_58658_103423_ab_stage0_h001___robetta_IGNORE_THE_REST_11_18_302525_12_0
Workunit 685246348
Created 4 Sep 2015 6:18:57 UTC
Sent 4 Sep 2015 10:13:57 UTC
Received 4 Sep 2015 11:42:49 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -1073741795 (0xffffffffc000001d)
Computer ID 2201387
Report deadline 18 Sep 2015 10:13:57 UTC
CPU time 0
stderr out
<core_client_version>7.4.42</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1073741795 (0xc000001d)
</message>
]]>
Validate state Invalid
Claimed credit 0
Granted credit 0
application version 3.62
ID: 78706 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,649,459
RAC: 0
Message 78711 - Posted: 6 Sep 2015, 21:21:30 UTC

There are FFD__ jobs like FFD__xxxxx_insulinxxxx that are completing and validating properly, but none of the FFH__xxxxx_abinitoDocking jobs are.
ID: 78711 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2145
Credit: 41,560,787
RAC: 9,320
Message 78714 - Posted: 7 Sep 2015, 0:56:56 UTC - in response to Message 78711.  

There are FFD__ jobs like FFD__xxxxx_insulinxxxx that are completing and validating properly, but none of the FFH__xxxxx_abinitoDocking jobs are.

Well spotted. Unfortunately I don't have any of those in my own task queue
ID: 78714 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
dkester788

Send message
Joined: 22 Oct 14
Posts: 2
Credit: 1,285,173
RAC: 0
Message 78720 - Posted: 7 Sep 2015, 19:56:09 UTC

I'm having the same issue with the FFH__xxxxx_abinitoDocking WUs. The jobs run to completion but fail validation when done. It sounds like they're working on the problem but I haven't seen any of the above WUs today.

Good Luck!
ID: 78720 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Trotador

Send message
Joined: 30 May 09
Posts: 108
Credit: 291,214,977
RAC: 0
Message 78727 - Posted: 8 Sep 2015, 5:18:09 UTC

Still having the Validate error with these units
ID: 78727 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,649,459
RAC: 0
Message 78729 - Posted: 8 Sep 2015, 5:30:18 UTC

We've just had a long weekend in North America so I suspect nothing has changed to remedy this just yet - let's see where we stand by end of day tomorrow. In the meantime, I'm continuing to abort these FFH__xxxxx_abinitoDocking jobs.
ID: 78729 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 78738 - Posted: 8 Sep 2015, 16:04:37 UTC

I just switched to DENIS while this is fixed.
ID: 78738 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
krypton
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 16 Nov 11
Posts: 108
Credit: 2,164,309
RAC: 0
Message 78739 - Posted: 8 Sep 2015, 21:25:25 UTC

It appears to be an issue with the latest rosetta build (which is being automatically updated on your end, once existing jobs have run). The format of the output changed for the protein-protein docking jobs and the validation script that we use on the server is expecting the old format. We have to either change the validation script or update rosetta output format.

We are working on it now!
Thanks for the feedback!
ID: 78739 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
krypton
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 16 Nov 11
Posts: 108
Credit: 2,164,309
RAC: 0
Message 78740 - Posted: 8 Sep 2015, 22:43:23 UTC

We've killed all new FFD_* jobs until the error is fixed.

Sorry for the trouble. We'll work on preventing this from happening in the future.
ID: 78740 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,649,459
RAC: 0
Message 78741 - Posted: 9 Sep 2015, 0:14:34 UTC - in response to Message 78740.  

We've killed all new FFD_* jobs until the error is fixed.

Sorry for the trouble. We'll work on preventing this from happening in the future.


Thanks for the update Sergey! I spent the day today at work debugging some rather complex code that recently underwent some major refactoring and broke one of our use cases that isn't tested for very thoroughly - akin to finding a needle in a haystack - so I totally understand what you all are going through! Cheers man!
ID: 78741 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Trotador

Send message
Joined: 30 May 09
Posts: 108
Credit: 291,214,977
RAC: 0
Message 78745 - Posted: 9 Sep 2015, 16:29:00 UTC - in response to Message 78739.  

It appears to be an issue with the latest rosetta build (which is being automatically updated on your end, once existing jobs have run). The format of the output changed for the protein-protein docking jobs and the validation script that we use on the server is expecting the old format. We have to either change the validation script or update rosetta output format.

We are working on it now!
Thanks for the feedback!


Good to know you have found out the root cause of the problem.

I still have tenths of Validate Error and no credit units from previous days, the correcting script should be having a lot of work/fun :).

Thanks!
ID: 78745 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2002
Credit: 9,790,281
RAC: 4,437
Message 78746 - Posted: 9 Sep 2015, 17:48:36 UTC - in response to Message 78740.  

We've killed all new FFD_* jobs until the error is fixed.


Strange. I continue to download FFD_* jobs
ID: 78746 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Trotador

Send message
Joined: 30 May 09
Posts: 108
Credit: 291,214,977
RAC: 0
Message 78748 - Posted: 9 Sep 2015, 18:41:13 UTC - in response to Message 78746.  

We've killed all new FFD_* jobs until the error is fixed.


Strange. I continue to download FFD_* jobs


+1

I've just checked it out....

and they continue giving validate error...
ID: 78748 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 78750 - Posted: 10 Sep 2015, 15:29:03 UTC - in response to Message 78746.  

We've killed all new FFD_* jobs until the error is fixed.


Strange. I continue to download FFD_* jobs


Often they basically halt the generation of new jobs for such problems, and any existing ones have to churn through to get fully purged. Or there are retries of existing tasks that were previously generated, and they are unable to cancel those. But the number left should be small.
Rosetta Moderator: Mod.Sense
ID: 78750 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Trotador

Send message
Joined: 30 May 09
Posts: 108
Credit: 291,214,977
RAC: 0
Message 78752 - Posted: 10 Sep 2015, 16:33:39 UTC

Is it planned to run the routine to give credits to the units with Validate error?

thanks
ID: 78752 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
krypton
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 16 Nov 11
Posts: 108
Credit: 2,164,309
RAC: 0
Message 78753 - Posted: 10 Sep 2015, 18:52:58 UTC

The invalid jobs should eventually get credit.

The error(s) from yesterday were my fault. Sorry about that!

We have a two step queue system. I killed all the FFD_* jobs in queue 1, but there were still some in queue 2. I've now launched the command to kill the jobs in queue 2 as well.
ID: 78753 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 78757 - Posted: 10 Sep 2015, 22:13:33 UTC - in response to Message 78752.  

Is it planned to run the routine to give credits to the units with Validate error?

thanks


If you check the task details, you will see the granted credit. For whatever reason, tasks given credit for errors do not show the granted credit on the task summary page.
Rosetta Moderator: Mod.Sense
ID: 78757 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : All FFD_ units ending with Validate error



©2024 University of Washington
https://www.bakerlab.org