Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 179 · 180 · 181 · 182 · 183 · 184 · 185 . . . 309 · Next

AuthorMessage
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,398,287
RAC: 19,677
Message 104978 - Posted: 19 Feb 2022, 4:53:24 UTC - in response to Message 104977.  

So it's going to take a while to clear them unless the project . . . puts in a flag with the Scheduler to only allocate them to LINUX systems.

Linux or Mac. It's only Windows that's choking on them.
And Mac OS is LINUX by a different name, or if it's M1 it's Android, by a different name.
Grant
Darwin NT
ID: 104978 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,398,287
RAC: 19,677
Message 104979 - Posted: 19 Feb 2022, 4:56:26 UTC - in response to Message 104976.  

SO! MANY! ERRORS!

First off are the known bad movingstub tasks that crash almost instantly.

Then there are the broken rb_ tasks that don't crash as fast.
And in the past when Tasks have crashed out with this error
chi angle must be between -180 and 180: -nan(ind)
you've still gotten Credit for the work done.
For some reason, that's not happening with these.
Grant
Darwin NT
ID: 104979 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 104980 - Posted: 19 Feb 2022, 5:10:35 UTC - in response to Message 104979.  

SO! MANY! ERRORS!

First off are the known bad movingstub tasks that crash almost instantly.

Then there are the broken rb_ tasks that don't crash as fast.
And in the past when Tasks have crashed out with this error
chi angle must be between -180 and 180: -nan(ind)
you've still gotten Credit for the work done.
For some reason, that's not happening with these.

That probably depends on whether the verifier program on the server recognizes the error message as enough evidence to declare the workunit faulty.
ID: 104980 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,398,287
RAC: 19,677
Message 104981 - Posted: 19 Feb 2022, 5:30:39 UTC - in response to Message 104980.  

SO! MANY! ERRORS!

First off are the known bad movingstub tasks that crash almost instantly.

Then there are the broken rb_ tasks that don't crash as fast.
And in the past when Tasks have crashed out with this error
chi angle must be between -180 and 180: -nan(ind)
you've still gotten Credit for the work done.
For some reason, that's not happening with these.

That probably depends on whether the verifier program on the server recognizes the error message as enough evidence to declare the workunit faulty.
Hmm.
Or in the past the reported Task has included a Result file along with the Stderr output, but in these cases it hasn't?


Either way, these errors have been occurring for ages now, the applications should have been updated to handle the error and to just treat it as the Task finishing early, and not as an error after it has already done all that work.
Grant
Darwin NT
ID: 104981 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 180
Credit: 5,386,173
RAC: 0
Message 104982 - Posted: 19 Feb 2022, 5:32:03 UTC

Welp, 4 hours later, this one also crashed:

rb_02_16_213027_208958_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_06_09_2906555_51_1
<core_client_version>7.16.20</core_client_version>
<![CDATA[
<message>
Incorrect function.
 (0x1) - exit code 1 (0x1)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_02_16_213027_208958_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 3 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_02_16_213027_208958_ab_t000__robetta.zip -frag3 rb_02_16_213027_208958_ab_t000__robetta.200.3mers.index.gz -fragA rb_02_16_213027_208958_ab_t000__robetta.200.9mers.index.gz -fragB rb_02_16_213027_208958_ab_t000__robetta.200.6mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3484950
Using database: database_357d5d93529_n_methylminirosetta_database

[ ERROR ]: Caught exception:


File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306
chi angle must be between -180 and 180: -nan(ind)
 ------------------------ Begin developer's backtrace ------------------------- 
BACKTRACE:
 ------------------------- End developer's backtrace -------------------------- 


AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS.



</stderr_txt>
]]>


Same issue with all the other borked rb_ tasks. I sure hope they catch this and fix it soon.

Wait, it's the weekend...
This is fine, I'm fine, there's nothing infuriating about this at all... Nothing at all.
ID: 104982 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
6dj72cn8

Send message
Joined: 18 Apr 06
Posts: 5
Credit: 207,684
RAC: 0
Message 104983 - Posted: 19 Feb 2022, 5:51:36 UTC - in response to Message 104978.  

And Mac OS is LINUX by a different name, or if it's M1 it's Android, by a different name.

Yes, I know. A UNIX kernel. I bothered to mention it because I din't want the Admins to limit the work units to specifically a LINUX OS after they had read this board.
[That was a joke, in case you also feel the need to explain to me that they don't read here.]
ID: 104983 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,398,287
RAC: 19,677
Message 104984 - Posted: 19 Feb 2022, 6:36:26 UTC
Last modified: 19 Feb 2022, 6:43:19 UTC

OK, this LINUX system has popped out plenty of errors with those problem RB Tasks, and they have gotten credit.

And here is a Windows system erroring out the same RB Tasks, and getting Credit for them as well.
Grant
Darwin NT
ID: 104984 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,529,707
RAC: 10,374
Message 104985 - Posted: 19 Feb 2022, 7:43:56 UTC - in response to Message 104973.  

I haven't got my head around what the specific issue is with rb tasks crashing out, but I do see people reporting problems with those too.
If someone describes the rb issue to me in a way I can pass on, I'll follow up with that too.
Is it much the same as movingstubs tasks or are they crashing on Ubuntu as well?

I've finally found some RB tasks on this PC and, while they haven't finished, they're at a few hours in without crashing yet.
The problem ones you guys are posting are rb_02_14 or rb_02_16 and the ones I have are rb_02_18
If the project was shut down for a while and came back up shortly after, could it be a fix went in? I notice a half million drop in the queued jobs on the front page.

Or I'm speaking too soon. I'll see in 5hrs more time
ID: 104985 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 104986 - Posted: 19 Feb 2022, 8:53:59 UTC - in response to Message 104969.  

24 movingstubs were in my queue. Aborted them instantly.
Not going to up my error count for stupidness from their end.
Aborted Tasks count as errors.



Yeah I saw that, but I didn't waste any compute time on them. Mental thing..or more of a blank blank thing really.

I just saw the updated comments, so I'll let those tasks back in.
ID: 104986 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2002
Credit: 9,783,459
RAC: 5,082
Message 104987 - Posted: 19 Feb 2022, 9:12:27 UTC - in response to Message 104985.  
Last modified: 19 Feb 2022, 9:14:35 UTC

Still "movingstub" wus download today.
Yesterday I tried to contact Rosetta@Home, RosettaCommons and IPD twitter accounts.
Today answer from Rosett@Home account:
We'll look into this. Thanks for flagging!


Fingers crossed!!
ID: 104987 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 104994 - Posted: 19 Feb 2022, 10:46:20 UTC - in response to Message 104987.  

Still "movingstub" wus download today.
Yesterday I tried to contact Rosetta@Home, RosettaCommons and IPD twitter accounts.
Today answer from Rosett@Home account:
We'll look into this. Thanks for flagging!


Fingers crossed!!

Thanks for alerting them to it. But on what other project are they not monitoring their own work units for failures?
And on what other project are they not monitoring the forums for problems?

I sometimes wonder if they are serious about this at all.
ID: 104994 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
computezrmle

Send message
Joined: 9 Dec 11
Posts: 63
Credit: 9,680,103
RAC: 0
Message 104995 - Posted: 19 Feb 2022, 10:57:45 UTC - in response to Message 104994.  

I sometimes wonder if they are serious about this at all.

+1
Even correcting errors without posting a short official comment like "issue xyz is now fixed" is a very impolite behaviour.
ID: 104995 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrchips

Send message
Joined: 11 Nov 09
Posts: 10
Credit: 15,046,470
RAC: 11,288
Message 104996 - Posted: 19 Feb 2022, 12:08:25 UTC

Name movingstub_gzm1_minimize_3CL_AVLstub_0194_21_extract_B_SAVE_ALL_OUT_2908392_402_0
Outcome Computation error

Have been getting these errors for 3 days, no tasks finish, then I get
this computer has finished a daily quota of 1 tasks
ID: 104996 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 354
Credit: 1,276,393
RAC: 2,018
Message 104998 - Posted: 19 Feb 2022, 13:45:31 UTC - in response to Message 104994.  

Thanks for alerting them to it. But on what other project are they not monitoring their own work units for failures?
And on what other project are they not monitoring the forums for problems?

I sometimes wonder if they are serious about this at all.



One would think that at least the researcher who submitted these work units would be aware of the large numbers of computational errors.
ID: 104998 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2002
Credit: 9,783,459
RAC: 5,082
Message 105001 - Posted: 19 Feb 2022, 17:34:28 UTC - in response to Message 104998.  

One would think that at least the researcher who submitted these work units would be aware of the large numbers of computational errors.


Seems that no one is stopping the batch....
ID: 105001 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kissagogo27

Send message
Joined: 31 Mar 20
Posts: 86
Credit: 2,976,511
RAC: 1,853
Message 105004 - Posted: 19 Feb 2022, 18:13:33 UTC - in response to Message 104936.  

I have had a number of failures on the Rosetta 4.20 in the last couple of days, but all of the "movingstub" ones have worked.
It is probably because I am on Linux (Ubuntu).
https://boinc.bakerlab.org/rosetta/results.php?userid=52455&offset=0&show_names=0&state=4&appid=



perhaps a solution found is to use the 4.21 version like i wrote here
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=14922&postid=105003#105003

Boinc do it alone .
ID: 105004 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 272
Credit: 507,897
RAC: 334
Message 105007 - Posted: 19 Feb 2022, 18:38:22 UTC - in response to Message 105004.  

I have set rosetta to no new tasks, synchronized with project, resetted project, and still get 4.20 workunits
ID: 105007 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105010 - Posted: 19 Feb 2022, 20:03:47 UTC - in response to Message 104994.  

Jim - as I said before, we are guenna pigs and our computers are used for leftovers and stuff they can't run or don't want to run on their AI system.
We used to be the front line and now we are back line. And with backline they don't monitor, because if they did then Sid would not be having to fill in the gap between us and the project.
And if they would have followed procedure, they would have found the fault in the movingstubs group via Ralph and corrected it before they released it to Rosie.
But this is the way it goes now.
ID: 105010 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105011 - Posted: 19 Feb 2022, 20:04:17 UTC - in response to Message 105007.  

I have set rosetta to no new tasks, synchronized with project, resetted project, and still get 4.20 workunits



Are you trying to get python work?
ID: 105011 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 272
Credit: 507,897
RAC: 334
Message 105012 - Posted: 19 Feb 2022, 20:05:05 UTC - in response to Message 105011.  

I was trying to get 4.21 application
ID: 105012 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 179 · 180 · 181 · 182 · 183 · 184 · 185 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org