Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 179 · 180 · 181 · 182 · 183 · 184 · 185 . . . 259 · Next

AuthorMessage
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1218
Credit: 13,366,970
RAC: 108
Message 104970 - Posted: 19 Feb 2022, 0:21:33 UTC - in response to Message 104959.  
Last modified: 19 Feb 2022, 0:21:50 UTC

Set cache setting to 10 days and try and trash as many of them as possible :-), until it gets backoffd
Yes that is what I have done
see how many I can get rid of
Not that many (compared to how many there are to get through).
For a particular application, for every error you return, you have the amount of work you can download reduced by 1 until you get to the point you will only be able to get 1 Task per 24 hours.
Returning Valid work increases the amount of work you can download per day.

For at least some BOINC projects, if every computer that tries a workunit gives computation error, BOINC decides that the workunit is defective and no longer counts any of the failures against the computers that tried it.
ID: 104970 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1916
Credit: 35,478,949
RAC: 1,621
Message 104972 - Posted: 19 Feb 2022, 0:29:57 UTC - in response to Message 104927.  

I only run on one Linux and one Windows 7 atm and all the Windows error out within 20 seconds. All the Ubuntu ones were good until now. The 14 movingstubs units ran the requested 8 hours on my 3900X. I can see that a new team member successfully completed his on Linux, but all were 3 hours. W3670. CPU and OS related?

I've reported this too. Your W7, my W10 and someone else's W11 all fail while Ubuntu and Linux all run ok.
Hopefully it means something to them.
Tasks haven't been withdrawn yet (that I've noticed) and I've had no direct reply yet either.
ID: 104972 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1916
Credit: 35,478,949
RAC: 1,621
Message 104973 - Posted: 19 Feb 2022, 0:43:28 UTC - in response to Message 104964.  
Last modified: 19 Feb 2022, 0:46:26 UTC

After getting all Computation Errors on all the WU's I set my PC to No New Tasks until you folks can figure out why Windows crashes with the "movingstub" Work units

Unless one of our experts here reaches out to his contact at UW, there is no one that will see these posts.
So "you folks" is a pointless thing.

I'm not 100% on the ball at the moment, Greg, but I am reporting things within 24 hours of seeing a post about them here.
Specifically I mean the movingstub tasks computation error, and now the update that they're running ok on UbuntuLinux but not any version of windows.
I haven't got my head around what the specific issue is with rb tasks crashing out, but I do see people reporting problems with those too.
If someone describes the rb issue to me in a way I can pass on, I'll follow up with that too.
Is it much the same as movingstubs tasks or are they crashing on Ubuntu as well?

Edit
Also in python the aagb-PHE stuff is buggy

Just spotted your mention of this too. I'm away from my one PC running Python tasks for another few days, so I'll try to confirm that issue as well
ID: 104973 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 236
Credit: 24,067,171
RAC: 306
Message 104974 - Posted: 19 Feb 2022, 0:51:20 UTC
Last modified: 19 Feb 2022, 0:57:36 UTC

Well its time for bed so I have undone my movingtargets silliness
Got 500 of them in the bin.
Though I did have a long running one , it lasted a full 30 seconds .
ID: 104974 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1386
Credit: 13,693,695
RAC: 1
Message 104975 - Posted: 19 Feb 2022, 2:27:25 UTC

Well, the project was down for a while there, but the movingstub crash and burns are still there since it came back online.
Grant
Darwin NT
ID: 104975 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 180
Credit: 5,004,906
RAC: 109
Message 104976 - Posted: 19 Feb 2022, 3:01:36 UTC - in response to Message 104975.  
Last modified: 19 Feb 2022, 3:02:41 UTC

SO! MANY! ERRORS!

First off are the known bad movingstub tasks that crash almost instantly.

Then there are the broken rb_ tasks that don't crash as fast.

rb_02_16_213031_208962_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_06_04_2906559_35_1
<core_client_version>7.16.20</core_client_version>
<![CDATA[
<message>
Incorrect function.
 (0x1) - exit code 1 (0x1)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_02_16_213031_208962_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 3 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_02_16_213031_208962_ab_t000__robetta.zip -frag3 rb_02_16_213031_208962_ab_t000__robetta.200.3mers.index.gz -fragA rb_02_16_213031_208962_ab_t000__robetta.200.4mers.index.gz -fragB rb_02_16_213031_208962_ab_t000__robetta.200.6mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3477109
Using database: database_357d5d93529_n_methylminirosetta_database

[ ERROR ]: Caught exception:


File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306
chi angle must be between -180 and 180: -nan(ind)
 ------------------------ Begin developer's backtrace ------------------------- 
BACKTRACE:
 ------------------------- End developer's backtrace -------------------------- 


AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS.



</stderr_txt>
]]>



rb_02_16_213027_208958_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_06_08_2906555_55_1
<core_client_version>7.16.20</core_client_version>
<![CDATA[
<message>
Incorrect function.
 (0x1) - exit code 1 (0x1)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_02_16_213027_208958_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 3 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_02_16_213027_208958_ab_t000__robetta.zip -frag3 rb_02_16_213027_208958_ab_t000__robetta.200.3mers.index.gz -fragA rb_02_16_213027_208958_ab_t000__robetta.200.8mers.index.gz -fragB rb_02_16_213027_208958_ab_t000__robetta.200.6mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3485351
Using database: database_357d5d93529_n_methylminirosetta_database

[ ERROR ]: Caught exception:


File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306
chi angle must be between -180 and 180: -nan(ind)
 ------------------------ Begin developer's backtrace ------------------------- 
BACKTRACE:
 ------------------------- End developer's backtrace -------------------------- 


AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS.



</stderr_txt>
]]>



rb_02_16_213026_208957_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_06_09_2906558_22_1
<core_client_version>7.16.20</core_client_version>
<![CDATA[
<message>
Incorrect function.
 (0x1) - exit code 1 (0x1)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_02_16_213026_208957_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 3 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_02_16_213026_208957_ab_t000__robetta.zip -frag3 rb_02_16_213026_208957_ab_t000__robetta.200.3mers.index.gz -fragA rb_02_16_213026_208957_ab_t000__robetta.200.9mers.index.gz -fragB rb_02_16_213026_208957_ab_t000__robetta.200.6mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3477932
Using database: database_357d5d93529_n_methylminirosetta_database

[ ERROR ]: Caught exception:


File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306
chi angle must be between -180 and 180: -nan(ind)
 ------------------------ Begin developer's backtrace ------------------------- 
BACKTRACE:
 ------------------------- End developer's backtrace -------------------------- 


AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS.



</stderr_txt>
]]>


Windows 11
ID: 104976 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
6dj72cn8

Send message
Joined: 18 Apr 06
Posts: 5
Credit: 206,941
RAC: 0
Message 104977 - Posted: 19 Feb 2022, 3:55:07 UTC - in response to Message 104955.  

So it's going to take a while to clear them unless the project . . . puts in a flag with the Scheduler to only allocate them to LINUX systems.

Linux or Mac. It's only Windows that's choking on them.
ID: 104977 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1386
Credit: 13,693,695
RAC: 1
Message 104978 - Posted: 19 Feb 2022, 4:53:24 UTC - in response to Message 104977.  

So it's going to take a while to clear them unless the project . . . puts in a flag with the Scheduler to only allocate them to LINUX systems.

Linux or Mac. It's only Windows that's choking on them.
And Mac OS is LINUX by a different name, or if it's M1 it's Android, by a different name.
Grant
Darwin NT
ID: 104978 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1386
Credit: 13,693,695
RAC: 1
Message 104979 - Posted: 19 Feb 2022, 4:56:26 UTC - in response to Message 104976.  

SO! MANY! ERRORS!

First off are the known bad movingstub tasks that crash almost instantly.

Then there are the broken rb_ tasks that don't crash as fast.
And in the past when Tasks have crashed out with this error
chi angle must be between -180 and 180: -nan(ind)
you've still gotten Credit for the work done.
For some reason, that's not happening with these.
Grant
Darwin NT
ID: 104979 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1218
Credit: 13,366,970
RAC: 108
Message 104980 - Posted: 19 Feb 2022, 5:10:35 UTC - in response to Message 104979.  

SO! MANY! ERRORS!

First off are the known bad movingstub tasks that crash almost instantly.

Then there are the broken rb_ tasks that don't crash as fast.
And in the past when Tasks have crashed out with this error
chi angle must be between -180 and 180: -nan(ind)
you've still gotten Credit for the work done.
For some reason, that's not happening with these.

That probably depends on whether the verifier program on the server recognizes the error message as enough evidence to declare the workunit faulty.
ID: 104980 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1386
Credit: 13,693,695
RAC: 1
Message 104981 - Posted: 19 Feb 2022, 5:30:39 UTC - in response to Message 104980.  

SO! MANY! ERRORS!

First off are the known bad movingstub tasks that crash almost instantly.

Then there are the broken rb_ tasks that don't crash as fast.
And in the past when Tasks have crashed out with this error
chi angle must be between -180 and 180: -nan(ind)
you've still gotten Credit for the work done.
For some reason, that's not happening with these.

That probably depends on whether the verifier program on the server recognizes the error message as enough evidence to declare the workunit faulty.
Hmm.
Or in the past the reported Task has included a Result file along with the Stderr output, but in these cases it hasn't?


Either way, these errors have been occurring for ages now, the applications should have been updated to handle the error and to just treat it as the Task finishing early, and not as an error after it has already done all that work.
Grant
Darwin NT
ID: 104981 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 180
Credit: 5,004,906
RAC: 109
Message 104982 - Posted: 19 Feb 2022, 5:32:03 UTC

Welp, 4 hours later, this one also crashed:

rb_02_16_213027_208958_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_06_09_2906555_51_1
<core_client_version>7.16.20</core_client_version>
<![CDATA[
<message>
Incorrect function.
 (0x1) - exit code 1 (0x1)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_02_16_213027_208958_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 3 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_02_16_213027_208958_ab_t000__robetta.zip -frag3 rb_02_16_213027_208958_ab_t000__robetta.200.3mers.index.gz -fragA rb_02_16_213027_208958_ab_t000__robetta.200.9mers.index.gz -fragB rb_02_16_213027_208958_ab_t000__robetta.200.6mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3484950
Using database: database_357d5d93529_n_methylminirosetta_database

[ ERROR ]: Caught exception:


File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306
chi angle must be between -180 and 180: -nan(ind)
 ------------------------ Begin developer's backtrace ------------------------- 
BACKTRACE:
 ------------------------- End developer's backtrace -------------------------- 


AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS.



</stderr_txt>
]]>


Same issue with all the other borked rb_ tasks. I sure hope they catch this and fix it soon.

Wait, it's the weekend...
This is fine, I'm fine, there's nothing infuriating about this at all... Nothing at all.
ID: 104982 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
6dj72cn8

Send message
Joined: 18 Apr 06
Posts: 5
Credit: 206,941
RAC: 0
Message 104983 - Posted: 19 Feb 2022, 5:51:36 UTC - in response to Message 104978.  

And Mac OS is LINUX by a different name, or if it's M1 it's Android, by a different name.

Yes, I know. A UNIX kernel. I bothered to mention it because I din't want the Admins to limit the work units to specifically a LINUX OS after they had read this board.
[That was a joke, in case you also feel the need to explain to me that they don't read here.]
ID: 104983 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1386
Credit: 13,693,695
RAC: 1
Message 104984 - Posted: 19 Feb 2022, 6:36:26 UTC
Last modified: 19 Feb 2022, 6:43:19 UTC

OK, this LINUX system has popped out plenty of errors with those problem RB Tasks, and they have gotten credit.

And here is a Windows system erroring out the same RB Tasks, and getting Credit for them as well.
Grant
Darwin NT
ID: 104984 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1916
Credit: 35,478,949
RAC: 1,621
Message 104985 - Posted: 19 Feb 2022, 7:43:56 UTC - in response to Message 104973.  

I haven't got my head around what the specific issue is with rb tasks crashing out, but I do see people reporting problems with those too.
If someone describes the rb issue to me in a way I can pass on, I'll follow up with that too.
Is it much the same as movingstubs tasks or are they crashing on Ubuntu as well?

I've finally found some RB tasks on this PC and, while they haven't finished, they're at a few hours in without crashing yet.
The problem ones you guys are posting are rb_02_14 or rb_02_16 and the ones I have are rb_02_18
If the project was shut down for a while and came back up shortly after, could it be a fix went in? I notice a half million drop in the queued jobs on the front page.

Or I'm speaking too soon. I'll see in 5hrs more time
ID: 104985 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5652
Credit: 5,622,096
RAC: 0
Message 104986 - Posted: 19 Feb 2022, 8:53:59 UTC - in response to Message 104969.  

24 movingstubs were in my queue. Aborted them instantly.
Not going to up my error count for stupidness from their end.
Aborted Tasks count as errors.



Yeah I saw that, but I didn't waste any compute time on them. Mental thing..or more of a blank blank thing really.

I just saw the updated comments, so I'll let those tasks back in.
ID: 104986 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1686
Credit: 6,629,809
RAC: 240
Message 104987 - Posted: 19 Feb 2022, 9:12:27 UTC - in response to Message 104985.  
Last modified: 19 Feb 2022, 9:14:35 UTC

Still "movingstub" wus download today.
Yesterday I tried to contact Rosetta@Home, RosettaCommons and IPD twitter accounts.
Today answer from Rosett@Home account:
We'll look into this. Thanks for flagging!


Fingers crossed!!
ID: 104987 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 4
Message 104994 - Posted: 19 Feb 2022, 10:46:20 UTC - in response to Message 104987.  

Still "movingstub" wus download today.
Yesterday I tried to contact Rosetta@Home, RosettaCommons and IPD twitter accounts.
Today answer from Rosett@Home account:
We'll look into this. Thanks for flagging!


Fingers crossed!!

Thanks for alerting them to it. But on what other project are they not monitoring their own work units for failures?
And on what other project are they not monitoring the forums for problems?

I sometimes wonder if they are serious about this at all.
ID: 104994 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
computezrmle

Send message
Joined: 9 Dec 11
Posts: 62
Credit: 9,680,103
RAC: 0
Message 104995 - Posted: 19 Feb 2022, 10:57:45 UTC - in response to Message 104994.  

I sometimes wonder if they are serious about this at all.

+1
Even correcting errors without posting a short official comment like "issue xyz is now fixed" is a very impolite behaviour.
ID: 104995 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrchips

Send message
Joined: 11 Nov 09
Posts: 4
Credit: 8,897,994
RAC: 333
Message 104996 - Posted: 19 Feb 2022, 12:08:25 UTC

Name movingstub_gzm1_minimize_3CL_AVLstub_0194_21_extract_B_SAVE_ALL_OUT_2908392_402_0
Outcome Computation error

Have been getting these errors for 3 days, no tasks finish, then I get
this computer has finished a daily quota of 1 tasks
ID: 104996 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 179 · 180 · 181 · 182 · 183 · 184 · 185 . . . 259 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2023 University of Washington
https://www.bakerlab.org