Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 179 · 180 · 181 · 182 · 183 · 184 · 185 . . . 309 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,397,560 RAC: 19,617 |
And Mac OS is LINUX by a different name, or if it's M1 it's Android, by a different name.So it's going to take a while to clear them unless the project . . . puts in a flag with the Scheduler to only allocate them to LINUX systems. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,397,560 RAC: 19,617 |
SO! MANY! ERRORS!And in the past when Tasks have crashed out with this error chi angle must be between -180 and 180: -nan(ind)you've still gotten Credit for the work done. For some reason, that's not happening with these. Grant Darwin NT |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
SO! MANY! ERRORS!And in the past when Tasks have crashed out with this error That probably depends on whether the verifier program on the server recognizes the error message as enough evidence to declare the workunit faulty. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,397,560 RAC: 19,617 |
Hmm.SO! MANY! ERRORS!And in the past when Tasks have crashed out with this error Or in the past the reported Task has included a Result file along with the Stderr output, but in these cases it hasn't? Either way, these errors have been occurring for ages now, the applications should have been updated to handle the error and to just treat it as the Task finishing early, and not as an error after it has already done all that work. Grant Darwin NT |
Tomcat雄猫 Send message Joined: 20 Dec 14 Posts: 180 Credit: 5,386,173 RAC: 0 |
Welp, 4 hours later, this one also crashed: rb_02_16_213027_208958_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_06_09_2906555_51_1 <core_client_version>7.16.20</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_02_16_213027_208958_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 3 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_02_16_213027_208958_ab_t000__robetta.zip -frag3 rb_02_16_213027_208958_ab_t000__robetta.200.3mers.index.gz -fragA rb_02_16_213027_208958_ab_t000__robetta.200.9mers.index.gz -fragB rb_02_16_213027_208958_ab_t000__robetta.200.6mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3484950 Using database: database_357d5d93529_n_methylminirosetta_database [ ERROR ]: Caught exception: File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306 chi angle must be between -180 and 180: -nan(ind) ------------------------ Begin developer's backtrace ------------------------- BACKTRACE: ------------------------- End developer's backtrace -------------------------- AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS. </stderr_txt> ]]> Same issue with all the other borked rb_ tasks. I sure hope they catch this and fix it soon. Wait, it's the weekend... This is fine, I'm fine, there's nothing infuriating about this at all... Nothing at all. |
6dj72cn8 Send message Joined: 18 Apr 06 Posts: 5 Credit: 207,684 RAC: 0 |
And Mac OS is LINUX by a different name, or if it's M1 it's Android, by a different name. Yes, I know. A UNIX kernel. I bothered to mention it because I din't want the Admins to limit the work units to specifically a LINUX OS after they had read this board. [That was a joke, in case you also feel the need to explain to me that they don't read here.] |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,397,560 RAC: 19,617 |
OK, this LINUX system has popped out plenty of errors with those problem RB Tasks, and they have gotten credit. And here is a Windows system erroring out the same RB Tasks, and getting Credit for them as well. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,529,005 RAC: 10,309 |
I haven't got my head around what the specific issue is with rb tasks crashing out, but I do see people reporting problems with those too. I've finally found some RB tasks on this PC and, while they haven't finished, they're at a few hours in without crashing yet. The problem ones you guys are posting are rb_02_14 or rb_02_16 and the ones I have are rb_02_18 If the project was shut down for a while and came back up shortly after, could it be a fix went in? I notice a half million drop in the queued jobs on the front page. Or I'm speaking too soon. I'll see in 5hrs more time |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
24 movingstubs were in my queue. Aborted them instantly.Aborted Tasks count as errors. Yeah I saw that, but I didn't waste any compute time on them. Mental thing..or more of a blank blank thing really. I just saw the updated comments, so I'll let those tasks back in. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2002 Credit: 9,783,459 RAC: 5,082 |
Still "movingstub" wus download today. Yesterday I tried to contact Rosetta@Home, RosettaCommons and IPD twitter accounts. Today answer from Rosett@Home account: We'll look into this. Thanks for flagging! Fingers crossed!! |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Still "movingstub" wus download today. Thanks for alerting them to it. But on what other project are they not monitoring their own work units for failures? And on what other project are they not monitoring the forums for problems? I sometimes wonder if they are serious about this at all. |
computezrmle Send message Joined: 9 Dec 11 Posts: 63 Credit: 9,680,103 RAC: 0 |
I sometimes wonder if they are serious about this at all. +1 Even correcting errors without posting a short official comment like "issue xyz is now fixed" is a very impolite behaviour. |
mrchips Send message Joined: 11 Nov 09 Posts: 10 Credit: 15,046,470 RAC: 11,288 |
Name movingstub_gzm1_minimize_3CL_AVLstub_0194_21_extract_B_SAVE_ALL_OUT_2908392_402_0 Outcome Computation error Have been getting these errors for 3 days, no tasks finish, then I get this computer has finished a daily quota of 1 tasks |
Falconet Send message Joined: 9 Mar 09 Posts: 354 Credit: 1,276,393 RAC: 2,018 |
Thanks for alerting them to it. But on what other project are they not monitoring their own work units for failures? One would think that at least the researcher who submitted these work units would be aware of the large numbers of computational errors. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2002 Credit: 9,783,459 RAC: 5,082 |
One would think that at least the researcher who submitted these work units would be aware of the large numbers of computational errors. Seems that no one is stopping the batch.... |
Kissagogo27 Send message Joined: 31 Mar 20 Posts: 86 Credit: 2,976,229 RAC: 1,856 |
I have had a number of failures on the Rosetta 4.20 in the last couple of days, but all of the "movingstub" ones have worked. perhaps a solution found is to use the 4.21 version like i wrote here https://boinc.bakerlab.org/rosetta/forum_thread.php?id=14922&postid=105003#105003 Boinc do it alone . |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 272 Credit: 507,897 RAC: 334 |
I have set rosetta to no new tasks, synchronized with project, resetted project, and still get 4.20 workunits |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Jim - as I said before, we are guenna pigs and our computers are used for leftovers and stuff they can't run or don't want to run on their AI system. We used to be the front line and now we are back line. And with backline they don't monitor, because if they did then Sid would not be having to fill in the gap between us and the project. And if they would have followed procedure, they would have found the fault in the movingstubs group via Ralph and corrected it before they released it to Rosie. But this is the way it goes now. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I have set rosetta to no new tasks, synchronized with project, resetted project, and still get 4.20 workunits Are you trying to get python work? |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 272 Credit: 507,897 RAC: 334 |
I was trying to get 4.21 application |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org