Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 178 · 179 · 180 · 181 · 182 · 183 · 184 . . . 309 · Next
Author | Message |
---|---|
JohnDK Send message Joined: 6 Apr 20 Posts: 33 Credit: 2,390,240 RAC: 0 |
Don't like those python WUs, but with the movingstub problems, I will have to continue with them. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Total queued jobs: 4,144,753 Someone at Rosetta has great faith in our ability to run these, though I don't know why. But I have found that the pythons do not suspend so much if I run only 50% of the cores. Or maybe they have changed them. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,398,287 RAC: 19,677 |
2 million Tasks for LINUX systems, Instant crash and burn on Windows.Total queued jobs: 4,144,753 So it's going to take a while to clear them unless the project pulls them & fixes & then re-issues them, or puts in a flag with the Scheduler to only allocate them to LINUX systems. No hope of the second, very slight hope for the first. Grant Darwin NT |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
movingstub problems I have a two almost identical systems P5Q DeLux motherboard with Q9450 cpu 8GB RAM , win7 , its a crash test dummy . 1* P5Q DeLux motherboard with Q9550 cpu 8GB RAM . Linux mint , Just another day at the office , crunchin on regardless Funny old world ---------------------------------------- 1* so I feel like being a git Set cache setting to 10 days and try and trash as many of them as possible :-), until it gets backoffd Yes that is what I have done see how many I can get rid of But , keep an eye on it in case any good work arrives. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,398,287 RAC: 19,677 |
Set cache setting to 10 days and try and trash as many of them as possible :-), until it gets backoffdNot that many (compared to how many there are to get through). For a particular application, for every error you return, you have the amount of work you can download reduced by 1 until you get to the point you will only be able to get 1 Task per 24 hours. Returning Valid work increases the amount of work you can download per day. Grant Darwin NT |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
On the win cruncher all its pythons go zombie so I don't let it do them , normal R4.2 is ok now It will only let me have 29 at a time to trash , And I am already getting 4 hours `go away and don't be silly` time. nnnn poot . |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,398,287 RAC: 19,677 |
And I am already getting 4 hours `go away and don't be silly` time.That will happen every time Tasks error out (it could be anything from 3min to well over 4 hours, the more that error out between Scheduler contacts the larger the backoff tends to be). Grant Darwin NT |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
After getting all Computation Errors on all the WU's I set my PC to No New Tasks until you folks can figure out why Windows crashes with the "movingstub" Work units Unless one of our experts here reaches out to his contact at UW, there is no one that will see these posts. So "you folks" is a pointless thing. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
24 movingstubs were in my queue. Aborted them instantly. Not going to up my error count for stupidness from their end. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
.clair. you've done all the different things we as a group have talked about for python? downgrade boinc and vbox and check your virtualization setting on your motherboard? If python still dies on you after all that, that is weird. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Ricky - your just getting the movingstubs garbage. Watch your queue and abort them as soon as you see them. They do not work. If you want to do Vbox stuff, then you can run python tasks 4.2 is a mishmash of stuff, but movingstubs is trash and rb_02_16_213037 has a bug Also in python the aagb-PHE stuff is buggy |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I have noticed that in 4.2 the rb_02_16_213037..... has a bug Also in python the aagb-PHE... stuff is buggy |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,398,287 RAC: 19,677 |
24 movingstubs were in my queue. Aborted them instantly.Aborted Tasks count as errors. Grant Darwin NT |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
Set cache setting to 10 days and try and trash as many of them as possible :-), until it gets backoffdNot that many (compared to how many there are to get through). For at least some BOINC projects, if every computer that tries a workunit gives computation error, BOINC decides that the workunit is defective and no longer counts any of the failures against the computers that tried it. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,531,104 RAC: 10,509 |
I only run on one Linux and one Windows 7 atm and all the Windows error out within 20 seconds. All the Ubuntu ones were good until now. The 14 movingstubs units ran the requested 8 hours on my 3900X. I can see that a new team member successfully completed his on Linux, but all were 3 hours. W3670. CPU and OS related? I've reported this too. Your W7, my W10 and someone else's W11 all fail while Ubuntu and Linux all run ok. Hopefully it means something to them. Tasks haven't been withdrawn yet (that I've noticed) and I've had no direct reply yet either. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,531,104 RAC: 10,509 |
After getting all Computation Errors on all the WU's I set my PC to No New Tasks until you folks can figure out why Windows crashes with the "movingstub" Work units I'm not 100% on the ball at the moment, Greg, but I am reporting things within 24 hours of seeing a post about them here. Specifically I mean the movingstub tasks computation error, and now the update that they're running ok on UbuntuLinux but not any version of windows. I haven't got my head around what the specific issue is with rb tasks crashing out, but I do see people reporting problems with those too. If someone describes the rb issue to me in a way I can pass on, I'll follow up with that too. Is it much the same as movingstubs tasks or are they crashing on Ubuntu as well? Edit Also in python the aagb-PHE stuff is buggy Just spotted your mention of this too. I'm away from my one PC running Python tasks for another few days, so I'll try to confirm that issue as well |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
Well its time for bed so I have undone my movingtargets silliness Got 500 of them in the bin. Though I did have a long running one , it lasted a full 30 seconds . |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,398,287 RAC: 19,677 |
Well, the project was down for a while there, but the movingstub crash and burns are still there since it came back online. Grant Darwin NT |
Tomcat雄猫 Send message Joined: 20 Dec 14 Posts: 180 Credit: 5,386,173 RAC: 0 |
SO! MANY! ERRORS! First off are the known bad movingstub tasks that crash almost instantly. Then there are the broken rb_ tasks that don't crash as fast. rb_02_16_213031_208962_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_06_04_2906559_35_1 <core_client_version>7.16.20</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_02_16_213031_208962_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 3 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_02_16_213031_208962_ab_t000__robetta.zip -frag3 rb_02_16_213031_208962_ab_t000__robetta.200.3mers.index.gz -fragA rb_02_16_213031_208962_ab_t000__robetta.200.4mers.index.gz -fragB rb_02_16_213031_208962_ab_t000__robetta.200.6mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3477109 Using database: database_357d5d93529_n_methylminirosetta_database [ ERROR ]: Caught exception: File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306 chi angle must be between -180 and 180: -nan(ind) ------------------------ Begin developer's backtrace ------------------------- BACKTRACE: ------------------------- End developer's backtrace -------------------------- AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS. </stderr_txt> ]]> rb_02_16_213027_208958_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_06_08_2906555_55_1 <core_client_version>7.16.20</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_02_16_213027_208958_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 3 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_02_16_213027_208958_ab_t000__robetta.zip -frag3 rb_02_16_213027_208958_ab_t000__robetta.200.3mers.index.gz -fragA rb_02_16_213027_208958_ab_t000__robetta.200.8mers.index.gz -fragB rb_02_16_213027_208958_ab_t000__robetta.200.6mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3485351 Using database: database_357d5d93529_n_methylminirosetta_database [ ERROR ]: Caught exception: File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306 chi angle must be between -180 and 180: -nan(ind) ------------------------ Begin developer's backtrace ------------------------- BACKTRACE: ------------------------- End developer's backtrace -------------------------- AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS. </stderr_txt> ]]> rb_02_16_213026_208957_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_06_09_2906558_22_1 <core_client_version>7.16.20</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_02_16_213026_208957_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 3 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_02_16_213026_208957_ab_t000__robetta.zip -frag3 rb_02_16_213026_208957_ab_t000__robetta.200.3mers.index.gz -fragA rb_02_16_213026_208957_ab_t000__robetta.200.9mers.index.gz -fragB rb_02_16_213026_208957_ab_t000__robetta.200.6mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3477932 Using database: database_357d5d93529_n_methylminirosetta_database [ ERROR ]: Caught exception: File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306 chi angle must be between -180 and 180: -nan(ind) ------------------------ Begin developer's backtrace ------------------------- BACKTRACE: ------------------------- End developer's backtrace -------------------------- AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS. </stderr_txt> ]]> Windows 11 |
6dj72cn8 Send message Joined: 18 Apr 06 Posts: 5 Credit: 207,684 RAC: 0 |
So it's going to take a while to clear them unless the project . . . puts in a flag with the Scheduler to only allocate them to LINUX systems. Linux or Mac. It's only Windows that's choking on them. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org