Problems with Minirosetta v1.54

Author	Message
Gavin Shaw Send message Joined: 1 Feb 07 Posts: 10 Credit: 506,456 RAC: 0	Message 59228 - Posted: 1 Feb 2009, 23:07:15 UTC Got this one a day or so ago. Not sure if it is a failure/error. 224812655 Never surrender and never give up. In the darkest hour there is always hope. ID: 59228 · Rating: 0 · rate: / Reply Quote

P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0	Message 59233 - Posted: 2 Feb 2009, 4:19:31 UTC Hi Mike. I did see somewhere that you said something about large upload file size, i think this is one that got away. ;) 99 models in 4hrs, 26min and result file of 8.32mb. _CAPRI17_T39_2_.sjf_br_one_docking.protocol__6483_19318_1. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=205403421 pete. ID: 59233 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 59236 - Posted: 2 Feb 2009, 5:30:33 UTC Last modified: 2 Feb 2009, 5:35:16 UTC Peter, the potential for large output files is why Mike changed it to exit after 99 models. That lets the task report back that it's running through models like candy and then they can weigh that before releasing more similar tasks. Rosetta Moderator: Mod.Sense ID: 59236 · Rating: 0 · rate: / Reply Quote

P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0	Message 59237 - Posted: 2 Feb 2009, 6:00:42 UTC - in response to Message 59236. Last modified: 2 Feb 2009, 6:02:21 UTC Peter, the potential for large output files is why Mike changed it to exit after 99 models. That lets the task report back that it's running through models like candy and then they can weigh that before releasing more similar tasks. Hi. Just as well it did finish after 99 i would hate to see the file size after 12 or 24 hours! :) I just returned another one the same size. pete. ID: 59237 · Rating: 0 · rate: / Reply Quote

LizzieBarry Send message Joined: 25 Feb 08 Posts: 76 Credit: 201,862 RAC: 0	Message 59239 - Posted: 2 Feb 2009, 13:39:00 UTC - in response to Message 59238. I've had a couple of these "Validate Errors" recently: Mini 1.47 Task 223871308 and Mini 1.54 Task 224694361 Both ended with exit code: 0 (0x0) and seemed to run successfully, but with no credit given. Is it something I did, a bug or just one of those things? I only checked the 1.54-task. You have a runtime-preference of 3 hours. This one ran for 7 hours, no finished models. I'd say the watchdog, which aborts tasks running longer than intended, cut in. This is one for the long-running tasks thread. Thanks for copying here - I thought it was just a problem with the validator (the error message being the clue). You're right, there's no "Done" section after the first model starts until the boinc_finish, which is odd, but no mention of the watchdog cutting in, even though it does run a long time. But on the 1.47 WU there are 3 models done, so I'm not entirely convinced it's the same thing. Usually long-running jobs get a default credit of 80, don't they? Looks like I missed out all ways. Oh well... ID: 59239 · Rating: 0 · rate: / Reply Quote

falingtrea Send message Joined: 8 Aug 07 Posts: 1 Credit: 1,981,304 RAC: 0	Message 59240 - Posted: 2 Feb 2009, 16:25:44 UTC Just got this error trying to perform an update: 2/2/2009 10:05:58 AM\|rosetta@home\|Sending scheduler request: Requested by user 2/2/2009 10:05:58 AM\|rosetta@home\|(not requesting new work or reporting completed tasks) 2/2/2009 10:06:03 AM\|rosetta@home\|Scheduler RPC succeeded 2/2/2009 10:06:03 AM\|rosetta@home\|Message from server: Server error: can't attach shared memory 2/2/2009 10:06:03 AM\|rosetta@home\|Deferring communication for 1 hr 0 min 0 sec 2/2/2009 10:06:03 AM\|rosetta@home\|Reason: project is down Server is up according to the webpage. One task was updated as complete. ID: 59240 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 0	Message 59245 - Posted: 2 Feb 2009, 22:02:36 UTC there is something odd going on with the graphics of lr5_D_score12_rlbd_2hsh_IGNORE_THE_REST_DECOY_6246_424_0 the plot disappears completely at times and the accepted energy does the same at times. then they reappear at times. all seems to depend on the energy value of the moment. as far as i know this is not normal. ID: 59245 · Rating: 0 · rate: / Reply Quote

TeAm Enterprise Send message Joined: 28 Sep 05 Posts: 18 Credit: 27,915,558 RAC: 0	Message 59249 - Posted: 3 Feb 2009, 3:44:56 UTC I keep running out of work on my Q6600. I think I found out why ... Boinc task manager shows that a WU will take 100+ hours of CPU time instead of the <3 hours it actually takes. Thus the manager asks for 400,000 seconds of work and gets one WU. For some reason when a core finishes a WU and asks for another it gets nothing for a long time thus sitting idle. Right now 3 cores are running: Core 1 is 82.4% done at 2 hr 28 min with 1 hr 47 min left Core 2 is 34.1% done at 1 hr 4 min with 72 hr 3 min left Core 3 is 15.5% done at 0 hr 34 min with 150 hr 48 min left Core 4 is idle. ID: 59249 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 0	Message 59253 - Posted: 3 Feb 2009, 10:11:54 UTC - in response to Message 59249. I keep running out of work on my Q6600. I think I found out why ... Boinc task manager shows that a WU will take 100+ hours of CPU time instead of the <3 hours it actually takes. Thus the manager asks for 400,000 seconds of work and gets one WU. For some reason when a core finishes a WU and asks for another it gets nothing for a long time thus sitting idle. Right now 3 cores are running: Core 1 is 82.4% done at 2 hr 28 min with 1 hr 47 min left Core 2 is 34.1% done at 1 hr 4 min with 72 hr 3 min left Core 3 is 15.5% done at 0 hr 34 min with 150 hr 48 min left Core 4 is idle. what version of boinc manager are you using? it looked like you were using 5.10.45 which is quite old. ID: 59253 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 14	Message 59254 - Posted: 3 Feb 2009, 12:41:38 UTC - in response to Message 59249. I keep running out of work on my Q6600. I think I found out why ... Boinc task manager shows that a WU will take 100+ hours of CPU time instead of the <3 hours it actually takes. Thus the manager asks for 400,000 seconds of work and gets one WU. For some reason when a core finishes a WU and asks for another it gets nothing for a long time thus sitting idle. Right now 3 cores are running: Core 1 is 82.4% done at 2 hr 28 min with 1 hr 47 min left Core 2 is 34.1% done at 1 hr 4 min with 72 hr 3 min left Core 3 is 15.5% done at 0 hr 34 min with 150 hr 48 min left Core 4 is idle. Sounds like your Duration Correction Factor is waaaay off. You can wait for Boinc to fix it, and it will all by itself. Or you can shut down Boinc and manually edit the file client_state.xml. Do this thru notepad or whatever, and go down until you find this Project and the line that says: <duration_correction_factor>0.705998</duration_correction_factor> That is a copy of my line, yours will have the numbers be like 70.?????????? or whatever. Change it to 1.000000, save the file and then restart Boinc. Boinc will then use those new numbers and get both new work and recalculate how long it will take to crunch your existing units. If you do a right click on the file name the EDIT option is listed, use that to edit the file. Do NOT change anything else in the file. The line you are looking for is near the top, mine was only 65 lines down from the top. ID: 59254 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0	Message 59256 - Posted: 3 Feb 2009, 12:49:09 UTC Compute error, though it looks more like a zip error ... process exited with code 1 (0x1, -255) Watchdog active. Hbond tripped. ERROR: dis==0 in pairtermderiv! ERROR:: Exit from: src/core/scoring/methods/PairEnergy.cc line: 338 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish Not sure what to make of this error ... happened on the Mac Pro ... ID: 59256 · Rating: 0 · rate: / Reply Quote

TeAm Enterprise Send message Joined: 28 Sep 05 Posts: 18 Credit: 27,915,558 RAC: 0	Message 59258 - Posted: 3 Feb 2009, 15:06:00 UTC - in response to Message 59253. Using version 6.4.5 which I downloaded and installed about 6 days ago. what version of boinc manager are you using? it looked like you were using 5.10.45 which is quite old.[/quote] ID: 59258 · Rating: 0 · rate: / Reply Quote

TeAm Enterprise Send message Joined: 28 Sep 05 Posts: 18 Credit: 27,915,558 RAC: 0	Message 59259 - Posted: 3 Feb 2009, 15:11:39 UTC - in response to Message 59254. That fixed it! Thanks, my duration was set at 55+. I keep running out of work on my Q6600. I think I found out why ... Boinc task manager shows that a WU will take 100+ hours of CPU time instead of the <3 hours it actually takes. Thus the manager asks for 400,000 seconds of work and gets one WU. For some reason when a core finishes a WU and asks for another it gets nothing for a long time thus sitting idle. Right now 3 cores are running: Core 1 is 82.4% done at 2 hr 28 min with 1 hr 47 min left Core 2 is 34.1% done at 1 hr 4 min with 72 hr 3 min left Core 3 is 15.5% done at 0 hr 34 min with 150 hr 48 min left Core 4 is idle. Sounds like your Duration Correction Factor is waaaay off. You can wait for Boinc to fix it, and it will all by itself. Or you can shut down Boinc and manually edit the file client_state.xml. Do this thru notepad or whatever, and go down until you find this Project and the line that says: <duration_correction_factor>0.705998</duration_correction_factor> That is a copy of my line, yours will have the numbers be like 70.?????????? or whatever. Change it to 1.000000, save the file and then restart Boinc. Boinc will then use those new numbers and get both new work and recalculate how long it will take to crunch your existing units. If you do a right click on the file name the EDIT option is listed, use that to edit the file. Do NOT change anything else in the file. The line you are looking for is near the top, mine was only 65 lines down from the top. ID: 59259 · Rating: 0 · rate: / Reply Quote

cenit Send message Joined: 1 Apr 07 Posts: 13 Credit: 1,630,287 RAC: 0	Message 59261 - Posted: 3 Feb 2009, 17:12:07 UTC - in response to Message 59240. Just got this error trying to perform an update: 2/2/2009 10:05:58 AM\|rosetta@home\|Sending scheduler request: Requested by user 2/2/2009 10:05:58 AM\|rosetta@home\|(not requesting new work or reporting completed tasks) 2/2/2009 10:06:03 AM\|rosetta@home\|Scheduler RPC succeeded 2/2/2009 10:06:03 AM\|rosetta@home\|Message from server: Server error: can't attach shared memory 2/2/2009 10:06:03 AM\|rosetta@home\|Deferring communication for 1 hr 0 min 0 sec 2/2/2009 10:06:03 AM\|rosetta@home\|Reason: project is down Server is up according to the webpage. One task was updated as complete. you have to wait and it will correct by itself. Maybe it is a long time from your last rosetta WU... during this time the project changed its web address and so boinc need to re-fetch master file. Leave it alone and in 24 hour max it will redownload it and resume working! ID: 59261 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 14	Message 59266 - Posted: 3 Feb 2009, 23:48:58 UTC - in response to Message 59259. That fixed it! Thanks, my duration was set at 55+. I keep running out of work on my Q6600. I think I found out why ... Boinc task manager shows that a WU will take 100+ hours of CPU time instead of the <3 hours it actually takes. Thus the manager asks for 400,000 seconds of work and gets one WU. For some reason when a core finishes a WU and asks for another it gets nothing for a long time thus sitting idle. Right now 3 cores are running: Core 1 is 82.4% done at 2 hr 28 min with 1 hr 47 min left Core 2 is 34.1% done at 1 hr 4 min with 72 hr 3 min left Core 3 is 15.5% done at 0 hr 34 min with 150 hr 48 min left Core 4 is idle. Sounds like your Duration Correction Factor is waaaay off. You can wait for Boinc to fix it, and it will all by itself. Or you can shut down Boinc and manually edit the file client_state.xml. Do this thru notepad or whatever, and go down until you find this Project and the line that says: <duration_correction_factor>0.705998</duration_correction_factor> That is a copy of my line, yours will have the numbers be like 70.?????????? or whatever. Change it to 1.000000, save the file and then restart Boinc. Boinc will then use those new numbers and get both new work and recalculate how long it will take to crunch your existing units. If you do a right click on the file name the EDIT option is listed, use that to edit the file. Do NOT change anything else in the file. The line you are looking for is near the top, mine was only 65 lines down from the top. Yea for some reason this has happened ALOT lately. ID: 59266 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 0	Message 59273 - Posted: 4 Feb 2009, 9:08:55 UTC Last modified: 4 Feb 2009, 9:09:38 UTC lr6_E_score12_rlbd_1e6i_IGNORE_THE_REST_DECOY_6254_236_1 ERROR:: Exit from: ....srcprotocolscheckpointCheckPointer.cc line: 87 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish ID: 59273 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 0	Message 59274 - Posted: 4 Feb 2009, 9:11:24 UTC lr5_D_hybrid_rlbd_1bmg_IGNORE_THE_REST_DECOY_6250_424_0 Initializing options.... ok ERROR: Option file open failed for: relax_options_lr5_D_hybrid_mtyka ID: 59274 · Rating: 0 · rate: / Reply Quote

KC0ISW Send message Joined: 28 Sep 05 Posts: 2 Credit: 58,926 RAC: 0	Message 59278 - Posted: 4 Feb 2009, 12:43:32 UTC - in response to Message 59274. https://boinc.bakerlab.org/rosetta/result.php?resultid=226103545 ID: 59278 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 14	Message 59331 - Posted: 5 Feb 2009, 13:40:19 UTC - in response to Message 59180. mikey, whatever the problem is, it stands a good chance of clearing itself when the next Rosetta version comes out. The .exe will be a different name afterall. So, please monitor the new release thread and give another try at that time. I will, I like the premise of Rosetta and that is what brought me here in the first place. I will certainly try again in the future, probably when you put out a new version as you suggest. Thanks for all your help I just had a thought...NOT dangerous this time, I am off for a few days here and I have finally figured out how to make Ubuntu Linux work for me and crunch Boinc projects too. I will try switching one of the machines that won't download the Windows app to Linux and see if that works. ID: 59331 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 14	Message 59355 - Posted: 5 Feb 2009, 19:00:02 UTC - in response to Message 59331. mikey, whatever the problem is, it stands a good chance of clearing itself when the next Rosetta version comes out. The .exe will be a different name afterall. So, please monitor the new release thread and give another try at that time. I will, I like the premise of Rosetta and that is what brought me here in the first place. I will certainly try again in the future, probably when you put out a new version as you suggest. Thanks for all your help I just had a thought...NOT dangerous this time, I am off for a few days here and I have finally figured out how to make Ubuntu Linux work for me and crunch Boinc projects too. I will try switching one of the machines that won't download the Windows app to Linux and see if that works. Guess what...............NO PROBLEM, it is crunching just fine. Here is the pc: https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1001897 It is on its first unit, so no results yet, but one unit is crunching just fine so far!! ID: 59355 · Rating: 0 · rate: / Reply Quote