Message boards : Number crunching : Problems with Minirosetta v1.54
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 15 · Next
Author | Message |
---|---|
Gavin Shaw Send message Joined: 1 Feb 07 Posts: 10 Credit: 506,456 RAC: 0 |
Got this one a day or so ago. Not sure if it is a failure/error. 224812655 Never surrender and never give up. In the darkest hour there is always hope. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi Mike. I did see somewhere that you said something about large upload file size, i think this is one that got away. ;) 99 models in 4hrs, 26min and result file of 8.32mb. _CAPRI17_T39_2_.sjf_br_one_docking.protocol__6483_19318_1. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=205403421 pete. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Peter, the potential for large output files is why Mike changed it to exit after 99 models. That lets the task report back that it's running through models like candy and then they can weigh that before releasing more similar tasks. Rosetta Moderator: Mod.Sense |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Peter, the potential for large output files is why Mike changed it to exit after 99 models. That lets the task report back that it's running through models like candy and then they can weigh that before releasing more similar tasks. Hi. Just as well it did finish after 99 i would hate to see the file size after 12 or 24 hours! :) I just returned another one the same size. pete. |
LizzieBarry Send message Joined: 25 Feb 08 Posts: 76 Credit: 201,862 RAC: 0 |
I've had a couple of these "Validate Errors" recently: Thanks for copying here - I thought it was just a problem with the validator (the error message being the clue). You're right, there's no "Done" section after the first model starts until the boinc_finish, which is odd, but no mention of the watchdog cutting in, even though it does run a long time. But on the 1.47 WU there are 3 models done, so I'm not entirely convinced it's the same thing. Usually long-running jobs get a default credit of 80, don't they? Looks like I missed out all ways. Oh well... |
falingtrea Send message Joined: 8 Aug 07 Posts: 1 Credit: 1,981,304 RAC: 0 |
Just got this error trying to perform an update: 2/2/2009 10:05:58 AM|rosetta@home|Sending scheduler request: Requested by user 2/2/2009 10:05:58 AM|rosetta@home|(not requesting new work or reporting completed tasks) 2/2/2009 10:06:03 AM|rosetta@home|Scheduler RPC succeeded 2/2/2009 10:06:03 AM|rosetta@home|Message from server: Server error: can't attach shared memory 2/2/2009 10:06:03 AM|rosetta@home|Deferring communication for 1 hr 0 min 0 sec 2/2/2009 10:06:03 AM|rosetta@home|Reason: project is down Server is up according to the webpage. One task was updated as complete. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
there is something odd going on with the graphics of lr5_D_score12_rlbd_2hsh_IGNORE_THE_REST_DECOY_6246_424_0 the plot disappears completely at times and the accepted energy does the same at times. then they reappear at times. all seems to depend on the energy value of the moment. as far as i know this is not normal. |
TeAm Enterprise Send message Joined: 28 Sep 05 Posts: 18 Credit: 27,911,735 RAC: 42 |
I keep running out of work on my Q6600. I think I found out why ... Boinc task manager shows that a WU will take 100+ hours of CPU time instead of the <3 hours it actually takes. Thus the manager asks for 400,000 seconds of work and gets one WU. For some reason when a core finishes a WU and asks for another it gets nothing for a long time thus sitting idle. Right now 3 cores are running: Core 1 is 82.4% done at 2 hr 28 min with 1 hr 47 min left Core 2 is 34.1% done at 1 hr 4 min with 72 hr 3 min left Core 3 is 15.5% done at 0 hr 34 min with 150 hr 48 min left Core 4 is idle. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I keep running out of work on my Q6600. I think I found out why ... Boinc task manager shows that a WU will take 100+ hours of CPU time instead of the <3 hours it actually takes. Thus the manager asks for 400,000 seconds of work and gets one WU. what version of boinc manager are you using? it looked like you were using 5.10.45 which is quite old. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,214,418 RAC: 1,328 |
I keep running out of work on my Q6600. I think I found out why ... Boinc task manager shows that a WU will take 100+ hours of CPU time instead of the <3 hours it actually takes. Thus the manager asks for 400,000 seconds of work and gets one WU. Sounds like your Duration Correction Factor is waaaay off. You can wait for Boinc to fix it, and it will all by itself. Or you can shut down Boinc and manually edit the file client_state.xml. Do this thru notepad or whatever, and go down until you find this Project and the line that says: <duration_correction_factor>0.705998</duration_correction_factor> That is a copy of my line, yours will have the numbers be like 70.?????????? or whatever. Change it to 1.000000, save the file and then restart Boinc. Boinc will then use those new numbers and get both new work and recalculate how long it will take to crunch your existing units. If you do a right click on the file name the EDIT option is listed, use that to edit the file. Do NOT change anything else in the file. The line you are looking for is near the top, mine was only 65 lines down from the top. |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
Compute error, though it looks more like a zip error ...
Not sure what to make of this error ... happened on the Mac Pro ... |
TeAm Enterprise Send message Joined: 28 Sep 05 Posts: 18 Credit: 27,911,735 RAC: 42 |
Using version 6.4.5 which I downloaded and installed about 6 days ago. what version of boinc manager are you using? it looked like you were using 5.10.45 which is quite old.[/quote] |
TeAm Enterprise Send message Joined: 28 Sep 05 Posts: 18 Credit: 27,911,735 RAC: 42 |
That fixed it! Thanks, my duration was set at 55+. I keep running out of work on my Q6600. I think I found out why ... Boinc task manager shows that a WU will take 100+ hours of CPU time instead of the <3 hours it actually takes. Thus the manager asks for 400,000 seconds of work and gets one WU. |
cenit Send message Joined: 1 Apr 07 Posts: 13 Credit: 1,630,287 RAC: 0 |
Just got this error trying to perform an update: you have to wait and it will correct by itself. Maybe it is a long time from your last rosetta WU... during this time the project changed its web address and so boinc need to re-fetch master file. Leave it alone and in 24 hour max it will redownload it and resume working! |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,214,418 RAC: 1,328 |
That fixed it! Thanks, my duration was set at 55+. Yea for some reason this has happened ALOT lately. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
lr6_E_score12_rlbd_1e6i_IGNORE_THE_REST_DECOY_6254_236_1 ERROR:: Exit from: ....srcprotocolscheckpointCheckPointer.cc line: 87 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
lr5_D_hybrid_rlbd_1bmg_IGNORE_THE_REST_DECOY_6250_424_0 Initializing options.... ok ERROR: Option file open failed for: relax_options_lr5_D_hybrid_mtyka |
KC0ISW Send message Joined: 28 Sep 05 Posts: 2 Credit: 58,926 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=226103545 |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,214,418 RAC: 1,328 |
mikey, whatever the problem is, it stands a good chance of clearing itself when the next Rosetta version comes out. The .exe will be a different name afterall. So, please monitor the new release thread and give another try at that time. I just had a thought...NOT dangerous this time, I am off for a few days here and I have finally figured out how to make Ubuntu Linux work for me and crunch Boinc projects too. I will try switching one of the machines that won't download the Windows app to Linux and see if that works. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,214,418 RAC: 1,328 |
mikey, whatever the problem is, it stands a good chance of clearing itself when the next Rosetta version comes out. The .exe will be a different name afterall. So, please monitor the new release thread and give another try at that time. Guess what...............NO PROBLEM, it is crunching just fine. Here is the pc: https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1001897 It is on its first unit, so no results yet, but one unit is crunching just fine so far!! |
Message boards :
Number crunching :
Problems with Minirosetta v1.54
©2024 University of Washington
https://www.bakerlab.org