Message boards : Number crunching : minirosetta 2.17
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next
Author | Message |
---|---|
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Mad Max, no, the only other mechanism that might end a task early is if it restarts 5 times without having made any progress (i.e. no checkpoint reach in any of the 5 starts). So if you are starting and ending BOINC, and rebooting your machine several times applying patches etc. then you can see basically all of the work in process end this way. But that number 5 is up high enough that even then it's pretty rarely the cause. I've seen such tasks ending early as well. I've been unable to isolate a cause. Could you ask a few questions of the person reporting the problem? (I'm assuming you are translating... thank you). Are they running more projects then just R@h? Do the tasks seem to end just after work for another project is being started? Looks like 4GB memory there on a 4 CPU machine, is BOINC running on all 4 CPUs? How much memory is BOINC allowed? Is the machine running BOINC 24x7? Or is it rebooted each day or is BOINC's run hours limited by the user preferences? Do the tasks that were only partially completed when BOINC is exited seem to be ending within the first few minutes of BOINC starting again the next time? These are just based on some of what I'm thinking I'm seeing personally. Hoping that with some additional perspective perhaps I can nail down some specific patterns for the Project Team to investigate. The ultimate would be if one could define a series of steps to follow that would CAUSE a task to end prematurely. I've been unable to do it intentionally. Rosetta Moderator: Mod.Sense |
EvoDude Send message Joined: 6 Nov 05 Posts: 21 Credit: 52,425 RAC: 0 |
Just come back to Rosetta to stretch the legs on my new I7 but I'm getting continous 'Download failed' when BOINC Manager gets work. The message log on the recent one is:- 09/01/2011 03:51:11 rosetta@home Sending scheduler request: To fetch work. 09/01/2011 03:51:11 rosetta@home Requesting new tasks for CPU 09/01/2011 03:51:13 rosetta@home Scheduler request completed: got 1 new tasks 09/01/2011 03:51:15 rosetta@home Started download of minirosetta_2.17_windows_x86_64.exe 09/01/2011 03:51:15 rosetta@home Started download of minirosetta_graphics_1.92_windows_x86_64.exe 09/01/2011 03:51:16 rosetta@home Giving up on download of minirosetta_2.17_windows_x86_64.exe: file not found 09/01/2011 03:51:16 rosetta@home Giving up on download of minirosetta_graphics_1.92_windows_x86_64.exe: file not found 09/01/2011 03:51:16 rosetta@home Started download of Helvetica.txf 09/01/2011 03:51:16 rosetta@home Started download of minirosetta_database_rev39052.zip 09/01/2011 03:51:17 rosetta@home Giving up on download of Helvetica.txf: file not found 09/01/2011 03:51:17 rosetta@home Giving up on download of minirosetta_database_rev39052.zip: file not found 09/01/2011 03:51:17 rosetta@home Started download of 1poh.aahelix03_05.200_v1_3.gz 09/01/2011 03:51:17 rosetta@home Started download of 1poh.aahelix09_05.200_v1_3.gz 09/01/2011 03:51:24 rosetta@home Finished download of 1poh.aahelix03_05.200_v1_3.gz 09/01/2011 03:51:24 rosetta@home Started download of 1poh.native.pdb 09/01/2011 03:51:26 rosetta@home Finished download of 1poh.aahelix09_05.200_v1_3.gz 09/01/2011 03:51:26 rosetta@home Finished download of 1poh.native.pdb 09/01/2011 03:51:26 rosetta@home Started download of helix.psipred_ss2.gz 09/01/2011 03:51:27 rosetta@home Finished download of helix.psipred_ss2.gz Any ideas guys and is anyone else getting this? |
The_Bad_Penguin Send message Joined: 5 Jun 06 Posts: 2751 Credit: 4,271,025 RAC: 0 |
I have two wu's with "compute errors" that each ran for over 80 hours of cpu time, despite what the logs may otherwise claim... resultid=390949677 resultid=391015050 Defeat Censorship! Wikileaks needs OUR help! Learn how you can help (d/l 'insurance' file), by clicking here. "Whoever would overthrow the liberty of a nation must begin by subduing the freeness of speech" B. Franklin |
UBT - Rick Horn Send message Joined: 17 Dec 05 Posts: 7 Credit: 283,961 RAC: 0 |
I`m getting "download failed" messages with all my WUs also. |
FreierFriese Send message Joined: 31 Aug 10 Posts: 1 Credit: 4,159 RAC: 0 |
Hi there, I can't upload my results at the moment, although the status-site tells me everything is ok. Ater 1-2 seconds the upload stops and BOINC tells me "Project file upload handler is missing". Link to the WU: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=357863331 Anyone knows if it's my BOINC-client or Rosetta who doesn't let me upload things? Greetings from Germany. J. |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
Anyone knows if it's my BOINC-client or Rosetta who doesn't let me upload things? The Rosetta file server had a major crash the other day and other users are reporting similar problems. |
Randy Proctor Send message Joined: 24 Mar 10 Posts: 4 Credit: 600,055 RAC: 0 |
I have about 30 jobs that need to upload but the I keep getting failed uploads....not sure if this is due to the server crash from earlier....any thoughts? Just a sample of the messages.... Sun Jan 9 18:04:23 2011 rosetta@home Started upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3528_0_0 Sun Jan 9 18:05:33 2011 rosetta@home Temporarily failed upload of 1A24_new_targets_CONTROL_SAVE_ALL_OUT_21967_3232_0_0: HTTP error Sun Jan 9 18:05:33 2011 rosetta@home Backing off 1 hr 18 min 36 sec on upload of 1A24_new_targets_CONTROL_SAVE_ALL_OUT_21967_3232_0_0 Sun Jan 9 18:05:33 2011 rosetta@home Temporarily failed upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3528_0_0: HTTP error Sun Jan 9 18:05:33 2011 rosetta@home Backing off 2 hr 5 min 25 sec on upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3528_0_0 Sun Jan 9 20:58:59 2011 rosetta@home Started upload of 1A24_new_targets_CONTROL_SAVE_ALL_OUT_21967_3232_0_0 Sun Jan 9 20:58:59 2011 rosetta@home Started upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3528_0_0 Sun Jan 9 21:00:01 2011 rosetta@home Temporarily failed upload of 1A24_new_targets_CONTROL_SAVE_ALL_OUT_21967_3232_0_0: HTTP error Sun Jan 9 21:00:01 2011 rosetta@home Backing off 57 min 8 sec on upload of 1A24_new_targets_CONTROL_SAVE_ALL_OUT_21967_3232_0_0 Sun Jan 9 21:00:01 2011 rosetta@home Temporarily failed upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3528_0_0: HTTP error Sun Jan 9 21:00:01 2011 rosetta@home Backing off 3 hr 1 min 16 sec on upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3528_0_0 Sun Jan 9 21:50:34 2011 rosetta@home Started upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3527_0_0 Sun Jan 9 21:50:34 2011 rosetta@home Started upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3526_0_0 Sun Jan 9 21:50:36 2011 rosetta@home [error] Error reported by file upload server: [1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3527_0_0] locked by file_upload_handler PID=-1 Sun Jan 9 21:50:36 2011 rosetta@home [error] Error reported by file upload server: [1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3526_0_0] locked by file_upload_handler PID=-1 Sun Jan 9 21:50:36 2011 rosetta@home Temporarily failed upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3527_0_0: transient upload error Sun Jan 9 21:50:36 2011 rosetta@home Backing off 2 hr 39 min 24 sec on upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3527_0_0 Sun Jan 9 21:50:36 2011 rosetta@home Temporarily failed upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3526_0_0: transient upload error Sun Jan 9 21:50:36 2011 rosetta@home Backing off 2 hr 41 min 3 sec on upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3526_0_0 |
The_Bad_Penguin Send message Joined: 5 Jun 06 Posts: 2751 Credit: 4,271,025 RAC: 0 |
I now have another task that has 54+ hours of cpu time, 22% Progress... t476_boinc_nmr_cm_rnd1_cs_frags_loopbuild_threading_cst_relax_tex_IGNORE_THE_REST_22837_909_0 Anyone wanna bet this will also ultimately die with a "compute error" ? I have two wu's with "compute errors" that each ran for over 80 hours of cpu time, despite what the logs may otherwise claim... Defeat Censorship! Wikileaks needs OUR help! Learn how you can help (d/l 'insurance' file), by clicking here. "Whoever would overthrow the liberty of a nation must begin by subduing the freeness of speech" B. Franklin |
Darmok Send message Joined: 4 Sep 09 Posts: 6 Credit: 231,572 RAC: 0 |
I have a problem which never occured to me before. Just prior to the crash, I was rejoining R@H (bad timing) and received a bunch of WU. Boinc Manager miscalculated and I will have many of them which won't make it to the report deadline. Can someone tell me what will happen and what I should do, if anything. Models will start to run in High Priority, which I want to avoid. Thanks |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I have a problem which never occured to me before. Just prior to the crash, I was rejoining R@H (bad timing) and received a bunch of WU. Boinc Manager miscalculated and I will have many of them which won't make it to the report deadline. Can someone tell me what will happen and what I should do, if anything. Models will start to run in High Priority, which I want to avoid. Thanks If I understand you correctly, you have more work then you can complete before the deadline. Just abort a few of the tasks that have not started yet is generally the best thing to do. Perhaps reduce the number of days of work you configure to have on hand to avoid getting too much again before BOINC learns how long the tasks are taking to process. Rosetta Moderator: Mod.Sense |
Darmok Send message Joined: 4 Sep 09 Posts: 6 Credit: 231,572 RAC: 0 |
That's exactly what happened and I had forgotten to reduce the work buffer prior to connection, which is not an issue with other projects, but is with R@H because of the short completion time provided/required. Too bad for aborting all those WU's though. Thanks Mod. |
Randy Proctor Send message Joined: 24 Mar 10 Posts: 4 Credit: 600,055 RAC: 0 |
Why would I be receiving this message? Mon Jan 10 16:59:55 2011 rosetta@home Message from server: platform 'x86_64-apple-darwin' not found I did do an OS update to Snow Leopard 10.6.6 but would that cause a problem? I have 40 tasks that are done and need uploaded and the deadline is in 2 days. Thanks for any help. |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
Why would I be receiving this message? The file server crashed last week and the Rosetta staff are having problems rebuilding it. Many users are reporting a series of different error messages. In terms of your uploads and the deadlines it is best not to worry. Normally work is valuable to the scientists even when it comes in late. The best thing you can do is keep an eye on these forums or the home page and wait to see if the Rosetta staff provide further advice. |
Randy Proctor Send message Joined: 24 Mar 10 Posts: 4 Credit: 600,055 RAC: 0 |
I had some things upload and not others....honestly I don't care much about this credit stuff, I'm in it for the science, because if success can be found with influenza maybe we can succesfully fight worse things (not that I take influenza lightly cause its a nasty bug). As long as the scientists get usable results I'm good. My main concern is I can't get anymore work to continue to help. |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,143,328 RAC: 1,511 |
Why would I be receiving this message?Certainly not. When I added my iMac while Rosetta was working fine, so was the Mac client (running on a 2007/24", so Intel x86_64 as well). I did the update to 10.6.6 just the other day when the fit had already hit the shan... Now, since the project went belly up the last time, I have a mix of WU showing ready to report and uploading, at going by the stats at least, at least a couple of WU must haven been properly being credited. Not only on the Mac, but on various Windows machines as well. In all, this is one great mess, lots of WU just stuck at uploading across the board, some that are being uploaded apparently seem to be stuck in increasing numbers as "credit pending" and some WU just see to go back and forth just fine. Well, as far as I am concerned, I haven't detached from the project (yet), but WCG gets my priority as far as my resources are concerned... Ralf |
The_Bad_Penguin Send message Joined: 5 Jun 06 Posts: 2751 Credit: 4,271,025 RAC: 0 |
a bit over 77 hours, expecting it to crap out soon... I now have another task that has 54+ hours of cpu time, 22% Progress... Defeat Censorship! Wikileaks needs OUR help! Learn how you can help (d/l 'insurance' file), by clicking here. "Whoever would overthrow the liberty of a nation must begin by subduing the freeness of speech" B. Franklin |
spamhasser Send message Joined: 12 Nov 10 Posts: 2 Credit: 308 RAC: 0 |
Since the last rosetta-crash i can't upload anything to rosetta's server. Mi 12 Jan 2011 02:43:24 CET rosetta@home Project file upload handler is missing The upload is stuck after 2,26% (0,31 of 13,48 kB) Is there anything i can do to get it done? If "waiting for someone solving the problem on rosetta's end" is the solution then i will shut up and wait a bit (or a byte if the duration is higher ;-) ) But if i have to do anything else than wait (e.g. resetting rosetta), let me know :) (It is still winter here and my CPU should not get cold) Thx in advance |
Ian Send message Joined: 22 Apr 09 Posts: 1 Credit: 459,642 RAC: 0 |
I have the same errorr on all my completed work units "Project file upload handler is missing". What should I do? |
EvoDude Send message Joined: 6 Nov 05 Posts: 21 Credit: 52,425 RAC: 0 |
Isn't it a shame no-one from the project can be bothered responding to this. Makes you feel as if no-one cares. Very disappointed. |
banditwolf Send message Joined: 10 Jan 06 Posts: 28 Credit: 139,737 RAC: 0 |
I have 2 of 3 that say compute error when they don't show any signs of having problems. |
Message boards :
Number crunching :
minirosetta 2.17
©2024 University of Washington
https://www.bakerlab.org