minirosetta 2.17

Message boards : Number crunching : minirosetta 2.17

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next

AuthorMessage
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 68828 - Posted: 17 Dec 2010, 22:43:44 UTC

Mad Max, no, the only other mechanism that might end a task early is if it restarts 5 times without having made any progress (i.e. no checkpoint reach in any of the 5 starts). So if you are starting and ending BOINC, and rebooting your machine several times applying patches etc. then you can see basically all of the work in process end this way. But that number 5 is up high enough that even then it's pretty rarely the cause.

I've seen such tasks ending early as well. I've been unable to isolate a cause. Could you ask a few questions of the person reporting the problem? (I'm assuming you are translating... thank you). Are they running more projects then just R@h? Do the tasks seem to end just after work for another project is being started? Looks like 4GB memory there on a 4 CPU machine, is BOINC running on all 4 CPUs? How much memory is BOINC allowed? Is the machine running BOINC 24x7? Or is it rebooted each day or is BOINC's run hours limited by the user preferences? Do the tasks that were only partially completed when BOINC is exited seem to be ending within the first few minutes of BOINC starting again the next time?

These are just based on some of what I'm thinking I'm seeing personally. Hoping that with some additional perspective perhaps I can nail down some specific patterns for the Project Team to investigate. The ultimate would be if one could define a series of steps to follow that would CAUSE a task to end prematurely. I've been unable to do it intentionally.
Rosetta Moderator: Mod.Sense
ID: 68828 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile EvoDude
Avatar

Send message
Joined: 6 Nov 05
Posts: 21
Credit: 52,425
RAC: 0
Message 69054 - Posted: 9 Jan 2011, 3:58:19 UTC

Just come back to Rosetta to stretch the legs on my new I7 but I'm getting continous 'Download failed' when BOINC Manager gets work. The message log on the recent one is:-

09/01/2011 03:51:11 rosetta@home Sending scheduler request: To fetch work.
09/01/2011 03:51:11 rosetta@home Requesting new tasks for CPU
09/01/2011 03:51:13 rosetta@home Scheduler request completed: got 1 new tasks
09/01/2011 03:51:15 rosetta@home Started download of minirosetta_2.17_windows_x86_64.exe
09/01/2011 03:51:15 rosetta@home Started download of minirosetta_graphics_1.92_windows_x86_64.exe
09/01/2011 03:51:16 rosetta@home Giving up on download of minirosetta_2.17_windows_x86_64.exe: file not found
09/01/2011 03:51:16 rosetta@home Giving up on download of minirosetta_graphics_1.92_windows_x86_64.exe: file not found
09/01/2011 03:51:16 rosetta@home Started download of Helvetica.txf
09/01/2011 03:51:16 rosetta@home Started download of minirosetta_database_rev39052.zip
09/01/2011 03:51:17 rosetta@home Giving up on download of Helvetica.txf: file not found
09/01/2011 03:51:17 rosetta@home Giving up on download of minirosetta_database_rev39052.zip: file not found
09/01/2011 03:51:17 rosetta@home Started download of 1poh.aahelix03_05.200_v1_3.gz
09/01/2011 03:51:17 rosetta@home Started download of 1poh.aahelix09_05.200_v1_3.gz
09/01/2011 03:51:24 rosetta@home Finished download of 1poh.aahelix03_05.200_v1_3.gz
09/01/2011 03:51:24 rosetta@home Started download of 1poh.native.pdb
09/01/2011 03:51:26 rosetta@home Finished download of 1poh.aahelix09_05.200_v1_3.gz
09/01/2011 03:51:26 rosetta@home Finished download of 1poh.native.pdb
09/01/2011 03:51:26 rosetta@home Started download of helix.psipred_ss2.gz
09/01/2011 03:51:27 rosetta@home Finished download of helix.psipred_ss2.gz


Any ideas guys and is anyone else getting this?
ID: 69054 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 69068 - Posted: 9 Jan 2011, 14:41:41 UTC
Last modified: 9 Jan 2011, 14:44:29 UTC

I have two wu's with "compute errors" that each ran for over 80 hours of cpu time, despite what the logs may otherwise claim...


resultid=390949677


resultid=391015050
Defeat Censorship! Wikileaks needs OUR help! Learn how you can help (d/l 'insurance' file), by clicking here. "Whoever would overthrow the liberty of a nation must begin by subduing the freeness of speech" B. Franklin
ID: 69068 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile UBT - Rick Horn

Send message
Joined: 17 Dec 05
Posts: 7
Credit: 283,961
RAC: 0
Message 69087 - Posted: 9 Jan 2011, 17:57:29 UTC

I`m getting "download failed" messages with all my WUs also.
ID: 69087 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FreierFriese

Send message
Joined: 31 Aug 10
Posts: 1
Credit: 4,159
RAC: 0
Message 69098 - Posted: 9 Jan 2011, 21:18:46 UTC

Hi there,

I can't upload my results at the moment, although the status-site tells me everything is ok.

Ater 1-2 seconds the upload stops and BOINC tells me "Project file upload handler is missing".

Link to the WU: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=357863331

Anyone knows if it's my BOINC-client or Rosetta who doesn't let me upload things?

Greetings from Germany.

J.

ID: 69098 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 69102 - Posted: 9 Jan 2011, 22:15:08 UTC - in response to Message 69098.  

Anyone knows if it's my BOINC-client or Rosetta who doesn't let me upload things?


The Rosetta file server had a major crash the other day and other users are reporting similar problems.
ID: 69102 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Randy Proctor

Send message
Joined: 24 Mar 10
Posts: 4
Credit: 600,055
RAC: 0
Message 69111 - Posted: 10 Jan 2011, 7:41:56 UTC

I have about 30 jobs that need to upload but the I keep getting failed uploads....not sure if this is due to the server crash from earlier....any thoughts?

Just a sample of the messages....

Sun Jan 9 18:04:23 2011 rosetta@home Started upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3528_0_0
Sun Jan 9 18:05:33 2011 rosetta@home Temporarily failed upload of 1A24_new_targets_CONTROL_SAVE_ALL_OUT_21967_3232_0_0: HTTP error
Sun Jan 9 18:05:33 2011 rosetta@home Backing off 1 hr 18 min 36 sec on upload of 1A24_new_targets_CONTROL_SAVE_ALL_OUT_21967_3232_0_0
Sun Jan 9 18:05:33 2011 rosetta@home Temporarily failed upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3528_0_0: HTTP error
Sun Jan 9 18:05:33 2011 rosetta@home Backing off 2 hr 5 min 25 sec on upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3528_0_0
Sun Jan 9 20:58:59 2011 rosetta@home Started upload of 1A24_new_targets_CONTROL_SAVE_ALL_OUT_21967_3232_0_0
Sun Jan 9 20:58:59 2011 rosetta@home Started upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3528_0_0
Sun Jan 9 21:00:01 2011 rosetta@home Temporarily failed upload of 1A24_new_targets_CONTROL_SAVE_ALL_OUT_21967_3232_0_0: HTTP error
Sun Jan 9 21:00:01 2011 rosetta@home Backing off 57 min 8 sec on upload of 1A24_new_targets_CONTROL_SAVE_ALL_OUT_21967_3232_0_0
Sun Jan 9 21:00:01 2011 rosetta@home Temporarily failed upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3528_0_0: HTTP error
Sun Jan 9 21:00:01 2011 rosetta@home Backing off 3 hr 1 min 16 sec on upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3528_0_0
Sun Jan 9 21:50:34 2011 rosetta@home Started upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3527_0_0
Sun Jan 9 21:50:34 2011 rosetta@home Started upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3526_0_0
Sun Jan 9 21:50:36 2011 rosetta@home [error] Error reported by file upload server: [1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3527_0_0] locked by file_upload_handler PID=-1
Sun Jan 9 21:50:36 2011 rosetta@home [error] Error reported by file upload server: [1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3526_0_0] locked by file_upload_handler PID=-1
Sun Jan 9 21:50:36 2011 rosetta@home Temporarily failed upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3527_0_0: transient upload error
Sun Jan 9 21:50:36 2011 rosetta@home Backing off 2 hr 39 min 24 sec on upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3527_0_0
Sun Jan 9 21:50:36 2011 rosetta@home Temporarily failed upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3526_0_0: transient upload error
Sun Jan 9 21:50:36 2011 rosetta@home Backing off 2 hr 41 min 3 sec on upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3526_0_0
ID: 69111 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 69142 - Posted: 10 Jan 2011, 21:11:46 UTC - in response to Message 69068.  

I now have another task that has 54+ hours of cpu time, 22% Progress...


t476_boinc_nmr_cm_rnd1_cs_frags_loopbuild_threading_cst_relax_tex_IGNORE_THE_REST_22837_909_0


Anyone wanna bet this will also ultimately die with a "compute error" ?



I have two wu's with "compute errors" that each ran for over 80 hours of cpu time, despite what the logs may otherwise claim...


resultid=390949677


resultid=391015050


Defeat Censorship! Wikileaks needs OUR help! Learn how you can help (d/l 'insurance' file), by clicking here. "Whoever would overthrow the liberty of a nation must begin by subduing the freeness of speech" B. Franklin
ID: 69142 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Darmok

Send message
Joined: 4 Sep 09
Posts: 6
Credit: 231,572
RAC: 0
Message 69148 - Posted: 10 Jan 2011, 22:36:01 UTC

I have a problem which never occured to me before. Just prior to the crash, I was rejoining R@H (bad timing) and received a bunch of WU. Boinc Manager miscalculated and I will have many of them which won't make it to the report deadline. Can someone tell me what will happen and what I should do, if anything. Models will start to run in High Priority, which I want to avoid. Thanks
ID: 69148 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 69151 - Posted: 10 Jan 2011, 23:22:24 UTC - in response to Message 69148.  

I have a problem which never occured to me before. Just prior to the crash, I was rejoining R@H (bad timing) and received a bunch of WU. Boinc Manager miscalculated and I will have many of them which won't make it to the report deadline. Can someone tell me what will happen and what I should do, if anything. Models will start to run in High Priority, which I want to avoid. Thanks


If I understand you correctly, you have more work then you can complete before the deadline. Just abort a few of the tasks that have not started yet is generally the best thing to do. Perhaps reduce the number of days of work you configure to have on hand to avoid getting too much again before BOINC learns how long the tasks are taking to process.
Rosetta Moderator: Mod.Sense
ID: 69151 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Darmok

Send message
Joined: 4 Sep 09
Posts: 6
Credit: 231,572
RAC: 0
Message 69154 - Posted: 11 Jan 2011, 0:44:18 UTC - in response to Message 69151.  

That's exactly what happened and I had forgotten to reduce the work buffer prior to connection, which is not an issue with other projects, but is with R@H because of the short completion time provided/required. Too bad for aborting all those WU's though.
Thanks Mod.

ID: 69154 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Randy Proctor

Send message
Joined: 24 Mar 10
Posts: 4
Credit: 600,055
RAC: 0
Message 69156 - Posted: 11 Jan 2011, 1:01:20 UTC

Why would I be receiving this message?

Mon Jan 10 16:59:55 2011 rosetta@home Message from server: platform 'x86_64-apple-darwin' not found


I did do an OS update to Snow Leopard 10.6.6 but would that cause a problem? I have 40 tasks that are done and need uploaded and the deadline is in 2 days.

Thanks for any help.
ID: 69156 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 69158 - Posted: 11 Jan 2011, 1:15:27 UTC - in response to Message 69156.  

Why would I be receiving this message?

Mon Jan 10 16:59:55 2011 rosetta@home Message from server: platform 'x86_64-apple-darwin' not found


I did do an OS update to Snow Leopard 10.6.6 but would that cause a problem? I have 40 tasks that are done and need uploaded and the deadline is in 2 days.

Thanks for any help.


The file server crashed last week and the Rosetta staff are having problems rebuilding it. Many users are reporting a series of different error messages.

In terms of your uploads and the deadlines it is best not to worry. Normally work is valuable to the scientists even when it comes in late. The best thing you can do is keep an eye on these forums or the home page and wait to see if the Rosetta staff provide further advice.
ID: 69158 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Randy Proctor

Send message
Joined: 24 Mar 10
Posts: 4
Credit: 600,055
RAC: 0
Message 69161 - Posted: 11 Jan 2011, 1:57:53 UTC
Last modified: 11 Jan 2011, 1:58:56 UTC

I had some things upload and not others....honestly I don't care much about this credit stuff, I'm in it for the science, because if success can be found with influenza maybe we can succesfully fight worse things (not that I take influenza lightly cause its a nasty bug).

As long as the scientists get usable results I'm good.

My main concern is I can't get anymore work to continue to help.
ID: 69161 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 111
Credit: 5,143,328
RAC: 1,511
Message 69167 - Posted: 11 Jan 2011, 3:51:00 UTC - in response to Message 69156.  

Why would I be receiving this message?

Mon Jan 10 16:59:55 2011 rosetta@home Message from server: platform 'x86_64-apple-darwin' not found


I did do an OS update to Snow Leopard 10.6.6 but would that cause a problem?
Certainly not. When I added my iMac while Rosetta was working fine, so was the Mac client (running on a 2007/24", so Intel x86_64 as well). I did the update to 10.6.6 just the other day when the fit had already hit the shan...

Now, since the project went belly up the last time, I have a mix of WU showing ready to report and uploading, at going by the stats at least, at least a couple of WU must haven been properly being credited. Not only on the Mac, but on various Windows machines as well.
In all, this is one great mess, lots of WU just stuck at uploading across the board, some that are being uploaded apparently seem to be stuck in increasing numbers as "credit pending" and some WU just see to go back and forth just fine.

Well, as far as I am concerned, I haven't detached from the project (yet), but WCG gets my priority as far as my resources are concerned...

Ralf
ID: 69167 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 69193 - Posted: 11 Jan 2011, 21:02:02 UTC - in response to Message 69142.  

a bit over 77 hours, expecting it to crap out soon...



I now have another task that has 54+ hours of cpu time, 22% Progress...


t476_boinc_nmr_cm_rnd1_cs_frags_loopbuild_threading_cst_relax_tex_IGNORE_THE_REST_22837_909_0


Anyone wanna bet this will also ultimately die with a "compute error" ?



I have two wu's with "compute errors" that each ran for over 80 hours of cpu time, despite what the logs may otherwise claim...


resultid=390949677


resultid=391015050



Defeat Censorship! Wikileaks needs OUR help! Learn how you can help (d/l 'insurance' file), by clicking here. "Whoever would overthrow the liberty of a nation must begin by subduing the freeness of speech" B. Franklin
ID: 69193 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
spamhasser

Send message
Joined: 12 Nov 10
Posts: 2
Credit: 308
RAC: 0
Message 69204 - Posted: 12 Jan 2011, 1:54:48 UTC

Since the last rosetta-crash i can't upload anything to rosetta's server.

Mi 12 Jan 2011 02:43:24 CET rosetta@home Project file upload handler is missing

The upload is stuck after 2,26% (0,31 of 13,48 kB)

Is there anything i can do to get it done?

If "waiting for someone solving the problem on rosetta's end" is the solution then i will shut up and wait a bit (or a byte if the duration is higher ;-) )

But if i have to do anything else than wait (e.g. resetting rosetta), let me know :)

(It is still winter here and my CPU should not get cold)

Thx in advance
ID: 69204 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ian

Send message
Joined: 22 Apr 09
Posts: 1
Credit: 459,642
RAC: 0
Message 69231 - Posted: 12 Jan 2011, 14:37:12 UTC

I have the same errorr on all my completed work units "Project file upload handler is missing". What should I do?
ID: 69231 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile EvoDude
Avatar

Send message
Joined: 6 Nov 05
Posts: 21
Credit: 52,425
RAC: 0
Message 69235 - Posted: 12 Jan 2011, 15:13:48 UTC

Isn't it a shame no-one from the project can be bothered responding to this. Makes you feel as if no-one cares. Very disappointed.
ID: 69235 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile banditwolf

Send message
Joined: 10 Jan 06
Posts: 28
Credit: 139,737
RAC: 0
Message 69236 - Posted: 12 Jan 2011, 15:15:33 UTC

I have 2 of 3 that say compute error when they don't show any signs of having problems.
ID: 69236 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next

Message boards : Number crunching : minirosetta 2.17



©2024 University of Washington
https://www.bakerlab.org