Problems with Rosetta version 5.59

Message boards : Number crunching : Problems with Rosetta version 5.59

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
ramostol

Send message
Joined: 6 Feb 07
Posts: 64
Credit: 584,052
RAC: 0
Message 39305 - Posted: 12 Apr 2007, 12:49:07 UTC

Since the "CPU stops computing, nothing hangs"-problem has reappeared:

I observed today a situation seen once or twice before: Rosetta works quite ordinary, but using little or practically nil (0.00 for several minutes, 0.01 for several minutes etc.) CPU time. The Activity Monitor shows that
- the Rosetta process is indeed using little or no processor time (0-12 %) (the "Nice" process group is conspicuously absent)
- a third party application (I have observed this behaviour connected with the genealogy program Reunion) uses lots of CPU (40-50 %) without doing anything sensible
- the usually peaceful process kernel_task occupies the rest of the CPU capacity. Visual appearance suggests quite clearly that Rosetta computing is performed in part or occasionally completely inside the kernel_task process, and thus undocumented by the Rosetta information screens.

I do not know what creates this situation, but it is resolved once you exit the located "third party application". The kernel_task process transfers its Rosetta activity (back?) to the Rosetta process and all appears as normal.

I have never been able to connect this behaviour with any errors in Rosetta computing, but I may have been lucky resolving such situations in time.

-- R. A. Mostol (iBook G4 MacOS 10.3.9)
ID: 39305 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 39315 - Posted: 12 Apr 2007, 16:44:06 UTC

Here is one Win Vista user, with a Core2 getting frequent -107 return codes. Are others seeing errors on Vista?
Rosetta Moderator: Mod.Sense
ID: 39315 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sirslacker

Send message
Joined: 18 Sep 06
Posts: 8
Credit: 253,960
RAC: 0
Message 39332 - Posted: 13 Apr 2007, 0:59:14 UTC

Howdy,

I have had a sporadic problem with the graphics locking up on my Intel Pentium D machine. It only happened say 2 or 3 three times a month. [Since version 5.54] It did not seem to be an issue since I found that if I went to the task manager while the graphic screen was frozen, my desktop would appear and I could move the mouse down to the Boinc Icon right click and exit out of Boinc. This stopped the program gracefully without corrupting the data. I then could restart Boinc and all was well for a week or more. However since version 5.59 has been out the frequency of the freeze has increased significantly. Now a week and some time a couple of days go by and I find the graphics frozen. The crunching for rosetta and E@H seem undisturbed. Any Idea's?

sirslacker
... all about the work ...
... time to step-up and throw-down ...
ID: 39332 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMDave

Send message
Joined: 16 Dec 05
Posts: 35
Credit: 12,576,896
RAC: 0
Message 39348 - Posted: 13 Apr 2007, 15:00:39 UTC
Last modified: 13 Apr 2007, 15:01:17 UTC

System: AthlonXP 2700+, 1GB RAM, Win2k sp4

Was running smoothly. I suspended program then exited. Logged off as user, logged on as Admin. Downloaded and installed Win security updates (KB 925902, KB 930178, KB 931784, KB 932168), as well as JAVA SE Runtime Environment 6 Update 1. Defragged the HD, then ran NTREGOPT to optimize the registry. Restarted system, logged on as user, openned BOINC Mgr (v5.4.11) and lost the wu that had 2+ hours of computing done. Began receiving this message:

"rosetta_5.59_wi.exe has generated errors and will be closed by Windows. You will need to restart the program. An error log is being created."

I noticed that some .xml files (sched_request_boinc.bakerlab.org_rosetta.xml, sched_reply_boinc.bakerlab.org_rosetta.xml, statistics_boinc.bakerlab.org_rosetta.xml, client_state.xml, master_boinc.bakerlab.org_rosetta.xml, and client_state_prev.xml) were modified. Prior to suspension, I had been crunching v5.59 wus. Now, every time the BOINC Mgr is restarted, wus units are errored-out, then new wus are downloaded and subsequently errored-out. Now, I would include snippets of the Tasks and Messages screens, but I could not locate those files.

The last entry in the Messages pane lists the following:
> Message from server: No work sent
> Message from server: (reached daily quota of 45 results)
> No work from project

What gives?

On a side note, my RAC has been sinking like the Titanic for about 2 months. Is this due to advances in the client seeking add'l computation?
ID: 39348 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
PovAddict

Send message
Joined: 25 Sep 05
Posts: 8
Credit: 192,053
RAC: 0
Message 39359 - Posted: 14 Apr 2007, 1:06:02 UTC - in response to Message 39348.  

Now, every time the BOINC Mgr is restarted, wus units are errored-out, then new wus are downloaded and subsequently errored-out. Now, I would include snippets of the Tasks and Messages screens, but I could not locate those files.

Try resetting the project.
ID: 39359 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
himmelskasper

Send message
Joined: 21 Dec 06
Posts: 1
Credit: 43,336
RAC: 0
Message 39363 - Posted: 14 Apr 2007, 9:50:14 UTC

hola...

mmhm, the workunits start after every restart of the computer from the front again and again :( ...but since this new version...cya

hk
ID: 39363 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMDave

Send message
Joined: 16 Dec 05
Posts: 35
Credit: 12,576,896
RAC: 0
Message 39367 - Posted: 14 Apr 2007, 15:23:24 UTC
Last modified: 14 Apr 2007, 15:24:43 UTC

Follow-up:

I opened the Projects window yesterday evening and clicked "Reset Project." However, the BOINC Mgr did not download any WUs, b/c it was done within the same day as the initial problem and the Mgr had already downloaded the max 45 WUs. I opened the Mgr today and it downloaded 9 WUs. Currently, a WU has been crunching for 92+ min. It seems to have returned to normal.

Prior, to the initial problem, the WUs were @3:55:00 or so in length. Now, they are 05:30:35 in length. Is this the new de facto length, or is it simply the length for the present batch of WUs?
ID: 39367 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Superfluence

Send message
Joined: 11 Apr 07
Posts: 2
Credit: 141
RAC: 0
Message 39374 - Posted: 14 Apr 2007, 19:40:02 UTC
Last modified: 14 Apr 2007, 19:44:14 UTC

If you would let me know when this is resolved, I'd like to continue to crunch more work from Rosetta again on unattended computers.


good idea! Me too!

Plz keep me postet if rosetta works, cause at the moment it doesn´t work at all.
ID: 39374 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The Wiz

Send message
Joined: 22 Jan 07
Posts: 2
Credit: 84,807
RAC: 0
Message 39414 - Posted: 15 Apr 2007, 17:25:19 UTC

What is going on? When I check my transfers I have a dozen or more pending transfers.
They keep retrying to send. My account doesn't indicate any download of files but I have dozens of messages say that contact has succeeded but the server must be down. My computers seem to be constantly contacting the site and trying to transfer files.

At the same time I am getting WU's that seem to be working OK.
ID: 39414 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The Wiz

Send message
Joined: 22 Jan 07
Posts: 2
Credit: 84,807
RAC: 0
Message 39415 - Posted: 15 Apr 2007, 17:28:27 UTC - in response to Message 39414.  

What is going on? When I check my transfers I have a dozen or more pending transfers.
They keep retrying to send. My account doesn't indicate any download of files but I have dozens of messages say that contact has succeeded but the server must be down. My computers seem to be constantly contacting the site and trying to transfer files.

At the same time I am getting WU's that seem to be working OK.


I should add that I tried resetting the project but it is still doing the same thing.
ID: 39415 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 39416 - Posted: 15 Apr 2007, 17:48:00 UTC

There are problems with downloading new files.
ID: 39416 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
John McCallum
Avatar

Send message
Joined: 8 Jan 06
Posts: 12
Credit: 6,094,478
RAC: 1,167
Message 39417 - Posted: 15 Apr 2007, 17:48:24 UTC
Last modified: 15 Apr 2007, 18:08:49 UTC

15/04/2007 18:37:51|rosetta@home|[file_xfer] Temporarily failed download of aa1k9kA09_05.200_v1_3.gz: system connect
15/04/2007 18:37:51|rosetta@home|Backing off 2 hr 29 min 33 sec on download of file aa1k9kA09_05.200_v1_3.gz
15/04/2007 18:37:51|rosetta@home|[file_xfer] Temporarily failed download of 1k9k.description.txt: system connect
15/04/2007 18:37:52||Access to reference site succeeded - project servers may be temporarily down.
Checked and all the servers seem to be up and running, what is happening?
[edit]This has been going on since 1550 BST.Time now 1907 BST.
If you can't take a joke you should never have joined.
ID: 39417 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
senatoralex85

Send message
Joined: 27 Sep 05
Posts: 66
Credit: 169,644
RAC: 0
Message 39421 - Posted: 15 Apr 2007, 18:14:24 UTC - in response to Message 38870.  

Boinc is not downloading from Rosetta. I have disabled my firewall, restarted my computer but the usual suspects are not fixing the problem. Here is a copy of my log below. I know it is a problem with Rosetta because malariacontrol downloaded without a problem.

4/15/2007 1:00:31 PM||Starting BOINC client version 4.45 for windows_intelx86
4/15/2007 1:00:31 PM||Data directory: F:Program FilesBOINC
4/15/2007 1:02:00 PM|rosetta@home|Temporarily failed download of rosetta_5.59_windows_intelx86.exe: -106
4/15/2007 1:02:00 PM|rosetta@home|Temporarily failed download of hom001_m2hd_.fasta.gz: -106
4/15/2007 1:02:00 PM|rosetta@home|Started download of hom001_m2hd_.psipred_ss2.gz
4/15/2007 1:02:00 PM|rosetta@home|Started download of boinc_hom001_aam2hd_03_05.200_v1_3.gz
4/15/2007 1:02:02 PM|malariacontrol.net beta|Scheduler request to http://www.malariacontrol.net/malariacontrol_cgi/cgi succeeded
4/15/2007 1:02:03 PM|malariacontrol.net beta|Deferring communication with project for 9 seconds
4/15/2007 1:02:03 PM|malariacontrol.net beta|Started download of malariacontrol_5.45_windows_intelx86
4/15/2007 1:02:03 PM|malariacontrol.net beta|Started download of wu_21_35_36864_0_1064057379
4/15/2007 1:02:05 PM|malariacontrol.net beta|Finished download of wu_21_35_36864_0_1064057379
4/15/2007 1:02:05 PM|malariacontrol.net beta|Throughput 25733 bytes/sec
4/15/2007 1:02:05 PM|malariacontrol.net beta|Started download of densities.csv
4/15/2007 1:02:10 PM|malariacontrol.net beta|Finished download of densities.csv
4/15/2007 1:02:10 PM|malariacontrol.net beta|Throughput 9174 bytes/sec

ID: 39421 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MattDavis
Avatar

Send message
Joined: 22 Sep 05
Posts: 206
Credit: 1,377,748
RAC: 0
Message 39429 - Posted: 15 Apr 2007, 19:26:16 UTC - in response to Message 39421.  

Boinc is not downloading from Rosetta. I have disabled my firewall, restarted my computer but the usual suspects are not fixing the problem. Here is a copy of my log below. I know it is a problem with Rosetta because malariacontrol downloaded without a problem.

4/15/2007 1:00:31 PM||Starting BOINC client version 4.45 for windows_intelx86
4/15/2007 1:00:31 PM||Data directory: F:Program FilesBOINC
4/15/2007 1:02:00 PM|rosetta@home|Temporarily failed download of rosetta_5.59_windows_intelx86.exe: -106
4/15/2007 1:02:00 PM|rosetta@home|Temporarily failed download of hom001_m2hd_.fasta.gz: -106
4/15/2007 1:02:00 PM|rosetta@home|Started download of hom001_m2hd_.psipred_ss2.gz
4/15/2007 1:02:00 PM|rosetta@home|Started download of boinc_hom001_aam2hd_03_05.200_v1_3.gz
4/15/2007 1:02:02 PM|malariacontrol.net beta|Scheduler request to http://www.malariacontrol.net/malariacontrol_cgi/cgi succeeded
4/15/2007 1:02:03 PM|malariacontrol.net beta|Deferring communication with project for 9 seconds
4/15/2007 1:02:03 PM|malariacontrol.net beta|Started download of malariacontrol_5.45_windows_intelx86
4/15/2007 1:02:03 PM|malariacontrol.net beta|Started download of wu_21_35_36864_0_1064057379
4/15/2007 1:02:05 PM|malariacontrol.net beta|Finished download of wu_21_35_36864_0_1064057379
4/15/2007 1:02:05 PM|malariacontrol.net beta|Throughput 25733 bytes/sec
4/15/2007 1:02:05 PM|malariacontrol.net beta|Started download of densities.csv
4/15/2007 1:02:10 PM|malariacontrol.net beta|Finished download of densities.csv
4/15/2007 1:02:10 PM|malariacontrol.net beta|Throughput 9174 bytes/sec


Have you tried reading the thread called "Failed Download"?
ID: 39429 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Lada JNet

Send message
Joined: 25 Mar 07
Posts: 2
Credit: 1,518
RAC: 0
Message 39437 - Posted: 15 Apr 2007, 20:26:42 UTC - in response to Message 39305.  

Since the "CPU stops computing, nothing hangs"-problem has reappeared:

I observed today a situation seen once or twice before: Rosetta works quite ordinary, but using little or practically nil (0.00 for several minutes, 0.01 for several minutes etc.) CPU time. The Activity Monitor shows that
- the Rosetta process is indeed using little or no processor time (0-12 %) (the "Nice" process group is conspicuously absent)
- a third party application (I have observed this behaviour connected with the genealogy program Reunion) uses lots of CPU (40-50 %) without doing anything sensible
- the usually peaceful process kernel_task occupies the rest of the CPU capacity. Visual appearance suggests quite clearly that Rosetta computing is performed in part or occasionally completely inside the kernel_task process, and thus undocumented by the Rosetta information screens.

I do not know what creates this situation, but it is resolved once you exit the located "third party application". The kernel_task process transfers its Rosetta activity (back?) to the Rosetta process and all appears as normal.

I have never been able to connect this behaviour with any errors in Rosetta computing, but I may have been lucky resolving such situations in time.

-- R. A. Mostol (iBook G4 MacOS 10.3.9)


I am not sure if this is supposed to be some kind of a workaround, however as I said - I run boinc on an unattended computer (a Linux router to be precise, all it has to do is routing packets, draw interface traffic graphs, pop some SNMP stats etc.). Any workarounds which require human intervention are unacceptable here.

I am happy to continue to run Rosetta on my desktop PC where I can keep an eye out ;-)

If I am to allow Rosetta back on the server, I have to know that the problem has been resolved first...

I only hope that developers are aware of the problem and are working on it... frankly, it was something I kind of hoped to hear.
ID: 39437 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael.L

Send message
Joined: 12 Nov 06
Posts: 67
Credit: 31,295
RAC: 0
Message 39458 - Posted: 15 Apr 2007, 23:11:26 UTC
Last modified: 15 Apr 2007, 23:34:44 UTC

In My Results - Work Unit.
Bench Abralax Save All Out 1 CTF Barcode R10 Filters 1292 8257
Result ID: 71426986 on PC 441502. Returned 15 Apr 19.42.46 - Over - Success. Credit given.
Same WU Result ID 73232908 sent to my PC 416175 15 Apr 17.09.55. Pending. (should complete 16 Apr).
Thinks Rosie has got her knickers twisted amidst all the confusion today.
ID: 39458 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rochester ny 3
Avatar

Send message
Joined: 17 Mar 07
Posts: 9
Credit: 190,625
RAC: 0
Message 39466 - Posted: 16 Apr 2007, 0:14:36 UTC

i got a lot of errors,,,,,,whats up
ID: 39466 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rochester ny 3
Avatar

Send message
Joined: 17 Mar 07
Posts: 9
Credit: 190,625
RAC: 0
Message 39467 - Posted: 16 Apr 2007, 0:15:09 UTC

i got a lot of errors,,,,,,whats up
ID: 39467 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rochester ny 3
Avatar

Send message
Joined: 17 Mar 07
Posts: 9
Credit: 190,625
RAC: 0
Message 39468 - Posted: 16 Apr 2007, 0:15:13 UTC

i got a lot of errors,,,,,,whats up
ID: 39468 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 39479 - Posted: 16 Apr 2007, 3:12:13 UTC - in response to Message 39468.  

i got a lot of errors,,,,,,whats up


You need more than 128MB of memory to run Rosetta these days.
ID: 39479 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : Problems with Rosetta version 5.59



©2022 University of Washington
https://www.bakerlab.org