Problems with rosetta 5.48

Message boards : Number crunching : Problems with rosetta 5.48

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Ingemar

Send message
Joined: 28 Feb 06
Posts: 20
Credit: 1,680
RAC: 0
Message 37315 - Posted: 1 Mar 2007, 23:37:28 UTC

Please report here for problems you have observed with Rosetta version 5.48.
ID: 37315 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
ramostol

Send message
Joined: 6 Feb 07
Posts: 64
Credit: 584,052
RAC: 0
Message 37341 - Posted: 3 Mar 2007, 10:20:53 UTC - in response to Message 37315.  

Sorry, but our problems are not over.

On my Mac iBook G4 10.3.9 the very first WU administered by 5.48 had to restart after being unable to find its process.

From my internal notes:

(This WU was not docked.)

File stdoutdae.txt, line no. 12747-:
2007-03-03 00:01:33 [rosetta@home] Starting FRA_t349_IG9_hom001_1_t349_1_model_1o1za.pdb_1586_12_0
2007-03-03 00:01:33 [rosetta@home] Starting task FRA_t349_IG9_hom001_1_t349_1_model_1o1za.pdb_1586_12_0 using rosetta version 548
2007-03-03 01:14:42 [---] Restarting FRA_t349_IG9_hom001_1_t349_1_model_1o1za.pdb_1586_12_0 - message timeout
2007-03-03 01:14:43 [---] [error] Process 5284 not found


At 01:50:00 - after 35 minutes of computing - claims to have used 1:17:00 CPU !! Progress: 38.902 % -- quite abnormal, ordinary processes rise from 0.0 % to about 1.5 % and then to 100 %... The "To completion" time is much lower than currently displayed for an ordinary process.

Finished at 2:00:21 after merely 45 min. computing - I can't believe it - no WU has ever used less than one hour even when running undisturbed by competing CPU tasks.


Result 65326544:
FRA_t349_IG9_hom001_1_t349_1_model_1o1za.pdb_1586_12_0:

stderr out
<core_client_version>5.8.15</core_client_version>
<![CDATA[
<stderr_txt>
# random seed: 3185788
# cpu_run_time_pref: 7200
# cpu_run_time_pref: 7200
======================================================
DONE ::     1 starting structures built         2 (nstruct) times
This process generated      2 decoys from       2 attempts
                            0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>


For the record: Most often these missing processes are time-consuming and irritating, but harmless. But occasionally they lead to computing errors - all computing errors produced on my iBook originate from second processings of WUs that lost their processes when run the first time.


R. A. Mostol
ID: 37341 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Huge

Send message
Joined: 8 Jan 06
Posts: 1
Credit: 5,034
RAC: 0
Message 37347 - Posted: 3 Mar 2007, 21:34:21 UTC

Hi all,

All of a sudden I get the foolowing message:
3-3-2007 22:26:45|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
3-3-2007 22:26:45|rosetta@home|Reason: To fetch work
3-3-2007 22:26:45|rosetta@home|Requesting 8640 seconds of new work
3-3-2007 22:26:51|rosetta@home|Scheduler request succeeded
3-3-2007 22:26:51|rosetta@home|Message from server: Your computer has 447.48MB of memory, and a job requires 476.84MB
3-3-2007 22:26:51|rosetta@home|Message from server: No work sent
3-3-2007 22:26:51|rosetta@home|Message from server: (there was work but your computer doesn't have enough memory)
3-3-2007 22:26:51|rosetta@home|No work from project

If have NOTHING changed to my computer.
I also run lhcathome, LEIDEN Classical and SETI.


Anyone any ideas?


Best regards,
Huge
ID: 37347 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile netwraith
Avatar

Send message
Joined: 3 Sep 06
Posts: 80
Credit: 13,483,227
RAC: 0
Message 37348 - Posted: 3 Mar 2007, 22:44:08 UTC
Last modified: 3 Mar 2007, 23:22:14 UTC

--

Did 5.48 change the memory requirements... I have a system that was using 80MB of 512mb ... now the jobs are saying 478MB required.. "Your computer does not have enough memory"..... What is up ????


*update* I am showing a few jobs on my larger Linux systems that are taking 390mb in real memory, but, nothing more than that... Average jobs are still in the 100mb range...

The machines showing these jobs have 4GB or more each.


Looking for a team ??? Join BoincSynergy!!


ID: 37348 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 37349 - Posted: 3 Mar 2007, 23:01:22 UTC

I've got the same this morning still no work.

3/4/2007 09:50:05|rosetta@home|Sending scheduler request: To fetch work
3/4/2007 09:50:05|rosetta@home|Requesting 5236 seconds of new work
3/4/2007 09:50:10|rosetta@home|Scheduler RPC succeeded [server version 509]
3/4/2007 09:50:10|rosetta@home|Message from server: Your preferences limit memory usage to 460.30MB, and a job requires 476.84MB
3/4/2007 09:50:10|rosetta@home|Message from server: No work sent
3/4/2007 09:50:10|rosetta@home|Message from server: (there was work but your computer doesn't have enough memory)

P.S. This job is running at the momento.
s036__BOINC_ABRELAX_hom013__1583_1618_0 using rosetta version 548


ID: 37349 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rob Lilley

Send message
Joined: 11 Jan 06
Posts: 11
Credit: 133,120
RAC: 0
Message 37350 - Posted: 4 Mar 2007, 0:16:41 UTC

My machine has 512mb of memory too, and I had the same message come up when Rosetta 5.48 appeared.

It seems that in General Preferences you can now determine what percentage of memory (actual not virtual) a project can use. I just increased the percentage to use when the computer is idle from 90 to 95% and it downloaded just fine.

Now let's see what happens when the crunching / fun starts!
ID: 37350 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile netwraith
Avatar

Send message
Joined: 3 Sep 06
Posts: 80
Credit: 13,483,227
RAC: 0
Message 37356 - Posted: 4 Mar 2007, 0:53:14 UTC

--

Won't help mine... Something meschuga in the PCI/ACPI code.. uses 80MB for kernel space.. I suspect that a bunch of it is wasted, but, can't get it fixed without BIOS code... and the MB manu discontinued the model... no more code to be had... so.. will need to wait for smaller jobs or switch off to other crunching tasks... pity, cuz it's a relatively hot machine... maybe I will add some more RAM next week, but, for now.....




Looking for a team ??? Join BoincSynergy!!


ID: 37356 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
UtahTestLabs

Send message
Joined: 1 Jan 07
Posts: 4
Credit: 164,281
RAC: 0
Message 37358 - Posted: 4 Mar 2007, 1:07:20 UTC

My WU is at 1% and wont go any higher. The CPU time is counting, but the percentage is constantly 1% and the time remaining is counting up.

ID: 37358 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 37367 - Posted: 4 Mar 2007, 4:51:39 UTC

Well i have increased both memory useage setting in Boinc to 90%

and i still cannot get work.

ID: 37367 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile hedera
Avatar

Send message
Joined: 15 Jul 06
Posts: 76
Credit: 5,139,863
RAC: 905
Message 37368 - Posted: 4 Mar 2007, 6:10:11 UTC

This is interesting; I have not received any of the "you don't have enough memory" messages with 5.48; and I have 1GB memory on my system. I can't imagine why it should need so much that it can't run on a box with 512MB; but I'm having no trouble running 2 tasks (and only Rosetta) on a 1GB system.
--hedera

Never be afraid to try something new. Remember that amateurs built the ark. Professionals built the Titanic.

ID: 37368 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 37370 - Posted: 4 Mar 2007, 7:00:40 UTC - in response to Message 37368.  
Last modified: 4 Mar 2007, 7:02:06 UTC

This is interesting; I have not received any of the "you don't have enough memory" messages with 5.48; and I have 1GB memory on my system. I can't imagine why it should need so much that it can't run on a box with 512MB; but I'm having no trouble running 2 tasks (and only Rosetta) on a 1GB system.


Yes, and a possibly releated effect, 5.48 seems a little more efficient in its effects on other programs on a 2-cpu system.

Detail:

On a 2 cpu (2 separate chips, not one of these new-fangled multicore jobs), I have been tracking the performance of CPDN running on one cpu with other tasks running on the other cpu, averaging over 12 hours or so of timesteps.

Taking the speed of crunching CPDN when the other cpu is idle as 100, then the speed when crunching Rosetta on the other cpu dropped to 95.4 +/- 0.5 with 5.46, but improved to 97.6 +/- 0.6 as soon as 5.48 started to run.

This is a real effect, the timesteps get closer together as soon as Rosetta starts on its new version, and although the timesteps vary in length both before and after there is no overlap at all in the sets of values.

By comparison, when running CPDN alongside another CPDN the speed of each drops to 87.5 +/- 0.4, which I why I usually run just one CPDN and let Rosetta have the other cpu.

This is a severe test of the combination of tasks, as the box has only 256Mb of RAM, far less than either CPDN or Rosetta would like even when running solo. In its favour it does not have a GUI to soak up extra cycles.

Conjecture:

My guess is that Rosetta is playing more friendly in one or both of the following ways.

The new tasks could simply be doing work that is confined in tighter loops. This would mean that the Rosetta core would be able to keep its code in cache for more of the time, and would not be contending with the other core for RAM access to re-load program code.

The new tasks could simply be using less memory overall, meaning that less of both progams' virtual memory is paged out to disk.

In view of hedera's comments, the second possibility seems more likely. Sadly I have not been monitoring swapfile usage, so I can't actually tell.

Question:

If I am right that the new tasks are using less memory, is this simply an artefact of the particuar jobs they have been given, or is it down to some re-optimisation of the code in the Rosetta app?

btw, apols for being OT here, as this is not a *problem* ;-) Like hedera I thought that positive feedback might be interesting ...

R~~
ID: 37370 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rumbach

Send message
Joined: 15 Aug 06
Posts: 1
Credit: 30,180
RAC: 0
Message 37377 - Posted: 4 Mar 2007, 9:05:24 UTC

Have 512mb on a 1.2ghz cpu with w2k. I can no longer get work units after finishing the last one.

3/4/2007 12:16:54 AM|rosetta@home|Message from server: Your preferences limit memory usage to 460.34MB, and a job requires 476.84MB
3/4/2007 12:16:54 AM|rosetta@home|Message from server: No work sent
3/4/2007 12:16:54 AM|rosetta@home|Message from server: (there was work but your computer doesn't have enough memory)

Change preferences to 95%, change pagefile to 2gb, still can not get any work.
ID: 37377 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
William Ostie

Send message
Joined: 6 Feb 07
Posts: 5
Credit: 1,125,655
RAC: 0
Message 37383 - Posted: 4 Mar 2007, 9:42:54 UTC - in response to Message 37347.  

Hi all,

All of a sudden I get the foolowing message:
3-3-2007 22:26:45|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
3-3-2007 22:26:45|rosetta@home|Reason: To fetch work
3-3-2007 22:26:45|rosetta@home|Requesting 8640 seconds of new work
3-3-2007 22:26:51|rosetta@home|Scheduler request succeeded
3-3-2007 22:26:51|rosetta@home|Message from server: Your computer has 447.48MB of memory, and a job requires 476.84MB
3-3-2007 22:26:51|rosetta@home|Message from server: No work sent
3-3-2007 22:26:51|rosetta@home|Message from server: (there was work but your computer doesn't have enough memory)
3-3-2007 22:26:51|rosetta@home|No work from project

If have NOTHING changed to my computer.
I also run lhcathome, LEIDEN Classical and SETI.


Anyone any ideas?


Best regards,
Huge


ID: 37383 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
William Ostie

Send message
Joined: 6 Feb 07
Posts: 5
Credit: 1,125,655
RAC: 0
Message 37384 - Posted: 4 Mar 2007, 9:45:45 UTC - in response to Message 37347.  

Hi all,
I am getting the same Message 2/4/07 0445AM
All of a sudden I get the foolowing message:
3-3-2007 22:26:45|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
3-3-2007 22:26:45|rosetta@home|Reason: To fetch work
3-3-2007 22:26:45|rosetta@home|Requesting 8640 seconds of new work
3-3-2007 22:26:51|rosetta@home|Scheduler request succeeded
3-3-2007 22:26:51|rosetta@home|Message from server: Your computer has 447.48MB of memory, and a job requires 476.84MB
3-3-2007 22:26:51|rosetta@home|Message from server: No work sent
3-3-2007 22:26:51|rosetta@home|Message from server: (there was work but your computer doesn't have enough memory)
3-3-2007 22:26:51|rosetta@home|No work from project


All of a sudden I get the foolowing message:
3-3-2007 22:26:45|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
3-3-2007 22:26:45|rosetta@home|Reason: To fetch work
3-3-2007 22:26:45|rosetta@home|Requesting 8640 seconds of new work
3-3-2007 22:26:51|rosetta@home|Scheduler request succeeded
3-3-2007 22:26:51|rosetta@home|Message from server: Your computer has 447.48MB of memory, and a job requires 476.84MB
3-3-2007 22:26:51|rosetta@home|Message from server: No work sent
3-3-2007 22:26:51|rosetta@home|Message from server: (there was work but your computer doesn't have enough memory)
3-3-2007 22:26:51|rosetta@home|No work from project

If have NOTHING changed to my computer.
I also run lhcathome, LEIDEN Classical and SETI.


Anyone any ideas?


Best regards,
Huge


ID: 37384 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RichardJ

Send message
Joined: 19 Mar 06
Posts: 8
Credit: 73,014
RAC: 0
Message 37386 - Posted: 4 Mar 2007, 9:59:24 UTC - in response to Message 37384.  

Hi all,
I am getting the same Message 2/4/07 0445AM
All of a sudden I get the foolowing message:
3-3-2007 22:26:45|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
3-3-2007 22:26:45|rosetta@home|Reason: To fetch work
3-3-2007 22:26:45|rosetta@home|Requesting 8640 seconds of new work
3-3-2007 22:26:51|rosetta@home|Scheduler request succeeded
3-3-2007 22:26:51|rosetta@home|Message from server: Your computer has 447.48MB of memory, and a job requires 476.84MB
3-3-2007 22:26:51|rosetta@home|Message from server: No work sent
3-3-2007 22:26:51|rosetta@home|Message from server: (there was work but your computer doesn't have enough memory)
3-3-2007 22:26:51|rosetta@home|No work from project


All of a sudden I get the foolowing message:
3-3-2007 22:26:45|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
3-3-2007 22:26:45|rosetta@home|Reason: To fetch work
3-3-2007 22:26:45|rosetta@home|Requesting 8640 seconds of new work
3-3-2007 22:26:51|rosetta@home|Scheduler request succeeded
3-3-2007 22:26:51|rosetta@home|Message from server: Your computer has 447.48MB of memory, and a job requires 476.84MB
3-3-2007 22:26:51|rosetta@home|Message from server: No work sent
3-3-2007 22:26:51|rosetta@home|Message from server: (there was work but your computer doesn't have enough memory)
3-3-2007 22:26:51|rosetta@home|No work from project

If have NOTHING changed to my computer.
I also run lhcathome, LEIDEN Classical and SETI.


Anyone any ideas?


Best regards,
Huge



ID: 37386 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RichardJ

Send message
Joined: 19 Mar 06
Posts: 8
Credit: 73,014
RAC: 0
Message 37388 - Posted: 4 Mar 2007, 10:00:50 UTC - in response to Message 37386.  

Me too:
04/03/2007 09:58:06|rosetta@home|Message from server: Your computer has 223.48MB of memory, and a job requires 476.84MB
Been happily chugging away for nearly a year now and not seen this before. Anything I can do?
Hi all,
I am getting the same Message 2/4/07 0445AM
All of a sudden I get the foolowing message:
3-3-2007 22:26:45|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
3-3-2007 22:26:45|rosetta@home|Reason: To fetch work
3-3-2007 22:26:45|rosetta@home|Requesting 8640 seconds of new work
3-3-2007 22:26:51|rosetta@home|Scheduler request succeeded
3-3-2007 22:26:51|rosetta@home|Message from server: Your computer has 447.48MB of memory, and a job requires 476.84MB
3-3-2007 22:26:51|rosetta@home|Message from server: No work sent
3-3-2007 22:26:51|rosetta@home|Message from server: (there was work but your computer doesn't have enough memory)
3-3-2007 22:26:51|rosetta@home|No work from project


All of a sudden I get the foolowing message:
3-3-2007 22:26:45|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
3-3-2007 22:26:45|rosetta@home|Reason: To fetch work
3-3-2007 22:26:45|rosetta@home|Requesting 8640 seconds of new work
3-3-2007 22:26:51|rosetta@home|Scheduler request succeeded
3-3-2007 22:26:51|rosetta@home|Message from server: Your computer has 447.48MB of memory, and a job requires 476.84MB
3-3-2007 22:26:51|rosetta@home|Message from server: No work sent
3-3-2007 22:26:51|rosetta@home|Message from server: (there was work but your computer doesn't have enough memory)
3-3-2007 22:26:51|rosetta@home|No work from project

If have NOTHING changed to my computer.
I also run lhcathome, LEIDEN Classical and SETI.


Anyone any ideas?


Best regards,
Huge




ID: 37388 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
288VKYUjwsXfAaTXn6SFJC4LVPRf

Send message
Joined: 16 Dec 05
Posts: 31
Credit: 153,110
RAC: 0
Message 37394 - Posted: 4 Mar 2007, 10:58:44 UTC - in response to Message 37358.  

My WU is at 1% and wont go any higher. The CPU time is counting, but the percentage is constantly 1% and the time remaining is counting up.


The same for me. Here is the link to my Failed WU

Paused the WU, resumed it. Closed Boinc and restarted Boinc. Nothing worked. It was just totally frozen after 1 minute and a couple of seconds.
ID: 37394 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
288VKYUjwsXfAaTXn6SFJC4LVPRf

Send message
Joined: 16 Dec 05
Posts: 31
Credit: 153,110
RAC: 0
Message 37395 - Posted: 4 Mar 2007, 11:02:45 UTC

And again :|

This one already stopped after 6 seconds.
ID: 37395 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rene
Avatar

Send message
Joined: 2 Dec 05
Posts: 10
Credit: 67,269
RAC: 0
Message 37398 - Posted: 4 Mar 2007, 13:07:12 UTC
Last modified: 4 Mar 2007, 13:36:44 UTC

My Ubuntu (6.10) host just "froze" a couple of minutes ago while crunching this wu...

I had to do a "hard reset" because nothing was responding... wu has restarted now.
Problem appeared just before reaching 41%.

;-)

Edit: just happened for the second time... after reaching 41.602%... did a "hard reset" again... wu restarted and cpu time is at approx 1:19:00

Edit 2: and it's repeating.... thirth reset of the host was needed... % went back to 41.600 and cpu time to 1:15.44

I will give it another try, but will abort it if it occurs again.
ID: 37398 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rene
Avatar

Send message
Joined: 2 Dec 05
Posts: 10
Credit: 67,269
RAC: 0
Message 37402 - Posted: 4 Mar 2007, 14:05:27 UTC - in response to Message 37398.  

I will give it another try, but will abort it if it occurs again.


Just did... after it "froze" the complete system again the wu kicked back again to 41.600% and it looks like the next stage could not be reached.
Kept an eye on the running processes and just before things got bad, 5.48 went back to 0% of CPU use.
Other Rosetta wu in que seems to have the same problem... only this one stopt at 1.030%

;-)

ID: 37402 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Problems with rosetta 5.48



©2024 University of Washington
https://www.bakerlab.org