Problems with version 5.90/5.91

Message boards : Number crunching : Problems with version 5.90/5.91

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 7 · Next

AuthorMessage
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 49836 - Posted: 20 Dec 2007, 20:41:41 UTC

Thanks for continuing to post bugs. We'd be particularly grateful if users who were noticing memory hog issues with 5.89 could post if the newer app is better!
ID: 49836 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 49845 - Posted: 20 Dec 2007, 23:01:56 UTC
Last modified: 20 Dec 2007, 23:11:23 UTC

resultid=128108283

This is but one example of why it appears that 5.90 doesn't run on linux machines (atleast mine). I watched it switch from 5.89 to 5.90. Gkrellm shows 100% cpu usage on both cores of my AMD64 X2 6000, however, CPU Time and progress indicators DO NOT progress/count up. I have aborted 3 different jobs so far in the last 10 minutes for this reason, and have found none so far that run properly.

I use this Boinc on all machines and am waiting for the other machines to finish 5.89 work to see what happens:

5.10.21 X86-64

NOTE: NONE of them ran on my machine. I ended up aborting them all and rebooting to windows on this machine.
ID: 49845 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 49846 - Posted: 21 Dec 2007, 0:01:10 UTC
Last modified: 21 Dec 2007, 0:03:26 UTC

I'm seeing the exact same thing with my AMD64 2800 and my AMD64 X2 4800 as well. I let them work on the tasks for 15 min and still only have --- as a cpu time. Not even 00:00:00. Although I did see the zeros on a couple of the ones from the 6000. Also, after suspending the already running 5.89 tasks both of them changed to "computation error". The 4800 is the only one that produced the "computation error" after suspension. Looks like I'll be windows only after these 5.89's run dry.

NOTE: 5.90 does run on my AMD64 3700 and using windows.

Hmmm, After 15 min and before I could abort the ones on my 4800 one of them switched to 19.864% done, but still shows --- as cpu time.
ID: 49846 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 49847 - Posted: 21 Dec 2007, 0:21:31 UTC - in response to Message 49846.  

This seems very odd. Thanks a lot for posting, especially the link to the workunit. I checked here that the %cpu usage is fine for other platforms, so I fear that this is a linux-specific issue.

Anyone else out there noticing success or failure with Linux?

Astro, do other apps (e.g., SETI) run fine?

Also, do you happen to know what version of BOINC you are using?

I'm seeing the exact same thing with my AMD64 2800 and my AMD64 X2 4800 as well. I let them work on the tasks for 15 min and still only have --- as a cpu time. Not even 00:00:00. Although I did see the zeros on a couple of the ones from the 6000. Also, after suspending the already running 5.89 tasks both of them changed to "computation error". The 4800 is the only one that produced the "computation error" after suspension. Looks like I'll be windows only after these 5.89's run dry.

NOTE: 5.90 does run on my AMD64 3700 and using windows.

Hmmm, After 15 min and before I could abort the ones on my 4800 one of them switched to 19.864% done, but still shows --- as cpu time.


ID: 49847 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 49848 - Posted: 21 Dec 2007, 0:32:27 UTC - in response to Message 49847.  
Last modified: 21 Dec 2007, 0:41:15 UTC

This seems very odd.
Astro, do other apps (e.g., SETI) run fine?

Also, do you happen to know what version of BOINC you are using?


Mandriva spring free 2007, Boinc (official 64b version) 5.10.21. I watched it switch from 5.89 to 5.90 on the 6000 machine, so I watched it stop working. I have aborted all 5.90 but those two remaining/running on my AMD64 X2 4800. I watched the progress jump from zero to 19, then later to 26 percent, but hasn't moved in some time. I thought it might be "checkpoints" where it updates the progress. On the other wu on that machine it took a long time to go from 0 to .010% and has just recently switched to .020% (nearly 40 min of run time, and cpu run time is set to 1 hour). I am trying to see if those two will finish and upload normally. Time will tell LOL. I'm sure they'll report ZERO cpu time, and therefor Zero Claimed Credit. Some users might not care for zero credit. LOL

HMMM I just looked over and see the one that was at 26 percent, just switched to 66.929% and shows 10 min remaining, but still has zero cpu time. Also, To Completion is lowering with each update to the percentage.

oops, yes, it used to do Seti fine, but not since the 26th of November, as I stopped doing Seti.

[edit] both now show 75 and 79% done. I'll get you some links when they're uploaded and reported. Neither shows cpu time.
ID: 49848 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 49849 - Posted: 21 Dec 2007, 1:03:55 UTC

My AMD64 X2 4800 in question is hostid=692483. At some point a minute or so from ending, the cpu time "flashed" 01:03:34 then back to ---. I just watched the 2 mit BOINC SYMM Fold and dock switch from 92% with 00:01:34 remaining to 0.029% with 01:38:05 remaining. As if it just restarted over again from scratch. Should I abort these, or let them run a bit longer???

The second/other wu was at 85% with 2 min remaining, and switched to actually displaying 00:00:00 cpu time, 0.000% done, and 01:38:05 remaining.
ID: 49849 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 49850 - Posted: 21 Dec 2007, 1:09:22 UTC
Last modified: 21 Dec 2007, 1:10:23 UTC

I just aborted them and two 5.89's started right up working normally.

The goofy ones were resultid=128129057 and resultid=128128049.

I'm now free of any 5.90's and will be win only for a while(once I'm out of 5.89's).
ID: 49850 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile sslickerson

Send message
Joined: 14 Oct 05
Posts: 101
Credit: 578,497
RAC: 0
Message 49851 - Posted: 21 Dec 2007, 2:01:11 UTC

I just had all of my 5.90 tasks error out one after another

see here:

USER 4578

Tim
ID: 49851 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Thomas Leibold

Send message
Joined: 30 Jul 06
Posts: 55
Credit: 19,627,164
RAC: 0
Message 49853 - Posted: 21 Dec 2007, 5:54:22 UTC

I currently have both a Ralph 5.90 and a Rosetta 5.90 running on my home system (SuSE Linux 32-bit dual cpu).

The Ralph 5.90 task shows cpu time 00:00:00 and progress 0.000% and 3:59:02 time to completion (I run Ralph with 4 hour workunits). This task seems to have run for over 24 hours and will probably never finish.

The Rosetta 5.90 task shows cpu time --- and progress 0.040% and 7:54:07 time to completion (I run Rosetta with 8 hour workunits). It has only run for about 10 minutes, so there is no telling yet how it will behave. While I'm typing this the progress and time to completion values have jumped up and down a couple of times (by as much as 2.5% progress and 30 minutes of time to completion), but cpu time remains just dashes.
Team Helix
ID: 49853 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Thomas Leibold

Send message
Joined: 30 Jul 06
Posts: 55
Credit: 19,627,164
RAC: 0
Message 49854 - Posted: 21 Dec 2007, 5:58:09 UTC - in response to Message 49853.  
Last modified: 21 Dec 2007, 6:24:59 UTC


The Rosetta 5.90 task shows cpu time --- and progress 0.040% and 7:54:07 time to completion (I run Rosetta with 8 hour workunits). It has only run for about 10 minutes, so there is no telling yet how it will behave. While I'm typing this the progress and time to completion values have jumped up and down a couple of times (by as much as 2.5% progress and 30 minutes of time to completion), but cpu time remains just dashes.


Shortly after posting the above message I saw a brief flash of the cpu time for the Rosetta 5.90 task (which was confirmed by ps), but the display went back to --- immediately afterwards.

Here are all the lines starting with BOINC in the stdout.txt file for the indefinite Ralph 5.90 task. It seems actual cpu time is stuck at 0.000999 and therefore never approaches 14400 (4 hours). The Watchdog timer isn't kicking in because the client is making progress completing more and more decoys.

BOINC :: [2007-12-19 21:25:55:] :: mode: pose1 :: nstartnum: 1 :: number_of_output: 9999 :: num_decoys: 0 :: pct_complete: 0
BOINC :: [2007-12-19 22:24:24:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 1 :: num_decoys: 1 :: farlx_stage: 0
BOINC :: [2007-12-19 23:17:12:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 2 :: num_decoys: 2 :: farlx_stage: 0
BOINC :: [2007-12-19 23:17:12:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 0.0004995
BOINC :: [2007-12-20 0: 3:15:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 3 :: num_decoys: 3 :: farlx_stage: 0
BOINC :: [2007-12-20 0: 3:15:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 0.000333
BOINC :: [2007-12-20 0:51:41:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 4 :: num_decoys: 4 :: farlx_stage: 0
BOINC :: [2007-12-20 0:51:41:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 0.00024975
BOINC :: [2007-12-20 1:49:10:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 5 :: num_decoys: 5 :: farlx_stage: 0
BOINC :: [2007-12-20 1:49:10:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 0.0001998
BOINC :: [2007-12-20 2:43:43:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 6 :: num_decoys: 6 :: farlx_stage: 0
BOINC :: [2007-12-20 2:43:43:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 0.0001665
BOINC :: [2007-12-20 3:34:52:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 7 :: num_decoys: 7 :: farlx_stage: 0
BOINC :: [2007-12-20 3:34:52:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 0.000142714
BOINC :: [2007-12-20 4:24:11:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 8 :: num_decoys: 8 :: farlx_stage: 0
BOINC :: [2007-12-20 4:24:11:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 0.000124875
BOINC :: [2007-12-20 5:12:53:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 9 :: num_decoys: 9 :: farlx_stage: 0
BOINC :: [2007-12-20 5:12:53:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 0.000111
BOINC :: [2007-12-20 5:55:19:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 10 :: num_decoys: 10 :: farlx_stage: 0
BOINC :: [2007-12-20 5:55:19:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 9.99e-05
BOINC :: [2007-12-20 6:41:22:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 11 :: num_decoys: 11 :: farlx_stage: 0
BOINC :: [2007-12-20 6:41:22:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 9.08182e-05
BOINC :: [2007-12-20 7:46: 5:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 12 :: num_decoys: 12 :: farlx_stage: 0
BOINC :: [2007-12-20 7:46: 5:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 8.325e-05
BOINC :: [2007-12-20 8:30:52:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 13 :: num_decoys: 13 :: farlx_stage: 0
BOINC :: [2007-12-20 8:30:52:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 7.68462e-05
BOINC :: [2007-12-20 9:19:13:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 14 :: num_decoys: 14 :: farlx_stage: 0
BOINC :: [2007-12-20 9:19:13:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 7.13571e-05
BOINC :: [2007-12-20 10: 5:29:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 15 :: num_decoys: 15 :: farlx_stage: 0
BOINC :: [2007-12-20 10: 5:29:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 6.66e-05
BOINC :: [2007-12-20 11: 3:21:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 16 :: num_decoys: 16 :: farlx_stage: 0
BOINC :: [2007-12-20 11: 3:21:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 6.24375e-05
BOINC :: [2007-12-20 11:53:49:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 17 :: num_decoys: 17 :: farlx_stage: 0
BOINC :: [2007-12-20 11:53:49:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 5.87647e-05
BOINC :: [2007-12-20 12:40:32:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 18 :: num_decoys: 18 :: farlx_stage: 0
BOINC :: [2007-12-20 12:40:32:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 5.55e-05
BOINC :: [2007-12-20 13:24:42:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 19 :: num_decoys: 19 :: farlx_stage: 0
BOINC :: [2007-12-20 13:24:42:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 5.25789e-05
BOINC :: [2007-12-20 14:14:23:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 20 :: num_decoys: 20 :: farlx_stage: 0
BOINC :: [2007-12-20 14:14:23:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 4.995e-05
BOINC :: [2007-12-20 15: 6:58:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 21 :: num_decoys: 21 :: farlx_stage: 0
BOINC :: [2007-12-20 15: 6:58:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 4.75714e-05
BOINC :: [2007-12-20 15:51: 4:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 22 :: num_decoys: 22 :: farlx_stage: 0
BOINC :: [2007-12-20 15:51: 4:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 4.54091e-05
BOINC :: [2007-12-20 16:35: 6:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 23 :: num_decoys: 23 :: farlx_stage: 0
BOINC :: [2007-12-20 16:35: 6:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 4.34348e-05
BOINC :: [2007-12-20 17:24: 1:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 24 :: num_decoys: 24 :: farlx_stage: 0
BOINC :: [2007-12-20 17:24: 1:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 4.1625e-05
BOINC :: [2007-12-20 18: 7:38:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 25 :: num_decoys: 25 :: farlx_stage: 0
BOINC :: [2007-12-20 18: 7:38:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 3.996e-05
BOINC :: [2007-12-20 19: 1: 9:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 26 :: num_decoys: 26 :: farlx_stage: 0
BOINC :: [2007-12-20 19: 1: 9:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 3.84231e-05
BOINC :: [2007-12-20 19:50:50:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 27 :: num_decoys: 27 :: farlx_stage: 0
BOINC :: [2007-12-20 19:50:50:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 3.7e-05
BOINC :: [2007-12-20 20:35:27:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 28 :: num_decoys: 28 :: farlx_stage: 0
BOINC :: [2007-12-20 20:35:27:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 3.56786e-05
BOINC :: [2007-12-20 21:31:26:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 29 :: num_decoys: 29 :: farlx_stage: 0
BOINC :: [2007-12-20 21:31:26:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 3.44483e-05
BOINC :: [2007-12-20 22:17: 3:] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 30 :: num_decoys: 30 :: farlx_stage: 0
BOINC :: [2007-12-20 22:17: 3:] :: cpu_time_pref: 14400 :: cpu_time: 0.000999 :: cpu_time_per_nstruct: 3.33e-05
Team Helix
ID: 49854 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 49860 - Posted: 21 Dec 2007, 6:51:31 UTC

Hi sslickerson.

I'm no X pert but i had a look at your results and the ones i saw

where all for the 5.89app.

pete.

ID: 49860 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Marky-UK

Send message
Joined: 1 Nov 05
Posts: 73
Credit: 1,689,495
RAC: 0
Message 49862 - Posted: 21 Dec 2007, 7:57:34 UTC - in response to Message 49847.  
Last modified: 21 Dec 2007, 7:59:09 UTC

This seems very odd. Thanks a lot for posting, especially the link to the workunit. I checked here that the %cpu usage is fine for other platforms, so I fear that this is a linux-specific issue.

Anyone else out there noticing success or failure with Linux?

Yes, I have two Linux boxes. Both are using 100% CPU time on Rosetta as seen in the process list, but BOINC Manager shows 0 progress and 0 CPU time on the WUs.

If I stop & start the BOINC client though, the WUs completed and uploaded OK. One had 9 hrs 50 min runtime (my preference is 3 hrs).
ID: 49862 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BitSpit
Avatar

Send message
Joined: 5 Nov 05
Posts: 33
Credit: 4,147,344
RAC: 0
Message 49864 - Posted: 21 Dec 2007, 13:01:10 UTC

I just finished going through the top 1000 computers and checking the results of the Linux systems. It's a very sad state. I only found 4 jobs that ran 100% properly. The rest either:

were marked invalid because the CPU time wasn't recorded
ignored the CPU runtime preference and ran up to four times the preference
reported properly because BOINC was restarted

I've stopped all job requests here on all systems (Windows too). I've double my runtime preference to squeeze more out of the 5.89 jobs. When those are done and if there's no fix for the Linux systems, I'm finished with Rosetta.
ID: 49864 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 49865 - Posted: 21 Dec 2007, 13:12:10 UTC - in response to Message 49864.  

I just finished going through the top 1000 computers and checking the results of the Linux systems. It's a very sad state. I only found 4 jobs that ran 100% properly. The rest either:

were marked invalid because the CPU time wasn't recorded
ignored the CPU runtime preference and ran up to four times the preference
reported properly because BOINC was restarted

I've stopped all job requests here on all systems (Windows too). I've double my runtime preference to squeeze more out of the 5.89 jobs. When those are done and if there's no fix for the Linux systems, I'm finished with Rosetta.

5.90 works so far on my windows machines. It's just the Linux which has issues, and as evidenced earlier, Rhiju is watching, responding, and is involved with correcting this. I'll just run Windows until a patch can be applied. If you (anyone) is linux only, then increasing the run time is a good solution.
ID: 49865 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DJStarfox

Send message
Joined: 19 Jul 07
Posts: 145
Credit: 1,235,310
RAC: 70
Message 49867 - Posted: 21 Dec 2007, 13:31:24 UTC - in response to Message 49847.  

I am running Fedora 7 linux x86_64. Running 32bit BOINC 5.8.16.

I presume 5.90 is still Beta? That's all I'm getting from the scheduler is beta 5.90. Anyway, yes, it uses less memory. GOOD. But when I tried to abort one, it became stuck in memory and became a Zombie process. Had to kill -9 it. BAD. I will let 3 run to completion and see if they are OK.
ID: 49867 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 11 Oct 05
Posts: 150
Credit: 3,796,424
RAC: 2,430
Message 49869 - Posted: 21 Dec 2007, 13:50:18 UTC - in response to Message 49865.  

I just finished going through the top 1000 computers and checking the results of the Linux systems. It's a very sad state. I only found 4 jobs that ran 100% properly. The rest either:

were marked invalid because the CPU time wasn't recorded
ignored the CPU runtime preference and ran up to four times the preference
reported properly because BOINC was restarted

I've stopped all job requests here on all systems (Windows too). I've double my runtime preference to squeeze more out of the 5.89 jobs. When those are done and if there's no fix for the Linux systems, I'm finished with Rosetta.

5.90 works so far on my windows machines. It's just the Linux which has issues, and as evidenced earlier, Rhiju is watching, responding, and is involved with correcting this. I'll just run Windows until a patch can be applied. If you (anyone) is linux only, then increasing the run time is a good solution.


Hello Astro,
What you have described for Rosetta, I am getting on Ralph. WU appears not to be doing anything but my processors are running 100% on all 4 cores.
Boinc Manager shows nothing happening except the WU is running at High Priority.
Stopping and starting BM will give current state of WU but then wont keep updating.
My WU's ran for 9 to 11 hours on a 6 hour preferance and produced 6 to 7 decoys in that time.
Rhiju is aware of it and my latest 4 WU's are doing the same thing.
Have not noticed it on Rosetta yet but probably have not finished all 5.89 WU's yet.
I am running Linux Fedora Core versions 3 and 6.
ID: 49869 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DJStarfox

Send message
Joined: 19 Jul 07
Posts: 145
Credit: 1,235,310
RAC: 70
Message 49871 - Posted: 21 Dec 2007, 13:54:16 UTC - in response to Message 49869.  
Last modified: 21 Dec 2007, 14:04:06 UTC

What you have described for Rosetta, I am getting on Ralph. WU appears not to be doing anything but my processors are running 100% on all 4 cores.
Boinc Manager shows nothing happening except the WU is running at High Priority.
Stopping and starting BM will give current state of WU but then wont keep updating.
My WU's ran for 9 to 11 hours on a 6 hour preferance and produced 6 to 7 decoys in that time.
Rhiju is aware of it and my latest 4 WU's are doing the same thing.
Have not noticed it on Rosetta yet but probably have not finished all 5.89 WU's yet.
I am running Linux Fedora Core versions 3 and 6.


Yes, I am seeing the same thing now too. It doesn't show any progress, but CPU is 100%. I assume it's still working somehow?

Correction: It does eventually update the tasks pane/status, but it's taking 10 minutes or so just to update the % done. This may be because of compiling with the newest BOINC API (as stated in release notes thread).
ID: 49871 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
vicel

Send message
Joined: 28 Mar 06
Posts: 5
Credit: 957,142
RAC: 0
Message 49875 - Posted: 21 Dec 2007, 16:51:08 UTC

I'm running under Ubuntu 7.10. Core2Duo.
Progress indicators DO NOT progress/count up too.
I have break 3 jobs. For first WU I waited three hours - progress 0, but CPU was usage.
ID: 49875 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 114,368,944
RAC: 53,093
Message 49876 - Posted: 21 Dec 2007, 17:10:09 UTC

if it's so common, why wasn't the linux problem picked up on RALPH???
ID: 49876 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lusvladimir

Send message
Joined: 18 Oct 05
Posts: 12
Credit: 1,784,854
RAC: 0
Message 49877 - Posted: 21 Dec 2007, 17:17:48 UTC

Ubuntu 7.10 and Core2Duo
Progress indicators do not progress and show on my two WU's 0% and 0.014%
I'm wait 5 hours - progress freeze, CPU usage - 100 at both WU's.
ID: 49877 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 . . . 7 · Next

Message boards : Number crunching : Problems with version 5.90/5.91



©2024 University of Washington
https://www.bakerlab.org