Report Problems with Rosetta Version 5.25

Message boards : Number crunching : Report Problems with Rosetta Version 5.25

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 12 · Next

AuthorMessage
Profile eNDo

Send message
Joined: 9 Apr 06
Posts: 9
Credit: 372,288
RAC: 0
Message 21960 - Posted: 7 Aug 2006, 5:22:17 UTC

Still having issues with 5.25. Same problems as I posted before, and apparently, many others are posting on. All problems with previous versions were minor and few in number. I'm running it on 6 hosts, all with varying CPU/OS's. All with plenty of RAM. Anxiously awaiting some fixes =)
Welcome all newcommers.


ID: 21960 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tim

Send message
Joined: 8 Jul 06
Posts: 2
Credit: 52,584
RAC: 0
Message 21965 - Posted: 7 Aug 2006, 8:42:45 UTC - in response to Message 19790.  

I have a graphical problem with all the workunits. I have a laptop with widescreen (1280*800), and when I see the graphics in full screen or using the screensaver it is cropped and the bottom line is missing. Here is a screenshoot:


It only happens when I suspend a WU and resume it. When a new WU starts it fits the resulotion, but if I exit the screensaver and let it start again, the graphics don't fit anymore.


This happens on Mac OS X too. It's related to widescreen displays. It may be related to using the default boinc graphics code which was written for normal aspect ratio displays. See http://www.ssl.berkeley.edu/pipermail/boinc_dev/2006-July/006034.html

and the following messages in the thread.

Tim
ID: 21965 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tim

Send message
Joined: 8 Jul 06
Posts: 2
Credit: 52,584
RAC: 0
Message 21966 - Posted: 7 Aug 2006, 8:56:04 UTC

Rosetta 5.25 on Mac OS X 10.4.7 incorrectly classifies the target t314__CASP7_FOLLOWUP_ABRELAX_SAVE_ALL_OUT_BARCODE_perfectss__1066_42278_0
as unknown, even though the display clearly shows it as known.

Use the link to email me.
ID: 21966 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile eNDo

Send message
Joined: 9 Apr 06
Posts: 9
Credit: 372,288
RAC: 0
Message 22151 - Posted: 9 Aug 2006, 18:58:00 UTC

Well i've waited as long as I care to wait spending my $ contributing.

PS.... 5.25 has been the worst release, it has made me quit when I spent most of my time trying to recuit people to Rosetta. I posted months ago on the issue... nothing.

Now I retire Rosetta from my machines, and can enjoy my lower power bill too. 5 machines were eating plenty.

ID: 22151 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 22152 - Posted: 9 Aug 2006, 19:55:05 UTC - in response to Message 22151.  

Well i've waited as long as I care to wait spending my $ contributing.

PS.... 5.25 has been the worst release, it has made me quit when I spent most of my time trying to recuit people to Rosetta. I posted months ago on the issue... nothing.

Now I retire Rosetta from my machines, and can enjoy my lower power bill too. 5 machines were eating plenty.


That is sad. For most people it seems 5.25 works quite well including me. I had this "sit with 0% CPU usage" only over at Ralph with some pre 5.25 version. Have you ever attached to RALPH to see whether you have the same problems there as here?

I think we'll see new versions soon. Now that CASP is over they probably resume tweaking the app.
ID: 22152 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
jjgb10

Send message
Joined: 29 Sep 05
Posts: 21
Credit: 6,152,959
RAC: 0
Message 22153 - Posted: 9 Aug 2006, 20:24:31 UTC

I have had ZERO problems with this release of Rosetta. I have Rosetta installed on 7 computers and it runs on every computer just fine with no problems. I am running the BOINC version 5.4.9.
ID: 22153 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile eNDo

Send message
Joined: 9 Apr 06
Posts: 9
Credit: 372,288
RAC: 0
Message 22158 - Posted: 10 Aug 2006, 1:21:51 UTC
Last modified: 10 Aug 2006, 1:27:36 UTC

Perhaps all of you who think 5.25 is flawless need to read this thread. Nothing but posts of issues. All the same for every person. I posted a month+ ago about it, nothing was said in any way. I posted again a few weeks ago, still nothing. Now I quit and people notice. I've added somewhere about 40 machines to this project because I think it's a great cause, a great idea, and a new path.
So I lost who knows how many hours with 5.25 stopping at 99-100%, then dumping the WU on reboot. It's just frustrating. If I didn't have so many boxes to check all the time to make sure they were running, It wouldn't bother me. I realize the attention has been focused on CASP, but there is MORE than enough people bringing forth issues since 5.25's debut. And for the record, I had no problems UNTIL 5.25.
ID: 22158 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jack Shaftoe
Avatar

Send message
Joined: 30 Apr 06
Posts: 115
Credit: 1,307,916
RAC: 0
Message 22159 - Posted: 10 Aug 2006, 2:36:30 UTC - in response to Message 22153.  

I have had ZERO problems with this release of Rosetta. I have Rosetta installed on 7 computers and it runs on every computer just fine with no problems. I am running the BOINC version 5.4.9.


I have to restart 1 or 2 machines each week. But that's not bad for 22. I check my stats every day to make sure all of them are connecting. When one of them gets late, it means I have to restart the software.

If I didn't have to do that my efficiency would be higher, but not significantly. What can I provide to help figure out what is happening?
Team Starfire World BOINC
ID: 22159 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Evan

Send message
Joined: 23 Dec 05
Posts: 268
Credit: 402,585
RAC: 0
Message 22215 - Posted: 10 Aug 2006, 19:48:41 UTC

I don't know what happened to resultid=32044316 but it failed, sped through 3 waiting work units at the speed of light, and thoroughly mixed up the computer.
ID: 22215 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 22219 - Posted: 10 Aug 2006, 22:21:44 UTC

Sangamon I would think it would be helpful if you could note any patterns in your farm. Is it the same 2 or 3 boxes getting hung up? Or random? Do they all run same BOINC version? Same OS udpates? Same BOINC Preferences and locations?

You might make it easy on yourself, whenever a machine gets hung up, attach it to Ralph for like a 10% resource share. So you can just see if the machine is already on Ralph, and start to get a feel if there are perfect machines in your farm, or if all are effected.

Then screutenize the Ralph work done closely. If failures occur on Ralph, more diagnostic data is returned to the project. Post on the boards there the specific WUs you see hanging up or crashing BOINC.

If you have specific Rosetta WUs that caused problems, post their IDs here. Sometimes there are issues with how the WUs are created. Other times it's a specific random number for a model that uncovers a problem.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 22219 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 11 Oct 05
Posts: 150
Credit: 3,818,279
RAC: 888
Message 22248 - Posted: 11 Aug 2006, 5:38:10 UTC

The people who have had no problems, I believe have been lucky and are probably running Windows. I have 4 computers at home, 2 do Windows and 2 do Linux. One of the Windows machines has had about 2 or 3 Wu's lock up but that is all. The 2 Linux machines have had numerous WU's lockup (around 2 dozen (24) at least). In all cases the "CPU time" and "To completion" times stop and the host CPU drops back to idle but does not move from that WU, saying it is running when in fact it is not, even after many hours to days.
Suspending the WU and resuming has no effect, restarting the Boinc Client/Manager has no effect, only a reboot will get the unit working again. More often than not after restarting the WU will error out anyway.
I do not intend to reboot every time a WU stops so at this stage I have been aborting the WU's in question to keep my computer doing something useful.
Seems to be more a Linux problem than Windows and mostly I get a "Segmentation Error" as the cause of the failure. The time to failure can be from a few minutes/percent to almost complete (80 to 90% done).
Happens on both Rosetta@home and also on Ralph@home, with more failures on Ralph than on Rosetta.
ID: 22248 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 22279 - Posted: 11 Aug 2006, 14:45:37 UTC - in response to Message 22248.  

The people who have had no problems, I believe have been lucky and are probably running Windows. I have 4 computers at home, 2 do Windows and 2 do Linux. One of the Windows machines has had about 2 or 3 Wu's lock up but that is all. The 2 Linux machines have had numerous WU's lockup (around 2 dozen (24) at least).


My linux crunchers have been going 24/7 with no problems for a long time now. (I've had 17 Linux crunchers going 100% on Rosetta during casp, but have now dropped back to 13 Linux crunchers.)

I'm using the standard BOINC 5.2.13.
ID: 22279 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile eNDo

Send message
Joined: 9 Apr 06
Posts: 9
Credit: 372,288
RAC: 0
Message 22346 - Posted: 12 Aug 2006, 16:24:35 UTC

I've detached from rosetta for the time, and moved to 100% share on ralph. I don't want to just give up, 5.25 is just making me loopy.

ID: 22346 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 11 Oct 05
Posts: 150
Credit: 3,818,279
RAC: 888
Message 22382 - Posted: 13 Aug 2006, 8:45:31 UTC

Still getting same problems on my 2 Linux machines, Windows machines ok.
WU says it is running but nothing is happening, no timers are moving and CPU is idle. Will not switch to another project as per preferences, my only solution is to abort as I can not be rebooting all the time to restart WU.
Getting "process exited with code 131" ERROR:SIGSEGV:segmentation violation"
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28130067
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28130003 (aborted)
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28129983
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28129982
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28129975 (aborted)
Plus these 2 the day before (12th)
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=27385306 (aborted)
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=27385307 (aborted)
Also have been getting the same error/problems on Ralph, again with Linux.
ID: 22382 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
[AF>Linux]Arnaud
Avatar

Send message
Joined: 17 Sep 05
Posts: 38
Credit: 10,490
RAC: 0
Message 22396 - Posted: 13 Aug 2006, 13:55:15 UTC

process exited with code 131 (0x83) here
Boinc 5.4.9 on Linux Suse 10.0.
Arnaud
ID: 22396 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Stan Pleban
Avatar

Send message
Joined: 6 Jun 06
Posts: 8
Credit: 5,771
RAC: 0
Message 22398 - Posted: 13 Aug 2006, 14:51:19 UTC - in response to Message 22382.  
Last modified: 13 Aug 2006, 15:01:46 UTC

Hello Conan...would appreciate knowing where you get that download for the core client 5.5.0 that you are using....you are getting the same credits in ROSETTA using that client as I did using 5.3.12, but not 5.4.11....thanks, Stan

user stats
ID: 22398 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 11 Oct 05
Posts: 150
Credit: 3,818,279
RAC: 888
Message 22399 - Posted: 13 Aug 2006, 14:52:39 UTC - in response to Message 22382.  

Still getting same problems on my 2 Linux machines, Windows machines ok.
WU says it is running but nothing is happening, no timers are moving and CPU is idle. Will not switch to another project as per preferences, my only solution is to abort as I can not be rebooting all the time to restart WU.
Getting "process exited with code 131" ERROR:SIGSEGV:segmentation violation"
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28130067
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28130003 (aborted)
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28129983
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28129982
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28129975 (aborted)
Plus these 2 the day before (12th)
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=27385306 (aborted)
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=27385307 (aborted)
Also have been getting the same error/problems on Ralph, again with Linux.


A follow up to the above, all of the new Wu's that I have processed today have died. All with the same errors as above and here are another 2
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28130084
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28129985

2 have died at 3741 and 3758 seconds (wu 28130067,28129983), with the others going between 5347 and 5400 seconds (about 1 1/2 hours). Coincidently my switch time between projects is 90 minutes (1 1/2 hours).
The WU's either error out at the times mentioned or hang with nothing happening. This problem has been increasing over the last few days.
The 2 Linux machines having most of the problems are both AMD, both dual CPUs, one a 275 with 4 GB RAM and one a 848 with 2 GB RAM, both running Linux Fedora Core 3.

ID: 22399 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
NJMHoffmann

Send message
Joined: 17 Dec 05
Posts: 45
Credit: 45,891
RAC: 0
Message 22408 - Posted: 13 Aug 2006, 17:17:47 UTC - in response to Message 22399.  

Rosetta 5.25 writes its checkpoints before calculating, if it shouldn't stop now. With the actual betas of boinc this leads to the funny situation, that boinc waits for a checkpoint to switch task - sees the checkpoint - switches task - and Rosetta sits there with 100% done, till the other projects got their share, before the result is uploaded.

Norbert
ID: 22408 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TCU Computer Science

Send message
Joined: 7 Dec 05
Posts: 28
Credit: 12,861,977
RAC: 0
Message 22414 - Posted: 13 Aug 2006, 19:28:06 UTC - in response to Message 20832.  

Another stuck work unit:

2f21X_BOINC_ABRELAX_SAVE_ALL_OUT_BARCODE__1075_31308
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=27912763

It has been running for more than 2 days but accumulated only 6 hours of CPU time and is stuck at 74.4%.

When I stopped BOINC, the Rosetta processes did not terminate.
I rebooted the machine and the WU immediately terminated with the error

ERROR:: Exit at: initialize.cc line:1618
ID: 22414 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tino Ruiz

Send message
Joined: 12 Oct 05
Posts: 13
Credit: 397,392
RAC: 0
Message 22529 - Posted: 16 Aug 2006, 11:30:41 UTC
Last modified: 16 Aug 2006, 11:32:04 UTC

Hi,

I'm having the same problems with Rosetta@Home "hanging" (it shows "running" but the CPU is at 0%). Usually it occurs within 23%~26% of processing the unit. The same thing happens with World Community Grid as well, but I know that's another project. ALL other projects work fine. I'm on a P4C 2.6 GHz, 512 MB RAM running Xubuntu. Nothing is overclocked. The workunits below I know are "stuck":

FRA_t370_CASPR_hom001_6_t370_4_2a2jA_IGNORE_THE_REST_223_1078_61_0
FRA_t322_CASPR_hom001_6_t322_3_1u1zA_IGNORE_THE_REST_17_1079_65_1

There are a lot more that I've had to abort over the weeks, but my log only goes so far.
ID: 22529 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 12 · Next

Message boards : Number crunching : Report Problems with Rosetta Version 5.25



©2024 University of Washington
https://www.bakerlab.org