Report Problems with Rosetta Version 5.16 I

Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 9 · Next

AuthorMessage
Profile Hoelder1in
Avatar

Send message
Joined: 30 Sep 05
Posts: 169
Credit: 3,915,947
RAC: 0
Message 16738 - Posted: 21 May 2006, 6:42:51 UTC
Last modified: 21 May 2006, 6:44:32 UTC

This result exited with code "1" giving the error message:

ERROR:: Exit at: dock_structure.cc line:401

This is a somewhat old Linux-box with just 256 MB memory but usually it runs stable - this is its first error in, I guess, months...
Team betterhumans.com - discuss and celebrate the future - hoelder1in.org
ID: 16738 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jphelan

Send message
Joined: 7 Apr 06
Posts: 1
Credit: 88,443
RAC: 0
Message 16749 - Posted: 21 May 2006, 12:14:48 UTC

I had to abort a greater number of work units after about a day since Rosetta 5.16 due to a work unit,freezing up during the process of being being worked on.
ID: 16749 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ian

Send message
Joined: 14 Apr 06
Posts: 29
Credit: 364,629
RAC: 500
Message 16752 - Posted: 21 May 2006, 13:30:42 UTC

Couple more errors in the we small hours (well, where I am anyway :))

https://boinc.bakerlab.org/rosetta/result.php?resultid=21060345

https://boinc.bakerlab.org/rosetta/result.php?resultid=21039948

Eyeballing it, I seem to go through bursts of great stability with no errors and then a brief period of alternating errors and success.
Ian Cundell, St Albans, UK
ID: 16752 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Seth Aaronson
Avatar

Send message
Joined: 5 Mar 06
Posts: 18
Credit: 3,976
RAC: 0
Message 16756 - Posted: 21 May 2006, 15:34:24 UTC

Moderator9,
Since my errors and freezes seem to be related to the rosetta/BOINC screen saver, can you point me in the right direction to find some answers for the problems with that?
Now that I am not using the BOINC screen saver, rosetta is error free for me.

ID: 16756 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile belldandy from pleiades

Send message
Joined: 2 Nov 05
Posts: 6
Credit: 102,731
RAC: 0
Message 16772 - Posted: 21 May 2006, 17:17:30 UTC - in response to Message 16766.  

There are too many errors with version 5.16 in my case.

The messages are similiar to this one:-

5/21/2006 6:23:46 PM|rosetta@home|Unrecoverable error for result FRA_t289_hom001_2_LOOPRLX_1yw6A_IGNORE_THE_REST_1958_526_1_0 (Incorrect function. (0x1) - exit code 1 (0x1))

5/21/2006 7:41:39 PM|rosetta@home|Unrecoverable error for result v287__CASP7_ABRELAX_SHORTRELAX_SAVE_ALL_OUT_cterm__522_5897_0 (Incorrect function. (0x1) - exit code 1 (0x1))


Links:-

https://boinc.bakerlab.org/rosetta/result.php?resultid=21070416
https://boinc.bakerlab.org/rosetta/result.php?resultid=21069579
https://boinc.bakerlab.org/rosetta/result.php?resultid=21068953
https://boinc.bakerlab.org/rosetta/result.php?resultid=21068304
https://boinc.bakerlab.org/rosetta/result.php?resultid=21067447
https://boinc.bakerlab.org/rosetta/result.php?resultid=21067020
https://boinc.bakerlab.org/rosetta/result.php?resultid=21063965
https://boinc.bakerlab.org/rosetta/result.php?resultid=21063932
https://boinc.bakerlab.org/rosetta/result.php?resultid=21063913
https://boinc.bakerlab.org/rosetta/result.php?resultid=21062758
https://boinc.bakerlab.org/rosetta/result.php?resultid=21062234
https://boinc.bakerlab.org/rosetta/result.php?resultid=21061228
https://boinc.bakerlab.org/rosetta/result.php?resultid=21061227

There are more of those from my computer that use 5.16 client. Almost all of them failed, usually after 3-5 minutes - sometimes longer, with the same error message above. My other host who are still at 5.13 has less errors, still have some but not warranting a complaint here. I dreaded upgrading that one to 5.16 if this continues. The error rate can be considered to be above 90% in my case.

Edit: BTW, the problematic computer ID is 217183

...

belldandy from pleiades,

I hope you won't mind if I replace your original post with this one. The image was very large and it stretches the forum page display, requiring people to scroll right and left to read and reply to posts.

I would recommend you try a project reset. There is no problem with the work unit batch, so the problem is local to your machine. I have seen this same error before, and on some systems a rest fixes it, on other an attach/reattach fixes it. If these things do not work then we will have to dig deeper.

One thing I would recommend is that you upgrade to the BOINC 5.4.9 client. That is the current recommended version of BOINC. It is far more stable, and it work very well with version 5.16 of Rosetta. That alone might solve your problem.


I did use BOINC 5.4.9.
I will try resetting the project tommorow.
Campeones everywhere!
ID: 16772 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Laurenu2

Send message
Joined: 6 Nov 05
Posts: 57
Credit: 3,818,778
RAC: 0
Message 16783 - Posted: 21 May 2006, 20:35:24 UTC

A lot of my nodes are without work due to reaching there WU quotas Rosetta should check there system and purge the BAD WU's they just sent out
If You Want The Best You Must forget The Rest
---------------And Join Free-DC----------------
ID: 16783 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 16787 - Posted: 21 May 2006, 21:08:54 UTC

Now this is weird:

I reattached to Rosetta. I got a work unit that is not starting. When I checked the allotted DISK SPACE assigned to Rosetta by the manager I find that ZERO, Bupcous has been assigned. And that RALGH that has been assigned 1/11th of my resources has 27+ Gigabytes assigned. There is no way a Rosetta WU can run on zero disk space. Can someone tell me what would drive the manager to do that?

BTW I am attached to RALPH and I am waiting for jobs to run.
ID: 16787 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Aglarond

Send message
Joined: 29 Jan 06
Posts: 26
Credit: 446,212
RAC: 0
Message 16790 - Posted: 21 May 2006, 21:29:01 UTC - in response to Message 16733.  
Last modified: 21 May 2006, 21:30:17 UTC

LINUX problem:
I need help with this problem: while running Rosetta on Linux server with PentiumIV HyperThreading processor, Rosetta occasionally hangs in a very strange state: everything is running except Rosetta. Boinc is running. Application on other thread (Simap@home) is running. Just Rosetta isn't.


I had encountered this particular issue back in Jan/Feb-06 (also under Linux). Overall about 5-6 times.

BOINC log would show that boinc restarted Rosetta, but the Rosetta process would just stay "idle" (ps flags were "SN"=sleep,nice consuming no CPU time) for hours/days, until I manually killed it (I guess nowadays the "watchdog" thread will catch it).


I don't think Watchdog can catch it, because whole process is sleeping.. it was in this state for more than 2 days and watchdog didn't catch it.


At the time, I thought it was an issue with Rosetta+BOINC interaction, as I think it happened upon resuming a Rosetta WU (with leave-in-mem=yes). At the time, I also suspected some issue with the system's resources, as that PC had only 256MB RAM and I was running 6 BOINC projects and 100+ processes.


I also have leave-in-mem=yes .. and it can be something with memory, as this is primarily webserver and it has only 1GB RAM so it can be low on RAM from time to time..


It COULD have been a faulty WU, but when I ran that WU with rosetta commandline outside BOINC and it completed fine.


No it wasn't faulty WU. After restarting boinc, both WUs were completed successfully.
ID: 16790 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 16791 - Posted: 21 May 2006, 21:32:23 UTC - in response to Message 16789.  

Now this is weird:

I reattached to Rosetta. I got a work unit that is not starting. When I checked the allotted DISK SPACE assigned to Rosetta by the manager I find that ZERO, Bupcous has been assigned. And that RALGH that has been assigned 1/11th of my resources has 27+ Gigabytes assigned. There is no way a Rosetta WU can run on zero disk space. Can someone tell me what would drive the manager to do that?

BTW I am attached to RALPH and I am waiting for jobs to run.

You resource assignment is different than the disk use settings. In fact they are not directly related at all. That said, you are correct that Rosetta should be using about 15-20 MB of space. It is possible that it did not actually download, or that it did and BOINC has not noticed the change in disk use yet. Since it is not processing the work unit yet, you might try two things.

1) Make certain that Rosetta has not been suspended in the projects tab
2) restart BOINC manager, and see if it wakes up.

I directed Rhiju to your post at Ralph. He is thrilled at the data you provided (see his post to you there). He is contacting Rom (BOINC Development) to discuss your report. Thank you for helping them.


More weirdness: The Rosetta exe and the Ralph Exe files have disappeared from the Task Manager.
ID: 16791 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Thor[Free-DC]

Send message
Joined: 24 Oct 05
Posts: 2
Credit: 354,251
RAC: 0
Message 16794 - Posted: 21 May 2006, 22:29:28 UTC

This ist not really a bug, but it is bugging me:

The new work units seem to have only very few "saving points"

Which means, you put half an hour or even an hour of crunching in, shut down the computer for some reason and when you get back to runching, you have to start over again..

I had this happen at least three times, so I wonder if there is any possibility to put more save spots in the WUs for the crunchers who are not running 24/7 ???

Greets Thor[Free-DC]
ID: 16794 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Laurenu2

Send message
Joined: 6 Nov 05
Posts: 57
Credit: 3,818,778
RAC: 0
Message 16797 - Posted: 21 May 2006, 22:56:35 UTC - in response to Message 16794.  

This ist not really a bug, but it is bugging me:

The new work units seem to have only very few "saving points"

Which means, you put half an hour or even an hour of crunching in, shut down the computer for some reason and when you get back to runching, you have to start over again..

I had this happen at least three times, so I wonder if there is any possibility to put more save spots in the WUs for the crunchers who are not running 24/7 ???

Greets Thor[Free-DC]

I to have seen this happen you reboot a pc that have a hour+ loged on it and it starts over at 00:00 you the check points are not working on all WU's

And Mod 9 then you are the lucky one that do not get these Errors But just becuse you do not get them does not meen we are not getting them

If You Want The Best You Must forget The Rest
---------------And Join Free-DC----------------
ID: 16797 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 16801 - Posted: 21 May 2006, 23:56:14 UTC - in response to Message 16772.  

Hi belldandy: I just took a look at your results too. You're getting the same
error every time -- and its due to a problem reading in a file called bbdep02.May.sortlib.gz. (Not very obvious huh?). It occured with some 5.13 workunits also, maybe some old ones that were still running when you also got 5.16 on your system.

I think that file is corrupted on your system. I'm not exactly sure how to fix this -- a boinc reinstall may trigger your system to re-download it. Alternatively, you could detach from the project, abort current workunits, and completely remove the directory that has this file, then start up BOINC again, and attach from the project.

Thanks for posting -- hope one of those solutions works! Its certainly an error that we haven't seen before.


There are too many errors with version 5.16 in my case.

The messages are similiar to this one:-

5/21/2006 6:23:46 PM|rosetta@home|Unrecoverable error for result FRA_t289_hom001_2_LOOPRLX_1yw6A_IGNORE_THE_REST_1958_526_1_0 (Incorrect function. (0x1) - exit code 1 (0x1))

5/21/2006 7:41:39 PM|rosetta@home|Unrecoverable error for result v287__CASP7_ABRELAX_SHORTRELAX_SAVE_ALL_OUT_cterm__522_5897_0 (Incorrect function. (0x1) - exit code 1 (0x1))


Links:-

https://boinc.bakerlab.org/rosetta/result.php?resultid=21070416
https://boinc.bakerlab.org/rosetta/result.php?resultid=21069579
https://boinc.bakerlab.org/rosetta/result.php?resultid=21068953
https://boinc.bakerlab.org/rosetta/result.php?resultid=21068304
https://boinc.bakerlab.org/rosetta/result.php?resultid=21067447
https://boinc.bakerlab.org/rosetta/result.php?resultid=21067020
https://boinc.bakerlab.org/rosetta/result.php?resultid=21063965
https://boinc.bakerlab.org/rosetta/result.php?resultid=21063932
https://boinc.bakerlab.org/rosetta/result.php?resultid=21063913
https://boinc.bakerlab.org/rosetta/result.php?resultid=21062758
https://boinc.bakerlab.org/rosetta/result.php?resultid=21062234
https://boinc.bakerlab.org/rosetta/result.php?resultid=21061228
https://boinc.bakerlab.org/rosetta/result.php?resultid=21061227

There are more of those from my computer that use 5.16 client. Almost all of them failed, usually after 3-5 minutes - sometimes longer, with the same error message above. My other host who are still at 5.13 has less errors, still have some but not warranting a complaint here. I dreaded upgrading that one to 5.16 if this continues. The error rate can be considered to be above 90% in my case.

Edit: BTW, the problematic computer ID is 217183

...

belldandy from pleiades,

I hope you won't mind if I replace your original post with this one. The image was very large and it stretches the forum page display, requiring people to scroll right and left to read and reply to posts.

I would recommend you try a project reset. There is no problem with the work unit batch, so the problem is local to your machine. I have seen this same error before, and on some systems a rest fixes it, on other an attach/reattach fixes it. If these things do not work then we will have to dig deeper.

One thing I would recommend is that you upgrade to the BOINC 5.4.9 client. That is the current recommended version of BOINC. It is far more stable, and it work very well with version 5.16 of Rosetta. That alone might solve your problem.


I did use BOINC 5.4.9.
I will try resetting the project tommorow.


ID: 16801 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 16802 - Posted: 22 May 2006, 0:01:40 UTC - in response to Message 16783.  
Last modified: 22 May 2006, 0:01:59 UTC

Hi Laurenu2... can you post the results page for one of your nodes that has this problem? Thanks!

I just looked through the pages for four or five of the nodes that are under your userid -- they all have had perfect success rates for
the last three days! We're not aware of any bad WU's being sent out on rosetta@home, and have been checking that the error rates are low. Obviously,
we need to know ASAP if there are any bad WUs. (There was a bad batch last week on ralph, but it was a small batch, and has been purged from the system.)

A lot of my nodes are without work due to reaching there WU quotas Rosetta should check there system and purge the BAD WU's they just sent out


ID: 16802 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Seth Aaronson
Avatar

Send message
Joined: 5 Mar 06
Posts: 18
Credit: 3,976
RAC: 0
Message 16803 - Posted: 22 May 2006, 0:08:42 UTC - in response to Message 16763.  

Moderator9,
Since my errors and freezes seem to be related to the rosetta/BOINC screen saver, can you point me in the right direction to find some answers for the problems with that?
Now that I am not using the BOINC screen saver, rosetta is error free for me.



Seth,

Yes. Could you please attach to Ralph at this address. The programers are looking for problem system to help find this specific error.


What is the recommended way of doing that? Should I suspend rosetta after I've created a RALPH account, attach to RALPH, then start to use the BOINC screen saver? I'm also attached to SETI and Einstein. Please advise.
-Seth

ID: 16803 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Seth Aaronson
Avatar

Send message
Joined: 5 Mar 06
Posts: 18
Credit: 3,976
RAC: 0
Message 16807 - Posted: 22 May 2006, 2:14:59 UTC - in response to Message 16804.  

Moderator9,
Since my errors and freezes seem to be related to the rosetta/BOINC screen saver, can you point me in the right direction to find some answers for the problems with that?
Now that I am not using the BOINC screen saver, rosetta is error free for me.



Seth,

Yes. Could you please attach to Ralph at this address. The programers are looking for problem system to help find this specific error.


What is the recommended way of doing that? Should I suspend rosetta after I've created a RALPH account, attach to RALPH, then start to use the BOINC screen saver? I'm also attached to SETI and Einstein. Please advise.
-Seth

You can just treat RALPH like any other project for the most part. The biggest difference is that while credits are awarded on RALPH there is no effort to restore lost credits. It is a development and diagnostic project. On a brighter note you will get to see the next versions of RALPH before the the rest of the world, and please do provide suggestion there if you think of any.

The link I provided is the URL that BOINC Manager is going to ask you for. Once you are attached, set the project priority low, say 10-20 percent share of your system. This will assure than when work is available you will get some, but it will not interfere with other processing too much. As far as running it just treat it as you would rosetta. If you have errors report them in the threads at RALPH, with a link to the result that had the error.

Thank you for the help.


Very well. I've attached to ralph and set its resource share to 20%. Thanks for your guidance. I'll be unsubscribing from this thread now.
Peace, year round.
-Seth

ID: 16807 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Laurenu2

Send message
Joined: 6 Nov 05
Posts: 57
Credit: 3,818,778
RAC: 0
Message 16812 - Posted: 22 May 2006, 3:56:45 UTC - in response to Message 16802.  

Hi Laurenu2... can you post the results page for one of your nodes that has this problem? Thanks!

I just looked through the pages for four or five of the nodes that are under your userid -- they all have had perfect success rates for
the last three days! We're not aware of any bad WU's being sent out on rosetta@home, and have been checking that the error rates are low. Obviously,
we need to know ASAP if there are any bad WUs. (There was a bad batch last week on ralph, but it was a small batch, and has been purged from the system.)

A lot of my nodes are without work due to reaching there WU quotas Rosetta should check there system and purge the BAD WU's they just sent out


Yes that is the same problem I have 60 to 70 PC's make Way way to many node pages to scan through
look here https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=196119
And
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=203528
There was another but it is lost in what I call my network

On this node
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=218017
I found it locked up due to Rosetta eating up all the memory and about 500 MB of a swap file had to kill Rose through Task man rebooted and it started eating memory again about 400 meg on just under 3 min I had to abort that WU and then it worked fine again.

If You Want The Best You Must forget The Rest
---------------And Join Free-DC----------------
ID: 16812 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 16816 - Posted: 22 May 2006, 6:51:08 UTC

Just a question: Are any of the people reporting errors of the 107 type using Zone Alarm?
Curious minds want to know.
ID: 16816 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hawgietonight

Send message
Joined: 18 Apr 06
Posts: 3
Credit: 808,621
RAC: 0
Message 16818 - Posted: 22 May 2006, 8:04:42 UTC - in response to Message 16816.  

Just a question: Are any of the people reporting errors of the 107 type using Zone Alarm?
Curious minds want to know.


No ZA here, just Xp's own firewall and AVG antivirus.
ID: 16818 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stwato

Send message
Joined: 11 Jan 06
Posts: 150
Credit: 655,634
RAC: 0
Message 16819 - Posted: 22 May 2006, 8:21:41 UTC
Last modified: 22 May 2006, 8:23:09 UTC

I'm not sure if this is a 5.16 problem or whether its something to do with my computer but sometimes when I click 'show graphics' and maximise the graphics window, the very bottom part with Accepted Energy and Accepted RMSD dissapear behind/below the taskbar (obviously a Windows machine). For example, just now I displayed the graphics, maximised it and everything is good. Then I closed it, reopened it and remaximised it and the bottom bit was missing. Nothing else on my system changed between opening the windows. Any ideas?

If it helps I have a ATI Radeon 9700 graphics card. The computer is a laptop with a widescreen, could it be a resolution problem?

I've just noticed that the problem happens before maximisation, i.e. the bottom doesn't show in the small window if its not going to show in the big window and vice versa.

This is not a problem for me, just a little frustrating when trying to see the hidden details.

Stwato
[Edit: too many zero's on graphics card description]
ID: 16819 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ian

Send message
Joined: 14 Apr 06
Posts: 29
Credit: 364,629
RAC: 500
Message 16823 - Posted: 22 May 2006, 11:13:09 UTC

Another one for you.

https://boinc.bakerlab.org/rosetta/result.php?resultid=21143590
Ian Cundell, St Albans, UK
ID: 16823 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 9 · Next

Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I



©2025 University of Washington
https://www.bakerlab.org