Problems with Rosetta version 5.43

Message boards : Number crunching : Problems with Rosetta version 5.43

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · Next

AuthorMessage
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 650
Credit: 11,637,805
RAC: 799
Message 35287 - Posted: 22 Jan 2007, 11:59:25 UTC

I wasn't sure it was a client problem so started a thread to highlight the issue. I'm seeing exactly the same problem.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 35287 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tiago

Send message
Joined: 11 Jul 06
Posts: 55
Credit: 2,538,721
RAC: 0
Message 35315 - Posted: 22 Jan 2007, 17:14:01 UTC

I'm having the same problem here. Something is wrong with that wu.
ID: 35315 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile paulcsteiner

Send message
Joined: 15 Oct 05
Posts: 19
Credit: 3,120,898
RAC: 1,006
Message 35323 - Posted: 22 Jan 2007, 19:11:04 UTC

Ditto, I've gotten 15 client errors just today all with this message:

<core_client_version>5.4.11</core_client_version>
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 2035473
ERROR:: Exit at: .fragments.cc line:459

For this machine: 401324

ID: 35323 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Lynn

Send message
Joined: 13 Jan 07
Posts: 3
Credit: 2,766,009
RAC: 0
Message 35339 - Posted: 22 Jan 2007, 22:20:38 UTC - in response to Message 35323.  
Last modified: 22 Jan 2007, 22:24:32 UTC

Are you all using Windows or Linux? I've had no failures on 3 Windows XP Pro systems I am running, but I finally had to detach from Rosetta on *ALL* of my Linux systems (I have two running Ubuntu 6.06 and one running Ubuntu 6.10) as I was seeing nearly 90% failure and these systems were offering rosetta 50% of their time. I didn't mind the tasks that ran 10-15 seconds before failing, but one of these systems is a new P4 dual-core and I had WU's hogging 6 to 8 hours (or 4 days for one WU!!!!) before they failed. Better that I donate that CPU power to another project that produces something useful.

Here is the dual-core's result page: 398561

- Lynn
ID: 35339 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Lee

Send message
Joined: 18 Apr 06
Posts: 4
Credit: 36,335
RAC: 0
Message 35340 - Posted: 22 Jan 2007, 22:21:09 UTC

I am getting the following message:

2007/01/23 12:16:22 AM|rosetta@home|Note: not requesting new work or reporting results
2007/01/23 12:16:27 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
2007/01/23 12:16:31 AM|rosetta@home|Started download of PSH_0037_2761.loopfile
2007/01/23 12:16:33 AM|rosetta@home|Temporarily failed download of PSH_0037_2761.loopfile: error 500
2007/01/23 12:16:34 AM|rosetta@home|Started download of PSH_0037_2761.loopfile
2007/01/23 12:16:36 AM|rosetta@home|Temporarily failed download of PSH_0037_2761.loopfile: error 500
2007/01/23 12:16:37 AM|rosetta@home|Started download of PSH_0037_2761.loopfile
2007/01/23 12:16:39 AM|rosetta@home|Temporarily failed download of PSH_0037_2761.loopfile: error 500
2007/01/23 12:16:40 AM|rosetta@home|Started download of PSH_0037_2761.loopfile
2007/01/23 12:16:42 AM|rosetta@home|Temporarily failed download of PSH_0037_2761.loopfile: error 500
2007/01/23 12:16:42 AM|rosetta@home|Backing off 2 hours, 26 minutes, and 33 seconds on download of file PSH_0037_2761.loopfile

The WUs have been done & sitting on my machine for a while...

ID: 35340 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Lee

Send message
Joined: 18 Apr 06
Posts: 4
Credit: 36,335
RAC: 0
Message 35341 - Posted: 22 Jan 2007, 22:21:15 UTC
Last modified: 22 Jan 2007, 22:22:48 UTC

Oops message got duplicated.

Does anyone know why this happens?
ID: 35341 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile netwraith
Avatar

Send message
Joined: 3 Sep 06
Posts: 80
Credit: 13,483,227
RAC: 0
Message 35346 - Posted: 22 Jan 2007, 23:12:36 UTC - in response to Message 35282.  

Same with me. I had all 4 PSH_0131_looprlx work units I downloaded fail in a similar manner.


Same here... about 20 WU's total...


Looking for a team ??? Join BoincSynergy!!


ID: 35346 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Chu

Send message
Joined: 23 Feb 06
Posts: 120
Credit: 112,439
RAC: 0
Message 35356 - Posted: 23 Jan 2007, 1:30:32 UTC - in response to Message 35340.  

A bad batch (PSH_003?_looprlx...) slipped through and we were purging it from the database this morning. As it failed right away after it is started, we don't expect any lef out there now and the impact to RAC should be minimal. However, this should not be an excuse for making such a mistake and we are very sorry for causing this convenience to all the boinc users. Thank you for the reporting and your support.

I am getting the following message:

2007/01/23 12:16:22 AM|rosetta@home|Note: not requesting new work or reporting results
2007/01/23 12:16:27 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
2007/01/23 12:16:31 AM|rosetta@home|Started download of PSH_0037_2761.loopfile
2007/01/23 12:16:33 AM|rosetta@home|Temporarily failed download of PSH_0037_2761.loopfile: error 500
2007/01/23 12:16:34 AM|rosetta@home|Started download of PSH_0037_2761.loopfile
2007/01/23 12:16:36 AM|rosetta@home|Temporarily failed download of PSH_0037_2761.loopfile: error 500
2007/01/23 12:16:37 AM|rosetta@home|Started download of PSH_0037_2761.loopfile
2007/01/23 12:16:39 AM|rosetta@home|Temporarily failed download of PSH_0037_2761.loopfile: error 500
2007/01/23 12:16:40 AM|rosetta@home|Started download of PSH_0037_2761.loopfile
2007/01/23 12:16:42 AM|rosetta@home|Temporarily failed download of PSH_0037_2761.loopfile: error 500
2007/01/23 12:16:42 AM|rosetta@home|Backing off 2 hours, 26 minutes, and 33 seconds on download of file PSH_0037_2761.loopfile

The WUs have been done & sitting on my machine for a while...


ID: 35356 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Thomas F. Bates IV

Send message
Joined: 10 May 06
Posts: 5
Credit: 2,853,254
RAC: 0
Message 35357 - Posted: 23 Jan 2007, 2:08:07 UTC

I hated to do it, but I had to kill the rosetta process. The last 5.43 WU I had was stuck at 100% CPU usage even though it was marked as 100% complete. Oh well...just uploaded a "computation error"...
ID: 35357 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Chu

Send message
Joined: 23 Feb 06
Posts: 120
Credit: 112,439
RAC: 0
Message 35358 - Posted: 23 Jan 2007, 2:29:40 UTC - in response to Message 35357.  

Which one was it? I saw you have four hosts running R@H and if you can point me to the one you killed, it may give us a clue of what went wrong. Thanks.

I hated to do it, but I had to kill the rosetta process. The last 5.43 WU I had was stuck at 100% CPU usage even though it was marked as 100% complete. Oh well...just uploaded a "computation error"...


ID: 35358 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 35359 - Posted: 23 Jan 2007, 3:22:48 UTC - in response to Message 35339.  

Are you all using Windows or Linux? I've had no failures on 3 Windows XP Pro systems I am running, but I finally had to detach from Rosetta on *ALL* of my Linux systems (I have two running Ubuntu 6.06 and one running Ubuntu 6.10) as I was seeing nearly 90% failure and these systems were offering rosetta 50% of their time. I didn't mind the tasks that ran 10-15 seconds before failing, but one of these systems is a new P4 dual-core and I had WU's hogging 6 to 8 hours (or 4 days for one WU!!!!) before they failed. Better that I donate that CPU power to another project that produces something useful.

Here is the dual-core's result page: 398561

- Lynn

Lynn:
While I don't use Linux for crunching - do these machines run Rosetta fine if they're running Rosetta 100% of the time? We've had better luck with the windows machines if we set the keep in memory setting to yes. Perhaps someone can confirm or deny if that's been a problem with the Linux systems as well. (trying to eliminate the keep in memory problem and possible problems with interactions between Rosetta and whatever other Boinc app or apps take up the other 50%). Since you've got 3 linux machines with similar problems - have you tried a different version of Linux like Red Hat? (trying to rule out versions not able to identify your hardware properly) Or verified that the 3 haven't managed to get infected?
I'd ask if you'd tested the memory - but that wouldn't appear on 3 machines at once unless you recycled the ram from dead machines..

ID: 35359 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 35360 - Posted: 23 Jan 2007, 4:27:39 UTC
Last modified: 23 Jan 2007, 4:29:36 UTC

Chu, RE: Thomas Bates, looks like this wu. It is called 1ail__BOINC_NOFILTERS_ABRELAX_SAVE_ALL_OUT_NEWRELAXFLAGS_frags83__1505_1221_0

It SAYS the watchdog ended it, but apparently not so.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 35360 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile hedera
Avatar

Send message
Joined: 15 Jul 06
Posts: 76
Credit: 5,139,863
RAC: 905
Message 35361 - Posted: 23 Jan 2007, 4:34:10 UTC

I almost hate to say it (knock wood!) but my WinXP Pro machine has been happily cranking out successful computations, all day yesterday and again this evening for about an hour and a half. I don't know if this is because the problem didn't hit WinXP, or because my habit of turning the machine off during the day while I'm at work paid off by missing the bad batch of downloads. As I think about it, it's probably because I missed the bad batch...
--hedera

Never be afraid to try something new. Remember that amateurs built the ark. Professionals built the Titanic.

ID: 35361 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
cwc

Send message
Joined: 18 Dec 06
Posts: 1
Credit: 35,128
RAC: 0
Message 35369 - Posted: 23 Jan 2007, 10:44:19 UTC
Last modified: 23 Jan 2007, 11:03:39 UTC

I just discovered that I've got a "computation error" problem.

On two occasions just a short time apart, I opened full-screen graphics on a work unit that had just started. I didn't do anything but watch it for a while. Everything was OK during the initial backbone search, but within a minute or so of starting the all-atom search, the application froze.
(It may have frozen at the very start of the all-atom search. I'm not sure.)

On the first occasion, Windows put up a message saying the application wasn't responding, followed a few seconds later with the standard "send error report" message as the program closed. I don't remember whether I closed the graphics screen or it closed on its own.

By then the BOINC manager was running two other work units, one of them belonging to R@H, and the failed work unit showing a "computation error". I clicked the "update" button to clear it out. I then opened the graphics screen for the just-started unit, and watched to see what would happen.

Again, just after the all-atom search begain, I noticed that the application had frozen. And again, by the time the graphics screen closed, the BOINC manager was running another R@H work unit. This time I didn't open the graphic screen, and it ran to completion.

Here are the error messages for the two failed work units:

1/23/2007 01:01:14 AM|rosetta@home|Unrecoverable error for result 1shfA_BOINC_NOFILTERS_ABRELAX_SAVE_ALL_OUT_NEWRELAXFLAGS_frags83__1505_2579_0 ( - exit code 1073807364 (0x40010004))

1/23/2007 01:07:35 AM|rosetta@home|Unrecoverable error for result 1c9oA_BOINC_NOFILTERS_ABRELAX_SAVE_ALL_OUT_NEWRELAXFLAGS_frags83__1505_2891_0 ( - exit code -1073741819 (0xc0000005))

Here's my system info:
       Machine name: CWC-05
   Operating System: Windows XP Professional (5.1, Build 2600) 
                     Service Pack 2 (2600.xpsp_sp2_gdr.050301-1519)
           Language: English (Regional Setting: English)
System Manufacturer: INTEL_
       System Model: D945GNT_
               BIOS: Default System BIOS
          Processor: Intel(R) Pentium(R) D CPU 3.00GHz (2 CPUs)
             Memory: 2046MB RAM
          Page File: 947MB used, 2991MB available
        Windows Dir: C:WINDOWS
    DirectX Version: DirectX 9.0c (4.09.0000.0904)
DX Setup Parameters: Not found
     DxDiag Version: 5.03.2600.2180 32bit Unicode
      Graphics Card: NVIDIA GeForce 7900 GTX
       Screen Saver: Windows Star-Field

cwc
ID: 35369 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,670,291
RAC: 2,328
Message 35397 - Posted: 23 Jan 2007, 15:39:46 UTC

I got a compute error after 20 secs and I am at work with my home computer running. Check the error message on result # 58499325

Basicly this is the text:

<core_client_version>5.4.11</core_client_version>
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 2091953
ERROR:: Exit at: .fragments.cc line:459

</stderr_txt>


ID: 35397 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 11 Feb 06
Posts: 316
Credit: 6,589,590
RAC: 2,537
Message 35399 - Posted: 23 Jan 2007, 16:04:42 UTC
Last modified: 23 Jan 2007, 16:18:50 UTC

I am having serious errors, across all of my macs, running 5.8.* (some intel, some PPC). They started a day or two ago. Almost nothing is crunching successfully.

https://boinc.bakerlab.org/rosetta/result.php?resultid=58758493
https://boinc.bakerlab.org/rosetta/results.php?hostid=204292
https://boinc.bakerlab.org/rosetta/results.php?hostid=269065

Edit: made URL's clickable.
Reno, NV
Team: SETI.USA
ID: 35399 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stevea

Send message
Joined: 19 Dec 05
Posts: 50
Credit: 738,655
RAC: 0
Message 35410 - Posted: 23 Jan 2007, 19:23:08 UTC

This one lasted 16 seconds...

PSH_0144_looprlx_GP120_OD1_115_136_2663_1506_5
BETA = Bahhh

Way too many errors, killing both the credit & RAC.

And I still think the (New and Improved) credit system is not ready for prime time...
ID: 35410 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 11 Feb 06
Posts: 316
Credit: 6,589,590
RAC: 2,537
Message 35416 - Posted: 23 Jan 2007, 20:52:53 UTC - in response to Message 35399.  

I am having serious errors, across all of my macs, running 5.8.* (some intel, some PPC). They started a day or two ago. Almost nothing is crunching successfully.

https://boinc.bakerlab.org/rosetta/result.php?resultid=58758493
https://boinc.bakerlab.org/rosetta/results.php?hostid=204292
https://boinc.bakerlab.org/rosetta/results.php?hostid=269065

Edit: made URL's clickable.


I noticed that these are all error code -161. What's error code -161?
Reno, NV
Team: SETI.USA
ID: 35416 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Chu

Send message
Joined: 23 Feb 06
Posts: 120
Credit: 112,439
RAC: 0
Message 35421 - Posted: 23 Jan 2007, 22:03:48 UTC - in response to Message 35416.  

That is the error code for not transfering result files correctly, either because the result files are not generated or because the client is unable to send the result files back to the server correctly. If you have only experienced such a problem recently, I would suggest to reset the project on your hosts as the current application has not been changed since last December and the specific WUs are returning valid results from other hosts. Seem like some communication issue between your host and the server, but I am not exactly sure what is causing that.

I am having serious errors, across all of my macs, running 5.8.* (some intel, some PPC). They started a day or two ago. Almost nothing is crunching successfully.

https://boinc.bakerlab.org/rosetta/result.php?resultid=58758493
https://boinc.bakerlab.org/rosetta/results.php?hostid=204292
https://boinc.bakerlab.org/rosetta/results.php?hostid=269065

Edit: made URL's clickable.


I noticed that these are all error code -161. What's error code -161?


ID: 35421 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Chu

Send message
Joined: 23 Feb 06
Posts: 120
Credit: 112,439
RAC: 0
Message 35422 - Posted: 23 Jan 2007, 22:11:39 UTC - in response to Message 35369.  

Thanks for the reporting. This is a well known graphics problem for the current rosetta application and we are working on a fix right now. Before it is fixed, please try to leave the graphic off. For details, read here
I just discovered that I've got a "computation error" problem.

On two occasions just a short time apart, I opened full-screen graphics on a work unit that had just started. I didn't do anything but watch it for a while. Everything was OK during the initial backbone search, but within a minute or so of starting the all-atom search, the application froze.
(It may have frozen at the very start of the all-atom search. I'm not sure.)

On the first occasion, Windows put up a message saying the application wasn't responding, followed a few seconds later with the standard "send error report" message as the program closed. I don't remember whether I closed the graphics screen or it closed on its own.

By then the BOINC manager was running two other work units, one of them belonging to R@H, and the failed work unit showing a "computation error". I clicked the "update" button to clear it out. I then opened the graphics screen for the just-started unit, and watched to see what would happen.

Again, just after the all-atom search begain, I noticed that the application had frozen. And again, by the time the graphics screen closed, the BOINC manager was running another R@H work unit. This time I didn't open the graphic screen, and it ran to completion.

Here are the error messages for the two failed work units:

1/23/2007 01:01:14 AM|rosetta@home|Unrecoverable error for result 1shfA_BOINC_NOFILTERS_ABRELAX_SAVE_ALL_OUT_NEWRELAXFLAGS_frags83__1505_2579_0 ( - exit code 1073807364 (0x40010004))

1/23/2007 01:07:35 AM|rosetta@home|Unrecoverable error for result 1c9oA_BOINC_NOFILTERS_ABRELAX_SAVE_ALL_OUT_NEWRELAXFLAGS_frags83__1505_2891_0 ( - exit code -1073741819 (0xc0000005))

Here's my system info:
       Machine name: CWC-05
   Operating System: Windows XP Professional (5.1, Build 2600) 
                     Service Pack 2 (2600.xpsp_sp2_gdr.050301-1519)
           Language: English (Regional Setting: English)
System Manufacturer: INTEL_
       System Model: D945GNT_
               BIOS: Default System BIOS
          Processor: Intel(R) Pentium(R) D CPU 3.00GHz (2 CPUs)
             Memory: 2046MB RAM
          Page File: 947MB used, 2991MB available
        Windows Dir: C:WINDOWS
    DirectX Version: DirectX 9.0c (4.09.0000.0904)
DX Setup Parameters: Not found
     DxDiag Version: 5.03.2600.2180 32bit Unicode
      Graphics Card: NVIDIA GeForce 7900 GTX
       Screen Saver: Windows Star-Field

cwc


ID: 35422 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · Next

Message boards : Number crunching : Problems with Rosetta version 5.43



©2024 University of Washington
https://www.bakerlab.org