Report Problems with Rosetta Version 5.25

Message boards : Number crunching : Report Problems with Rosetta Version 5.25

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 12 · Next

AuthorMessage
Bin Qian

Send message
Joined: 13 Jul 05
Posts: 33
Credit: 36,897
RAC: 0
Message 19720 - Posted: 2 Jul 2006, 23:07:56 UTC - in response to Message 19591.  


Sharp eye! You are totally right - t103 is not in casp7. It's the N terminal domain of T303. The same is true for t130, the N terminal domain of T330.

I just got this WU. Its named Fra_t103_Casp7... I assume that is a typo and probably means t303 since no t103 exists in CASP7.


ID: 19720 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
_heinz

Send message
Joined: 30 Jun 06
Posts: 24
Credit: 38,697
RAC: 0
Message 19726 - Posted: 3 Jul 2006, 9:06:24 UTC

next error:
03.07.2006 10:42:58|rosetta@home|Unrecoverable error for result FRA_t329_CASP7_hom001_6_t329_6_2ah5A_IGNORE_THE_REST_717_858_6_0 (<file_xfer_error> <file_name>FRA_t329_CASP7_hom001_6_t329_6_2ah5A_IGNORE_THE_REST_717_858_6_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
26796880
ID: 19726 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
[AF>Linux]Arnaud
Avatar

Send message
Joined: 17 Sep 05
Posts: 38
Credit: 10,490
RAC: 0
Message 19731 - Posted: 3 Jul 2006, 14:28:39 UTC
Last modified: 3 Jul 2006, 14:30:05 UTC

Hi,
I have the WARNING! attempt to gzip file ./xxt319.out failed error:
https://boinc.bakerlab.org/rosetta/result.php?resultid=26050268
https://boinc.bakerlab.org/rosetta/result.php?resultid=26345915

I've also several SIGSEGV: segmentation violation messages but the wus are validated:
https://boinc.bakerlab.org/rosetta/result.php?resultid=26720623

Linux Mandriva and Suse 10, Boinc 5.4.9.

Arnaud
ID: 19731 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile cmds
Avatar

Send message
Joined: 29 Jun 06
Posts: 13
Credit: 41,811
RAC: 0
Message 19733 - Posted: 3 Jul 2006, 15:55:08 UTC - in response to Message 19719.  
Last modified: 3 Jul 2006, 15:59:04 UTC

Thanks for reporting this error! The jobs have been deleted from the queue, but if you still got those queued up on your system, please feel free to delete them. Sorry!

About a third of the 1000 starting structures were not recognized by the rosetta application for this WU. When that happened, Rosetta will finish within a couple of minutes of starting. Since no .out.gz result file was produced, boinc reports a "file not found" error. Apparently those bad starting structures were not included in our ralph test of this WU where only a samll subset of the starting structures were used.

We will remove the bad starting structures and resend the jobs. Thanks again for reporting this!

I also had FRA_t329* WU crash on WinXP. Perhaps the project should delete them from the queue?



On my Linux Host the WU xxx329.xxx produced valid result!
Linux FedoraCore 5/Boinc 5.4.9/App. 5.25
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=22790026

Chris



ID: 19733 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
senatoralex85

Send message
Joined: 27 Sep 05
Posts: 66
Credit: 169,644
RAC: 0
Message 19735 - Posted: 3 Jul 2006, 16:34:54 UTC
Last modified: 3 Jul 2006, 16:38:30 UTC

I have never had this problem until now. Whenever I return results, It will not automatically download workunits. Even after I manually update the project, it still says no work......

7/3/2006 11:26:12 AM||request_reschedule_cpus: project op
7/3/2006 11:26:14 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
7/3/2006 11:26:14 AM|rosetta@home|Requesting 8640 seconds of work, returning 0 results
7/3/2006 11:26:15 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
7/3/2006 11:26:15 AM|rosetta@home|Message from server: Not sending work - last request too recent: 20 sec
7/3/2006 11:26:15 AM|rosetta@home|No work from project
7/3/2006 11:26:16 AM|rosetta@home|Deferring communication with project for 4 minutes and 1 seconds

5 Minutes Later

7/3/2006 11:30:32 AM|rosetta@home|Starting result t330__CASP7_ABRELAX_SAVE_ALL_OUT_ncap_hom001__762_2301_0 using rosetta version 5.25


Usually once I return a result it automatically downloads another workunit. Why is there a 5 minute delay?
ID: 19735 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 645
Credit: 11,328,825
RAC: 120
Message 19740 - Posted: 3 Jul 2006, 19:24:04 UTC
Last modified: 3 Jul 2006, 19:24:33 UTC

Bombed out quickly by me, and another.

227928208
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 19740 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile duanra

Send message
Joined: 12 Feb 06
Posts: 8
Credit: 36,223
RAC: 0
Message 19741 - Posted: 3 Jul 2006, 20:10:36 UTC

I've got this message for an error with rosetta 5.25 :

3/07/2006 21:52:07|rosetta@home|Unrecoverable error for result FRA_t329_CASP7_hom001_6_t329_6_2ah5A_IGNORE_THE_REST_814_858_3_0 (<file_xfer_error> <file_name>FRA_t329_CASP7_hom001_6_t329_6_2ah5A_IGNORE_THE_REST_814_858_3_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)

If it can help the techs .... there it is !

I don't know if the problem comes from the application or from the workunit.

Hoping to help the project.
Duanra
ID: 19741 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RichardJ

Send message
Joined: 19 Mar 06
Posts: 8
Credit: 73,014
RAC: 0
Message 19743 - Posted: 3 Jul 2006, 21:01:35 UTC

I got this after a few seconds:
03/07/2006 19:43:15|rosetta@home|Unrecoverable error for result FRA_t329_CASP7_hom001_6_t329_6_2ah5A_IGNORE_THE_REST_486_852_8_1 (<file_xfer_error> <file_name>FRA_t329_CASP7_hom001_6_t329_6_2ah5A_IGNORE_THE_REST_486_852_8_1_0</file_name> <error_code>-161</error_code></file_xfer_error>)

ID: 19743 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TCU Computer Science

Send message
Joined: 7 Dec 05
Posts: 28
Credit: 12,861,977
RAC: 0
Message 19744 - Posted: 3 Jul 2006, 22:00:42 UTC

WU ID 22634421

It's been running for three days.
The messages show it being paused and resumed at one hour intervals.
But the accumulated time was stuck at 1 hr 55 mins.

Stopped and restarted boinc and the accumulated time is now increasing.

This occurred on a Linux box.
ID: 19744 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 19767 - Posted: 4 Jul 2006, 15:47:34 UTC

This VU got a error after running bechmarks.

https://boinc.bakerlab.org/rosetta/result.php?resultid=27037409

Anders n
ID: 19767 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ian

Send message
Joined: 14 Apr 06
Posts: 29
Credit: 25,252
RAC: 0
Message 19785 - Posted: 5 Jul 2006, 0:50:49 UTC
Last modified: 5 Jul 2006, 0:52:33 UTC

ID: 19785 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
_heinz

Send message
Joined: 30 Jun 06
Posts: 24
Credit: 38,697
RAC: 0
Message 19789 - Posted: 5 Jul 2006, 8:26:02 UTC

next error:
05.07.2006 09:42:11|rosetta@home|Unrecoverable error for result FRA_t329_CASP7_hom001_6_t329_6_2ah5A_IGNORE_THE_REST_154_852_12_1 (<file_xfer_error> <file_name>FRA_t329_CASP7_hom001_6_t329_6_2ah5A_IGNORE_THE_REST_154_852_12_1_0</file_name> <error_code>-161</error_code></file_xfer_error>)
26807553
ID: 19789 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AciDucK

Send message
Joined: 22 Mar 06
Posts: 1
Credit: 350,586
RAC: 0
Message 19790 - Posted: 5 Jul 2006, 10:35:30 UTC

I have a graphical problem with all the workunits. I have a laptop with widescreen (1280*800), and when I see the graphics in full screen or using the screensaver it is cropped and the bottom line is missing. Here is a screenshoot:


It only happens when I suspend a WU and resume it. When a new WU starts it fits the resulotion, but if I exit the screensaver and let it start again, the graphics don't fit anymore.
ID: 19790 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dag
Avatar

Send message
Joined: 16 Dec 05
Posts: 106
Credit: 1,000,020
RAC: 0
Message 19802 - Posted: 5 Jul 2006, 16:35:41 UTC
Last modified: 5 Jul 2006, 16:36:05 UTC

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=22687129 errored out twice.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=22690826 errored out twice.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=22690311 errored out twice.



dag
--Finding aliens is cool, but understanding the structure of proteins is useful.
ID: 19802 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
GooseWingman

Send message
Joined: 20 Jun 06
Posts: 3
Credit: 1,798,613
RAC: 0
Message 19838 - Posted: 6 Jul 2006, 17:46:07 UTC

I don't know if this is really related to version 5.25 or not since I notice about 25 WU at 5.24 and 25 at 5.25. I was gone for the long weekend and didn't have internet connection to my workstation. I had set it for 7 days communication thinking it would be fine for me to fix the DSL once I got back.

Now that it's on the Internet, I can get new jobs but can't upload. Tons of work has missed the deadline and I'm really bummed. Here's part of the error when I manually click Update:

7/6/2006 10:18:56 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
7/6/2006 10:18:56 AM|rosetta@home|Reason: Requested by user
7/6/2006 10:18:56 AM|rosetta@home|Reporting 52 tasks
7/6/2006 10:20:44 AM||Project communication failed: attempting access to reference site
7/6/2006 10:20:45 AM||Access to reference site succeeded - project servers may be temporarily down.
7/6/2006 10:20:46 AM|rosetta@home|Scheduler request failed: failed sending data to the peer
7/6/2006 10:20:46 AM|rosetta@home|Deferring scheduler requests for 1 minutes and 0 seconds
7/6/2006 10:22:20 AM||Resuming computation

I looked all over the forums. I can get to the site from this computer, I can ping and tracert with no problem. Windows Firewall and third-party firewall are both off. Rebooted tons of times. Help please!
ID: 19838 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 7 Oct 05
Posts: 234
Credit: 15,020
RAC: 0
Message 19864 - Posted: 7 Jul 2006, 5:45:21 UTC

This WU crashed because of the graphics froze. I opened the graphics and then became busy with something else, and when I got back the graphic had frozen so I had to close the window through Window's tasklist, and then the WU crashed :-(

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=23275677

Result: https://boinc.bakerlab.org/rosetta/result.php?resultid=27318933

They seem a little fragile, those WU's.


[b]"I'm trying to maintain a shred of dignity in this world." - Me[/b]

ID: 19864 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Infrastructure Machines Background Time

Send message
Joined: 15 Jun 06
Posts: 1
Credit: 6,050,823
RAC: 0
Message 19866 - Posted: 7 Jul 2006, 6:47:37 UTC

One of my machines picked up t348__CASP7_ABRELAX_SAVE_ALL_OUT_hishom034__864_180_0 on 3-JUL. Has been working (evenings, off hours) ever since. BOINC manager shows no CPU time accumulating, 1.020% complete. I'm going to try aborting/reset.
ID: 19866 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Andreas

Send message
Joined: 23 May 06
Posts: 52
Credit: 2,113,291
RAC: 61
Message 19870 - Posted: 7 Jul 2006, 7:24:06 UTC

This one 23108760 exited with zero status after 3 hours 56 minutes
ID: 19870 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sam Miorelli

Send message
Joined: 16 Feb 06
Posts: 7
Credit: 1,303,044
RAC: 0
Message 19878 - Posted: 7 Jul 2006, 11:30:13 UTC

I just had 23385112 die on my PowerMac G4 500Mhz machine this morning. It's notable because I almost never see a Rosetta WU die on it. It died whie the graphics were running on the screensaver, but didn't do the crazy stuff that happens when the screen saver crashes on Windows PCs. This was the error message given:

Fri Jul 7 07:16:07 2006|rosetta@home|Unrecoverable error for result t335__CASP7_ABRELAX_SAVE_ALL_OUT_nohistag_hom001__778_34490_0 (process exited with code 131 (0x83))

Also, along the lines of the crash Fuzzy Hollynoodles reported, has there been any resolution to the crashing screensaver issue for Windows XP machines? I had a Prescott machine that almost every Rosetta WU died on it for V. 5.16 to 5.22 so I stopped crunching Rosetta on it. If this has been fixed then I might reattach on that one.

ID: 19878 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile eNDo

Send message
Joined: 9 Apr 06
Posts: 9
Credit: 372,288
RAC: 0
Message 19909 - Posted: 8 Jul 2006, 3:17:31 UTC

I've had nothing but problems since 5.25's release. Many WU's will just stop at any given percentage and goto 0 CPU load and sit there until I do something not counting wasted hours. This has happened across all my machines, many times. I generally have to reboot the machine and it will error out. Sometimes restarting the service will work. I anxiously await 5.26.
ID: 19909 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 12 · Next

Message boards : Number crunching : Report Problems with Rosetta Version 5.25



©2023 University of Washington
https://www.bakerlab.org