Problems with Rosetta version 5.45

Message boards : Number crunching : Problems with Rosetta version 5.45

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 36139 - Posted: 5 Feb 2007, 4:36:10 UTC - in response to Message 36138.  
Last modified: 5 Feb 2007, 4:38:01 UTC

What's this mean ?

2/4/2007 11:25:38 PM||[error] Can't create HTTP response output file projects/boinc.bakerlab.org_rosetta/rosetta_5.45_windows_intelx86.exe
2/4/2007 11:25:44 PM||[error] Can't create HTTP response output file projects/boinc.bakerlab.org_rosetta/rosetta_5.45_windows_intelx86.exe
2/4/2007 11:25:50 PM||[error] Can't create HTTP response output file projects/boinc.bakerlab.org_rosetta/rosetta_5.45_windows_intelx86.exe
2/4/2007 11:25:56 PM||[error] Can't create HTTP response output file projects/boinc.bakerlab.org_rosetta/rosetta_5.45_windows_intelx86.exe
2/4/2007 11:25:56 PM|rosetta@home|Backing off 10 min 27 sec on download of file rosetta_5.45_windows_intelx86.exe


It means you have some weird problem with the automatic download of the new Rosetta version. I'd say leave it and see if it all sorts itself out after a retry or three - the software at your end is obviously backing off for 10mins to see if the issue resolves itself, but those who know more than I do may advise differently.

R~~

ps - like your profile - I too am old enuff to remember punch cards
ID: 36139 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Chu

Send message
Joined: 23 Feb 06
Posts: 120
Credit: 112,439
RAC: 0
Message 36162 - Posted: 5 Feb 2007, 17:27:37 UTC - in response to Message 36137.  

Thanks for the report, River. When this happened, did you happend to see whether the cpu run time was stilled being incremented? I agree with you it definitely looks like a bug somewhere, but not graphic related. I am wondering if this only happens on linux platforms or everywhere else.

The 'stuck at 100%' bug has returned with this result here.

The prefferred run time had just been cut from 24hrs to 1hr to encourage Rosetta to make way for LHC (which rarely has work and which I therefore give highest priroity when it does have some), but instead this result hung having reached its new completion point.

I don't know if I provoked it, or if it would have happened anyway at the end of the original run length. Either way I'd say it is a bug, tho obviously less serious of it only occurs with a shortened run.

For others who see this, the best fix I have found is to stop BOINC and restart it, which then pushes the stuck task to start uploading.

edit add:

BTW - in response to your question in the first posting in this thread, this box has no graphics (not even an X-server) so it is not a gfx bug (unless the bug is that the windup code goes looking for the gfx...)

edit 2 add

and two more examples here and here, all different boxes, all running Linux, all stopped at 100% after run time shortened.

This is clearly relevant as it caused the watchdog message to appear, but what I still say is a bug is that the watchdog seems to make the result stick instead of ending properly.

R~~


ID: 36162 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Eric Roellig

Send message
Joined: 5 Feb 07
Posts: 1
Credit: 0
RAC: 0
Message 36165 - Posted: 5 Feb 2007, 18:00:45 UTC

Hi all.

I've been looking at this project for a while and finally got a machine new & big enough to run it.

I just installed boinc and rosetta. It has downloaded three items to crunch, but each one "finishes" right away with s status of "Computation error".

I'm running Fedora Core 6 on an HP Media Center m7060n. Both KDE and GNOME are installed. I get the feeling that something in my computing environment isn't up-to-snuff....

A cut-&-paste from the client terminal

2007-02-05 11:49:41 [---] Starting BOINC client version 5.4.11 for i686-pc-linux-gnu
2007-02-05 11:49:41 [---] libcurl/7.16.0 OpenSSL/0.9.8d zlib/1.2.3
2007-02-05 11:49:41 [---] Data directory: /home/rosetta/download/boinc/BOINC
2007-02-05 11:49:41 [---] Processor: 2 GenuineIntel Intel(R) Pentium(R) 4 CPU 3.00GHz
2007-02-05 11:49:41 [---] Memory: 1002.52 MB physical, 3.91 GB virtual
2007-02-05 11:49:41 [---] Disk: 54.34 GB total, 51.15 GB free
2007-02-05 11:49:41 [rosetta@home] URL: https://boinc.bakerlab.org/rosetta/; Computer ID:
s: default
2007-02-05 11:49:41 [---] No general preferences found - using BOINC defaults
2007-02-05 11:49:41 [---] Local control only allowed
2007-02-05 11:49:41 [---] Listening on port 31416
2007-02-05 11:49:41 [rosetta@home] Started download of file rhhsm6splthh6d.ljatr.etable.
2007-02-05 11:49:41 [rosetta@home] Started download of file rhhsm6splthh6d.dljatr.etable
2007-02-05 11:49:43 [rosetta@home] Finished download of file rhhsm6splthh6d.ljatr.etable
2007-02-05 11:49:43 [rosetta@home] Throughput 34212 bytes/sec
2007-02-05 11:49:43 [rosetta@home] Started download of file rhhsm6splthh6d.dljrep.etable
2007-02-05 11:49:56 [rosetta@home] Finished download of file rhhsm6splthh6d.dljatr.etabl
2007-02-05 11:49:56 [rosetta@home] Throughput 33505 bytes/sec
2007-02-05 11:49:56 [rosetta@home] Finished download of file rhhsm6splthh6d.dljrep.etabl
2007-02-05 11:49:56 [rosetta@home] Throughput 61962 bytes/sec
2007-02-05 11:49:56 [rosetta@home] Started download of file rhhsm6splthh6d.solv1.etable.
2007-02-05 11:49:56 [rosetta@home] Started download of file rhhsm6splthh6d.solv2.etable.
2007-02-05 11:50:30 [rosetta@home] Finished download of file rhhsm6splthh6d.solv1.etable .gz
2007-02-05 11:50:30 [rosetta@home] Throughput 47849 bytes/sec
2007-02-05 11:50:30 [rosetta@home] Finished download of file rhhsm6splthh6d.solv2.etable .gz
2007-02-05 11:50:30 [rosetta@home] Throughput 48486 bytes/sec
2007-02-05 11:50:30 [rosetta@home] Started download of file rhhsm6splthh6d.dsolv.etable. gz
2007-02-05 11:50:51 [rosetta@home] Finished download of file rhhsm6splthh6d.dsolv.etable .gz
2007-02-05 11:50:51 [rosetta@home] Throughput 95664 bytes/sec
2007-02-05 11:50:52 [---] Rescheduling CPU: files downloaded
2007-02-05 11:50:52 [rosetta@home] Starting task 1bq9A_BOINC_ABRELAX_NEWRELAXFLAGS_frags 83__1521_8771_0 using rosetta version 545
2007-02-05 11:50:53 [rosetta@home] Unrecoverable error for result 1bq9A_BOINC_ABRELAX_NE WRELAXFLAGS_frags83__1521_8771_0 (process exited with code 127 (0x7f))
2007-02-05 11:50:53 [rosetta@home] Unrecoverable error for result 1bq9A_BOINC_ABRELAX_NE WRELAXFLAGS_frags83__1521_8771_0 (process exited with code 127 (0x7f))
2007-02-05 11:50:53 [---] Rescheduling CPU: application exited
2007-02-05 11:50:53 [rosetta@home] Computation for task 1bq9A_BOINC_ABRELAX_NEWRELAXFLAG S_frags83__1521_8771_0 finished


a Cut-&-Paste from the manager terminal

[rosetta@robin BOINC]$ ./run_manager

Gdk-WARNING **: Missing charsets in FontSet creation


Gdk-WARNING **: JISX0208.1983-0


Gdk-WARNING **: KSC5601.1987-0


Gdk-WARNING **: GB2312.1980-0


Gdk-WARNING **: JISX0201.1976-0

send: -1
send: Bad file descriptor
connect: Operation now in progress

Pointers to where I can RTFM to get going are welcome.

Eric

Eric Roellig
ID: 36165 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile EdMulock
Avatar

Send message
Joined: 14 Mar 06
Posts: 30
Credit: 2,347,485
RAC: 0
Message 36166 - Posted: 5 Feb 2007, 18:09:38 UTC - in response to Message 36139.  

What's this mean ?

Complete reinstall - including delete BOINC directory - re link to project did the trick. Thanks.

Remember when people with BIG HANDS got all the promotions ?


It means you have some weird problem with the automatic download of the new Rosetta version. I'd say leave it and see if it all sorts itself out after a retry or three - the software at your end is obviously backing off for 10mins to see if the issue resolves itself, but those who know more than I do may advise differently.

R~~

ps - like your profile - I too am old enuff to remember punch cards


ID: 36166 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 652
Credit: 11,662,550
RAC: 1,276
Message 36191 - Posted: 6 Feb 2007, 8:12:07 UTC
Last modified: 6 Feb 2007, 8:17:36 UTC

Lost 2 wu's with the same error. Both giving...
core_client_version>5.8.8</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 2278022
# cpu_run_time_pref: 10800
ERROR:: Exit at: .refold.cc line:337

</stderr_txt>
]]>

... both arrived within a couple of hours and were crunched on different machines. The other commonality is the wu name, both were FRA_z020_STRUCTURAL_GENOMICS... etc wu's.

Both machines are running 5.8.8 core, leave in memory set, no graphics.
I too am old enuff to remember punch cards
Ah, good times huh? Punched cards, paper tape. A program patch was done with scissors and sticky tape. I recall we had a mechanical card sorter, so WHEN you dropped your card deck, the machine would put them back into order determined by the line numbers in the last 8 columns of your Fortran-IV program.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 36191 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile netwraith
Avatar

Send message
Joined: 3 Sep 06
Posts: 80
Credit: 13,483,227
RAC: 0
Message 36243 - Posted: 7 Feb 2007, 14:02:44 UTC
Last modified: 7 Feb 2007, 14:16:48 UTC

--

With my Linux client (standard 5.4.11), I am no longer getting the adjusted granted credit.

For this computer...
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=299336

My claim and grant are identical... (Has not happened before)
https://boinc.bakerlab.org/rosetta/result.php?resultid=61261566
Claimed credit 38.2147647381899
Granted credit 38.2147647381899

https://boinc.bakerlab.org/rosetta/result.php?resultid=61220331
Claimed credit 37.0220763114969
Granted credit 60.7551059618435

This seems to be happening to a lot of my hosts, and only the latest results..

Needless to say, I am not using any form of optimised client and am afraid that this issue, if intentional, might lead to another round of high claiming clients....

*EDIT* I went a bit deeper into my results, and anything being report from about 4:30AM UTC to around 9AM UTC had identical CLAIM/GRANT numbers...

What happened ???

I suppose that there are much more pressing issues than this... Don't spend a bunch of time on my account...




Looking for a team ??? Join BoincSynergy!!


ID: 36243 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 36245 - Posted: 7 Feb 2007, 14:32:54 UTC
Last modified: 7 Feb 2007, 14:33:53 UTC

The credit issue with the new server BOINC code has been corrected now. See post
Rosetta Moderator: Mod.Sense
ID: 36245 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Christoph

Send message
Joined: 10 Dec 05
Posts: 57
Credit: 1,512,386
RAC: 0
Message 36297 - Posted: 8 Feb 2007, 19:30:07 UTC

This one got stuck and has been aborted by the watchdog.
Claimed Credit: 2.67
Granted Credit: 20
:o
ID: 36297 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile netwraith
Avatar

Send message
Joined: 3 Sep 06
Posts: 80
Credit: 13,483,227
RAC: 0
Message 36308 - Posted: 8 Feb 2007, 21:03:41 UTC

--

https://boinc.bakerlab.org/rosetta/result.php?resultid=61387146

This one went to 100%, was still in run mode and stopped accumulating time...

I restarted boinc and it then continued, resetting to 47%...

I am not sure what behavior this indicates, but, figured I would document it anyway...
Looking for a team ??? Join BoincSynergy!!


ID: 36308 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile netwraith
Avatar

Send message
Joined: 3 Sep 06
Posts: 80
Credit: 13,483,227
RAC: 0
Message 36363 - Posted: 9 Feb 2007, 4:07:45 UTC - in response to Message 36308.  

--

https://boinc.bakerlab.org/rosetta/result.php?resultid=61387146

This one went to 100%, was still in run mode and stopped accumulating time...

I restarted boinc and it then continued, resetting to 47%...

I am not sure what behavior this indicates, but, figured I would document it anyway...


*UPDATE*

Noticing lots of anomolies with W.U.'s with this pattern.... Hangs, SEGV's, and short runs....

DOC_????_fixbb_????


Looking for a team ??? Join BoincSynergy!!


ID: 36363 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MattDavis
Avatar

Send message
Joined: 22 Sep 05
Posts: 206
Credit: 1,377,748
RAC: 0
Message 36365 - Posted: 9 Feb 2007, 4:49:07 UTC - in response to Message 36363.  

--

https://boinc.bakerlab.org/rosetta/result.php?resultid=61387146

This one went to 100%, was still in run mode and stopped accumulating time...

I restarted boinc and it then continued, resetting to 47%...

I am not sure what behavior this indicates, but, figured I would document it anyway...


*UPDATE*

Noticing lots of anomolies with W.U.'s with this pattern.... Hangs, SEGV's, and short runs....

DOC_????_fixbb_????




https://boinc.bakerlab.org/rosetta/forum_thread.php?id=2883
ID: 36365 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 36400 - Posted: 9 Feb 2007, 18:55:37 UTC - in response to Message 36365.  

Dedicated BOINC machine, no other processes:


Result ID 61735972
Name DOC_1EO8_R070207_pose_u_global_search_fixbb_1549_713_0
Workunit 54954229
Created 9 Feb 2007 7:06:19 UTC
Sent 9 Feb 2007 7:11:54 UTC
Received 9 Feb 2007 17:11:35 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 358920
Report deadline 19 Feb 2007 7:11:54 UTC
CPU time 18121.765625
stderr out <core_client_version>5.4.11</core_client_version>
<stderr_txt>
# random seed: 2083803
# cpu_run_time_pref: 21600
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score 0 for 3600 seconds
**********************************************************************
GZIP SILENT FILE: .dd1EO8.out

</stderr_txt>


Validate state Valid
Claimed credit 66.5583529721271
Granted credit 20
application version 5.45
ID: 36400 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Trog Dog
Avatar

Send message
Joined: 25 Nov 05
Posts: 129
Credit: 57,345
RAC: 0
Message 36413 - Posted: 10 Feb 2007, 2:26:22 UTC - in response to Message 36165.  

Hi all.

I've been looking at this project for a while and finally got a machine new & big enough to run it.

I just installed boinc and rosetta. It has downloaded three items to crunch, but each one "finishes" right away with s status of "Computation error".

I'm running Fedora Core 6 on an HP Media Center m7060n. Both KDE and GNOME are installed. I get the feeling that something in my computing environment isn't up-to-snuff....

A cut-&-paste from the client terminal

2007-02-05 11:50:53 [rosetta@home] Unrecoverable error for result 1bq9A_BOINC_ABRELAX_NE WRELAXFLAGS_frags83__1521_8771_0 (process exited with code 127 (0x7f))
2007-02-05 11:50:53 [rosetta@home] Unrecoverable error for result 1bq9A_BOINC_ABRELAX_NE WRELAXFLAGS_frags83__1521_8771_0 (process exited with code 127 (0x7f))


Pointers to where I can RTFM to get going are welcome.

Eric


G'day Eric

The 127 error code is a file not found code which normally means a missing library or dependency. It could also mean that the rosetta app does not have executable bit set. From a terminal window run the ldd command on the rosetta app and then on boincmgr and note what packages/libraries that you are missing.

ldd /full/path/to/rosetta_5.43_i686-pc-linux-gnu


and

ldd /full/path/to/boincmgr


substituting the correct path for /full/path/to/
ID: 36413 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael.L

Send message
Joined: 12 Nov 06
Posts: 67
Credit: 31,295
RAC: 0
Message 36421 - Posted: 10 Feb 2007, 9:25:31 UTC

Result ID 61729110
Name DOC_1DQJ_R070207_pose_u_global_search_fixbb_1549_663_0
Workunit 54947927
Created 9 Feb 2007 6:07:35 UTC
Sent 9 Feb 2007 6:12:48 UTC
Received 10 Feb 2007 8:42:19 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 410873
Report deadline 19 Feb 2007 6:12:48 UTC
CPU time 3591.15625
stderr out <core_client_version>5.4.11</core_client_version>
<stderr_txt>
# random seed: 2083863
# cpu_run_time_pref: 14400
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score 0 for 3600 seconds
**********************************************************************
GZIP SILENT FILE: .dd1DQJ.out

</stderr_txt>


Validate state Valid
Claimed credit 10.9597211979944
Granted credit 20
application version 5.45

ID: 36421 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 11 Oct 05
Posts: 150
Credit: 3,818,279
RAC: 888
Message 36429 - Posted: 10 Feb 2007, 12:45:17 UTC

> Getting the same problem I had over at Ralph with the stuck for too long at zero.

<core_client_version>5.4.11</core_client_version>
<stderr_txt>
# random seed: 2084050
# cpu_run_time_pref: 21600
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score 0 for 3600 seconds
**********************************************************************
GZIP SILENT FILE: .dd1CHO.out

https://boinc.bakerlab.org/rosetta/result.php?resultid=61657043
https://boinc.bakerlab.org/rosetta/result.php?resultid=61657041
https://boinc.bakerlab.org/rosetta/result.php?resultid=61596422
https://boinc.bakerlab.org/rosetta/result.php?resultid=61556930

On Ralph it was my Linux machine that had this problem for 6 Wu's now my Windows machines are having the same problem on Rosetta.
ID: 36429 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 36432 - Posted: 10 Feb 2007, 15:17:38 UTC

There were some problems with a batch of DOC WUs being ended by the watchdog. These WUs have now been removed from the server. Further description and symptoms in the link below:

quoting Chu Feb 9th, "...those [DOC] WUs have been temporarily removed from the queue...To further help us track down the problem, could you please report what kind of platform your host is? It is definitely happening on linux, what about the rest of you?"
Rosetta Moderator: Mod.Sense
ID: 36432 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael.L

Send message
Joined: 12 Nov 06
Posts: 67
Credit: 31,295
RAC: 0
Message 36433 - Posted: 10 Feb 2007, 15:24:27 UTC
Last modified: 10 Feb 2007, 15:25:27 UTC

Stuck WU. Msg 36421. Windows XP 2 Home. BOINC 5.4.1.
ID: 36433 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 36440 - Posted: 10 Feb 2007, 16:05:08 UTC - in response to Message 36432.  
Last modified: 10 Feb 2007, 16:06:27 UTC

Compaq sr2030nx. AMD A64 3800+. 1 GB RAM. Win XP MC 2005.
BOINC 5.4.11

Result ID 61735972
Name DOC_1EO8_R070207_pose_u_global_search_fixbb_1549_713_0
Workunit 54954229

**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score 0 for 3600 seconds
**********************************************************************

To further help us track down the problem, could you please report what kind of platform your host is? It is definitely happening on linux, what about the rest of you?"

ID: 36440 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 36464 - Posted: 11 Feb 2007, 7:25:16 UTC - in response to Message 36162.  


Sorry, I didn't take note of that... will let you know next time it happens.

I would add that this is not an every time bug, even on Linux, as I had cleared out the Rosetta work from 6 Linux boxes and 3 win2k using the same technique, and this bug only arose on 2 boxes. It could be coincidence that both failures occurred on Linux

Thanks for the report, River. When this happened, did you happend to see whether the cpu run time was stilled being incremented? I agree with you it definitely looks like a bug somewhere, but not graphic related. I am wondering if this only happens on linux platforms or everywhere else.

The 'stuck at 100%' bug has returned ...


ID: 36464 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 36465 - Posted: 11 Feb 2007, 7:29:51 UTC

and the stuck soon after start bug has also returned. This result hung for eight hours at 1min 16sec before I noticed it and aborted it. The next following result started and has already run for 14min cpu and still counting.

I got the eight hours figure from looking up the 'computation started' time in the messages tab.

River~~
ID: 36465 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Problems with Rosetta version 5.45



©2024 University of Washington
https://www.bakerlab.org