Posts by ramostol

1) Message boards : Number crunching : Success: Credit Awarded 0.00 (Message 68601)
Posted 12 Nov 2010 by ramostol
Post:
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=341721474

on the link above you can see that the work unit has been marked valid but the awarded credit is zero - but only for me!

What is going on??



When I follow the link I find these results:

373924681 985316 23 Oct 2010 15:31:01 UTC 6 Nov 2010 22:45:50 UTC Over Success Done 10,484.89 22.47 31.36
376550365 1360841 2 Nov 2010 15:47:28 UTC 2 Nov 2010 18:05:17 UTC Over Client error Compute error 551.44 4.98 --- [has received 4.98]
376581242 1303781 2 Nov 2010 18:09:27 UTC 11 Nov 2010 20:05:04 UTC Over Success Done 14,181.80 61.28 61.28

So everybody should be happy now — it just takes a little time… :)
2) Message boards : Number crunching : WUs Advancing Together (Message 65543)
Posted 13 Mar 2010 by ramostol
Post:
...

Now I'm thinking that the tasks are not all actually running. ...


Well, he may actually be correct and have two tasks running on the same processor.

I see that he has upgraded to BOINC 6.10.36. A few days after upgrading this happened on my two-core Intel Mac:

I left it running peacefully by itself in the afternoon. The morning sfter I found it running two Rosetta tasks on one processor (40-something each), the Finder occupying the other processor 100% (doing whatever), and the Kernel_Task process - as usual in various conflicts - being much too active. Never seen the like before.

A cold boot cured that one. But strange things have occurred also on my PPC after upgrading to 6.10.36, so I have this version under strict observation, and 6.10.35 ready to downgrade (which fixed my PPC-problems).

By the way, in Mr. Parlette's messages I observe:

> 3/12/2010 4:08:26 PM suspend work if non-BOINC CPU load exceeds 25 %
> 3/12/2010 4:08:26 PM (to change, visit the web site of an attached project,
> 3/12/2010 4:08:26 PM or click on Preferences)

He is aware of this new forced default in (at least on Mac) 6.10.35- BOINC versions, and really wants BOINC to function in this way?
3) Message boards : Number crunching : minirosetta 2.05 (Message 65480)
Posted 7 Mar 2010 by ramostol
Post:
My first Protein_interface (validation related?) error as far as I know - MacOS 10.5:

tyrsim_3gbn_2esa_Protein_interface_design_01Feb2010_17949_9_2


Outcome Success
Client state Done
Exit status 0 (0x0)

CPU time 21540.8

<core_client_version>6.10.36</core_client_version>
<![CDATA[
<stderr_txt>

[...]

# cpu_run_time_pref: 21600
======================================================
DONE :: 327 starting structures 21540.3 cpu seconds
This process generated 327 decoys from 327 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state Workunit error - check skipped


One of two wingmen validated successfully after his deadline, but with far fewer decoys completed.
4) Message boards : Number crunching : Problems with Rosetta version 5.98 (Message 63638)
Posted 9 Oct 2009 by ramostol
Post:
This workunit 285247863 failed on Mac OSX 10.6.1

<core_client_version>6.6.36</core_client_version>
<![CDATA[
<stderr_txt>
Rosetta@home Macintosh Stack Size checker.
Original size: 8388608.
Maximum size: 0.
RLIM_INFINITY 67104768
# cpu_run_time_pref: 10800
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
# random seed: 3155889
Rosetta@home Macintosh Stack Size checker.
Original size: 8388608.
Maximum size: 0.
RLIM_INFINITY 67104768

plus similar messages



All Rossmann tasks, successful or not, report these errors, for instance this task run on MacOS 10.5 on a computer working quite undisturbed by human activity:

CPU time 21761.01
stderr out

<core_client_version>6.10.11</core_client_version>
<![CDATA[
<stderr_txt>
Rosetta@home Macintosh Stack Size checker.
Original size: 8388608.
Maximum size: 0.
RLIM_INFINITY 67104768
# cpu_run_time_pref: 21600
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
# random seed: 2994865
======================================================
DONE :: 1 starting structures 21760.5 cpu seconds
This process generated 10 decoys from 10 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
]]>

Validate state Valid
Claimed credit 145.750455203617
Granted credit 74.7162632779638
5) Message boards : Number crunching : OS-X BOINC Client "Freeze" (Message 60717)
Posted 18 Apr 2009 by ramostol
Post:
Paul's description seems to cover the most.

My additions, as far as my memory goes:


On Apr 17, 2009, at 3:01 PM, Charlie Fenton wrote:

Hello Paul,

We've not been able to reproduce this problem.


Not surprising, it is intermittent. Not sure if if it time related or related to other networking issues, for example, if my external network is more rocky does this happen more often as the extern network errors "pile-up" and eventually kill the local TCP/IP connection. OF course I am just speculating ...


I have had this problem without being connected to the internet or to a network, the computer solely computing tasks all by itself. So I doubt that networks are to blame.


If you select a running task, does the "Show Graphics" button work?


Did not think to try this as I never, well almost never use or look at the graphics ... will try to recall this test the next time it happens.


The "Show Graphics" and "Properties" buttons give the same result as Paul describes for the "detach from the client" etc.

I have observed this problem only on my Intel Mac (10.5.6), not on my PPC (10.4).
6) Message boards : Number crunching : BOINC 6.6.20 released to public (Message 60684)
Posted 17 Apr 2009 by ramostol
Post:
I reverted back to 6.2.18 on OS X. The apps kept crunching, but the client kind of froze and stopped updating. Had to force quit a few times..

I have now seen that behavior from 6.5.0 and later ...


Same here for the 6.6.x versions. I have a handful of incidents where the BOINC manager loses contact with localhost in spite of reporting that the connection is still active.

But only on Intel - PPC seems to work without these problems :-)
7) Message boards : Number crunching : No new WUs since Feb 22,09 (Message 60014)
Posted 7 Mar 2009 by ramostol
Post:
Here it is March 7 and I haven't received any WUs in the past 14 days. What's up?


You seem to have a PPC. Tasks that run on a PPC are not supplied on a regular basis any more, see this discussion.
8) Message boards : Number crunching : Problems with Rosetta version 5.98 (Message 59990)
Posted 6 Mar 2009 by ramostol
Post:

The particular WU you show suffers from the dreaded bug in the BOINC server code where a result is accepted after the deadline is reached but the task has already been reissued. There has been a BOINC trac item open to fix this for over a year.
http://boinc.berkeley.edu/trac/ticket/276


I am still of the opinion that this is not at all a bug to be dreaded. All parties involved still seem to be compensated with appropriate credits. Interested parties might like to have a look at the following wus (while they last):

2p64__BOINC_SYMM_FOLD_AND_DOCK_RELAX-2p64_-native_frag2__7622_8156

2 "Compute error", 1 subsequent success after "Too many error results Too many total results". Which somehow proves a point.


2p64__BOINC_SYMM_FOLD_AND_DOCK_RELAX-2p64_-native_frag2__7622_13799

1 "No reply", 1 "Compute error", after "Too many total results" 1 apparent success with no local error messages, but the result file shows the wu erroring out also with the third cruncher:
CPU time 11808.83
stderr out

<core_client_version>6.6.12</core_client_version> [[but computed with 6.6.11]]
<![CDATA[
<stderr_txt>
Rosetta@home Macintosh Stack Size checker.
Original size: 0.
Maximum size: 8388608.
RLIM_INFINITY 0
shell-init: could not get current directory: getcwd: cannot access parent directories: Permission denied [[normal error message in my situation]]
# cpu_run_time_pref: 21600
# random seed: 2821546
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score -147.965 for 900 seconds
**********************************************************************
GZIP SILENT FILE: ./xx2p64.out

</stderr_txt>
]]>

Validate state Valid


9) Message boards : Number crunching : Account deletion - Not possible, so... read this (Message 59556)
Posted 14 Feb 2009 by ramostol
Post:
Oh, we have to acknowledge humourous initiatives...

I hope Deleted user at least finish his one wu. The project deserves that much, doesn't it? ;-)
10) Message boards : Number crunching : Report long-running models here (Message 58608)
Posted 7 Jan 2009 by ramostol
Post:
The WU 1nkuA_BOINC_MPZN_vanilla_abrelax_5901_51691_0 finally finished after more than 66 hours only to get 80 credits... quite disappointing... :(


Well, this is the old problem of the dog that did nothing in the nighttime, as documented in the watchdog message:

<core_client_version>6.4.5</core_client_version>
<![CDATA[
<stderr_txt>
**********************************************************************
Rosetta is going too long. Watchdog is ending the run!
CPU time: 238970 seconds. Greater than 3X preferred time: 10800 seconds
**********************************************************************
called boinc_finish

</stderr_txt>
]]>

3X 10800 seconds = 32400 seconds, why did the watchdog need 238970 seconds to interfere?
11) Message boards : Number crunching : Credit (Message 58606)
Posted 7 Jan 2009 by ramostol
Post:
At the same time one must consider the error message given:

failed to create shared mem segment
CreateSemaphore failure! Cannot create semaphore!
# cpu_run_time_pref: 28800
======================================================
DONE :: 1 starting structures 34145 cpu seconds
This process generated 10 decoys from 10 attempts
======================================================

It clearly states that an error is registered. Furthermore it says that with a preferred runtime of 28800 cpu seconds 10 decoys was computed using 34145 cpu seconds - the ultimate model must have been quite longwinded indeed compared to the others.

I should say that something happened when computing the last model, and the result was invalid. BOINC behaved as it should, and all received due credits.

As for the “too many results” message: I have twice observed this message being issued not upon the server receiving a result, but when the server issued a task for the third time to a pc. Since the third participants later delivered valid results and got their credits I consider this message as a mere warning, not an explanation of subsequent developments.
12) Message boards : Number crunching : Minirosetta v1.47 bug thread. (Message 58108)
Posted 22 Dec 2008 by ramostol
Post:
My 1.47 cc2_1_8_mammoth-tasks have all crashed on Ralph, now my 1.47 cc2_1_8_native-tasks are crashing on Rosetta.

...


#Aehm - i can't see your RALPH failure for this job. I had one result come back and it was a success..

http://ralph.bakerlab.org/rah_queue_ops/db_action.php?table=result&id=1228006


I believe I am not allowed access to rah_queue_ops ;-) so I cannot check your observation.

However, my Ralph mammoth-failures flourish, the ultimate example:

cc2_1_8_mammoth_fa_cst_hb_t369__IGNORE_THE_REST_1S3QA_7_6585_1_0

When this is said I seem to have reconciled with Rosetta by rebooting the computer in question. Why this was suddenly necessary on a computer with no new program installations, no new configurations, no system upgrades, no separate computing on the side, and successfully computing 1.47-tasks 24 hours earlier, I am unable to explain. Even the subsequently installed Boinc 6.5 works like a charm. So I am loaded with tasks for a peaceful Christmas session and hope for the best until reporting time next weekend.


13) Message boards : Number crunching : Minirosetta v1.47 bug thread. (Message 57927)
Posted 16 Dec 2008 by ramostol
Post:
And now all today's imported 1.47-tasks for the upcoming week have collapsed, most of them after less than 1 minute of computing, one was manually aborted as potentially ever-lasting.

It seems that I have to stick to my 5.98-tasks for some days and increase the default runtime.
14) Message boards : Number crunching : Minirosetta v1.47 bug thread. (Message 57926)
Posted 16 Dec 2008 by ramostol
Post:
My 1.47 cc2_1_8_mammoth-tasks have all crashed on Ralph, now my 1.47 cc2_1_8_native-tasks are crashing on Rosetta.

Example (1 of 2):
cc2_1_8_native_fa_cst_hb_t369__IGNORE_THE_REST_1S3QA_4_5599_36_0

<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
minirosetta_1.47_i686-apple-darwin(95094,0xa0538fa0) malloc: *** error for object 0x1747df0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
# cpu_run_time_pref: 14400
SIGSEGV: segmentation violation

15) Message boards : Number crunching : Minirosetta v1.45 bug thread (Message 57924)
Posted 16 Dec 2008 by ramostol
Post:
With 1.47 launched these results are perhaps not too interesting, but to be on the safe side:

This has indeed been a Black Monday, with all 1.45 tasks reserved for the coming week already crashed.

Early crashes with no result file:
8 tasks lr5_score13
2 tasks lr5_score12
1 task cc_3_5_nocst4
1 task 1_irna
1 task cs_vanilla
26 tasks abinitio...

In addition 2 tasks had to be manually aborted showing the signs of being non-terminators:
abinitio_abrelax_nohomfrag_129_B_2acy__5483_2781_0

abinitio_abrelax_nohomfrag_129_B_1r26A_5483_2512_0

16) Message boards : Number crunching : Minirosetta v1.45 bug thread (Message 57892)
Posted 15 Dec 2008 by ramostol
Post:
Deleted as duplicate - unstable internet connection...
17) Message boards : Number crunching : Minirosetta v1.45 bug thread (Message 57891)
Posted 15 Dec 2008 by ramostol
Post:
A number of failed 1.45 abinitio-tasks in the last 24 hours:

These two I had to abort, they had run more than 20 hours on 4 hours default, refused to display graphics:

abinitio_abrelax_nohomfrag_129_B_2ccvA_5483_3423_088

abinitio_abrelax_nohomfrag_129_B_1ctf__5483_3423_0

These collapsed quickly:

abinitio_abrelax_nohomfrag_129_B_1dzoA_5483_1560_1

<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
minirosetta_1.45_i686-apple-darwin(47077,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
# cpu_run_time_pref: 14400
minirosetta_1.45_i686-apple-darwin(47077,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
SIGBUS: bus error

abinitio_abrelax_nohomfrag_129_B_1npsA_5483_3423_0

<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
minirosetta_1.45_i686-apple-darwin(48148,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
# cpu_run_time_pref: 14400
SIGBUS: bus error

abinitio_abrelax_nohomfrag_129_B_2chf__5483_3423_0

<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
minirosetta_1.45_i686-apple-darwin(48486,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
# cpu_run_time_pref: 14400
minirosetta_1.45_i686-apple-darwin(48486,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
minirosetta_1.45_i686-apple-darwin(48486,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
SIGBUS: bus error

abinitio_abrelax_nohomfrag_129_B_1o4wA_5483_3423_0

<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
minirosetta_1.45_i686-apple-darwin(58522,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
# cpu_run_time_pref: 14400
minirosetta_1.45_i686-apple-darwin(58522,0xb0087000) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
terminate called after throwing an instance of 'std::length_error'
what(): basic_string::_S_create
SIGABRT: abort called

abinitio_abrelax_nohomfrag_129_B_1elwA_5483_3423_0

<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
minirosetta_1.45_i686-apple-darwin(47419,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
# cpu_run_time_pref: 14400
SIGSEGV: segmentation violation
18) Message boards : Number crunching : Minirosetta v1.45 bug thread (Message 57799)
Posted 11 Dec 2008 by ramostol
Post:
Failed tasks among tasks received 20081207:

Errored out on 2 computers:
cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_hi0719_olange_5386_25492

cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_hr1958_olange_5387_30102_1

One computer failures:
cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_hr1958_olange_5387_32109_0

loopbuild_reference_hombench_loopbuild_t363__IGNORE_THE_REST_2CU3A_7_5461_31_0
19) Message boards : Number crunching : Report long-running models here (Message 57767)
Posted 10 Dec 2008 by ramostol
Post:
The t060 (beta 5.98) wus usually last 4.5 to nearly 6 hours per model on my PPC. This model (t060_1_NMRREF_1_t060_1_id_model_07IGNORE_THE_REST_idl_5381_1234_0) lasted almost 10 hours.
20) Message boards : Number crunching : MAC PowerPC - There was work for other platforms (Message 57585)
Posted 4 Dec 2008 by ramostol
Post:
Oh, we knew this was bound to happen, so we just have to take it in stride. The project is reworking its software, and it is understandable that they do not put much effort in supporting a "dinosaur" processor. Enough problems abound with the Intel to keep the project developers in shape, don't they?

It is a fact that Boinc still supports PPC - my PPC is currently running 6.3.24, which seems to work just fine. The 6.3.x Boincs have even been more stable on PPC than on MacBook Intel.

And, lest anyone should think that Rosetta has abandoned the PPC: 5.98 wus still arrive, but not dependably and regularly as before. Usually merely trickling in, but two days ago a lot of them was dropped in my lap, and at this very moment I believe I can download as many as my computer can take. We are approaching Christmas, I believe.

However, the situation favours manual use of the "Update" button in Boinc to avoid the 24 hours fatal interregnum created by the "work for other platforms"-messages. This is of course an ideal situation for us laptop users with irregular internet connection... :-)

My PPC now works for WCG and Rosetta and seems satisfied this way.


Next 20



©2024 University of Washington
https://www.bakerlab.org