Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 72 · 73 · 74 · 75 · 76 · 77 · 78 . . . 276 · Next

AuthorMessage
Aravah

Send message
Joined: 12 Apr 20
Posts: 6
Credit: 1,101,172
RAC: 0
Message 99013 - Posted: 15 Sep 2020, 19:39:22 UTC - in response to Message 99006.  

Thanks for your feedback.
For clarification the two jobs are as far as I can tell unrelated but have very different credit scores - I did not mean to imply they were first and second runs of the same task. I know some projects send out jobs to multiple computers, I did not know if Rosetta was one of these.
ID: 99013 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Aravah

Send message
Joined: 12 Apr 20
Posts: 6
Credit: 1,101,172
RAC: 0
Message 99014 - Posted: 15 Sep 2020, 19:39:24 UTC - in response to Message 99006.  
Last modified: 15 Sep 2020, 19:40:31 UTC

tx
ID: 99014 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile bormolino

Send message
Joined: 16 May 13
Posts: 4
Credit: 160,977
RAC: 0
Message 99022 - Posted: 16 Sep 2020, 18:10:43 UTC

I'm still having issues with the graphics on Ubuntu 18.04. It shows "No shared mem".

ID: 99022 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 350
Credit: 1,016,963
RAC: 531
Message 99061 - Posted: 20 Sep 2020, 15:43:12 UTC
Last modified: 20 Sep 2020, 15:57:29 UTC

Seeing some tasks with a large log with these lines:

AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS.



[ ERROR ]: Caught exception:


File: ......srcprotocolsmotif_graftingmoversMotifGraftMover.cc:537
For this scaffold there are not suitable scaffold grafts within your constrains
------------------------ Begin developer's backtrace -------------------------
BACKTRACE:
------------------------- End developer's backtrace --------------------------


AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS.



https://boinc.bakerlab.org/rosetta/result.php?resultid=1263012644
https://boinc.bakerlab.org/rosetta/result.php?resultid=1263011946

In those examples, one had the number of decoys at the end and the other one didn't.
They are validating but I sure hope this isn't wasted electricity.

epcam_breaker_graft_v1_SAVE_ALL_OUT_IGNORE_THE_REST_2lt3jd5h_1009432_4_0
pdl1_graft_v1_SAVE_ALL_OUT_IGNORE_THE_REST_7es6gq8a_1009506_4_0

EDIT: On another PC https://boinc.bakerlab.org/rosetta/result.php?resultid=1262076666
https://boinc.bakerlab.org/rosetta/result.php?resultid=1262076045

Seems limited to epcam_breaker and pdl1_graft tasks from what I can tell.
ID: 99061 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Detto
Avatar

Send message
Joined: 10 Apr 20
Posts: 2
Credit: 788,565
RAC: 0
Message 99062 - Posted: 20 Sep 2020, 17:46:33 UTC

For the 3rd time since April I only got 3 credits for a work unit :

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1130847248

any insights?
ID: 99062 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1478
Credit: 14,506,175
RAC: 11,673
Message 99063 - Posted: 20 Sep 2020, 18:52:22 UTC - in response to Message 99062.  

For the 3rd time since April I only got 3 credits for a work unit :

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1130847248

any insights?
Nope.
The system completed a Task of exactly the same type without issue.
Is there only 1 instance of BOINC running on that system?

The difference between CPU time and Runtime indicates the system is doing a fair bit of work other than processing BOINC Tasks, but it's nowhere near as big a difference as other systems that aren't having low Credit issues.


<core_client_version>7.16.11</core_client_version>
<![CDATA[
<stderr_txt>
command: rosetta_4.20_x86_64-apple-darwin -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers.index -in:file:native 00001.pdb -silent_gz 1 -frag9 00001.200.9mers.index -out:file:silent default.out -ex1 1 -abinitio::rsd_wt_loop 0.5 -relax::default_repeats 5 -abinitio::use_filters false -abinitio::increase_cycles 10 -abinitio::rsd_wt_helix 0.5 -beta 1 -abinitio::rg_reweight 0.5 -in:file:boinc_wu_zip Norn_pssm_struct_profile_layered_design_less_IVYW_wt1_091_c1__0.05_0018_8ffb15d87a6b0ee88cff77a7acba3bea_BJH8LOZG_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2711465
Using database: database_357d5d93529_n_methyl/minirosetta_database
======================================================
DONE ::     1 starting structures  28683.4 cpu seconds
This process generated    133 decoys from     133 attempts
======================================================
BOINC :: WS_max 4.60468e+08
11:45:31 (22004): called boinc_finish(0)

</stderr_txt>
]]>





<core_client_version>7.16.6</core_client_version>
<![CDATA[
<stderr_txt>
command: rosetta_4.20_x86_64-apple-darwin -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers.index -in:file:native 00001.pdb -silent_gz 1 -frag9 00001.200.9mers.index -out:file:silent default.out -ex1 1 -abinitio::rsd_wt_loop 0.5 -relax::default_repeats 5 -abinitio::use_filters false -abinitio::increase_cycles 10 -abinitio::rsd_wt_helix 0.5 -beta 1 -abinitio::rg_reweight 0.5 -in:file:boinc_wu_zip Norn_pssm_plus_struct_profile_091_c1_barrel6_3_c0851511e59b6__0.05_0009_8c3dcb16fc078c91ae0a41d4b95a66fc_M0Z4KTXS_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2841497
Using database: database_357d5d93529_n_methyl/minirosetta_database
======================================================
DONE ::     1 starting structures  28729.4 cpu seconds
This process generated    130 decoys from     130 attempts
======================================================
BOINC :: WS_max 4.43654e+08
19:49:09 (1228): called boinc_finish(0)
======================================================
DONE ::     1 starting structures  28916.2 cpu seconds
This process generated      1 decoys from       1 attempts
======================================================
BOINC :: WS_max 2.25206e+08
19:58:00 (3178): called boinc_finish(0)

</stderr_txt>
]]>


It's the usual cause- the Task finished, and yet continued to run and produced one more Decoy from another starting structure. That processing time was added to the earlier processing time, but that final Decoy wasn't, so added to the previous Decoys produced, so Credit was granted based on that one Decoy, and none of the previous work.
Grant
Darwin NT
ID: 99063 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 350
Credit: 1,016,963
RAC: 531
Message 99064 - Posted: 20 Sep 2020, 18:56:41 UTC - in response to Message 99062.  
Last modified: 20 Sep 2020, 18:59:36 UTC

DELETED
ID: 99064 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 350
Credit: 1,016,963
RAC: 531
Message 99097 - Posted: 22 Sep 2020, 21:20:52 UTC

"Validation error"

No idea what it was.


https://boinc.bakerlab.org/rosetta/result.php?resultid=1263056236
ID: 99097 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,831,954
RAC: 1,109
Message 99112 - Posted: 23 Sep 2020, 0:42:18 UTC - in response to Message 99097.  

"Validation error"

No idea what it was.


https://boinc.bakerlab.org/rosetta/result.php?resultid=1263056236

The other task from the same workunit also failed. Therefore, an error in one or more of the input files is a likely cause, even though the stderr output the two task said nothing very useful about just what the error was.
ID: 99112 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 350
Credit: 1,016,963
RAC: 531
Message 99114 - Posted: 23 Sep 2020, 8:36:57 UTC - in response to Message 99112.  

Yes, but it was an Unhandled Exception error at the start. I used to get some of of those related to my machine a while ago.


Mine ran for a lot longer with no apparent errors.

Just thought I'd report here.
ID: 99114 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 350
Credit: 1,016,963
RAC: 531
Message 99127 - Posted: 25 Sep 2020, 19:23:40 UTC - in response to Message 99097.  

"Validation error"

No idea what it was.


https://boinc.bakerlab.org/rosetta/result.php?resultid=1263056236



Another one https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1131803173

Just "validation error". Nothing seemingly wrong on the log.
ID: 99127 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sph

Send message
Joined: 27 Mar 20
Posts: 7
Credit: 17,359,964
RAC: 0
Message 99234 - Posted: 4 Oct 2020, 2:43:38 UTC

Issue with 50% of all Rosetta tasks on this PC.:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4975041
Tasks will run for 50000 seconds yet only yield 6.5 points to 45 points. The other 50% of Rosetta tasks on this PC work as expected.
All other PCs are fine.
I have removed Rosetta from PC and run other projects, which work as expected.
Re-added Rosetta. Tasks worked well for 4 days, then reverted back to the above failure pattern.
The pc has no detected issues.
As Rosetta is working well on my other PCs, the error is obviously only shown under specific conditions.
ID: 99234 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,831,954
RAC: 1,109
Message 99236 - Posted: 4 Oct 2020, 3:02:37 UTC - in response to Message 99234.  
Last modified: 4 Oct 2020, 3:04:02 UTC

Issue with 50% of all Rosetta tasks on this PC.:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4975041
Tasks will run for 50000 seconds yet only yield 6.5 points to 45 points. The other 50% of Rosetta tasks on this PC work as expected.

[snip]

This looks like most of the points were based on the number of decoys completed, and NOT on the amount of CPU time used.

You might check if this also holds for your other computers.
ID: 99236 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1478
Credit: 14,506,175
RAC: 11,673
Message 99238 - Posted: 4 Oct 2020, 4:15:38 UTC - in response to Message 99234.  

Issue with 50% of all Rosetta tasks on this PC.:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4975041
Tasks will run for 50000 seconds yet only yield 6.5 points to 45 points. The other 50% of Rosetta tasks on this PC work as expected.
All other PCs are fine.
I have removed Rosetta from PC and run other projects, which work as expected.
Re-added Rosetta. Tasks worked well for 4 days, then reverted back to the above failure pattern.
The pc has no detected issues.
As Rosetta is working well on my other PCs, the error is obviously only shown under specific conditions.
Just had a look at those WUs on my systems, and there are some WUs that pay out considerably less Credit than others, but no where near as low as what yours are doing.
And some of those with low Credit have produced many more Decoys than some of those with much higher Credit.
The only difference i can see is that i've process a lot more of them- more cores & threads in use & using the default processing time.

The benchmarks on that system are OK, and the system isn't losing time to doing non-crunching work, so i can't think of any particular reason for such a variation in Credit granted (although i do recall that someone had a host several months back that was exhibiting similar odd Credit payouts, but i can't remember the result of that particular issue)).




The amount of Credit granted depends on the amount of work done- which is the number of Models completed.
2 WUs of the same type running on the same system running for the same length of time may complete a similar number of Models, but one may produce only 1 Decoy, the other may produce hundreds. But both should get similar amounts of Credit as they did similar amounts of work (number of Models completed), even though the number of Decoys produced is different.

Processing a Task for a longer period will result in more Credit for that Task- but the Credit per hour will still be on par with processing it for a much shorter period of time. The only way to get more Credit per hour is more cores & threads, and/or higher clock speed and/or greater IPC (Instructions Per Clock).
Grant
Darwin NT
ID: 99238 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sph

Send message
Joined: 27 Mar 20
Posts: 7
Credit: 17,359,964
RAC: 0
Message 99240 - Posted: 4 Oct 2020, 10:39:41 UTC - in response to Message 99238.  
Last modified: 4 Oct 2020, 10:42:31 UTC

Issue with 50% of all Rosetta tasks on this PC.:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4975041
Tasks will run for 50000 seconds yet only yield 6.5 points to 45 points. The other 50% of Rosetta tasks on this PC work as expected.
All other PCs are fine.
I have removed Rosetta from PC and run other projects, which work as expected.
Re-added Rosetta. Tasks worked well for 4 days, then reverted back to the above failure pattern.
The pc has no detected issues.
As Rosetta is working well on my other PCs, the error is obviously only shown under specific conditions.
Just had a look at those WUs on my systems, and there are some WUs that pay out considerably less Credit than others, but no where near as low as what yours are doing.
And some of those with low Credit have produced many more Decoys than some of those with much higher Credit.
The only difference i can see is that i've process a lot more of them- more cores & threads in use & using the default processing time.

The benchmarks on that system are OK, and the system isn't losing time to doing non-crunching work, so i can't think of any particular reason for such a variation in Credit granted (although i do recall that someone had a host several months back that was exhibiting similar odd Credit payouts, but i can't remember the result of that particular issue)).




The amount of Credit granted depends on the amount of work done- which is the number of Models completed.
2 WUs of the same type running on the same system running for the same length of time may complete a similar number of Models, but one may produce only 1 Decoy, the other may produce hundreds. But both should get similar amounts of Credit as they did similar amounts of work (number of Models completed), even though the number of Decoys produced is different.

Processing a Task for a longer period will result in more Credit for that Task- but the Credit per hour will still be on par with processing it for a much shorter period of time. The only way to get more Credit per hour is more cores & threads, and/or higher clock speed and/or greater IPC (Instructions Per Clock).


Hi Grant
+1 on all feedback.
This is an older gen PC that has been able to contribute at the level expected of this generation pc. The latest optimisation of the WU seems to has introduced issues peculiar to this PC config. It is a linux virtual box VM on a windows host whereas an identical PC running Linux on the host is running fine.
If Admins cannot track the issue, I may just format the host as Linux and be done with it. Some issues dont justify the time required to debug.
Just hoping others may have also seen similar issues.

EDITS: fix typo and format
ID: 99240 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 99241 - Posted: 4 Oct 2020, 11:30:10 UTC - in response to Message 99234.  

Issue with 50% of all Rosetta tasks on this PC.:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4975041
Tasks will run for 50000 seconds yet only yield 6.5 points to 45 points.
There’s a problem somewhere that’s causing those tasks to get stuck without performing much useful work. The lines in the output like
BOINC:: CPU time: 50422.3s, 36000s + 14400s[2020-10- 4 18:25: 5:] :: BOINC
come from the watchdog ending the tasks 10 hours after their target 4-⁠hour run time. It’s odd that they validate as successful under those circumstances.

That machine is running the 32-⁠bit Rosetta application, which I suspect doesn’t get much testing these days. Perhaps there’s a bug in the application itself, or some compatibility issue with the OS environment, or even something strange going on with the virtualisation. Hard to say.
ID: 99241 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sph

Send message
Joined: 27 Mar 20
Posts: 7
Credit: 17,359,964
RAC: 0
Message 99243 - Posted: 4 Oct 2020, 12:18:28 UTC - in response to Message 99241.  
Last modified: 4 Oct 2020, 12:21:28 UTC

Issue with 50% of all Rosetta tasks on this PC.:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4975041
Tasks will run for 50000 seconds yet only yield 6.5 points to 45 points.
There’s a problem somewhere that’s causing those tasks to get stuck without performing much useful work. The lines in the output like
BOINC:: CPU time: 50422.3s, 36000s + 14400s[2020-10- 4 18:25: 5:] :: BOINC
come from the watchdog ending the tasks 10 hours after their target 4-⁠hour run time. It’s odd that they validate as successful under those circumstances.

That machine is running the 32-⁠bit Rosetta application, which I suspect doesn’t get much testing these days. Perhaps there’s a bug in the application itself, or some compatibility issue with the OS environment, or even something strange going on with the virtualisation. Hard to say.


Hi Brian
Saw the same message (and unusual error messages in others WUs) and the unexpected successful completion... hence my hunch on an error in the app.
Didnt think of the 32 bit angle, thanks for highlighting this aspect. This may definitely be a contributing factor.

Looks more and more like a format for this machine, but wont be able to schedule this for 2 - 3 weeks..... so will continue to tinker with it till then.
ID: 99243 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sph

Send message
Joined: 27 Mar 20
Posts: 7
Credit: 17,359,964
RAC: 0
Message 99244 - Posted: 5 Oct 2020, 2:56:09 UTC - in response to Message 99234.  

Issue with 50% of all Rosetta tasks on this PC.:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4975041
Tasks will run for 50000 seconds yet only yield 6.5 points to 45 points. The other 50% of Rosetta tasks on this PC work as expected.
All other PCs are fine.
I have removed Rosetta from PC and run other projects, which work as expected.
Re-added Rosetta. Tasks worked well for 4 days, then reverted back to the above failure pattern.
The pc has no detected issues.
As Rosetta is working well on my other PCs, the error is obviously only shown under specific conditions.


Further information on this issue:
If I abort these tasks after 8 hours, credit is awarded at the expected level of work completed. I can only assume the aborted tasks would result is the low credit level, but based on current trend for this pc, this is a safe assumption.
The credit is not awarded immediatley, but is awarded before the task is completed by another pc.
ID: 99244 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ross Parlette

Send message
Joined: 10 Nov 05
Posts: 32
Credit: 2,165,044
RAC: 0
Message 99484 - Posted: 1 Nov 2020, 23:21:44 UTC

I've only had a handful of tasks for the last few days, only two today. Am I missing something?

Ross
ID: 99484 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,831,954
RAC: 1,109
Message 99485 - Posted: 1 Nov 2020, 23:27:48 UTC - in response to Message 99484.  

I've only had a handful of tasks for the last few days, only two today. Am I missing something?

Ross

That's been normal for about a week.

The server status page indicates that few tasks are ready to send, but many are in progress.

In other words, the number of user requests for tasks greatly exceeds the number of tasks created.
ID: 99485 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 72 · 73 · 74 · 75 · 76 · 77 · 78 . . . 276 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org