Posts by Luigi R.

1) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 107280)
Posted 12 Oct 2022 by Luigi R.
Post:
Validation is running.

Workunits waiting for validation	2084120
Workunits waiting for assimilation	10578
Workunits waiting for file deletion	2


After about 20 minutes.

Workunits waiting for validation	1970571
Workunits waiting for assimilation	415
Workunits waiting for file deletion	176



Will validator win against crunchers? D:
2) Message boards : Number crunching : Tells us your thoughts on granting credit for large protein, long-running tasks (Message 94982)
Posted 20 Apr 2020 by Luigi R.
Post:
R@h adapts to changing requirements. With these new large protein models coming soon, tagged with a 4GB memory bound, and with models that may take several hours to run, enough that the watchdog has been extended from its normal 4hours to 10 hours, it seems credit may need some changes as well.

Normally, credit is granted based on the cumulative reported CPU time per model. And so a fast machine with lots of memory computes more models and gets more credit than an older system. But, in the case of these 4GB WUs, they will not even be sent to machines that do not have at least 4GB of memory (and normally BOINC would only be allowed to use less than 100% of that, so I should say where BOINC is allowed to use at least 4GB). So there will be no struggling Pentium 4s reporting any results to reflect the difficulty in the cumulative average.

[...]

if( maxMemoryUsed > 1) {
	grantedCredits = normalCredits * maxMemoryUsed;
}
else {
	grantedCredits = normalCredits;
}


E.g. a host gets 40cr/h. If it uses up to3.5GB of memory, you pay it 40*3.5=140cr/h.
3) Message boards : Number crunching : Reported a bit late and invalid (Message 94217)
Posted 12 Apr 2020 by Luigi R.
Post:
Is server marking tasks as invalid because results are reported some minutes/hours after deadline? How come?


rb_04_08_20667_20541_ab_t000__h002_robetta_IGNORE_THE_REST_04_08_905771_13_0
https://boinc.bakerlab.org/rosetta/result.php?resultid=1143484478
<core_client_version>7.9.3</core_client_version>
<![CDATA[
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.12_x86_64-pc-linux-gnu @rb_04_08_20667_20541_ab_t000__h002_robetta_FLAGS -in::file::fasta t000__h002.fasta -in:file:boinc_wu_zip rb_04_08_20667_20541_ab_t000__h002_robetta.zip -frag3 rb_04_08_20667_20541_ab_t000__h002_robetta.200.3mers.index.gz -fragA rb_04_08_20667_20541_ab_t000__h002_robetta.200.8mers.index.gz -fragB rb_04_08_20667_20541_ab_t000__h002_robetta.200.4mers.index.gz -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 1868176
Starting watchdog...
Watchdog active.
======================================================
DONE ::     1 starting structures   9816.2 cpu seconds
This process generated      1 decoys from       1 attempts
======================================================
BOINC :: WS_max 3.79814e+08

BOINC :: Watchdog shutting down...
21:32:46 (18543): called boinc_finish(0)

</stderr_txt>
]]>


rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta_IGNORE_THE_REST_05_11_905767_20_0
https://boinc.bakerlab.org/rosetta/result.php?resultid=1143482427
<core_client_version>7.9.3</core_client_version>
<![CDATA[
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.12_x86_64-pc-linux-gnu @rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta_FLAGS -in::file::fasta t000__h002.fasta -in:file:boinc_wu_zip rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta.zip -frag3 rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta.200.3mers.index.gz -fragA rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta.200.11mers.index.gz -fragB rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta.200.5mers.index.gz -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 1872149
Starting watchdog...
Watchdog active.
======================================================
DONE ::     1 starting structures  25995.4 cpu seconds
This process generated      5 decoys from       5 attempts
======================================================
BOINC :: WS_max 4.65002e+08

BOINC :: Watchdog shutting down...
20:17:00 (16382): called boinc_finish(0)

</stderr_txt>
]]>


rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta_IGNORE_THE_REST_04_08_905767_20_0
https://boinc.bakerlab.org/rosetta/result.php?resultid=1143482437
<core_client_version>7.9.3</core_client_version>
<![CDATA[
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.12_x86_64-pc-linux-gnu @rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta_FLAGS -in::file::fasta t000__h002.fasta -in:file:boinc_wu_zip rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta.zip -frag3 rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta.200.3mers.index.gz -fragA rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta.200.8mers.index.gz -fragB rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta.200.4mers.index.gz -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 1872449
Starting watchdog...
Watchdog active.
======================================================
DONE ::     1 starting structures  24984.6 cpu seconds
This process generated      6 decoys from       6 attempts
======================================================
BOINC :: WS_max 5.80346e+08

BOINC :: Watchdog shutting down...
21:40:25 (16842): called boinc_finish(0)

</stderr_txt>
]]>


rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta_IGNORE_THE_REST_06_05_905767_20_0
https://boinc.bakerlab.org/rosetta/result.php?resultid=1143482499
<core_client_version>7.9.3</core_client_version>
<![CDATA[
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.12_x86_64-pc-linux-gnu @rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta_FLAGS -in::file::fasta t000__h002.fasta -in:file:boinc_wu_zip rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta.zip -frag3 rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta.200.3mers.index.gz -fragA rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta.200.5mers.index.gz -fragB rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta.200.6mers.index.gz -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 1872769
Starting watchdog...
Watchdog active.
======================================================
DONE ::     1 starting structures  28015.9 cpu seconds
This process generated      6 decoys from       6 attempts
======================================================
BOINC :: WS_max 4.64642e+08

BOINC :: Watchdog shutting down...
20:47:18 (16224): called boinc_finish(0)

</stderr_txt>
]]>


rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta_IGNORE_THE_REST_08_05_905767_20_0
https://boinc.bakerlab.org/rosetta/result.php?resultid=1143482500
<core_client_version>7.9.3</core_client_version>
<![CDATA[
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.12_x86_64-pc-linux-gnu @rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta_FLAGS -in::file::fasta t000__h002.fasta -in:file:boinc_wu_zip rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta.zip -frag3 rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta.200.3mers.index.gz -fragA rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta.200.5mers.index.gz -fragB rb_04_08_20790_20542_abCOVID-19_t000__h002_robetta.200.8mers.index.gz -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 1872729
Starting watchdog...
Watchdog active.
======================================================
DONE ::     1 starting structures  24192.6 cpu seconds
This process generated      5 decoys from       5 attempts
======================================================
BOINC :: WS_max 5.64347e+08

BOINC :: Watchdog shutting down...
21:27:31 (16886): called boinc_finish(0)

</stderr_txt>
]]>
4) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 93396)
Posted 4 Apr 2020 by Luigi R.
Post:
Not sure what retroactive aspect you are talking about. This is not mentioned in the post you referenced.

Well, there was a thread about this issue and you referred to admin's post.
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13672&postid=92969#92969

Maybe I should have posted there.
5) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 93387)
Posted 4 Apr 2020 by Luigi R.
Post:
Not sure what retroactive aspect you are talking about. This is not mentioned in the post you referenced.

Covid-19 wu
https://boinc.bakerlab.org/rosetta/result.php?resultid=1136817690

Only 2cr today, although I downloaded on 29 Mar 2020. I expected no problem with credits, but I was wrong.
6) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 93382)
Posted 4 Apr 2020 by Luigi R.
Post:
@bcov:

Why is low-credit granting retroactive?
I downloaded a bunch of covid-19 tasks on the 29th of March. I'm reporting now and getting 2cr. instead of ~200cr.

I thought issue would affect only new tasks downloaded after the 31st of March.

My app version is 4.08.
7) Message boards : Number crunching : Stalled downloads (Message 92013)
Posted 16 Mar 2020 by Luigi R.
Post:
I've only encountered stuck download queue behavior on my Linux boxes, where BOINC still thinks there are stalled downloads, despite me having cleared them. A restart of the boincmgr fixes that issue.

Same here. It fixes itself after some minutes or hours too.
8) Message boards : Number crunching : Computation errors: rb_02_25_16883_16706_ab_t000__robetta_cstwt_5.0_xxxx (Message 91893)
Posted 7 Mar 2020 by Luigi R.
Post:
Errors on Ubuntu 18.04 too. BOINC didn't crash though.

rb_02_23_16774_16587_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_03_07_899045_59_1
rb_02_24_16778_16590_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_07_07_899052_43_1
9) Message boards : Number crunching : Computation errors: rb_02_25_16883_16706_ab_t000__robetta_cstwt_5.0_xxxx (Message 91889)
Posted 7 Mar 2020 by Luigi R.
Post:
On Ubuntu 18.04 there are no problems to run *cstwt_5.0* tasks.

Ubuntu 18.04.4 LTS, kernel 4.15.0-88-generic
BOINC v7.9.3
OK

Ubuntu 14.04.6 LTS, kernel 4.4.0-142-generic
BOINC v7.2.42
Dangerous
10) Message boards : Number crunching : Computation errors: rb_02_25_16883_16706_ab_t000__robetta_cstwt_5.0_xxxx (Message 91882)
Posted 6 Mar 2020 by Luigi R.
Post:
I'm going to abort all *cstwt_5.0* tasks by bash on Linux to guarantee my contribution to R@H.

Here it is my script:
https://pastebin.com/RKdZKhGx
11) Message boards : Number crunching : Computation errors: rb_02_25_16883_16706_ab_t000__robetta_cstwt_5.0_xxxx (Message 91863)
Posted 4 Mar 2020 by Luigi R.
Post:
Do you mean Mod.Sense? He is a great guy, but he is NOT an admin.
Admin posts only "Predictor of the day" and "News".
David E.K. - latest post is March 2019.
David Baker - latest post is Decembre 2017.
Here he is.
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13510&postid=91696#91696
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13510&postid=91703#91703


If you read forums, the "cstwt_5.0" wus has problems since February 2019.
I see.

I think I have already encountered this issue, but I didn't remember it at all.

BOINC client stops to respond and you can't even kill it. Although my client is standalone and user's process ( not a service), you have to kill as superuser.
12) Message boards : Number crunching : Computation errors: rb_02_25_16883_16706_ab_t000__robetta_cstwt_5.0_xxxx (Message 91860)
Posted 4 Mar 2020 by Luigi R.
Post:
Well, I see that Admin answers on Number Crunching threads.

As volunteer, I can spend my time to arrange a solution to abort all tasks named like "*cstwt*".
13) Message boards : Number crunching : Computation errors: rb_02_25_16883_16706_ab_t000__robetta_cstwt_5.0_xxxx (Message 91857)
Posted 4 Mar 2020 by Luigi R.
Post:
I'm on Linux, BOINC got freezed because of task rb_02_21_16595_16419_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_05_05_896595_65_1.

After 8 hours, I found that my host is at idle! This is very bad.

Please check it, it's not acceptable that a task blocks crunching on all DC projects.
Sadly, I set no more work on R@H.
14) Message boards : Number crunching : Rosetta 4.0+ (Message 87939)
Posted 20 Dec 2017 by Luigi R.
Post:
My target CPU run time is 1 hour.
Rosetta application runs for >5 hours. Then it fails. It happened 1 week ago too.
It looks no credit will get granted for that. I mean the credit granted after ~24 hours for errors.
How can I disable receiving Rosetta workunits? I want Rosetta Mini only.

<core_client_version>7.4.22</core_client_version>
<![CDATA[
<message>
process got signal 11
</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.06_x86_64-pc-linux-gnu @cp11v3n3_c.293.9_0001_0001_0001_0001.flags -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 1894171
Starting watchdog...
Watchdog active.
BOINC:: CPU time: 18537.5s, 14400s + 3600s[2017-12-20 19:18:14:] :: BOINC 
WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1
InternalDecoyCount: 0 (GZ)
-----
0
-----
Stream information inconsistent.
Writing W_0000001
======================================================
DONE ::     1 starting structures  18537.5 cpu seconds
This process generated      1 decoys from       1 attempts
======================================================
19:18:14 (10793): called boinc_finish(0)

</stderr_txt>
]]>

Link to result: https://boinc.bakerlab.org/result.php?resultid=961026207
15) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 87881)
Posted 10 Dec 2017 by Luigi R.
Post:
Same here.

https://boinc.bakerlab.org/result.php?resultid=958964042
https://boinc.bakerlab.org/result.php?resultid=958964041
16) Message boards : Number crunching : Stuck on uploading is a new problem? (Message 81488)
Posted 18 Apr 2017 by Luigi R.
Post:
Another (stuck) task got sent this night.

18-Apr-2017 04:05:28 [rosetta@home] Started upload of rb_03_23_72525_116778__t000__ab_robetta_IGNORE_THE_REST_474917_815_0_0
18-Apr-2017 04:05:46 [rosetta@home] Finished upload of rb_03_23_72525_116778__t000__ab_robetta_IGNORE_THE_REST_474917_815_0_0
18-Apr-2017 04:05:48 [rosetta@home] Sending scheduler request: To report completed tasks.
18-Apr-2017 04:05:48 [rosetta@home] Reporting 1 completed tasks
18-Apr-2017 04:05:48 [rosetta@home] Not requesting tasks: don't need
18-Apr-2017 04:05:51 [rosetta@home] Scheduler request completed
https://boinc.bakerlab.org/rosetta/result.php?resultid=910050184


Why are some tasks uploaded successfully after days?


Now I have 3 remaining (stuck) tasks.
17) Message boards : Number crunching : Stuck on uploading is a new problem? (Message 81479)
Posted 17 Apr 2017 by Luigi R.
Post:
By the way, I'm still not convinced it isn't Windows-10-specific. Yes, there was at least one report of a similar problem on Linux, but maybe it was different. I ran one of my Linux boxes all day yesterday without getting a sticker, and my Mac remains sticker free. Firing up a different Linux box today and will let it run all day (but that's including a major upgrade, which confuses all issues).

My host (4 stuck task now) has Xubuntu 14.04.5, Linux kernel 3.13.0-116-generic.
18) Message boards : Number crunching : Stuck on uploading is a new problem? (Message 81473)
Posted 17 Apr 2017 by Luigi R.
Post:
A little update.

One stuck task was miraculously uploaded yesterday.

16-Apr-2017 17:15:48 [rosetta@home] Started upload of 14dslfv5_14re4np_gb_0037_0001_30_0002_SAVE_ALL_OUT_480050_322_0_0
16-Apr-2017 17:16:02 [rosetta@home] Finished upload of 14dslfv5_14re4np_gb_0037_0001_30_0002_SAVE_ALL_OUT_480050_322_0_0
16-Apr-2017 17:16:06 [rosetta@home] Sending scheduler request: To report completed tasks.
16-Apr-2017 17:16:06 [rosetta@home] Reporting 1 completed tasks
16-Apr-2017 17:16:06 [rosetta@home] Not requesting tasks: don't need
16-Apr-2017 17:16:09 [rosetta@home] Scheduler request completed
https://boinc.bakerlab.org/rosetta/result.php?resultid=910051019


Now I have 4 remaining tasks stuck on uploading state.
You can see that they break the "8 KB rule".
https://s28.postimg.org/43l918fy5/rosetta_stuck_tasks.png
Maybe 3 tasks stalled 4 times, so that's why uploaded file size is 32KB (=4*8KB) and not 8KB.
19) Message boards : Number crunching : Stuck on uploading is a new problem? (Message 81455)
Posted 16 Apr 2017 by Luigi R.
Post:
I'm running LHC@Home, WCG and NumberFields too. No problem for these projects.
20) Message boards : Number crunching : Stuck on uploading is a new problem? (Message 81441)
Posted 15 Apr 2017 by Luigi R.
Post:
5 tasks stuck on uploading here.



client_state.xml

<file>
    <name>rb_03_23_72525_116778__t000__ab_robetta_IGNORE_THE_REST_474917_815_0_0</name>
    <nbytes>530178.000000</nbytes>
    <max_nbytes>25000000.000000</max_nbytes>
    <md5_cksum>221c7cf702ff15910a96060fed236335</md5_cksum>
    <status>1</status>
    <upload_url>http://srv1.bakerlab.org/rosetta_cgi/file_upload_handler</upload_url>
    <persistent_file_xfer>
        <num_retries>11</num_retries>
        <first_request_time>1492170993.617707</first_request_time>
        <next_request_time>1492253144.919740</next_request_time>
        <time_so_far>2552.406322</time_so_far>
        <last_bytes_xferred>32768.000000</last_bytes_xferred>
        <is_upload>1</is_upload>
    </persistent_file_xfer>
</file>

<file>
    <name>rb_03_23_72525_116778__t000__4_C1_SAVE_ALL_OUT_IGNORE_THE_REST_474917_259_0_0</name>
    <nbytes>887888.000000</nbytes>
    <max_nbytes>25000000.000000</max_nbytes>
    <md5_cksum>33e5d718ef03b6c814fecaa4d08c9b81</md5_cksum>
    <status>1</status>
    <upload_url>http://srv4.bakerlab.org/rosetta_cgi/file_upload_handler</upload_url>
    <persistent_file_xfer>
        <num_retries>11</num_retries>
        <first_request_time>1492169234.382923</first_request_time>
        <next_request_time>1492257568.171602</next_request_time>
        <time_so_far>2357.694327</time_so_far>
        <last_bytes_xferred>207.000000</last_bytes_xferred>
        <is_upload>1</is_upload>
    </persistent_file_xfer>
</file>

<file>
    <name>UN-NM_C4Yang_000006_2L8HC4-12_DHR32_0019.pdb_C4Yang_17_04_20_47_25_localDocking_9_SAVE_ALL_OUT_479492_23_0_0</name>
    <nbytes>22502.000000</nbytes>
    <max_nbytes>50000000.000000</max_nbytes>
    <md5_cksum>db8309e5c372f565885d1075ac2b9683</md5_cksum>
    <status>1</status>
    <upload_url>http://srv4.bakerlab.org/rosetta_cgi/file_upload_handler</upload_url>
    <persistent_file_xfer>
        <num_retries>8</num_retries>
        <first_request_time>1492198192.963454</first_request_time>
        <next_request_time>1492258150.352014</next_request_time>
        <time_so_far>2485.533146</time_so_far>
        <last_bytes_xferred>22502.000000</last_bytes_xferred>
        <is_upload>1</is_upload>
    </persistent_file_xfer>
</file>

<file>
    <name>3566f810a5e0096440dc8f17796115d2_eehee_pd1-docking_CancerImmunotherapy_17_04_13_32_36_globalDocking_4_SAVE_ALL_OUT_478149_7_0_0</name>
    <nbytes>99159.000000</nbytes>
    <max_nbytes>50000000.000000</max_nbytes>
    <md5_cksum>a72161808b6852c6bb6f86c8fc85619f</md5_cksum>
    <status>1</status>
    <upload_url>http://srv3.bakerlab.org/rosetta_cgi/file_upload_handler</upload_url>
    <persistent_file_xfer>
        <num_retries>5</num_retries>
        <first_request_time>1492214322.948390</first_request_time>
        <next_request_time>1492248107.442623</next_request_time>
        <time_so_far>2275.530723</time_so_far>
        <last_bytes_xferred>32768.000000</last_bytes_xferred>
        <is_upload>1</is_upload>
    </persistent_file_xfer>
</file>

<file>
    <name>14dslfv5_14re4np_gb_0037_0001_30_0002_SAVE_ALL_OUT_480050_322_0_0</name>
    <nbytes>337741.000000</nbytes>
    <max_nbytes>50000000.000000</max_nbytes>
    <md5_cksum>8b05674e7048a0d3632f82d93a4d9571</md5_cksum>
    <status>1</status>
    <upload_url>http://srv1.bakerlab.org/rosetta_cgi/file_upload_handler</upload_url>
    <persistent_file_xfer>
        <num_retries>5</num_retries>
        <first_request_time>1492243267.332962</first_request_time>
        <next_request_time>1492251204.143131</next_request_time>
        <time_so_far>1553.738969</time_so_far>
        <last_bytes_xferred>32768.000000</last_bytes_xferred>
        <is_upload>1</is_upload>
    </persistent_file_xfer>
</file>


Next 20



©2024 University of Washington
https://www.bakerlab.org