Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 23 · 24 · 25 · 26 · 27 · 28 · 29 . . . 306 · Next

AuthorMessage
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 90198 - Posted: 13 Jan 2019, 16:54:07 UTC - in response to Message 90197.  
Last modified: 13 Jan 2019, 17:38:30 UTC

Instead of being an idiot, be informative and post a link or a quote.

I gave you the answer, straight into your hand. And you still weren't able to figure it out.
(I agree that information is hard to find sometimes - that is why you should look at the topics first. If you don't have time, I don't think you can expect someone else to do it for you.)
ID: 90198 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 90203 - Posted: 14 Jan 2019, 23:16:38 UTC - in response to Message 90198.  
Last modified: 14 Jan 2019, 23:27:53 UTC

What you said was it has "something" to do with "decemeber 25"
Well what exactly is something? and what does decemeber 25 have to with anything when this problem goes back to the 18th or even earlier?

If you have time..go find a specific post and put the link here.
I do that for people when I have time and know the specific answer from another post.
Generalization does nothing for me. That's what you offered. "Something"

Here is specific detailed info that I see on my single project graph.
All other projects are 100% equal share with Rosetta. (default setting) Rosetta is losing credits and should be trying to make them up but does not. Credit stays normal until on the 17th and dips for a day and the corrects itself until the 21st and after that all downhill. So how does this go with your "something on the 25th"? 17 and 21 predate the 25th.

[/img]
ID: 90203 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 90208 - Posted: 15 Jan 2019, 18:28:00 UTC

Amazing!!

1/15/2019 7:24:26 PM | Rosetta@home | Sending scheduler request: To fetch work.
1/15/2019 7:24:26 PM | Rosetta@home | Requesting new tasks for CPU
1/15/2019 7:24:29 PM | Rosetta@home | Scheduler request completed: got 29 new tasks
ID: 90208 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 397
Credit: 12,285,463
RAC: 11,195
Message 90509 - Posted: 14 Mar 2019, 16:01:06 UTC

I am seeing all too many errors from work units at the end of their processing cycle (after 12hours processing) and would like some advice as to whether there are any changes I can make to stop them.

Examples can be seen in WUs 1062692421 and 1062687362 but basically they show exit status 139 (unknown error) with signal 11 and a message saying that default.out.gz already exists with size -1.

Any suggestions would be gratefully received.
ID: 90509 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 90511 - Posted: 14 Mar 2019, 21:08:47 UTC - in response to Message 90509.  

You may need more memory. You have 8 GB on your Ryzen, but the Rosetta work units sometimes take up to 1 GB each.
ID: 90511 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 397
Credit: 12,285,463
RAC: 11,195
Message 90512 - Posted: 15 Mar 2019, 0:06:20 UTC - in response to Message 90511.  

You may need more memory. You have 8 GB on your Ryzen, but the Rosetta work units sometimes take up to 1 GB each.


Ouch, I know my free memory sometimes goes down to 2 or 3% but I hadn’t thought of it going negative.

Thanks for the suggestion, I’ll look at getting another 8gb and maybe some more for the FX rig as well, that only has 4gb for the 4 cores.

Hmm, that raises a thought. They’re both running half and half between Rosetta and WCG which, I think, has a lower memory requirement?
ID: 90512 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 90513 - Posted: 15 Mar 2019, 1:43:25 UTC - in response to Message 90512.  

Hmm, that raises a thought. They’re both running half and half between Rosetta and WCG which, I think, has a lower memory requirement?

Yes, all the WCG ones that I know of have a pretty small memory requirement. The biggest is MIP, which is around 300 MB.
But you probably aren't always running an equal proportion of Rosetta and WCG. The BOINC scheduler does strange things, and may give you all Rosetta once in a while.
ID: 90513 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,334,484
RAC: 3,524
Message 90514 - Posted: 15 Mar 2019, 3:44:12 UTC - in response to Message 90509.  

I am seeing all too many errors from work units at the end of their processing cycle (after 12hours processing) and would like some advice as to whether there are any changes I can make to stop them.

Examples can be seen in WUs 1062692421 and 1062687362 but basically they show exit status 139 (unknown error) with signal 11 and a message saying that default.out.gz already exists with size -1.

Any suggestions would be gratefully received.

You might check if decreasing the time that workunits can run on your computers to ten hours has any effect on this.
ID: 90514 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 397
Credit: 12,285,463
RAC: 11,195
Message 90515 - Posted: 15 Mar 2019, 8:40:31 UTC - in response to Message 90514.  

I am seeing all too many errors from work units at the end of their processing cycle (after 12hours processing) and would like some advice as to whether there are any changes I can make to stop them.

Examples can be seen in WUs 1062692421 and 1062687362 but basically they show exit status 139 (unknown error) with signal 11 and a message saying that default.out.gz already exists with size -1.

Any suggestions would be gratefully received.

You might check if decreasing the time that workunits can run on your computers to ten hours has any effect on this.


The time was set to the default of 8 hours so I don’t know why these were taking 12 hours anyway but I reset it to 6 hours yesterday to try to reduce the loss when a wu errored out in this way.
ID: 90515 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 397
Credit: 12,285,463
RAC: 11,195
Message 90516 - Posted: 15 Mar 2019, 8:50:18 UTC - in response to Message 90513.  

Hmm, that raises a thought. They’re both running half and half between Rosetta and WCG which, I think, has a lower memory requirement?

Yes, all the WCG ones that I know of have a pretty small memory requirement. The biggest is MIP, which is around 300 MB.
But you probably aren't always running an equal proportion of Rosetta and WCG. The BOINC scheduler does strange things, and may give you all Rosetta once in a while.


I monitor them fairly closely and I’m fairly sure the the ryzen was running 6 and 6 at the time. The FX had just come out of a period where it was running all Rosetta for a few days to catch up after running all WCG for a while but the ryzen completed 46 WCG WUs that day which is about normal and was equal every time I looked.
ID: 90516 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 397
Credit: 12,285,463
RAC: 11,195
Message 90517 - Posted: 15 Mar 2019, 10:37:08 UTC

OK, extra memory ordered for both machines so we’ll see if that sorts it.
ID: 90517 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 397
Credit: 12,285,463
RAC: 11,195
Message 90519 - Posted: 15 Mar 2019, 13:32:02 UTC

Whilst I’m here, a silly question if I may.

Is there any way of changing the View Tasks page from sorting by date sent to sorting by date returned?
ID: 90519 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 90520 - Posted: 15 Mar 2019, 14:19:38 UTC - in response to Message 90519.  

Whilst I’m here, a silly question if I may.

Is there any way of changing the View Tasks page from sorting by date sent to sorting by date returned?


I take it you are asking about the website? ...and not the BOINC Manager tasks page. I am not aware of a way to define how to present the web page.
Rosetta Moderator: Mod.Sense
ID: 90520 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 397
Credit: 12,285,463
RAC: 11,195
Message 90521 - Posted: 15 Mar 2019, 16:34:42 UTC - in response to Message 90520.  

Ok, thanks. It was worth asking :-)
ID: 90521 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 23,221,036
RAC: 12,268
Message 90522 - Posted: 15 Mar 2019, 18:08:00 UTC - in response to Message 90517.  
Last modified: 15 Mar 2019, 18:28:22 UTC

OK, extra memory ordered for both machines so we’ll see if that sorts it.


It seems like Rosetta gets into a state where it consumes 1gb+ per WU. I am running 35 WU and there is always a couple taking over a gb.

I watch the difference between CPU and RUN times and swap used. As long as the swap used is very low, you are probably not running into memory problems. I tend to buy more GB of memory than threads. I originally got my 36 thread machine with 32GB and that was not enough. You can see that 19gb of my swap space has been used even though the machine has 64gb installed for the 36 threads. 19gb swap space used is concerning.

Based on over a thousand jobs each, the credit difference between the 64-bit Rosetta WU and Minirosetta 32-bit WU is negligible. 44.0 credits/CPU hr for Rosetta 4.08 and 45.7 credits/CPU hr.

top ic .... sorted by memory use.

top - 10:55:55 up 1 day, 18:24, 0 users, load average: 40.22, 36.72, 36.27
Tasks: 524 total, 37 running, 487 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.4 us, 1.4 sy, 96.6 ni, 1.1 id, 0.0 wa, 0.4 hi, 0.1 si, 0.0 st
MiB Mem : 64090.7 total, 1051.2 free, 16283.5 used, 46756.0 buff/cache
MiB Swap: 32112.0 total, 32093.0 free, 19.0 used. 45874.0 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24590 boinc 39 19 1722808 1.5g 75400 R 98.3 2.5 219:01.25 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_15_1798_1948__t000__1_C1+
25349 boinc 39 19 1384300 1.2g 75400 R 99.3 1.9 198:57.60 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_15_1798_1948__t000__1_C1+
22988 boinc 39 19 838204 723668 75400 R 97.7 1.1 259:32.35 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_14_1536_1929__t000__0_C1+
24878 boinc 39 19 706928 592640 75784 R 99.3 0.9 211:30.53 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_15_1805_1950__t000__0_C1+
15222 boinc 39 19 605140 491200 76104 R 99.0 0.7 459:54.12 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_15_1674_1946__t000__0_C1+
20625 boinc 39 19 605492 491108 75400 R 99.3 0.7 319:46.33 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_15_1808_1947__t000__0_C1+
16439 boinc 39 19 583112 468876 75784 R 97.4 0.7 428:23.20 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_15_1674_1946__t000__0_C1+
24082 boinc 39 19 583664 465920 68044 R 99.3 0.7 231:28.63 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu @flags_rb_03_15_1805_1950__t000__ab_robetta -in:file:boinc_wu_zip+
17334 boinc 39 19 575680 457680 68620 R 99.3 0.7 404:59.21 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+
22280 boinc 39 19 543464 425512 68556 R 99.7 0.6 276:44.21 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+
19901 boinc 39 19 533536 415428 68556 R 99.7 0.6 338:55.09 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+
22209 boinc 39 19 530860 413260 68236 R 99.3 0.6 278:15.90 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu @flags_rb_03_15_1808_1947__t000__ab_robetta -in:file:boinc_wu_zip+
25711 boinc 39 19 523612 408668 70668 R 99.3 0.6 190:02.19 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu @foldit_2007571_0001_fold_and_dock_flags -silent_gz -mute all -ou+
21481 boinc 39 19 521072 406132 70604 R 99.3 0.6 297:12.91 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu @foldit_2007571_0005_fold_and_dock_flags -silent_gz -mute all -ou+
17873 boinc 39 19 516024 398184 68620 R 99.3 0.6 391:55.17 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+
15374 boinc 39 19 511956 394116 68556 R 99.3 0.6 455:50.04 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+
30825 boinc 39 19 509260 391232 68620 R 99.3 0.6 78:42.82 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+
14998 boinc 39 19 508228 390160 68620 R 98.0 0.6 465:27.01 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+
18209 boinc 39 19 503324 385500 68620 R 99.0 0.6 383:28.39 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+
31538 boinc 39 19 500516 382744 68620 R 99.3 0.6 60:22.53 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+
ID: 90522 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Juha

Send message
Joined: 28 Mar 16
Posts: 13
Credit: 705,034
RAC: 0
Message 90523 - Posted: 15 Mar 2019, 18:29:56 UTC - in response to Message 90522.  

19gb swap space used is concerning.


19 GB would indeed be a lot of swap in use but haven't you got the unit wrong? It looks like 19 MB to me.
ID: 90523 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2132
Credit: 41,490,422
RAC: 17,755
Message 90532 - Posted: 18 Mar 2019, 12:38:20 UTC - in response to Message 90512.  

You may need more memory. You have 8 GB on your Ryzen, but the Rosetta work units sometimes take up to 1 GB each.

Ouch, I know my free memory sometimes goes down to 2 or 3% but I hadn’t thought of it going negative.

Thanks for the suggestion, I’ll look at getting another 8gb and maybe some more for the FX rig as well, that only has 4gb for the 4 cores.

Hmm, that raises a thought. They’re both running half and half between Rosetta and WCG which, I think, has a lower memory requirement?

Sorry to be a bit late on this, but I did notice around 13th March I had a task consuming 2.4Gb and 14Gb of my 16Gb (total) RAM being in use to run 8 tasks.

I can't recall the tasks involved. Right now I'm back to my more usual level of 7.74Gb in use
ID: 90532 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 23,221,036
RAC: 12,268
Message 90533 - Posted: 18 Mar 2019, 16:14:06 UTC - in response to Message 90523.  

19gb swap space used is concerning.


19 GB would indeed be a lot of swap in use but haven't you got the unit wrong? It looks like 19 MB to me.


DOH! You are obviously correct. I got units of GB dancing in my head.
ID: 90533 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 397
Credit: 12,285,463
RAC: 11,195
Message 90535 - Posted: 19 Mar 2019, 10:40:22 UTC - in response to Message 90532.  


Sorry to be a bit late on this, but I did notice around 13th March I had a task consuming 2.4Gb and 14Gb of my 16Gb (total) RAM being in use to run 8 tasks.

I can't recall the tasks involved. Right now I'm back to my more usual level of 7.74Gb in use


As of this AM I have 16gb on the ryzen and it's currently showing 81% free memory but that's with no Rosetta as no WUs have come down since early yesterday.

I'll monitor going forward and report back.
ID: 90535 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 90536 - Posted: 19 Mar 2019, 11:41:37 UTC - in response to Message 90535.  
Last modified: 19 Mar 2019, 11:45:37 UTC

As of this AM I have 16gb on the ryzen and it's currently showing 81% free memory but that's with no Rosetta as no WUs have come down since early yesterday.

Yes, we are back to 8086 tasks ready to send according to the server status page which actually means 0 tasks ready to send. Maybe the admins should investigate, what those 8086 tasks are and if they eventually cause the issues.
.
ID: 90536 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 23 · 24 · 25 · 26 · 27 · 28 · 29 . . . 306 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org