Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 30 · 31 · 32 · 33 · 34 · 35 · 36 . . . 309 · Next

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,526,036
RAC: 10,392
Message 92631 - Posted: 30 Mar 2020, 15:20:00 UTC - in response to Message 92629.  

Yeah, workunits aren't being generated, I think.
The queue was well over 1 million just yesterday.

There are still 1.5 million tasks showing as waiting to return, so it's mainly a matter of our task buffers running down, but new users will be met with no tasks to download at all which can be discouraging.

There are ways to extend the life of existing tasks, but better to wait and see if supply comes back on stream for a while first. Turnaround time remains important.
ID: 92631 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Germano_0x

Send message
Joined: 27 Dec 13
Posts: 3
Credit: 2,493,872
RAC: 0
Message 92632 - Posted: 30 Mar 2020, 15:33:07 UTC

I am not receiving any new working unit. What is going on?
ID: 92632 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 92633 - Posted: 30 Mar 2020, 15:40:27 UTC

More work will be coming soon. It looks like the queue of work is empty at the moment. I'm sure it won't be empty for long.
Rosetta Moderator: Mod.Sense
ID: 92633 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Millenium

Send message
Joined: 20 Sep 05
Posts: 68
Credit: 184,283
RAC: 0
Message 92635 - Posted: 30 Mar 2020, 16:05:49 UTC

We are crunching a lot of work!
ID: 92635 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
spRocket
Avatar

Send message
Joined: 23 Mar 20
Posts: 22
Credit: 3,008,018
RAC: 0
Message 92640 - Posted: 30 Mar 2020, 17:40:34 UTC

Looks like I found my answer as to why one more machine I managed to add only got a single task. :)
ID: 92640 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 354
Credit: 1,276,393
RAC: 2,018
Message 92646 - Posted: 30 Mar 2020, 19:32:38 UTC
Last modified: 30 Mar 2020, 19:34:10 UTC

Work is flowing again.
ID: 92646 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Millenium

Send message
Joined: 20 Sep 05
Posts: 68
Credit: 184,283
RAC: 0
Message 92647 - Posted: 30 Mar 2020, 19:48:49 UTC

Looks like meat's back on the menu, boys!
ID: 92647 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 354
Credit: 1,276,393
RAC: 2,018
Message 92648 - Posted: 30 Mar 2020, 19:54:03 UTC - in response to Message 92646.  

Work is flowing again.


Turns out they are just resends due to computation errors and 1 deadline miss, but the last queue update does show 13k tasks.
ID: 92648 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile yoerik
Avatar

Send message
Joined: 24 Mar 20
Posts: 128
Credit: 169,525
RAC: 0
Message 92654 - Posted: 30 Mar 2020, 20:44:40 UTC - in response to Message 92648.  

Work is flowing again.


Turns out they are just resends due to computation errors and 1 deadline miss, but the last queue update does show 13k tasks.


Or it's a ton of verification - not necessarily computation errors.
ID: 92654 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 354
Credit: 1,276,393
RAC: 2,018
Message 92655 - Posted: 30 Mar 2020, 20:46:05 UTC - in response to Message 92654.  

The wingman workunits ended with computation errors.
ID: 92655 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 92661 - Posted: 30 Mar 2020, 21:46:14 UTC - in response to Message 92620.  

Getting this again - Just a pre-emptive post
30/03/2020 14:28:05 | Rosetta@home | update requested by user
30/03/2020 14:28:09 | Rosetta@home | Sending scheduler request: Requested by user.
30/03/2020 14:28:09 | Rosetta@home | Requesting new tasks for CPU
30/03/2020 14:28:12 | Rosetta@home | Scheduler request completed: got 0 new tasks
30/03/2020 14:28:12 | Rosetta@home | No tasks sent

Task creation not meeting the very high demand again?


I like it when we can keep up with the workload. I enjoy trying to empty the Milkyway cache with my 5 GPUs. It's a pity I can't run Rosetta on them. 17 Teraflops would be good for Rosetta.
ID: 92661 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 92662 - Posted: 30 Mar 2020, 21:48:01 UTC - in response to Message 92647.  

Looks like meat's back on the menu, boys!


[crosses legs]
ID: 92662 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile nastasache

Send message
Joined: 24 Feb 07
Posts: 16
Credit: 171,383
RAC: 0
Message 92678 - Posted: 31 Mar 2020, 0:16:47 UTC

I have 2 tasks with Computation error resolution:

https://boinc.bakerlab.org/rosetta/result.php?resultid=1136321350
https://boinc.bakerlab.org/rosetta/result.php?resultid=1136320461

Both looks receiving out of memory at a point.
I have 64GB RAM; it's about Rosetta@home on 32bit or something?

Thanks,
Iulian
ID: 92678 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile karbonade
Avatar

Send message
Joined: 22 Mar 20
Posts: 3
Credit: 1,839,092
RAC: 10
Message 92680 - Posted: 31 Mar 2020, 0:45:42 UTC

Hi,
My PC crashed and I had to install the OS again (VM Ubuntu) and gave it the same name synstem name as the install before. After that I Insatlled Boinc and set up R@H.
But now I don't get any new tasks. I just keep getting the message: "Scheduler request completed: got 0 new tasks. No tasks sent"
On my other PC I still get new tasks sent.
Is it because the tasks that where there before the crash still are "open" ? Well, they will not be procecced by me....
I also used in Boinc the button " reset project" but that didn't help in any way.
Is there something else I have to do?
ID: 92680 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile yoerik
Avatar

Send message
Joined: 24 Mar 20
Posts: 128
Credit: 169,525
RAC: 0
Message 92683 - Posted: 31 Mar 2020, 0:53:10 UTC - in response to Message 92680.  

Hi,
My PC crashed and I had to install the OS again (VM Ubuntu) and gave it the same name synstem name as the install before. After that I Insatlled Boinc and set up R@H.
But now I don't get any new tasks. I just keep getting the message: "Scheduler request completed: got 0 new tasks. No tasks sent"
On my other PC I still get new tasks sent.
Is it because the tasks that where there before the crash still are "open" ? Well, they will not be procecced by me....
I also used in Boinc the button " reset project" but that didn't help in any way.
Is there something else I have to do?


It's not your end. Rosetta is out of WUs. The latest updates from a Project Administrator: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13533&postid=92681#92681
So don't worry. Most people aren't getting new WUs right now. The WUs you were using, were likely redeployed, or the system will take a few days to realize that you stopped working on those WUs.[/url]
ID: 92683 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MarkJ

Send message
Joined: 28 Mar 20
Posts: 72
Credit: 25,238,680
RAC: 0
Message 92685 - Posted: 31 Mar 2020, 1:13:51 UTC - in response to Message 92606.  

There are no silly questions. But HT is enabled. Here are the other BIOS options about performance (last word is the current setting):
Intel(R) Turbo Boost Technology Default - Enabled Enabled
ACPI SLIT Default - Enabled Enabled

Node Interleaving Default - Disabled Disabled
Intel NIC DMA Channels (IOAT) Default - Enabled Enabled
HW Prefetcher Default - Enabled Enabled
Adjacent Sector Prefetch Default - Enabled Enabled
DCU Stream Prefetcher Default - Enabled Enabled
DCU IP Prefetcher Default - Enabled Enabled
QPI Snoop Configuration Default - Home Snoop Home Snoop
QPI Home Snoop Optimization Default - Directory + OSB Enabled
QPI Bandwidth Optimization (RTID) Default - Balanced Balanced
Memory Proximity Reporting for I/O Default - Enabled Enabled
I/O Non-posted Prefetching Default - Enabled Enabled
NUMA Group Size Optimization Default - Clustered Clustered
Intel Performance Monitoring Support Default - Disabled Disabled

I would compare these settings with one of your other machines that is working properly.
Node Interleaving
NUMA group size

About the only other thing I can think of would be the windows version. Are you running Home, Pro or Enterprise on it? Is it the same as the other (working) machines?
ID: 92685 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 92687 - Posted: 31 Mar 2020, 1:16:17 UTC - in response to Message 92678.  

I have 2 tasks with Computation error resolution:

https://boinc.bakerlab.org/rosetta/result.php?resultid=1136321350
https://boinc.bakerlab.org/rosetta/result.php?resultid=1136320461

Both looks receiving out of memory at a point.
I have 64GB RAM; it's about Rosetta@home on 32bit or something?

Thanks,
Iulian

It looks likely that you have enough memory, but haven't given BOINC permission to use enough of it.

Try this:

If you are using the simple view, click on View near the top line, then Advance view....

Click on Projects, then Rosetta@home, then Your account.

Under Preferences, click on Computing preferences.

For each of the following sections, if they are present:
Primary (default) preferences
Separate preferences for home
Separate preferences for work
Separate preferences for school

Scroll down to Memory.

If the memory percentages are too low to allow 64 GB divided by the number of processors (12 in your case), to be at least 2 GB, then scroll down to Other and click on Edit preferences, then scroll down to Memory and increase the percentages, then scroll down to Other and click on Update preferences.

Click on the X at the top right corner of the Computing preferences window to close it.

Click on Projects, then Rosetta@home, then Update.

If you want to go back to the Simple view, click on View, then Simple view....
ID: 92687 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,386,302
RAC: 19,119
Message 92705 - Posted: 31 Mar 2020, 4:41:49 UTC
Last modified: 31 Mar 2020, 4:53:56 UTC

Can't upload anything at present- uploads going in to instant timeout.

31/03/2020 13:56:53 | Rosetta@home | Computation for task rb_03_21_19079_18887_ab_t000__robetta_cstwt_5.0_IGNORE_THE_REST_03_07_902861_8_1 finished
31/03/2020 13:56:54 | Rosetta@home | Starting task 1xl3zs2u_Junior_HalfRoid_design2_COVID-19_SAVE_ALL_OUT_904232_1_0
31/03/2020 13:56:55 | Rosetta@home | Started upload of rb_03_21_19079_18887_ab_t000__robetta_cstwt_5.0_IGNORE_THE_REST_03_07_902861_8_1_r402737099_0
31/03/2020 13:56:57 | Rosetta@home | [error] Error reported by file upload server: can't open log file '../log_bwsrv2/file_upload_handler.log' (errno: 9)
31/03/2020 13:56:57 | Rosetta@home | Temporarily failed upload of rb_03_21_19079_18887_ab_t000__robetta_cstwt_5.0_IGNORE_THE_REST_03_07_902861_8_1_r402737099_0: transient upload error
31/03/2020 13:56:57 | Rosetta@home | Backing off 00:03:13 on upload of rb_03_21_19079_18887_ab_t000__robetta_cstwt_5.0_IGNORE_THE_REST_03_07_902861_8_1_r402737099_0
31/03/2020 13:58:40 | Rosetta@home | Computation for task 0jb7gi3t_jhr_design1_COVID-19_SAVE_ALL_OUT_903456_1_0 finished
31/03/2020 13:58:41 | Rosetta@home | Starting task rb_03_29_19780_19680_ab_t000__robetta_IGNORE_THE_REST_05_08_904234_12_0
31/03/2020 13:58:43 | Rosetta@home | Started upload of 0jb7gi3t_jhr_design1_COVID-19_SAVE_ALL_OUT_903456_1_0_r1714937121_0
31/03/2020 13:58:45 | Rosetta@home | [error] Error reported by file upload server: can't open log file '../log_bwsrv2/file_upload_handler.log' (errno: 9)
31/03/2020 13:58:45 | Rosetta@home | Temporarily failed upload of 0jb7gi3t_jhr_design1_COVID-19_SAVE_ALL_OUT_903456_1_0_r1714937121_0: transient upload error



EDIT-
finally managed to get them to upload.
Grant
Darwin NT
ID: 92705 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,386,302
RAC: 19,119
Message 92706 - Posted: 31 Mar 2020, 4:44:18 UTC - in response to Message 92600.  
Last modified: 31 Mar 2020, 4:51:08 UTC

I really don't know what's wrong. I use the exact same setting on all my servers. As said, Gen8 servers, even with 64 cores are fully loaded. Gen9 servers only take half...
Any luck with this problem?
Not sure if you answered one of my earlier questions- "Are all of the threads in use on just the one CPU?"
If you run another CPU intensive programme, does it use one of the other unused threads, or does it end up on the threads presently in use?

Win Sever 2016- could it be a licencing issue? Licence is expired/no longer valid, so only 1 socket usable, even though both CPUs are detected & recognised by the OS? (never had to deal with socket/core/thread licencing myself).
Grant
Darwin NT
ID: 92706 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile yoerik
Avatar

Send message
Joined: 24 Mar 20
Posts: 128
Credit: 169,525
RAC: 0
Message 92707 - Posted: 31 Mar 2020, 4:52:49 UTC - in response to Message 92705.  

Can't upload anything at present- uploads going in to instant timeout.

31/03/2020 13:56:53 | Rosetta@home | Computation for task rb_03_21_19079_18887_ab_t000__robetta_cstwt_5.0_IGNORE_THE_REST_03_07_902861_8_1 finished
31/03/2020 13:56:54 | Rosetta@home | Starting task 1xl3zs2u_Junior_HalfRoid_design2_COVID-19_SAVE_ALL_OUT_904232_1_0
31/03/2020 13:56:55 | Rosetta@home | Started upload of rb_03_21_19079_18887_ab_t000__robetta_cstwt_5.0_IGNORE_THE_REST_03_07_902861_8_1_r402737099_0
31/03/2020 13:56:57 | Rosetta@home | [error] Error reported by file upload server: can't open log file '../log_bwsrv2/file_upload_handler.log' (errno: 9)
31/03/2020 13:56:57 | Rosetta@home | Temporarily failed upload of rb_03_21_19079_18887_ab_t000__robetta_cstwt_5.0_IGNORE_THE_REST_03_07_902861_8_1_r402737099_0: transient upload error
31/03/2020 13:56:57 | Rosetta@home | Backing off 00:03:13 on upload of rb_03_21_19079_18887_ab_t000__robetta_cstwt_5.0_IGNORE_THE_REST_03_07_902861_8_1_r402737099_0
31/03/2020 13:58:40 | Rosetta@home | Computation for task 0jb7gi3t_jhr_design1_COVID-19_SAVE_ALL_OUT_903456_1_0 finished
31/03/2020 13:58:41 | Rosetta@home | Starting task rb_03_29_19780_19680_ab_t000__robetta_IGNORE_THE_REST_05_08_904234_12_0
31/03/2020 13:58:43 | Rosetta@home | Started upload of 0jb7gi3t_jhr_design1_COVID-19_SAVE_ALL_OUT_903456_1_0_r1714937121_0
31/03/2020 13:58:45 | Rosetta@home | [error] Error reported by file upload server: can't open log file '../log_bwsrv2/file_upload_handler.log' (errno: 9)
31/03/2020 13:58:45 | Rosetta@home | Temporarily failed upload of 0jb7gi3t_jhr_design1_COVID-19_SAVE_ALL_OUT_903456_1_0_r1714937121_0: transient upload error


My PC is having trouble transferring files as well. The server on their end is probably overwhelmed by the number of WUs out in the wild, trying to upload.
ID: 92707 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 30 · 31 · 32 · 33 · 34 · 35 · 36 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org