Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 181 · 182 · 183 · 184 · 185 · 186 · 187 . . . 309 · Next

AuthorMessage
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 105041 - Posted: 20 Feb 2022, 1:25:13 UTC

I have given up on manualy aborting tasks , it takes longer than `just leave them to it`
and any way , it don't like me any more . . .
20/02/2022 00:33:22 | Rosetta@home | This computer has finished a daily quota of 37 tasks
and so little time later
20/02/2022 01:13:43 | Rosetta@home | This computer has finished a daily quota of 1 tasks
and I have left it alone today .
never mind I will save on electric
ID: 105041 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,534,176
RAC: 10,708
Message 105044 - Posted: 20 Feb 2022, 3:51:47 UTC - in response to Message 104985.  

I haven't got my head around what the specific issue is with rb tasks crashing out, but I do see people reporting problems with those too.
If someone describes the rb issue to me in a way I can pass on, I'll follow up with that too.
Is it much the same as movingstubs tasks or are they crashing on Ubuntu as well?

I've finally found some RB tasks on this PC and, while they haven't finished, they're at a few hours in without crashing yet.
The problem ones you guys are posting are rb_02_14 or rb_02_16 and the ones I have are rb_02_18
If the project was shut down for a while and came back up shortly after, could it be a fix went in? I notice a half million drop in the queued jobs on the front page.

Or I'm speaking too soon. I'll see in 5hrs more time

I didn't get a chance to look at them but rb_02_18 tasks <seemed> to complete successfully.
Which would be great, except I haven't seen any further rb tasks to run since <sigh>

It looks like these movingstub tasks are going to be allowed to error out, seeing as they take up so very little user runtime. A project problem more than ours.
Without having had any reply, it may be they've taken the view that linux systems will continue to run them successfully and Windows systems will error out and they get what they get. That's happened before.
In the meantime, the queue that started off at 4.4m (inc 2.2m pythons) is now down to 2.8m (inc 2.2m pythons) so we're ~75% through them and it won't be too long before they're wiped out anyway.
I know that's not satisfactory, but it may be how this episode turns out
ID: 105044 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2002
Credit: 9,785,717
RAC: 5,211
Message 105048 - Posted: 20 Feb 2022, 9:56:06 UTC - in response to Message 105022.  

Warning! RAH is sending out moving stubs in bulk now. I just aborted over 130 of them in the last 5 minutes
Now setting RAH to no new tasks.
I have only 1 python running.

UNREAL!


Despite the twitt of Rosetta@Home account, i continue to receive "movingstubb" wus (and continue errors)
ID: 105048 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kissagogo27

Send message
Joined: 31 Mar 20
Posts: 86
Credit: 2,977,216
RAC: 1,905
Message 105049 - Posted: 20 Feb 2022, 10:16:25 UTC

it seems that the server side of Boinc has decided alone to use the 32Bits app with my W7 64Bits OS because of the hudge amount of bad tasks, an automatic process to eradicate a bad app ?

but 4.20 64b and 4.21 32b are not bads applications ...
ID: 105049 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105054 - Posted: 20 Feb 2022, 13:38:38 UTC - in response to Message 105048.  
Last modified: 20 Feb 2022, 13:39:42 UTC

Warning! RAH is sending out moving stubs in bulk now. I just aborted over 130 of them in the last 5 minutes
Now setting RAH to no new tasks.
I have only 1 python running.

UNREAL!


Despite the twitt of Rosetta@Home account, i continue to receive "movingstubb" wus (and continue errors)



One, because its the weekend and two as Sid pointed out a few posts below, it is possible that since the linux ones run just fine they will just let the windows ones go through and error out. After 2 or 3 errors then its a dead task.
Either they don't care or don't have time or understanding on how to purge these tasks.
ID: 105054 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,534,176
RAC: 10,708
Message 105056 - Posted: 20 Feb 2022, 15:29:04 UTC - in response to Message 105044.  
Last modified: 20 Feb 2022, 15:31:25 UTC

The problem ones you guys are posting are rb_02_14 or rb_02_16 and the ones I have are rb_02_18
If the project was shut down for a while and came back up shortly after, could it be a fix went in? I notice a half million drop in the queued jobs on the front page.

Or I'm speaking too soon. I'll see in 5hrs more time

I didn't get a chance to look at them but rb_02_18 tasks <seemed> to complete successfully.
Which would be great, except I haven't seen any further rb tasks to run since <sigh>

Seems like I forgot what PC they ran on. It was the PC I just left and the rb_02_18 tasks <did> run successfully.
Still haven't received any others

It looks like these movingstub tasks are going to be allowed to error out, seeing as they take up so very little user runtime. A project problem more than ours.
Without having had any reply, it may be they've taken the view that linux systems will continue to run them successfully and Windows systems will error out and they get what they get. That's happened before.
In the meantime, the queue that started off at 4.4m (inc 2.2m pythons) is now down to 2.8m (inc 2.2m pythons) so we're ~75% through them and it won't be too long before they're wiped out anyway.

Now down to 2.2m and none unsent, so looks like those left are all Python tasks. No more to come down - just returning 300k duff tasks (except for those running linux).
Until next time...
ID: 105056 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105057 - Posted: 20 Feb 2022, 16:23:16 UTC - in response to Message 105056.  
Last modified: 20 Feb 2022, 16:24:47 UTC

The problem ones you guys are posting are rb_02_14 or rb_02_16 and the ones I have are rb_02_18
If the project was shut down for a while and came back up shortly after, could it be a fix went in? I notice a half million drop in the queued jobs on the front page.

Or I'm speaking too soon. I'll see in 5hrs more time

I didn't get a chance to look at them but rb_02_18 tasks <seemed> to complete successfully.
Which would be great, except I haven't seen any further rb tasks to run since <sigh>

Seems like I forgot what PC they ran on. It was the PC I just left and the rb_02_18 tasks <did> run successfully.
Still haven't received any others

It looks like these movingstub tasks are going to be allowed to error out, seeing as they take up so very little user runtime. A project problem more than ours.
Without having had any reply, it may be they've taken the view that linux systems will continue to run them successfully and Windows systems will error out and they get what they get. That's happened before.
In the meantime, the queue that started off at 4.4m (inc 2.2m pythons) is now down to 2.8m (inc 2.2m pythons) so we're ~75% through them and it won't be too long before they're wiped out anyway.

Now down to 2.2m and none unsent, so looks like those left are all Python tasks. No more to come down - just returning 300k duff tasks (except for those running linux).
Until next time...


I killed the remaining 33 I had on my system. Good riddance!
Now the other projects need to get some work done so maybe later I get python back.

Oh..now I am insulted...the damn system knocked me off python for some reason (to many aborts of 4.2?) so I had to connect again. Can't they do anything right?
ID: 105057 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,534,176
RAC: 10,708
Message 105058 - Posted: 20 Feb 2022, 16:26:28 UTC - in response to Message 105054.  

Despite the twitt of Rosetta@Home account, I continue to receive "movingstubb" wus (and continue errors)

One, because it's the weekend and two as Sid pointed out a few posts below, it is possible that since the linux ones run just fine they will just let the windows ones go through and error out. After 2 or 3 errors then it's a dead task.
Either they don't care or don't have time or understanding on how to purge these tasks.

It doesn't seem to be a case of that imo.
It's not that there's anything wrong with the tasks - just that they won't run on Windows machines but will run on Linux.

So how can they select "bad" ones to delete? They can't.

The selection of "bad" tasks is very efficient.
Allow Windows machines to download them and they're all rejected after 20 seconds.
Linux machines run them successfully to completion.
A perfect solution. All the machines that can run them do. All the ones that can't, don't.

The price paid is in bandwidth of Windows users, that's true, but nothing more.
I don't personally consider people's frustration to be worth a second thought
ID: 105058 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105059 - Posted: 20 Feb 2022, 16:27:22 UTC
Last modified: 20 Feb 2022, 16:27:59 UTC

| Rosetta@home | Started download of AIMNet_minimization_python_project.py
AIMNet - Atoms In Molecules Neural Network Potential

This repository contains reference AIMNet implementation along with some examples and menchmarks

Accurate and transferable multitask prediction of chemical properties with an atoms-in-molecules neural network

Anyone got details on this?
I am just looking fast...not digging around yet.
ID: 105059 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 272
Credit: 507,897
RAC: 334
Message 105060 - Posted: 20 Feb 2022, 17:15:30 UTC - in response to Message 105059.  

Do you have problems with download speed?
My download speed is 595 KBps
ID: 105060 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,399,907
RAC: 19,807
Message 105063 - Posted: 20 Feb 2022, 18:18:07 UTC

Well that's the end of that batch, 50% or more of which were errored out.
What a waste.
Grant
Darwin NT
ID: 105063 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
entity

Send message
Joined: 8 May 18
Posts: 19
Credit: 6,123,514
RAC: 4,804
Message 105064 - Posted: 20 Feb 2022, 19:12:36 UTC

Are the vbox tasks limited as to how many can run concurrently. Can only get 17 to run at the same time. All others are in "waiting to run" status in BOINC. No app config file in the projects directory.
ID: 105064 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 272
Credit: 507,897
RAC: 334
Message 105065 - Posted: 20 Feb 2022, 19:18:14 UTC - in response to Message 105064.  
Last modified: 20 Feb 2022, 19:20:34 UTC

.Can you try to change use at most memory setting in computing preferences > disk and memory?
You have hidden your computers so i can't see them.
ID: 105065 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2002
Credit: 9,785,717
RAC: 5,211
Message 105066 - Posted: 20 Feb 2022, 19:21:20 UTC - in response to Message 105058.  

The selection of "bad" tasks is very efficient.

VERY efficient. It's up to the volunteers to kill the bad tasks. VERY efficient.


The price paid is in bandwidth of Windows users, that's true, but nothing more.
I don't personally consider people's frustration to be worth a second thought

Yep, 3 days of waste of time and bandwidth WITHOUT ANY APOLOGIES from project it's "nothing more".
Thank you for your considerations of volunteers
Are you speaking on behalf of the project administrators? If yes, it's very serious.
ID: 105066 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
entity

Send message
Joined: 8 May 18
Posts: 19
Credit: 6,123,514
RAC: 4,804
Message 105067 - Posted: 20 Feb 2022, 19:38:20 UTC - in response to Message 105065.  
Last modified: 20 Feb 2022, 19:41:21 UTC

.Can you try to change use at most memory setting in computing preferences > disk and memory?
You have hidden your computers so i can't see them.

Use at most setting for memory is set to 99% and 100% for the CPUs. Server has 128 threads and 256GB of memory yet only 17 tasks are running. No message in log indicating that BOINC is waiting for any resource. Boinc has copied the VDI file to 88 slots result in about 697GB of used disk space. Disk is a 900GB disk. Boinc told to leave 1% free as the most restrictive parameter.
ID: 105067 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105069 - Posted: 20 Feb 2022, 19:48:43 UTC - in response to Message 105067.  

.Can you try to change use at most memory setting in computing preferences > disk and memory?
You have hidden your computers so i can't see them.

Use at most setting for memory is set to 99% and 100% for the CPUs. Server has 128 threads and 256GB of memory yet only 17 tasks are running. No message in log indicating that BOINC is waiting for any resource. Boinc has copied the VDI file to 88 slots result in about 697GB of used disk space. Disk is a 900GB disk. Boinc told to leave 1% free as the most restrictive parameter.



What else is running on your system? How much memory debt do you have to other programs and OS requirements? In theory, with nothing else running, you should be able to run around 100 pythons, but maybe Vbox can't handle that? 88 slots, so you have 88 tasks downloaded, but can run only 17?


I'll leave this for the experts to work on...I think, but can not say for sure that Vbox might be the limiter.
ID: 105069 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 105070 - Posted: 20 Feb 2022, 19:49:23 UTC - in response to Message 105064.  

Are the vbox tasks limited as to how many can run concurrently. Can only get 17 to run at the same time. All others are in "waiting to run" status in BOINC. No app config file in the projects directory.

Each of them reserves a large amount of memory, almost 8 GB. Unless that amount of memory is free, they won't start, even if they would shift to using much less memory if they did start.
ID: 105070 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 272
Credit: 507,897
RAC: 334
Message 105071 - Posted: 20 Feb 2022, 19:51:05 UTC - in response to Message 105070.  

My two tasks allocate 6144
ID: 105071 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
entity

Send message
Joined: 8 May 18
Posts: 19
Credit: 6,123,514
RAC: 4,804
Message 105072 - Posted: 20 Feb 2022, 20:11:59 UTC - in response to Message 105070.  

Are the vbox tasks limited as to how many can run concurrently. Can only get 17 to run at the same time. All others are in "waiting to run" status in BOINC. No app config file in the projects directory.

Each of them reserves a large amount of memory, almost 8 GB. Unless that amount of memory is free, they won't start, even if they would shift to using much less memory if they did start.

There is 202GB of free memory as of this writing. BOINC client was not acting correctly so I restarted the client. It took almost 15 minutes for the client to restart the 17 VBoxHeadless processes. During the restart the client runs 100% busy and BOINCMgr is totally unresponsive. However, one you let the processes complete the startup and the boinc process drops back to under 5% utilization you can start more processes. Starting 10 processes causes the boinc process to jump back to 100% busy for about 10 minutes. Once the client drops back to 5% the tasks show as running. It seems to be related to BOINC and VBox. I/O is negligible during the starting of tasks. I think I can baby sit this thing and get it to where I want it.

Thanks for the insights.
ID: 105072 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,534,176
RAC: 10,708
Message 105074 - Posted: 20 Feb 2022, 20:37:00 UTC - in response to Message 105066.  

The selection of "bad" tasks is very efficient.

VERY efficient. It's up to the volunteers to kill the bad tasks. VERY efficient.

I'm a bit confused. All contributors offer their time and resources to the project.
So if it's for users (miilions) to sort through runnable and non-runnable tasks so the researchers (a handful) don't have to, that's conforms perfectly.

The price paid is in bandwidth of Windows users, that's true, but nothing more.
I don't personally consider people's frustration to be worth a second thought

Yep, 3 days of waste of time and bandwidth WITHOUT ANY APOLOGIES from project it's "nothing more".
Thank you for your considerations of volunteers
Are you speaking on behalf of the project administrators? If yes, it's very serious.


Lol, if I was I'd have been thrown out years ago.
It's pure pragmatism. If it wasn't for the bad news, there wouldn't be any news at all.
Sorry and thanks for your time and effort.
Bygones.
ID: 105074 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 181 · 182 · 183 · 184 · 185 · 186 · 187 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org