Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 186 · 187 · 188 · 189 · 190 · 191 · 192 . . . 309 · Next

AuthorMessage
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2002
Credit: 9,787,940
RAC: 5,329
Message 105310 - Posted: 2 Mar 2022, 11:20:54 UTC - in response to Message 105304.  

You don't because they don't answer....We are beginning to feel like leftovers after they got their neural network up and running.
That is when everyone disappeared or shortly after that.


They disappeared before the new Vm app.
It's YEARS that the communications of the project are gone
ID: 105310 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105311 - Posted: 2 Mar 2022, 20:03:39 UTC - in response to Message 105310.  

You don't because they don't answer....We are beginning to feel like leftovers after they got their neural network up and running.
That is when everyone disappeared or shortly after that.


They disappeared before the new Vm app.
It's YEARS that the communications of the project are gone



I'm talking neural network start up days.
ID: 105311 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2002
Credit: 9,787,940
RAC: 5,329
Message 105315 - Posted: 3 Mar 2022, 11:43:43 UTC - in response to Message 105311.  

I'm talking neural network start up days.


I know. I said that they don't answer not only about "neural network" app (and science) but they don't answer to anything since....i don't remember
ID: 105315 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ben W

Send message
Joined: 2 Mar 22
Posts: 2
Credit: 217
RAC: 0
Message 105316 - Posted: 3 Mar 2022, 17:34:54 UTC

no work units sent to my machines. joined a few days ago. why?
ID: 105316 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 272
Credit: 507,897
RAC: 334
Message 105317 - Posted: 3 Mar 2022, 17:42:50 UTC - in response to Message 105316.  

There are only workunits for virtualbox app
https://boinc.bakerlab.org/rosetta/server_status.php
ID: 105317 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tullio

Send message
Joined: 10 May 20
Posts: 63
Credit: 630,125
RAC: 0
Message 105318 - Posted: 3 Mar 2022, 17:44:15 UTC - in response to Message 105316.  

Did you install VirtualBox? rosetta python needs it.
Tullio
ID: 105318 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ben W

Send message
Joined: 2 Mar 22
Posts: 2
Credit: 217
RAC: 0
Message 105319 - Posted: 3 Mar 2022, 20:44:03 UTC - in response to Message 105318.  

thanks guys :)
ID: 105319 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105320 - Posted: 3 Mar 2022, 22:16:52 UTC - in response to Message 105315.  

I'm talking neural network start up days.


I know. I said that they don't answer not only about "neural network" app (and science) but they don't answer to anything since....i don't remember


2019 or 2020 depending on who or what is what I see.
Dr. B it looks like stopped in 19 and Admin. stopped in 20.
ID: 105320 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lundare

Send message
Joined: 12 Apr 21
Posts: 2
Credit: 822,019
RAC: 0
Message 105330 - Posted: 5 Mar 2022, 17:28:38 UTC

Hi there!
I am experiencing some issues on all tasks from Rosetta@home from the last like 10 months or so. They ran fine before that.

Issue 1
The tasks from Rosetta do run fine and quite fast for the first 85-90% and then completely slows down for the remainder on the task.
It slows down to the extent that my computers will never complete them and stops progressing at 98-99% completed, but the CPUs is still pegged at 100% load.
I have waited for some to complete for several days. But they will never get complete. This behaviour occupies the CPU cores for a small eternity and prevent other tasks from running.

Issue 2
Many tasks from Rosetta will be stuck and wait for memory for seemingly for ever. This despite I have 48Gb in both my computers and the memory is hardly used at all according to taskmanager. This behaviour seem to prevent all other tasks for other project to get access to memory or CPU.
So in the end I may only have 2 tasks running on a 12core/24thread system. That I don't like.

This however happens from time to time with other projects as well, but then all of my memory actually is used. That I can understand that these tasks will wait for memory to be freed up when other tasks gets completed. And when the memory-hogging tasks finnishes, the tasks that waited for memory will run.

I have tried reset, reboot, reinstall, remove and add again the project without luck.
Now I have stopped running Rosetta completely because I can hardly complete any task.

Thank in advance for help.

Have a nice day!
//Mattias

Specs for both computers:
Mac Pro 5,1
Dual Xeon X5680
48Gb RAM
SSD storage
AMD RX570 in both computers plus an extra RX560 in one
Latest Boinc and Virtual box and Mac OS Catalina with all patches.
ID: 105330 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 105331 - Posted: 5 Mar 2022, 21:33:19 UTC - in response to Message 105330.  

Hi there!
I am experiencing some issues on all tasks from Rosetta@home from the last like 10 months or so. They ran fine before that.

If you are running the pythons with VirtualBox, it looks like you are experiencing the "0 CPU" and "Vm job unmanageable" problems, both well known.

Search around on this thread, or related ones.
ID: 105331 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105332 - Posted: 5 Mar 2022, 23:33:42 UTC - in response to Message 105330.  

Let the work run out.
Then run a registry and disk cleaner on your system and make sure you can restart "fresh"
do a "repair" of BOINC just to make sure you have a "clean" working copy.
Do you shut your systems down at night or run 24/7?
Have you checked your VM manager for any dead tasks and removed them?

These are things I do with my windows system just to make sure I am on a "clean" machine and then I try again and if that does not work, then I go hunting for answers.

You might want to install BOINC tasks by Emfer so you can see your cpu usage per task and other good info, like memory usage physical and virtual. So if you see a CPU task running at say .10% and the run time is over 12 hours and your stalled at 97-99% and if your BOINC Tasks amount done (in 2 decimal places) runs at something like .05 for every two cycle updates, then you know its time to kill the tasks and not wait days to kill them.

If after the cleaning you still get stalls, then yeah, you have to look through here or try a google search that might link to a message. We discussed this last year sometime, that would be up to 10 pages in the past.
ID: 105332 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 272
Credit: 507,897
RAC: 334
Message 105333 - Posted: 5 Mar 2022, 23:39:42 UTC - in response to Message 105332.  

https://efmer.com/boinctasks/download-boinctasks/
ID: 105333 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lundare

Send message
Joined: 12 Apr 21
Posts: 2
Credit: 822,019
RAC: 0
Message 105339 - Posted: 6 Mar 2022, 9:45:26 UTC - in response to Message 105332.  

A clean install or not does not seem to do any different, since one of the computers has a just freshly installed Mac OS and software and the other install is about a year or so old.
Result is the same.

The computers is restarted once every week.

I will have to look in to BOINC tasks by Emfer.
ID: 105339 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105342 - Posted: 6 Mar 2022, 17:16:39 UTC - in response to Message 105339.  
Last modified: 6 Mar 2022, 17:31:05 UTC

A clean install or not does not seem to do any different, since one of the computers has a just freshly installed Mac OS and software and the other install is about a year or so old.
Result is the same.

The computers is restarted once every week.

I will have to look in to BOINC tasks by Emfer.



Boinc tasks will not change the way the projects run, but it will help you identify stalled tasks and let you see how the others are doing.

You checked you VM for dead unreachable tasks?

try running one batch of pythons and the ones that get stuck post a copy of the stderr file here, maybe someone will see something in the text that points to the problem.
ID: 105342 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
johndad5

Send message
Joined: 12 Aug 09
Posts: 7
Credit: 2,729,604
RAC: 0
Message 105358 - Posted: 8 Mar 2022, 5:52:04 UTC

Hello,

I am not getting any jobs at all. I checked my tasks and it says that 4 were not started on time. I looked further into one and it said the job was canceled by the user. I never canceled any tasks.



Task
click for details Computer Sent Time reported
or deadline
explain Status Run time
(sec) CPU time
(sec) Credit Application
1475804655 6163421 3 Mar 2022, 9:36:09 UTC 6 Mar 2022, 9:36:22 UTC Not started by deadline - canceled 0.00 0.00 --- rosetta python projects v1.03 (vbox64)
windows_x86_64
1476130321 3396392 6 Mar 2022, 12:33:33 UTC 7 Mar 2022, 11:24:39 UTC Error while computing 7,812.61 1,325.91 12.00 rosetta python projects v1.03 (vbox64)
windows_x86_64

ask 1475804655
Name aagb-ABU_pp-mNMPHE-GPN-ACHC12C_7_2575989_3_0
Workunit 1314564584
Created 2 Mar 2022, 23:58:43 UTC
Sent 3 Mar 2022, 9:36:09 UTC
Report deadline 6 Mar 2022, 9:36:09 UTC
Received 6 Mar 2022, 9:36:22 UTC
Server state Over
Outcome Computation error
Client state Aborted by user
Exit status 200 (0x000000C8) EXIT_UNSTARTED_LATE
Computer ID 6163421
Run time
CPU time
Validate state Invalid
Credit 0.00
Device peak FLOPS 3.53 GFLOPS
Application version rosetta python projects v1.03 (vbox64)
windows_x86_64
ID: 105358 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,408,362
RAC: 20,061
Message 105359 - Posted: 8 Mar 2022, 6:21:02 UTC - in response to Message 105358.  

I am not getting any jobs at all. I checked my tasks and it says that 4 were not started on time. I looked further into one and it said the job was canceled by the user. I never canceled any tasks.
With your computers hidden it's pretty much impossible to help without just taking wild guesses.

But the fact is from the Task you posted, you are missing deadlines, and for every error you get, the amount of work you can get per day is reduced until you start to return Valid work.
Set your cache to 0.01 days & 0.01 additional days then when you get some more work, you should be able to return it before the deadline passes.

Also on your computer's details page down near the bottom there should be a Skip or Accept button for Python work. If it says Accept, you need to click it to get more Python work. Too many errors, and you get blocked from getting more.
Grant
Darwin NT
ID: 105359 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,534,988
RAC: 10,560
Message 105362 - Posted: 8 Mar 2022, 12:41:24 UTC - in response to Message 105358.  

Hello,
I am not getting any jobs at all. I checked my tasks and it says that 4 were not started on time.
I looked further into one and it said the job was canceled by the user. I never canceled any tasks.

It actually gives 3 reasons for failing, 2 of them contradictory.

1 - Computation error, but no CPU time
2 - Aborted by user, but not aborted
3 - Much more likely (editing your quote for clarity)
1475804655 3 Mar 2022, 9:36:09 UTC 6 Mar 2022, 9:36:22 UTC Not started by deadline - canceled

Task 1475804655
Sent 3 Mar 2022, 9:36:09 UTC
Report deadline 6 Mar 2022, 9:36:09 UTC
Received 6 Mar 2022, 9:36:22 UTC
Exit status 200 (0x000000C8) EXIT_UNSTARTED_LATE

It looks like a task the Server thinks it sent but you never received. Maybe some blip during download.
Nothing you'd know about, nor can you do anything about even if you did know.

If you're not getting tasks, the final one of Grant's suggestions looks the likely solution
ID: 105362 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105365 - Posted: 8 Mar 2022, 20:05:03 UTC
Last modified: 8 Mar 2022, 20:06:47 UTC

You had 3 days to complete that task.
For some reason (system is to busy with other projects or a glitch) you missed the deadline, so the server pulled the task from your system.

From another project:

Outcome Computation error
Client state Aborted by user
Exit status 200 (0x000000C8) EXIT_UNSTARTED_LATE


Your BOINC client thought it was too late to start the task. I guess system clock glitched on your PC.

As for the Error while computing, we would need to see the STDERR output which you can retrieve from that tasks webpage at the bottom of the page.

Could be a bug in that task, or a problem with how the task interacts with your system or who knows what....

I would go with Grant's suggestion of .01 days and .01 days additional work or at max .25 and .25. See how things work with those settings.

You will either need to make your computers public or post the errors and the STDERR text so we can see whats going on.
ID: 105365 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
johndad5

Send message
Joined: 12 Aug 09
Posts: 7
Credit: 2,729,604
RAC: 0
Message 105371 - Posted: 9 Mar 2022, 6:03:03 UTC - in response to Message 105362.  
Last modified: 9 Mar 2022, 6:06:53 UTC

Thanks for your reply.

I unhid my computer. Sorry for the inconvenience.
ID: 105371 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,408,362
RAC: 20,061
Message 105372 - Posted: 9 Mar 2022, 6:39:25 UTC - in response to Message 105371.  

Thanks for your reply.

I unhid my computer. Sorry for the inconvenience.
From the looks of things it's pretty rare for any of your systems to complete Rosetta work in time. Most of it would be missing the deadline.

If you are doing only one project, that has long deadlines, and has frequent server issues or shortages of work then you need a cache. If you're running more than one project, then there's no need for a cache.
0.01 & 0.01 additional days would be best, 0.25 and 0.01 additional days if you really feel the need for some sort of cache.


It would be worth checking on the Details page for each of your systems to see if down the bottom there is an Allow or Skip button for Python tasks. If it says Allow, you need to click it to start getting work again- after setting your cache to something more reasonable.
Grant
Darwin NT
ID: 105372 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 186 · 187 · 188 · 189 · 190 · 191 · 192 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org