Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 186 · 187 · 188 · 189 · 190 · 191 · 192 . . . 309 · Next
Author | Message |
---|---|
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2002 Credit: 9,785,717 RAC: 5,211 |
You don't because they don't answer....We are beginning to feel like leftovers after they got their neural network up and running. They disappeared before the new Vm app. It's YEARS that the communications of the project are gone |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
You don't because they don't answer....We are beginning to feel like leftovers after they got their neural network up and running. I'm talking neural network start up days. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2002 Credit: 9,785,717 RAC: 5,211 |
I'm talking neural network start up days. I know. I said that they don't answer not only about "neural network" app (and science) but they don't answer to anything since....i don't remember |
Ben W Send message Joined: 2 Mar 22 Posts: 2 Credit: 217 RAC: 0 |
no work units sent to my machines. joined a few days ago. why? |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 272 Credit: 507,897 RAC: 334 |
There are only workunits for virtualbox app https://boinc.bakerlab.org/rosetta/server_status.php |
tullio Send message Joined: 10 May 20 Posts: 63 Credit: 630,125 RAC: 0 |
Did you install VirtualBox? rosetta python needs it. Tullio |
Ben W Send message Joined: 2 Mar 22 Posts: 2 Credit: 217 RAC: 0 |
thanks guys :) |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I'm talking neural network start up days. 2019 or 2020 depending on who or what is what I see. Dr. B it looks like stopped in 19 and Admin. stopped in 20. |
lundare Send message Joined: 12 Apr 21 Posts: 2 Credit: 822,019 RAC: 0 |
Hi there! I am experiencing some issues on all tasks from Rosetta@home from the last like 10 months or so. They ran fine before that. Issue 1 The tasks from Rosetta do run fine and quite fast for the first 85-90% and then completely slows down for the remainder on the task. It slows down to the extent that my computers will never complete them and stops progressing at 98-99% completed, but the CPUs is still pegged at 100% load. I have waited for some to complete for several days. But they will never get complete. This behaviour occupies the CPU cores for a small eternity and prevent other tasks from running. Issue 2 Many tasks from Rosetta will be stuck and wait for memory for seemingly for ever. This despite I have 48Gb in both my computers and the memory is hardly used at all according to taskmanager. This behaviour seem to prevent all other tasks for other project to get access to memory or CPU. So in the end I may only have 2 tasks running on a 12core/24thread system. That I don't like. This however happens from time to time with other projects as well, but then all of my memory actually is used. That I can understand that these tasks will wait for memory to be freed up when other tasks gets completed. And when the memory-hogging tasks finnishes, the tasks that waited for memory will run. I have tried reset, reboot, reinstall, remove and add again the project without luck. Now I have stopped running Rosetta completely because I can hardly complete any task. Thank in advance for help. Have a nice day! //Mattias Specs for both computers: Mac Pro 5,1 Dual Xeon X5680 48Gb RAM SSD storage AMD RX570 in both computers plus an extra RX560 in one Latest Boinc and Virtual box and Mac OS Catalina with all patches. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Hi there! If you are running the pythons with VirtualBox, it looks like you are experiencing the "0 CPU" and "Vm job unmanageable" problems, both well known. Search around on this thread, or related ones. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Let the work run out. Then run a registry and disk cleaner on your system and make sure you can restart "fresh" do a "repair" of BOINC just to make sure you have a "clean" working copy. Do you shut your systems down at night or run 24/7? Have you checked your VM manager for any dead tasks and removed them? These are things I do with my windows system just to make sure I am on a "clean" machine and then I try again and if that does not work, then I go hunting for answers. You might want to install BOINC tasks by Emfer so you can see your cpu usage per task and other good info, like memory usage physical and virtual. So if you see a CPU task running at say .10% and the run time is over 12 hours and your stalled at 97-99% and if your BOINC Tasks amount done (in 2 decimal places) runs at something like .05 for every two cycle updates, then you know its time to kill the tasks and not wait days to kill them. If after the cleaning you still get stalls, then yeah, you have to look through here or try a google search that might link to a message. We discussed this last year sometime, that would be up to 10 pages in the past. |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 272 Credit: 507,897 RAC: 334 |
https://efmer.com/boinctasks/download-boinctasks/ |
lundare Send message Joined: 12 Apr 21 Posts: 2 Credit: 822,019 RAC: 0 |
A clean install or not does not seem to do any different, since one of the computers has a just freshly installed Mac OS and software and the other install is about a year or so old. Result is the same. The computers is restarted once every week. I will have to look in to BOINC tasks by Emfer. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
A clean install or not does not seem to do any different, since one of the computers has a just freshly installed Mac OS and software and the other install is about a year or so old. Boinc tasks will not change the way the projects run, but it will help you identify stalled tasks and let you see how the others are doing. You checked you VM for dead unreachable tasks? try running one batch of pythons and the ones that get stuck post a copy of the stderr file here, maybe someone will see something in the text that points to the problem. |
johndad5 Send message Joined: 12 Aug 09 Posts: 7 Credit: 2,729,604 RAC: 0 |
Hello, I am not getting any jobs at all. I checked my tasks and it says that 4 were not started on time. I looked further into one and it said the job was canceled by the user. I never canceled any tasks. Task click for details Computer Sent Time reported or deadline explain Status Run time (sec) CPU time (sec) Credit Application 1475804655 6163421 3 Mar 2022, 9:36:09 UTC 6 Mar 2022, 9:36:22 UTC Not started by deadline - canceled 0.00 0.00 --- rosetta python projects v1.03 (vbox64) windows_x86_64 1476130321 3396392 6 Mar 2022, 12:33:33 UTC 7 Mar 2022, 11:24:39 UTC Error while computing 7,812.61 1,325.91 12.00 rosetta python projects v1.03 (vbox64) windows_x86_64 ask 1475804655 Name aagb-ABU_pp-mNMPHE-GPN-ACHC12C_7_2575989_3_0 Workunit 1314564584 Created 2 Mar 2022, 23:58:43 UTC Sent 3 Mar 2022, 9:36:09 UTC Report deadline 6 Mar 2022, 9:36:09 UTC Received 6 Mar 2022, 9:36:22 UTC Server state Over Outcome Computation error Client state Aborted by user Exit status 200 (0x000000C8) EXIT_UNSTARTED_LATE Computer ID 6163421 Run time CPU time Validate state Invalid Credit 0.00 Device peak FLOPS 3.53 GFLOPS Application version rosetta python projects v1.03 (vbox64) windows_x86_64 |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,399,907 RAC: 19,807 |
I am not getting any jobs at all. I checked my tasks and it says that 4 were not started on time. I looked further into one and it said the job was canceled by the user. I never canceled any tasks.With your computers hidden it's pretty much impossible to help without just taking wild guesses. But the fact is from the Task you posted, you are missing deadlines, and for every error you get, the amount of work you can get per day is reduced until you start to return Valid work. Set your cache to 0.01 days & 0.01 additional days then when you get some more work, you should be able to return it before the deadline passes. Also on your computer's details page down near the bottom there should be a Skip or Accept button for Python work. If it says Accept, you need to click it to get more Python work. Too many errors, and you get blocked from getting more. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,534,176 RAC: 10,708 |
Hello, It actually gives 3 reasons for failing, 2 of them contradictory. 1 - Computation error, but no CPU time 2 - Aborted by user, but not aborted 3 - Much more likely (editing your quote for clarity) 1475804655 3 Mar 2022, 9:36:09 UTC 6 Mar 2022, 9:36:22 UTC Not started by deadline - canceled It looks like a task the Server thinks it sent but you never received. Maybe some blip during download. Nothing you'd know about, nor can you do anything about even if you did know. If you're not getting tasks, the final one of Grant's suggestions looks the likely solution |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
You had 3 days to complete that task. For some reason (system is to busy with other projects or a glitch) you missed the deadline, so the server pulled the task from your system. From another project: Outcome Computation error Client state Aborted by user Exit status 200 (0x000000C8) EXIT_UNSTARTED_LATE Your BOINC client thought it was too late to start the task. I guess system clock glitched on your PC. As for the Error while computing, we would need to see the STDERR output which you can retrieve from that tasks webpage at the bottom of the page. Could be a bug in that task, or a problem with how the task interacts with your system or who knows what.... I would go with Grant's suggestion of .01 days and .01 days additional work or at max .25 and .25. See how things work with those settings. You will either need to make your computers public or post the errors and the STDERR text so we can see whats going on. |
johndad5 Send message Joined: 12 Aug 09 Posts: 7 Credit: 2,729,604 RAC: 0 |
Thanks for your reply. I unhid my computer. Sorry for the inconvenience. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,399,907 RAC: 19,807 |
Thanks for your reply.From the looks of things it's pretty rare for any of your systems to complete Rosetta work in time. Most of it would be missing the deadline. If you are doing only one project, that has long deadlines, and has frequent server issues or shortages of work then you need a cache. If you're running more than one project, then there's no need for a cache. 0.01 & 0.01 additional days would be best, 0.25 and 0.01 additional days if you really feel the need for some sort of cache. It would be worth checking on the Details page for each of your systems to see if down the bottom there is an Allow or Skip button for Python tasks. If it says Allow, you need to click it to start getting work again- after setting your cache to something more reasonable. Grant Darwin NT |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org