Problems and Technical Issues with Rosetta@home

Author	Message
crystalsys Send message Joined: 11 Aug 09 Posts: 11 Credit: 1,677,398 RAC: 0	Message 94632 - Posted: 16 Apr 2020, 19:58:29 UTC - in response to Message 94628. Last modified: 16 Apr 2020, 19:59:19 UTC OK, so I decided to take your advice, though I don't have any recent notices of a new version. In BOINC Manager (currently 7.14.2 x64) I clicked 'check for new version'. It came back and told me there wasn't one. I normally don't have the log window open, but I did because I was monitoring the stalled tasks. In THAT window, I got a message in RED saying there was a new version. Did the check again, NOW it says there is a new version. Hopefully they also fixed the bogus 'there is no newer version' message. Thanks! ID: 94632 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 29 Mar 20 Posts: 97 Credit: 343,893 RAC: 0	Message 94635 - Posted: 16 Apr 2020, 20:41:46 UTC - in response to Message 94632. That feature in the Manager does not work. You can check for the latest BOINC version on the BOINC download page. The latest is 7.16.5. https://boinc.berkeley.edu/download_all.php You can also restrict the applications you run by configuring a cc_config.xml file. https://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration ID: 94635 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1926 Credit: 18,534,891 RAC: 0	Message 94644 - Posted: 17 Apr 2020, 0:20:04 UTC - in response to Message 94612. A batch of WUs is not cancelled by the project when they have good reason to believe your attempt to crunch it will go better. They set up most WUs to do one additional try after a failure for the reason you mention, maybe the second attempt will go better. But, looking across the whole batch is the only way to make a decision about whether to withdraw the batch, and that has nothing to do with your current state on the WU. That's just it. As near as i can tell, that batch of WUs wasn't cancelled by the servers (i've actually processed 3 others that were resends, and one initial issue with no problems), that Task was sent out to see if it was dodgy or not. But then the Server cancelled it anyway without giving me a chance to even process it. minimum quorum 1 initial replication 1 max # of error/total/success tasks 1, 2, 1 errors Too many errors (may have bug) Too many total results WU cancelled Why would the server cancel that Task? Given the time and effort it takes to produce work, i'd have thought Tasks being cancelled before checking them out would be worth looking in to. Grant Darwin NT ID: 94644 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Jun 08 Posts: 1264 Credit: 14,421,737 RAC: 0	Message 94650 - Posted: 17 Apr 2020, 3:04:30 UTC - in response to Message 94644. Last modified: 17 Apr 2020, 3:10:15 UTC [snip] minimum quorum 1 initial replication 1 max # of error/total/success tasks 1, 2, 1 errors Too many errors (may have bug) Too many total results WU cancelled Why would the server cancel that Task? Given the time and effort it takes to produce work, i'd have thought Tasks being cancelled before checking them out would be worth looking in to. The previous failed attempt WAS enough checking it out. Tasks already downloaded are normally cancelled only if they haven't started. ID: 94650 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1926 Credit: 18,534,891 RAC: 0	Message 94652 - Posted: 17 Apr 2020, 3:50:54 UTC - in response to Message 94650. Last modified: 17 Apr 2020, 3:51:54 UTC [snip] minimum quorum 1 initial replication 1 max # of error/total/success tasks 1, 2, 1 errors Too many errors (may have bug) Too many total results WU cancelled Why would the server cancel that Task? Given the time and effort it takes to produce work, i'd have thought Tasks being cancelled before checking them out would be worth looking in to. The previous failed attempt WAS enough checking it out. Tasks already downloaded are normally cancelled only if they haven't started. If that was the case, why resend it? The whole point of resending something, is to check it out. If it doesn't need to be checked out, it doesn't need to be resent. And as i posted, i've done 3 others of that type that had errored on other systems, without them being cancelled. Grant Darwin NT ID: 94652 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 94676 - Posted: 17 Apr 2020, 13:32:27 UTC - in response to Message 94652. Human intervention is required to make the decision about whether there is a specific problem with one machine, or a more general problem with the WU batch. By the time the human had enough information to make that decision, some WUs of the batch were already out to a second host. Rosetta Moderator: Mod.Sense ID: 94676 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2544 Credit: 47,139,452 RAC: 1,732	Message 94727 - Posted: 18 Apr 2020, 4:02:48 UTC Not sure if I should report this as a problem, but... On an Android phone I'm running 4 tasks and have another (varying) 3 or 4 waiting to follow. I've been reporting and receiving more tasks regularly. All sounds good. Trouble is, the Server Status page has been reporting no tasks available to download for at least a day. And the number of in progress tasks has been reducing steadily until a few hours ago and currently reads nil. Right now I have 7. I've certainly received and reported tasks since both read nil. Not complaining, obviously. Just reporting ID: 94727 · Rating: 0 · rate: / Reply Quote

GoldenHat Send message Joined: 14 Apr 20 Posts: 3 Credit: 122,663 RAC: 0	Message 94754 - Posted: 18 Apr 2020, 11:26:17 UTC Hello, I'm a newbie to Rosetta and got things set up and running ok. In the last two days I've noticed my laptop running this app in an odd manner. Instead of running at 100% CPU, it fluctuates between 33% and 100%, my fan turns on and off each time yet I have the settings set as default - 100% CPU time. I'm concerned because 1) It's slower to process the data, 2) It's wearing out my PC and I'm inclined to delete the app from the computer if this continues. I have a Toshiba Qosmio i7 Quad-core with 8 logical processors. It runs the CPU, GPU 0 and GPU 1 at full capacity, with CPU speed at 3Ghz with a base speed of 2.4Ghz. Any ideas how I can get it running flat at 100% rather than this fluctuation? When I started it was fine, just in the last two days it's gone funky. Thanks, Richard. ID: 94754 · Rating: 0 · rate: / Reply Quote

Bryn Mawr Send message Joined: 26 Dec 18 Posts: 440 Credit: 15,220,300 RAC: 1,389	Message 94755 - Posted: 18 Apr 2020, 11:31:57 UTC - in response to Message 94754. Hello, I'm a newbie to Rosetta and got things set up and running ok. In the last two days I've noticed my laptop running this app in an odd manner. Instead of running at 100% CPU, it fluctuates between 33% and 100%, my fan turns on and off each time yet I have the settings set as default - 100% CPU time. I'm concerned because 1) It's slower to process the data, 2) It's wearing out my PC and I'm inclined to delete the app from the computer if this continues. I have a Toshiba Qosmio i7 Quad-core with 8 logical processors. It runs the CPU, GPU 0 and GPU 1 at full capacity, with CPU speed at 3Ghz with a base speed of 2.4Ghz. Any ideas how I can get it running flat at 100% rather than this fluctuation? When I started it was fine, just in the last two days it's gone funky. Thanks, Richard. What os are you running? Have you tried running the system monitor to see what processes are taking cpu time and maybe which processes are cutting in and out to cause the fluctuations? ID: 94755 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1926 Credit: 18,534,891 RAC: 0	Message 94756 - Posted: 18 Apr 2020, 11:54:35 UTC - in response to Message 94754. Any ideas how I can get it running flat at 100% rather than this fluctuation? When I started it was fine, just in the last two days it's gone funky. In the last couple of days you have picked up more work from Seti, Some was run on the iGPU, the rest is on the Nvdia GPU. And the problem with how it was before, is that the system was producing nothing but errors here at Rosetta. if you set your Computing preferences to the following, things should settle down. Less errors & less fans starting up & slowing down continually. Computing Usage limits Use at most 100% of the CPUs Use at most 100% of CPU time When to suspend Suspend when computer is on battery (selected) Suspend when computer is in use (not selected) Suspend GPU computing when computer is in use (not selected) 'In use' means mouse/keyboard input in last 3 minutes Suspend when no mouse/keyboard input in last --- minutes Suspend when non-BOINC CPU usage is above --- % Compute only between --- Other Store at least 1 days of work Store up to an additional 0.02 days of work Switch between tasks every 60 minutes Request tasks to checkpoint at most every 60 seconds Disk Use no more than 20 GB Leave at least 2 GB free Use no more than 60 % of total Memory When computer is in use, use at most 95 % When computer is not in use, use at most 95 % Leave non-GPU tasks in memory while suspended (not selected) Page/swap file: use at most 75 % Grant Darwin NT ID: 94756 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 94791 - Posted: 18 Apr 2020, 18:01:44 UTC What is your setting for "Suspend when non-BOINC CPU usage is above --- %"? Perhaps you have other tasks popping in and consuming CPU, which is causing BOINC to snooze. Especially if the value is at the default (25%?) it can be easy for the various other tasks to exceed that (briefly). I would set it to 75% of higher. Don't worry, the BOINC tasks still have low priority. If you are also running work on your GPU, keep in mind that CPU is still required to service the active work on the GPU. I believe many set things to use at most some % of CPUs that leaves one core free to service the GPU work. I don't run GPU work, perhaps someone could reply with details on how to set up that arrangement. Rosetta Moderator: Mod.Sense ID: 94791 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1926 Credit: 18,534,891 RAC: 0	Message 94804 - Posted: 19 Apr 2020, 0:02:39 UTC - in response to Message 94791. I believe many set things to use at most some % of CPUs that leaves one core free to service the GPU work. I don't run GPU work, perhaps someone could reply with details on how to set up that arrangement. I find it's best to reserve a core to support the GPU. If a GPU Task is running, it gets the CPU support it needs and it doesn't impact on the processing time of CPU Tasks that are running. If there is no GPU work being done, then the CPU core/thread is free to do CPU work. The app_config.xml file needs to go in to the Seti project folder. If installed on the C: drive C:/ProgramData BOINC/projects setiathome.berkeley.edu/app_coonfig.xml Make sure to use Notepad or similar to create or edit the file (NOT Word or Wordpad) <app_config> <app> <name>setiathome_v8</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>1.0</cpu_usage> </gpu_versions> </app> <app> <name>astropulse_v7</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>1.0</cpu_usage> </gpu_versions> </app> </app_config> Grant Darwin NT ID: 94804 · Rating: 0 · rate: / Reply Quote

Admin Project administrator Send message Joined: 1 Jul 05 Posts: 5146 Credit: 0 RAC: 0	Message 94807 - Posted: 19 Apr 2020, 0:23:59 UTC - in response to Message 94727. We are deprecating the 'rosetta_for_devices' app. The arm platforms have been added to the 'rosetta' application group. We will also be deprecating the minirosetta app and will soon have just the rosetta app. There are still some minirosetta jobs in our queue. ID: 94807 · Rating: 0 · rate: / Reply Quote

Nicholas Hathaway Send message Joined: 20 Nov 14 Posts: 6 Credit: 791,395 RAC: 0	Message 94808 - Posted: 19 Apr 2020, 1:16:49 UTC Hi I am repeatedly getting the following in my event loG: Sat Apr 18 07:39:31 2020 \| Rosetta@home \| Resetting project Sat Apr 18 23:48:22 2020 \| Rosetta@home \| Task hgfpsplit2_148_fold_SAVE_ALL_OUT_916496_89_0 exited with zero status but no 'finished' file Sat Apr 18 23:48:22 2020 \| Rosetta@home \| If this happens repeatedly you may need to reset the project. Sat Apr 18 23:53:46 2020 \| Rosetta@home \| Task hgfpsplit2_148_fold_SAVE_ALL_OUT_916496_89_0 exited with zero status but no 'finished' file Sat Apr 18 23:53:46 2020 \| Rosetta@home \| If this happens repeatedly you may need to reset the project. Sat Apr 18 23:54:40 2020 \| Rosetta@home \| Task hgfpsplit2_148_fold_SAVE_ALL_OUT_916496_89_0 exited with zero status but no 'finished' file Sat Apr 18 23:54:40 2020 \| Rosetta@home \| If this happens repeatedly you may need to reset the project. Sun Apr 19 00:46:30 2020 \| Rosetta@home \| Project requested delay of 7 seconds What do I need to do? ID: 94808 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1926 Credit: 18,534,891 RAC: 0	Message 94810 - Posted: 19 Apr 2020, 1:38:00 UTC - in response to Message 94808. Last modified: 19 Apr 2020, 1:43:43 UTC Hi I am repeatedly getting the following in my event loG: Sat Apr 18 07:39:31 2020 \| Rosetta@home \| Resetting project Sat Apr 18 23:48:22 2020 \| Rosetta@home \| Task hgfpsplit2_148_fold_SAVE_ALL_OUT_916496_89_0 exited with zero status but no 'finished' file Sat Apr 18 23:48:22 2020 \| Rosetta@home \| If this happens repeatedly you may need to reset the project. Sat Apr 18 23:53:46 2020 \| Rosetta@home \| Task hgfpsplit2_148_fold_SAVE_ALL_OUT_916496_89_0 exited with zero status but no 'finished' file Sat Apr 18 23:53:46 2020 \| Rosetta@home \| If this happens repeatedly you may need to reset the project. Sat Apr 18 23:54:40 2020 \| Rosetta@home \| Task hgfpsplit2_148_fold_SAVE_ALL_OUT_916496_89_0 exited with zero status but no 'finished' file Sat Apr 18 23:54:40 2020 \| Rosetta@home \| If this happens repeatedly you may need to reset the project. Sun Apr 19 00:46:30 2020 \| Rosetta@home \| Project requested delay of 7 seconds What do I need to do? How do your Computing preferences compare to these? Particularly the "When to suspend" settings. If they aren't a problem, then as it says in the Event log, you'll probably need to reset the project, but even then i may not fix the problem. It appears it's because the system is busy doing something else & BOINC can't communicate with the science application. Having "Use at most 100% of CPU time" less than 100% can cause it on some systems. As long as Tasks don't error out as a result, it's not a problem as such, but it does show some contention for resources on the system. You've also had a lot of Tasks miss the deadline, so a much smaller cache would be a good idea. Computing Usage limits Use at most 100% of the CPUs Use at most 100% of CPU time When to suspend Suspend when computer is on battery (not selected) Suspend when computer is in use (not selected) Suspend GPU computing when computer is in use (not selected) 'In use' means mouse/keyboard input in last 3 minutes Suspend when no mouse/keyboard input in last --- minutes Suspend when non-BOINC CPU usage is above --- % Compute only between --- Other Store at least 1 days of work Store up to an additional 0.02 days of work Switch between tasks every 60 minutes Request tasks to checkpoint at most every 60 seconds Disk Use no more than 20 GB Leave at least 2 GB free Use no more than 60 % of total Memory When computer is in use, use at most 95 % When computer is not in use, use at most 95 % Leave non-GPU tasks in memory while suspended (not selected) Page/swap file: use at most 75 % Grant Darwin NT ID: 94810 · Rating: 0 · rate: / Reply Quote

Nicholas Hathaway Send message Joined: 20 Nov 14 Posts: 6 Credit: 791,395 RAC: 0	Message 94812 - Posted: 19 Apr 2020, 1:46:09 UTC - in response to Message 94810. Computing preferences These settings apply to all computers using this account except computers where you have set preferences locally using the BOINC Manager Android devices Computing Usage limits Use at most 100 % of the CPUs Use at most 100 % of CPU time When to suspend Suspend when computer is on battery Suspend when computer is in use Suspend GPU computing when computer is in use 'In use' means mouse/keyboard input in last 3 minutes Suspend when no mouse/keyboard input in last --- minutes Suspend when non-BOINC CPU usage is above --- % Compute only between --- Other Store at least 0.5 days of work Store up to an additional 1 days of work Switch between tasks every 60 minutes Request tasks to checkpoint at most every 60 seconds Disk Use no more than --- GB Leave at least --- GB free Use no more than 90 % of total Memory When computer is in use, use at most 90 % When computer is not in use, use at most 90 % Leave non-GPU tasks in memory while suspended Page/swap file: use at most 75 % Network Usage limits Limit download rate to --- KB/second Limit upload rate to --- KB/second Limit usage to --- MB every --- days When to suspend Transfer files only between --- Other Skip data verification for image files Confirm before connecting to Internet Disconnect when done Edit preferences ID: 94812 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2544 Credit: 47,139,452 RAC: 1,732	Message 94815 - Posted: 19 Apr 2020, 2:03:04 UTC - in response to Message 94727. Not sure if I should report this as a problem, but... On an Android phone I'm running 4 tasks and have another (varying) 3 or 4 waiting to follow. I've been reporting and receiving more tasks regularly. All sounds good. Trouble is, the Server Status page has been reporting no tasks available to download for at least a day. And the number of in progress tasks has been reducing steadily until a few hours ago and currently reads nil. Right now I have 7. I've certainly received and reported tasks since both read nil. Not complaining, obviously. Just reporting Still reporting tasks and getting more 24hrs after 0 to send and 0 in progress. Am I one of the 7 who reported in the last 24hrs? We are deprecating the 'rosetta_for_devices' app. The arm platforms have been added to the 'rosetta' application group. We will also be deprecating the minirosetta app and will soon have just the rosetta app. There are still some minirosetta jobs in our queue. Oh, maybe this explains it. Thought it was weird. ID: 94815 · Rating: 0 · rate: / Reply Quote

MarkJ Send message Joined: 28 Mar 20 Posts: 72 Credit: 25,292,180 RAC: 4	Message 94837 - Posted: 19 Apr 2020, 7:35:19 UTC - in response to Message 94807. We are deprecating the 'rosetta_for_devices' app. The arm platforms have been added to the 'rosetta' application group. We will also be deprecating the minirosetta app and will soon have just the rosetta app. There are still some minirosetta jobs in our queue. Well that should reduce the development effort if you only have the one app (but multiple platforms). I still seem to be getting MiniRosetta and haven't cleared my cache (which is only 0.3 days) yet. BOINC blog ID: 94837 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1926 Credit: 18,534,891 RAC: 0	Message 94838 - Posted: 19 Apr 2020, 7:44:48 UTC - in response to Message 94837. We are deprecating the 'rosetta_for_devices' app. The arm platforms have been added to the 'rosetta' application group. We will also be deprecating the minirosetta app and will soon have just the rosetta app. There are still some minirosetta jobs in our queue. Well that should reduce the development effort if you only have the one app (but multiple platforms). I still seem to be getting MiniRosetta and haven't cleared my cache (which is only 0.3 days) yet. I figure there will be a few resends after the last of the initial Tasks have been sent out. But with the short deadlines & low replication, it should all be cleared up well within a week. Grant Darwin NT ID: 94838 · Rating: 0 · rate: / Reply Quote

zfp Send message Joined: 22 Mar 20 Posts: 1 Credit: 114,637 RAC: 0	Message 94840 - Posted: 19 Apr 2020, 7:57:36 UTC Hello, After a kernel update I restarted my system. It resulted in two of the task running at the time of the reboot to lose all progress after the reboot, then all running tasks at the time of the reboot to run for extremely long and all of them to exit with: Exit status 139 (0x0000008B) Unknown error code The output shows this: WARNING! cannot get file size for default.out.gz: could not open file. Output exists: default.out.gz Size: -1 https://boinc.bakerlab.org/rosetta/result.php?resultid=1152628078 https://boinc.bakerlab.org/rosetta/result.php?resultid=1152627950 https://boinc.bakerlab.org/rosetta/result.php?resultid=1152627464 https://boinc.bakerlab.org/rosetta/result.php?resultid=1152627790 ID: 94840 · Rating: 0 · rate: / Reply Quote