Problems and Technical Issues with Rosetta@home

Author	Message
iancantwell Send message Joined: 7 Jul 13 Posts: 4 Credit: 398,080 RAC: 0	Message 79984 - Posted: 30 Apr 2016, 14:39:29 UTC According to my event log: Task 06_optimize0001_fold_SAVE_ALL_OUT_344614_3237_2 exited with zero status but no 'finished' file. Another message says that "if this happens repeatedly you may need to reset the project". As I haven't had this problem before it maybe that the task is somehow corrupt and should be withdrawn for analysis ID: 79984 · Rating: 0 · rate: /

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2204 Credit: 13,720,774 RAC: 8	Message 79989 - Posted: 1 May 2016, 8:04:50 UTC - in response to Message 79979. 30.04.2016 11:19:58 \| rosetta@home \| Scheduler request completed: got 0 new tasks 30.04.2016 11:19:58 \| rosetta@home \| No work sent 30.04.2016 11:19:58 \| rosetta@home \| Rosetta Mini for Android is not available for your type of computer. It's time to update the scheduler (and the server code). But i think that during CASP it is impossible ID: 79989 · Rating: 0 · rate: /

Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0	Message 79991 - Posted: 1 May 2016, 8:54:28 UTC - in response to Message 79977. Last modified: 1 May 2016, 9:03:01 UTC What that means for me is that when I know I won't be available to manually push reports at least once during the day, I essentially end up having to shift to another project for those systems. That's what I do during a vacation away from the systems. You could simply increase your cache to 4-6 days, or even better, set up the other project as a backup project. When away on vacation, it's better to have more than one project active, so in case your main project runs out of work, your computer has some other projects to choose from. . ID: 79991 · Rating: 0 · rate: /

BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,845,917 RAC: 0	Message 79996 - Posted: 1 May 2016, 18:34:26 UTC - in response to Message 79991. Thanks for the reply It is an option -- but there are times (vacations) when I'm away for a week or more and that would not work well for that. The thing is, I believe there is a problem a bit further up the chain than simply at my workstations since this behavior is Rosetta project specific. Further, my project balance is a bit more balanced (sometimes with more than one CPU project in the mix) than simply running a back up project in place would resolve even as a work around. What that means for me is that when I know I won't be available to manually push reports at least once during the day, I essentially end up having to shift to another project for those systems. That's what I do during a vacation away from the systems. You could simply increase your cache to 4-6 days, or even better, set up the other project as a backup project. When away on vacation, it's better to have more than one project active, so in case your main project runs out of work, your computer has some other projects to choose from. ID: 79996 · Rating: 0 · rate: /

Sid Celery Send message Joined: 11 Feb 08 Posts: 2591 Credit: 47,220,881 RAC: 1	Message 79998 - Posted: 2 May 2016, 1:20:40 UTC Looking at something else, I glanced at the number of active users on Rosetta. On March 3rd it was 79,000 users In May 2nd it's now 142,000 The number of new hosts at the start of April has consistently been over 1,000 and up to 3,900 each day. This is a lot Rosetta Users Overview ID: 79998 · Rating: 0 · rate: /

BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,845,917 RAC: 0	Message 80001 - Posted: 2 May 2016, 21:56:42 UTC - in response to Message 79996. I appreciate the suggestions for work arounds -- I do wonder though about what specifically is going on at the server side which might be causing this Rosetta specific problem for me. Thanks for the reply It is an option -- but there are times (vacations) when I'm away for a week or more and that would not work well for that. The thing is, I believe there is a problem a bit further up the chain than simply at my workstations since this behavior is Rosetta project specific. Further, my project balance is a bit more balanced (sometimes with more than one CPU project in the mix) than simply running a back up project in place would resolve even as a work around. What that means for me is that when I know I won't be available to manually push reports at least once during the day, I essentially end up having to shift to another project for those systems. That's what I do during a vacation away from the systems. You could simply increase your cache to 4-6 days, or even better, set up the other project as a backup project. When away on vacation, it's better to have more than one project active, so in case your main project runs out of work, your computer has some other projects to choose from. ID: 80001 · Rating: 0 · rate: /

ThrowerGB Send message Joined: 4 Dec 05 Posts: 3 Credit: 12,259,708 RAC: 0	Message 80017 - Posted: 4 May 2016, 15:28:40 UTC Last modified: 4 May 2016, 15:32:19 UTC I'm not getting new tasks either. I've been getting the same message about Rosetta mini for Android. Yet I'm running on OSX. I've just removed BOINC from my system including data files and reloaded BOINC. The file below shows the log. In addition to Rosetta, I'm running Seti@home. -------------- Wed May 4 11:10:24 2016 \| \| Starting BOINC client version 7.6.22 for x86_64-apple-darwin Wed May 4 11:10:24 2016 \| \| log flags: file_xfer, sched_ops, task Wed May 4 11:10:24 2016 \| \| Libraries: libcurl/7.39.0 OpenSSL/1.0.1j zlib/1.2.5 c-ares/1.10.0 Wed May 4 11:10:24 2016 \| \| Data directory: /Library/Application Support/BOINC Data Wed May 4 11:10:24 2016 \| \| CUDA: NVIDIA GPU 0: GeForce GTX 675MX (driver version 7.5.26, CUDA version 7.5, compute capability 3.0, 1024MB, 3MB available, 1933 GFLOPS peak) Wed May 4 11:10:24 2016 \| \| OpenCL: NVIDIA GPU 0: GeForce GTX 675MX (driver version 10.10.5.2 310.42.25f01, device version OpenCL 1.2, 1024MB, 3MB available, 1933 GFLOPS peak) Wed May 4 11:10:24 2016 \| \| OpenCL CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (OpenCL driver vendor: Apple, driver version 1.1, device version OpenCL 1.2) Wed May 4 11:10:25 2016 \| \| Host name: iMac.home Wed May 4 11:10:25 2016 \| \| Processor: 8 GenuineIntel Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz [x86 Family 6 Model 58 Stepping 9] Wed May 4 11:10:25 2016 \| \| Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clfsh ds acpi mmx fxsr sse sse2 ss htt tm pbe pni pclmulqdq dtes64 mon dscpl vmx smx est tm2 ssse3 cx16 tpr pdcm sse4_1 sse4_2 x2apic popcnt aes pcid xsave osxsave tsctmr avx rdrand f16c Wed May 4 11:10:25 2016 \| \| OS: Mac OS X 10.11.4 (Darwin 15.4.0) Wed May 4 11:10:25 2016 \| \| Memory: 16.00 GB physical, 559.94 GB virtual Wed May 4 11:10:25 2016 \| \| Disk: 1.01 TB total, 559.70 GB free Wed May 4 11:10:25 2016 \| \| Local time is UTC -4 hours Wed May 4 11:10:25 2016 \| rosetta@home \| URL https://boinc.bakerlab.org/rosetta/; Computer ID 1528837; resource share 48 Wed May 4 11:10:25 2016 \| SETI@home \| URL http://setiathome.berkeley.edu/; Computer ID 6107788; resource share 47 Wed May 4 11:10:25 2016 \| rosetta@home \| General prefs: from rosetta@home (last modified 01-May-2016 17:00:45) Wed May 4 11:10:25 2016 \| rosetta@home \| Computer location: home Wed May 4 11:10:25 2016 \| rosetta@home \| General prefs: no separate prefs for home; using your defaults Wed May 4 11:10:25 2016 \| \| Reading preferences override file Wed May 4 11:10:25 2016 \| \| Preferences: Wed May 4 11:10:25 2016 \| \| max memory usage when active: 13107.20MB Wed May 4 11:10:25 2016 \| \| max memory usage when idle: 15728.64MB Wed May 4 11:10:25 2016 \| \| max disk usage: 100.00GB Wed May 4 11:10:25 2016 \| \| (to change preferences, visit a project web site or select Preferences in the Manager) Wed May 4 11:10:25 2016 \| rosetta@home \| Sending scheduler request: To fetch work. Wed May 4 11:10:25 2016 \| rosetta@home \| Requesting new tasks for CPU and NVIDIA GPU Wed May 4 11:10:26 2016 \| rosetta@home \| Scheduler request completed: got 0 new tasks Wed May 4 11:10:26 2016 \| rosetta@home \| No work sent Wed May 4 11:10:26 2016 \| rosetta@home \| Rosetta Mini for Android is not available for your type of computer. ID: 80017 · Rating: 0 · rate: /

googloo Send message Joined: 15 Sep 06 Posts: 137 Credit: 24,022,414 RAC: 0	Message 80018 - Posted: 4 May 2016, 15:45:48 UTC Two (and maybe more) of my posts have disappeared. What's up? I repeat: the problem with "Rosetta Mini for Android is not available for your type of computer" appears to be the 24-hour back off period that results. Can this be adjusted? ID: 80018 · Rating: 0 · rate: /

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 80020 - Posted: 4 May 2016, 16:47:17 UTC - in response to Message 80018. Two (and maybe more) of my posts have disappeared. What's up? I repeat: the problem with "Rosetta Mini for Android is not available for your type of computer" appears to be the 24-hour back off period that results. Can this be adjusted? Your posts are over in the other thread It's a BOINC question, not R@h. I am not aware of any means of tailoring the backoff behavior of BOINC Manager. Rosetta Moderator: Mod.Sense ID: 80020 · Rating: 0 · rate: /

googloo Send message Joined: 15 Sep 06 Posts: 137 Credit: 24,022,414 RAC: 0	Message 80022 - Posted: 4 May 2016, 20:38:54 UTC - in response to Message 80020. Two (and maybe more) of my posts have disappeared. What's up? I repeat: the problem with "Rosetta Mini for Android is not available for your type of computer" appears to be the 24-hour back off period that results. Can this be adjusted? Your posts are over in the other thread It's a BOINC question, not R@h. I am not aware of any means of tailoring the backoff behavior of BOINC Manager. Sorry for my confusion, and thanks for the answer. ID: 80022 · Rating: 0 · rate: /

danosavi Send message Joined: 20 Apr 07 Posts: 1 Credit: 58,586 RAC: 0	Message 80028 - Posted: 6 May 2016, 7:10:40 UTC Hello, this morning I've noticed a sudden decrease in task count of all my connected computers in the computer list of my account. Machine and total credits are correct, it's just the number of tasks that decreased. Is that a bug or something else? Thanks. ID: 80028 · Rating: 0 · rate: /

Dr. Merkwürdigliebe Send message Joined: 5 Dec 10 Posts: 81 Credit: 2,657,273 RAC: 0	Message 80033 - Posted: 6 May 2016, 14:28:07 UTC Occasionally I click on the "Show graphics" button in the boinc client to see the proteins spin. Sometimes I close the window before the animation starts and the window is gone - but not from memory. ps aux \| grep defunc merkwuerdig 4889 0.0 0.0 0 0 ? Z 16:16 0:00 [minirosetta_gra] <defunct> I can't kill those zombie processes. Not even with kill -9. ID: 80033 · Rating: 0 · rate: /

BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,845,917 RAC: 0	Message 80035 - Posted: 6 May 2016, 17:49:28 UTC - in response to Message 80020. Not directly related to the Android work units, but I do wonder about the answer suggesting that it is a BOINC question regarding back off behavior. The reason for my follow up here is that while in Rosetta, the back-off period of 24 hours when a (perhaps incorrect) 'server not responding' report happens on my workstation when reporting results, is that while other projects go to a progressive back off cycle -- approximately 1 hour, then 2 hours, then 3 hours, then 4 hours, then recycle to a 1 hour back off, Rosetta, and ONLY Rosetta, goes immediately to a 24 hour back off cycle. I'd note further, that every time I go to a workstation at any stage of the 24 hour back off cycle and manually push an update, the update goes through. So I remain a bit bewildered here at what from my experience with a number of workstations certainly appears to be Rosetta specific behavior. Since I run multiple projects, and have encountered this over the past month or more, I have ended up shifting a bit over to other projects which do not exhibit what certainly appears as a Rosetta specific behavior, particularly when I anticipate leaving a workstation in an unattended mode. Two (and maybe more) of my posts have disappeared. What's up? I repeat: the problem with "Rosetta Mini for Android is not available for your type of computer" appears to be the 24-hour back off period that results. Can this be adjusted? Your posts are over in the other thread It's a BOINC question, not R@h. I am not aware of any means of tailoring the backoff behavior of BOINC Manager. ID: 80035 · Rating: 0 · rate: /

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 80036 - Posted: 6 May 2016, 19:18:09 UTC I see what you mean Barry. So I did some searching. Found this description of Per-processor-type backoff. It seems to indicate that the backoff is determined by the client's tracking of schedule requests for each project and processor type. From this I conclude that there must be periods of time when client machines can fail to get work on several scheduler requests in a row, and this leads to 24hr backoff. It is hard to catch a server status snapshot showing the server is out of work. But my theory is that there are points within the 10 minute interval where the jobs that generate the tasks from the queue of work are unable to keep up. The project team has (apparently) increased the cache of work to try and avoid running out. But I think there are still times when there are no tasks available to send. I have been watching the number of available tasks on the server status page quite frequently over the past month or so, and I suspect that the task volumes are now getting so high that the snapshot you see every 10 minutes of the number of available tasks is not reflective of the underlying reality. I mean if it shows 200,000 tasks are available, it doesn't mean much when the server sends out 400,000 tasks over the next 10 minutes. I have no idea what the actual rates are, but when you look at the number of outstanding work units change between 10 minute intervals, it seems the numbers must be very high. Obviously if number of outstanding tasks is 1.0 million at one time interval, and 1.4 million on the next, the project must have sent out at least 400,000 tasks. But when you don't really know how many tasks were reported in as completed during that time interval, the number could be much higher. Rosetta Moderator: Mod.Sense ID: 80036 · Rating: 0 · rate: /

BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,845,917 RAC: 0	Message 80037 - Posted: 6 May 2016, 20:33:38 UTC - in response to Message 80036. Thanks for your detailed response. One thing I'd note here -- I believe it isn't the request for new work which yields the 24 hour back off that I see, rather it is the reporting of completed work units. I could be wrong there, but I believe the sequence is to report completed work and then request new work. When I check the workstations, I see between 3 and 12 completed work units which have not been reported. If the problem is a lack of available work, then I would expect the completed work units to be reported and a 'no new work' message. I do see that sort of report with some other projects which are periodically sparse with work units (GPUGrid for example or MilkyWay) I will note at this point I see that Rosetta is getting a lot of new users and that likely will be draining down the available work. In any event, should this persist, I figure to shift systems to other projects temporarily when I can't regularly access the systems (say when I am out of town). ID: 80037 · Rating: 0 · rate: /

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 80038 - Posted: 6 May 2016, 21:18:06 UTC Tasks are reported and requested in the same scheduler request. However, if the task results are still uploading at the time of the scheduler request, they cannot be reported as completed yet. So perhaps the ones that you see are ready to report were just recently completed, or their uploads just recently completed. So they were ready to report after the series of work requests caused the backoff to run up to 24hrs. EMailing with DK today, he confirms there are short periods where no work is available, even though you don't typically see that reflected on the status page. New work becomes available soon enough that the status typically shows lots of work at-the-ready. This is why the problem is intermittent. Rosetta Moderator: Mod.Sense ID: 80038 · Rating: 0 · rate: /

BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,845,917 RAC: 0	Message 80039 - Posted: 6 May 2016, 23:40:02 UTC - in response to Message 80038. OK -- fair enough -- it is possible that given the large Rosetta client population and its growth that I see it as a "Rosetta specific" issue. As I noted, the thing which surprises me is that apparently, the instant this happens on a workstation, it goes to a 24 hour back off -- which I don't see with other projects. Then again, most of the other projects (aside from GPUGrid) are not in a 'no work available' mode all that often. As it is, for me the trade off is that World Grid is getting happier with me as the mix is shifting a bit toward it with my work units. Tasks are reported and requested in the same scheduler request. However, if the task results are still uploading at the time of the scheduler request, they cannot be reported as completed yet. So perhaps the ones that you see are ready to report were just recently completed, or their uploads just recently completed. So they were ready to report after the series of work requests caused the backoff to run up to 24hrs. EMailing with DK today, he confirms there are short periods where no work is available, even though you don't typically see that reflected on the status page. New work becomes available soon enough that the status typically shows lots of work at-the-ready. This is why the problem is intermittent. ID: 80039 · Rating: 0 · rate: /

robertmiles Send message Joined: 16 Jun 08 Posts: 1265 Credit: 14,424,358 RAC: 0	Message 80226 - Posted: 23 Jun 2016, 5:21:10 UTC Last modified: 23 Jun 2016, 5:26:24 UTC Rosetta@Home appears to have had some problem that block uploads for the last several hours. This workunit is affected: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=759093541 It appears that downloads and getting new workunits to clients are probably not affected. ID: 80226 · Rating: 0 · rate: /

Emerald42 Send message Joined: 9 Jun 08 Posts: 9 Credit: 1,675,720 RAC: 0	Message 80227 - Posted: 23 Jun 2016, 5:33:45 UTC Last modified: 23 Jun 2016, 5:34:06 UTC I must beight here. And i can´t upload several ready computed files to server. ID: 80227 · Rating: 0 · rate: /

furukitsune Send message Joined: 19 Mar 16 Posts: 9 Credit: 7,847,298 RAC: 0	Message 80234 - Posted: 23 Jun 2016, 12:42:21 UTC Last modified: 23 Jun 2016, 12:48:23 UTC Uploads stopped at 23 Jun 2016 8:22:07 UTC (for me). Now 12:40 utc still can't upload, SSP shows all green (!). ID: 80234 · Rating: 0 · rate: /