Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 26 · 27 · 28 · 29 · 30 · 31 · 32 . . . 55 · Next
Author | Message |
---|---|
seybernetx Send message Joined: 16 Aug 10 Posts: 5 Credit: 1,520 RAC: 0 |
6/9/2013 7:22:26 PM | rosetta@home | update requested by user 6/9/2013 7:22:32 PM | rosetta@home | Sending scheduler request: Requested by user. 6/9/2013 7:22:32 PM | rosetta@home | [color=red][b]Not requesting tasks: don't need[/b][/color] 6/9/2013 7:22:34 PM | rosetta@home | Scheduler request completed 6/9/2013 7:22:34 PM | rosetta@home | General prefs: from rosetta@home (last modified 07-Jun-2013 13:37:59) 6/9/2013 7:22:34 PM | rosetta@home | Computer location: home 6/9/2013 7:22:34 PM | rosetta@home | General prefs: no separate prefs for home; using your defaults ---------------- Best I can tell, Rosetta thinks my system doesn't need any work. Not clear why. Having all four projects sharing worked fine for more than a month, than all of a sudden, splat. The website page for your host indicates it has not contacted the server (i.e. not requested any work) since the 6th. So, as GregBE suggests, either your machine feels it already has enough work from other projects, or perhaps the Rosetta project is labelled as "no new tasks". See BOINC Manager projects tab status column. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
...and now it says June 10. So, until you hit the update button, the BOINC Manager did not feel it needed to contact the project scheduler. Rosetta Moderator: Mod.Sense |
seybernetx Send message Joined: 16 Aug 10 Posts: 5 Credit: 1,520 RAC: 0 |
HUH?? mod.sense, what on earth are you talking about? Rosetta has no work units at all on my machine. When I force an update, rosetta insists "Not requesting tasks: don't need". Your response is to point out that Rosetta stores time/date info in UTC time, not local time. Quite true, and utterly beside the point, that is. dan PS: FWIW, I'm at UTC-5 or UTC-6, pending Daylight Saving Time status. ...and now it says June 10. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
My point was that June 10 (UTC) was the first time in days that the project had been contacted by your machine. It still didn't request any work, but at least contact was made (that's the timestamp shown on the webpage I mentioned). ...and so this tends to confirm you do not have a network problem, you do not have a configuration problem. BOINC Manager was simply not asking for any Rosetta work and that is why you don't have any. From your stats, it looks like BOINC probably does not get to run very many hours per day. And so it is being pretty conservative about getting work, because it's not sure how many hours to expect to be running on a given day. Rosetta Moderator: Mod.Sense |
seybernetx Send message Joined: 16 Aug 10 Posts: 5 Credit: 1,520 RAC: 0 |
mod.sense, you kind of remind me of The Good Old Days, back when I was a customer of New Jersey Bell. Whenever there was a problem, I would call, and the service people would insist everything was working fine, there was never anything wrong, ever, no matter how much I argued. Then POOF! the problem would magically disappear, typically by the next afternoon. Cheers.... My point was that June 10 (UTC) was the first time in days that the project had been contacted by your machine. It still didn't request any work, but at least contact was made (that's the timestamp shown on the webpage I mentioned). ...and so this tends to confirm you do not have a network problem, you do not have a configuration problem. BOINC Manager was simply not asking for any Rosetta work and that is why you don't have any. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
LOL! I've had the same phone company experience! Let me know if it's all still unclear. Rosetta Moderator: Mod.Sense |
seybernetx Send message Joined: 16 Aug 10 Posts: 5 Credit: 1,520 RAC: 0 |
Nope, mod.sense, things are working fine. Ever since your post insisting that all the problems were on my end, BOINC has been downloading and processing an average of about one Rosetta work unit a day. Thanks to whoever fixed the non-existent problem. cheers... LOL! I've had the same phone company experience! |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Your machine got all of it's projects back in to balance with regard to your resource shares, and so it will now run a balanced amount of tasks from your new project mix. Rosetta Moderator: Mod.Sense |
Cartoonman Send message Joined: 9 Oct 08 Posts: 13 Credit: 7,274,094 RAC: 105 |
This one ran for a very long time, and all I got out of it was 20 credits. :I https://boinc.bakerlab.org/rosetta/result.php?resultid=587016069 Apparently there was a cos/sin out of bounds error, except it didn't error out the WU, it just kept it running on full for 27 hours, with the same error over and over. It would explain why after crunching for nearly 7 hours it still didn't make a checkpoint. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,214,047 RAC: 1,768 |
I am running the cryo units and this is happenning: 25,796.25 186.91 163.79 https://boinc.bakerlab.org/rosetta/result.php?resultid=587401709 other cry units are doing this: 10,449.38 75.71 167.81 https://boinc.bakerlab.org/rosetta/result.php?resultid=587394445 Twice the run time and NO more credits, these darned cryo units had problems in the past, should I start aborting them AGAIN?!!! |
Thierry Preusser Send message Joined: 5 Aug 13 Posts: 1 Credit: 526,503 RAC: 0 |
Hello, For some reason, Rosetta demand intense disk access, considerably slows the overall treatment of BOINC. I run BOINC 7.0.64 (x64) on a Windows 7 64-bit platform. I'm running the following projects: SETI@home, Asteroids@home, Cosmology@home, Climateprediction.net, LHC@home 1.0, MindModeling@beta and finally rosetta@home. Each project has an activity ratio of 10 except that SETI has 50. I run these projects on three computers, but it is on the most powerful that I have problems. Here is a brief report of the configuration of my machine : Operating System: Microsoft Windows 7 Professional Version: 6.1.7601 Service Pack 1 Build 7601 Type x64-based PC Logical Processor Intel (R) Core (TM) i7 CPU X 980@3.33GHz, 3334 MHz, 6 cores (s), 12 processors BIOS Version / Date: American Megatrends Inc. 0602, 5/10/10 SMBIOS Version: 2.5 Physical Memory (RAM): 24.0 GB Total Physical Memory: 24.0 GB Available Physical Memory: 16.6 GB Total Virtual Memory: 27.9 GB Available Virtual Memory: 19.9 GB Space for the swap file: 3.91 GB Paging File C:pagefile.sys BOINC has 40 GB of disk space for the file data folder. It can use 100% of CPUs and 100% of the CPU time. It is connected all the time at the network. Rosetta has downloaded to treat a total of 2.04 GB of data for 10.51 GB for all the projects. What happens is that when three or six tasks rosetta are being calculated, CPU activity drops drastically in the Windows Task Manager. It can even spend an hour with almost no activity, while 12 projects are being marked "calculating". When I suspend rosetta in the Project Manager, the activity of 12 processors is soon back 100%. I'll still see a few days how rosetta behaves at home on BOINC. I started to calculate for rosetta since August 5. Do you have any explanation or a suggestion ? Thank you for your reply. Thierry Preusser Username: ThierryPreusser UserID: 479977 Email Address: preuthier@voila.fr |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Thierry, you provided a lot of info. But I didn't see how much memory you've told BOINC to use. Nor how you defined that disk usage was the bottleneck. If BOINC is configured to not use much of that memory, then BOINC will suspend tasks when it approaches the configured memory limit. So you would see tasks not progressing, but that shouldn't cause disk IO. Just about the only reason running Rosetta tasks would generate a lot of disk activity, to the point that work is bogging down, is if page swapping is occurring. Each task you start must do some loading of standard libraries etc. You have so many CPUs that this may be occurring several times an hour. You could reduce the relative level of overhead per task by running with longer runtime preference. This is in the Rosetta-specific preferences configured via the project website. Beware, changes to the value will be applied to the tasks your currently have on your machine, so you typically want to reduce the buffer of unfinished work, and change the runtime preference value only gradually. BOINC Manager needs time to see the result and alter it's completion time estimates. Rosetta Moderator: Mod.Sense |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Tasks with names starting with 3H22 (sample 597543288 ) are failing immediately with a computation error. Linux Ubuntu/Boinc 7.0.65 Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached ERROR: Illegal value specified for option -run:protocol : abinitio </stderr_txt> ]]> |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
Tasks with names starting with 3H22 (sample 597543288 ) are failing immediately with a computation error. Linux Ubuntu/Boinc 7.0.65 Same with Windows 7. Greetings, TJ. |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
Task enough to run accordingly to the server status page but I am not getting any anymore. Do I need to reset the project again to get new tasks? Guys when are you updating the obsolete server code of the project? Greetings, TJ. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
You shouldn't have to reset the project to get work. Your host(s) are hidden. Have you had a string of failed work units or some problems downloading? Rosetta Moderator: Mod.Sense |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
You shouldn't have to reset the project to get work. Your host(s) are hidden. Have you had a string of failed work units or some problems downloading? No, the answers to both questions are no. This is what it said at Outcome: Client detached. Even the WU´s that where still running on the rig were already Client detached. The science done here is important, but sticking to the project becomes harder every time. Greetings, TJ. |
Ananas Send message Joined: 1 Jan 06 Posts: 232 Credit: 752,471 RAC: 0 |
endo_ae__ results cause (and suffer from) BOINC heartbeat problems and they do not checkpoint properly on one of my boxes, my guess is that they have very high RAM requirements (my internet PC with only 2GB RAM, having Firefox nearly always running, one Rosetta task plus 3 projects with very low RAM requirements). They should probably be limited to boxes with more than 3GB physical RAM. Unfortunately I could not catch/spy on one just before it crashed, so the RAM thing is only a guess. After the crash the RAM history is lost with the PID so I cannot check the maximum usage. Other result types seem not to be affected. |
Warped Send message Joined: 15 Jan 06 Posts: 48 Credit: 1,788,185 RAC: 0 |
endo_ae__ results cause (and suffer from) BOINC heartbeat problems and they do not checkpoint properly on one of my boxes, my guess is that they have very high RAM requirements (my internet PC with only 2GB RAM, having Firefox nearly always running, one Rosetta task plus 3 projects with very low RAM requirements). They should probably be limited to boxes with more than 3GB physical RAM. Indeed. The endo_ae tasks are terrible: 1. The first checkpoint takes a number of hours. 2. I have at least three which have crashed after a few minutes. 3. The credit from them is poor. In one example over 8 hours for only 20 points. Warped |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Tasks with names starting with vp26_ab_* seem to be causing problems. They don't checkpoint and run until terminated by the watchdog. They validate but only award 20 points. WARNING! cannot get file size for default.out.gz: could not open file. Output exists: default.out.gz Size: -1 InternalDecoyCount: 0 (GZ) ----- 0 ----- Stream information inconsistent. Writing W_0000001 ====================================================== DONE :: 1 starting structures 25408.3 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== called boinc_finish SIGSEGV: segmentation violation Stack trace (2 frames): [0xb2aef87] [0xf77b3400] Exiting... Sample task 605199017 |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org