Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 26 · 27 · 28 · 29 · 30 · 31 · 32 . . . 55 · Next

AuthorMessage
seybernetx

Send message
Joined: 16 Aug 10
Posts: 5
Credit: 1,520
RAC: 0
Message 75731 - Posted: 10 Jun 2013, 0:33:32 UTC - in response to Message 75730.  

6/9/2013 7:22:26 PM | rosetta@home | update requested by user
6/9/2013 7:22:32 PM | rosetta@home | Sending scheduler request: Requested by user.
6/9/2013 7:22:32 PM | rosetta@home | [color=red][b]Not requesting tasks: don't need[/b][/color]
6/9/2013 7:22:34 PM | rosetta@home | Scheduler request completed
6/9/2013 7:22:34 PM | rosetta@home | General prefs: from rosetta@home (last modified 07-Jun-2013 13:37:59)
6/9/2013 7:22:34 PM | rosetta@home | Computer location: home
6/9/2013 7:22:34 PM | rosetta@home | General prefs: no separate prefs for home; using your defaults

----------------

Best I can tell, Rosetta thinks my system doesn't need any work. Not clear why.

Having all four projects sharing worked fine for more than a month, than all of a sudden, splat.

The website page for your host indicates it has not contacted the server (i.e. not requested any work) since the 6th. So, as GregBE suggests, either your machine feels it already has enough work from other projects, or perhaps the Rosetta project is labelled as "no new tasks". See BOINC Manager projects tab status column.

ID: 75731 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 75733 - Posted: 10 Jun 2013, 1:08:48 UTC

...and now it says June 10.

So, until you hit the update button, the BOINC Manager did not feel it needed to contact the project scheduler.
Rosetta Moderator: Mod.Sense
ID: 75733 · Rating: 0 · rate: Rate + / Rate - Report as offensive
seybernetx

Send message
Joined: 16 Aug 10
Posts: 5
Credit: 1,520
RAC: 0
Message 75740 - Posted: 10 Jun 2013, 17:16:41 UTC - in response to Message 75733.  

HUH??

mod.sense, what on earth are you talking about?

Rosetta has no work units at all on my machine. When I force an update, rosetta insists "Not requesting tasks: don't need".

Your response is to point out that Rosetta stores time/date info in UTC time, not local time. Quite true, and utterly beside the point, that is.

dan

PS: FWIW, I'm at UTC-5 or UTC-6, pending Daylight Saving Time status.


...and now it says June 10.

So, until you hit the update button, the BOINC Manager did not feel it needed to contact the project scheduler.

ID: 75740 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 75742 - Posted: 10 Jun 2013, 20:07:05 UTC

My point was that June 10 (UTC) was the first time in days that the project had been contacted by your machine. It still didn't request any work, but at least contact was made (that's the timestamp shown on the webpage I mentioned). ...and so this tends to confirm you do not have a network problem, you do not have a configuration problem. BOINC Manager was simply not asking for any Rosetta work and that is why you don't have any.

From your stats, it looks like BOINC probably does not get to run very many hours per day. And so it is being pretty conservative about getting work, because it's not sure how many hours to expect to be running on a given day.
Rosetta Moderator: Mod.Sense
ID: 75742 · Rating: 0 · rate: Rate + / Rate - Report as offensive
seybernetx

Send message
Joined: 16 Aug 10
Posts: 5
Credit: 1,520
RAC: 0
Message 75744 - Posted: 10 Jun 2013, 23:25:07 UTC - in response to Message 75742.  

mod.sense, you kind of remind me of The Good Old Days, back when I was a customer of New Jersey Bell. Whenever there was a problem, I would call, and the service people would insist everything was working fine, there was never anything wrong, ever, no matter how much I argued. Then POOF! the problem would magically disappear, typically by the next afternoon.

Cheers....

My point was that June 10 (UTC) was the first time in days that the project had been contacted by your machine. It still didn't request any work, but at least contact was made (that's the timestamp shown on the webpage I mentioned). ...and so this tends to confirm you do not have a network problem, you do not have a configuration problem. BOINC Manager was simply not asking for any Rosetta work and that is why you don't have any.

From your stats, it looks like BOINC probably does not get to run very many hours per day. And so it is being pretty conservative about getting work, because it's not sure how many hours to expect to be running on a given day.

ID: 75744 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 75751 - Posted: 11 Jun 2013, 21:06:58 UTC

LOL! I've had the same phone company experience!

Let me know if it's all still unclear.
Rosetta Moderator: Mod.Sense
ID: 75751 · Rating: 0 · rate: Rate + / Rate - Report as offensive
seybernetx

Send message
Joined: 16 Aug 10
Posts: 5
Credit: 1,520
RAC: 0
Message 75752 - Posted: 12 Jun 2013, 16:12:18 UTC - in response to Message 75751.  

Nope, mod.sense, things are working fine.

Ever since your post insisting that all the problems were on my end, BOINC has been downloading and processing an average of about one Rosetta work unit a day.

Thanks to whoever fixed the non-existent problem.

cheers...


LOL! I've had the same phone company experience!

Let me know if it's all still unclear.

ID: 75752 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 75754 - Posted: 12 Jun 2013, 21:25:56 UTC

Your machine got all of it's projects back in to balance with regard to your resource shares, and so it will now run a balanced amount of tasks from your new project mix.
Rosetta Moderator: Mod.Sense
ID: 75754 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Cartoonman

Send message
Joined: 9 Oct 08
Posts: 13
Credit: 7,220,538
RAC: 0
Message 75761 - Posted: 14 Jun 2013, 14:25:52 UTC

This one ran for a very long time, and all I got out of it was 20 credits. :I


https://boinc.bakerlab.org/rosetta/result.php?resultid=587016069




Apparently there was a cos/sin out of bounds error, except it didn't error out the WU, it just kept it running on full for 27 hours, with the same error over and over. It would explain why after crunching for nearly 7 hours it still didn't make a checkpoint.
ID: 75761 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1847
Credit: 5,974,953
RAC: 347
Message 75763 - Posted: 15 Jun 2013, 21:14:15 UTC

I am running the cryo units and this is happenning:
25,796.25 186.91 163.79
https://boinc.bakerlab.org/rosetta/result.php?resultid=587401709

other cry units are doing this:
10,449.38 75.71 167.81
https://boinc.bakerlab.org/rosetta/result.php?resultid=587394445

Twice the run time and NO more credits, these darned cryo units had problems in the past, should I start aborting them AGAIN?!!!
ID: 75763 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Thierry Preusser

Send message
Joined: 5 Aug 13
Posts: 1
Credit: 526,503
RAC: 0
Message 75911 - Posted: 7 Aug 2013, 21:21:20 UTC

Hello,
For some reason, Rosetta demand intense disk access, considerably slows the overall treatment of BOINC. I run BOINC 7.0.64 (x64) on a Windows 7 64-bit platform. I'm running the following projects: SETI@home, Asteroids@home, Cosmology@home, Climateprediction.net, LHC@home 1.0, MindModeling@beta and finally rosetta@home. Each project has an activity ratio of 10 except that SETI has 50. I run these projects on three computers, but it is on the most powerful that I have problems.
Here is a brief report of the configuration of my machine :
Operating System: Microsoft Windows 7 Professional
Version: 6.1.7601 Service Pack 1 Build 7601
Type x64-based PC
Logical Processor Intel (R) Core (TM) i7 CPU X 980@3.33GHz, 3334 MHz, 6 cores (s), 12 processors
BIOS Version / Date: American Megatrends Inc. 0602, 5/10/10
SMBIOS Version: 2.5
Physical Memory (RAM): 24.0 GB
Total Physical Memory: 24.0 GB
Available Physical Memory: 16.6 GB
Total Virtual Memory: 27.9 GB
Available Virtual Memory: 19.9 GB
Space for the swap file: 3.91 GB
Paging File C:pagefile.sys
BOINC has 40 GB of disk space for the file data folder. It can use 100% of CPUs and 100% of the CPU time. It is connected all the time at the network.
Rosetta has downloaded to treat a total of 2.04 GB of data for 10.51 GB for all the projects.
What happens is that when three or six tasks rosetta are being calculated, CPU activity drops drastically in the Windows Task Manager. It can even spend an hour with almost no activity, while 12 projects are being marked "calculating". When I suspend rosetta in the Project Manager, the activity of 12 processors is soon back 100%.
I'll still see a few days how rosetta behaves at home on BOINC. I started to calculate for rosetta since August 5.
Do you have any explanation or a suggestion ?
Thank you for your reply.
Thierry Preusser
Username: ThierryPreusser
UserID: 479977
Email Address: preuthier@voila.fr
ID: 75911 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 75924 - Posted: 10 Aug 2013, 0:49:26 UTC

Thierry, you provided a lot of info. But I didn't see how much memory you've told BOINC to use. Nor how you defined that disk usage was the bottleneck. If BOINC is configured to not use much of that memory, then BOINC will suspend tasks when it approaches the configured memory limit. So you would see tasks not progressing, but that shouldn't cause disk IO. Just about the only reason running Rosetta tasks would generate a lot of disk activity, to the point that work is bogging down, is if page swapping is occurring.

Each task you start must do some loading of standard libraries etc. You have so many CPUs that this may be occurring several times an hour. You could reduce the relative level of overhead per task by running with longer runtime preference. This is in the Rosetta-specific preferences configured via the project website. Beware, changes to the value will be applied to the tasks your currently have on your machine, so you typically want to reduce the buffer of unfinished work, and change the runtime preference value only gradually. BOINC Manager needs time to see the result and alter it's completion time estimates.
Rosetta Moderator: Mod.Sense
ID: 75924 · Rating: 0 · rate: Rate + / Rate - Report as offensive
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 10,492,791
RAC: 4,626
Message 75927 - Posted: 11 Aug 2013, 16:28:09 UTC

Tasks with names starting with 3H22 (sample 597543288 ) are failing immediately with a computation error. Linux Ubuntu/Boinc 7.0.65

Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
ERROR: Illegal value specified for option -run:protocol : abinitio

</stderr_txt>
]]>
ID: 75927 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TJ

Send message
Joined: 29 Mar 09
Posts: 127
Credit: 4,799,890
RAC: 0
Message 75928 - Posted: 12 Aug 2013, 10:21:31 UTC - in response to Message 75927.  

Tasks with names starting with 3H22 (sample 597543288 ) are failing immediately with a computation error. Linux Ubuntu/Boinc 7.0.65

Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
ERROR: Illegal value specified for option -run:protocol : abinitio

</stderr_txt>
]]>

Same with Windows 7.
Greetings,
TJ.
ID: 75928 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TJ

Send message
Joined: 29 Mar 09
Posts: 127
Credit: 4,799,890
RAC: 0
Message 76029 - Posted: 5 Sep 2013, 17:34:48 UTC

Task enough to run accordingly to the server status page but I am not getting any anymore. Do I need to reset the project again to get new tasks?

Guys when are you updating the obsolete server code of the project?
Greetings,
TJ.
ID: 76029 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 76033 - Posted: 6 Sep 2013, 4:31:41 UTC

You shouldn't have to reset the project to get work. Your host(s) are hidden. Have you had a string of failed work units or some problems downloading?
Rosetta Moderator: Mod.Sense
ID: 76033 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TJ

Send message
Joined: 29 Mar 09
Posts: 127
Credit: 4,799,890
RAC: 0
Message 76034 - Posted: 6 Sep 2013, 8:24:13 UTC - in response to Message 76033.  

You shouldn't have to reset the project to get work. Your host(s) are hidden. Have you had a string of failed work units or some problems downloading?

No, the answers to both questions are no. This is what it said at Outcome: Client detached.

Even the WU´s that where still running on the rig were already Client detached.
The science done here is important, but sticking to the project becomes harder every time.
Greetings,
TJ.
ID: 76034 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Ananas

Send message
Joined: 1 Jan 06
Posts: 232
Credit: 752,471
RAC: 0
Message 76053 - Posted: 19 Sep 2013, 8:35:09 UTC
Last modified: 19 Sep 2013, 8:39:44 UTC

endo_ae__ results cause (and suffer from) BOINC heartbeat problems and they do not checkpoint properly on one of my boxes, my guess is that they have very high RAM requirements (my internet PC with only 2GB RAM, having Firefox nearly always running, one Rosetta task plus 3 projects with very low RAM requirements). They should probably be limited to boxes with more than 3GB physical RAM.

Unfortunately I could not catch/spy on one just before it crashed, so the RAM thing is only a guess. After the crash the RAM history is lost with the PID so I cannot check the maximum usage. Other result types seem not to be affected.
ID: 76053 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Warped

Send message
Joined: 15 Jan 06
Posts: 47
Credit: 1,586,400
RAC: 181
Message 76054 - Posted: 19 Sep 2013, 9:19:33 UTC - in response to Message 76053.  

endo_ae__ results cause (and suffer from) BOINC heartbeat problems and they do not checkpoint properly on one of my boxes, my guess is that they have very high RAM requirements (my internet PC with only 2GB RAM, having Firefox nearly always running, one Rosetta task plus 3 projects with very low RAM requirements). They should probably be limited to boxes with more than 3GB physical RAM.

Unfortunately I could not catch/spy on one just before it crashed, so the RAM thing is only a guess. After the crash the RAM history is lost with the PID so I cannot check the maximum usage. Other result types seem not to be affected.


Indeed. The endo_ae tasks are terrible:
1. The first checkpoint takes a number of hours.
2. I have at least three which have crashed after a few minutes.
3. The credit from them is poor. In one example over 8 hours for only 20 points.
Warped

ID: 76054 · Rating: 0 · rate: Rate + / Rate - Report as offensive
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 10,492,791
RAC: 4,626
Message 76056 - Posted: 22 Sep 2013, 0:25:56 UTC

Tasks with names starting with vp26_ab_* seem to be causing problems. They don't checkpoint and run until terminated by the watchdog. They validate but only award 20 points.

WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1
InternalDecoyCount: 0 (GZ)
-----
0
-----
Stream information inconsistent.
Writing W_0000001
======================================================
DONE :: 1 starting structures 25408.3 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
called boinc_finish
SIGSEGV: segmentation violation
Stack trace (2 frames):
[0xb2aef87]
[0xf77b3400]

Exiting...


Sample task 605199017

ID: 76056 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 . . . 26 · 27 · 28 · 29 · 30 · 31 · 32 . . . 55 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2021 University of Washington
https://www.bakerlab.org