New Motherboard, old Disk and Rosetta

Message boards : Number crunching : New Motherboard, old Disk and Rosetta

To post messages, you must log in.

AuthorMessage
Winkle

Send message
Joined: 22 May 06
Posts: 88
Credit: 1,354,930
RAC: 0
Message 61602 - Posted: 7 Jun 2009, 5:13:11 UTC

Hi,
It has been a long time since I was on here.

I searched the boards for a problem similar but found none.

My Win 2000 Domain Controller (ID 225837) tossed in its motherboard a few weeks ago, so I transferred the Hard Drive into another machine, and with a bit of tweaking and new drivers it all came back properly except Boinc and Rosetta. Boinc would occasionally fail to start with a "dll failed to load" error, and Rosetta would run for days at 100% complete, forcing me to abort he units (I hate doing this).

I ended up uninstalling BOINC 5.10.45 and reinstalling. 5.10.45 was the only one I could find which would run on a domain controller. I am running as a service. It appears to now be OK, and has punched out 2 work units since.

Is there a way of getting the later Boinc to run on a Domain Controller, and has anyone else noticed this behavior?

Ian

ID: 61602 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,380
RAC: 6,544
Message 61604 - Posted: 7 Jun 2009, 12:24:20 UTC - in response to Message 61602.  

Hi,
It has been a long time since I was on here.

I searched the boards for a problem similar but found none.

My Win 2000 Domain Controller (ID 225837) tossed in its motherboard a few weeks ago, so I transferred the Hard Drive into another machine, and with a bit of tweaking and new drivers it all came back properly except Boinc and Rosetta. Boinc would occasionally fail to start with a "dll failed to load" error, and Rosetta would run for days at 100% complete, forcing me to abort he units (I hate doing this).

I ended up uninstalling BOINC 5.10.45 and reinstalling. 5.10.45 was the only one I could find which would run on a domain controller. I am running as a service. It appears to now be OK, and has punched out 2 work units since.

Is there a way of getting the later Boinc to run on a Domain Controller, and has anyone else noticed this behavior?
Ian


I don't know anything about a Domain Controller but I do know that I am running a Windows Home Server, based on Windows Server version 2003, and am using Boinc version 6.4.5 just fine. My Server basically gives out IP addresses for my local network and does backups of them too and that is all. I am not sure that is relevant to what you do, but the version of Boinc I am using gives me no troubles at all.
ID: 61604 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Winkle

Send message
Joined: 22 May 06
Posts: 88
Credit: 1,354,930
RAC: 0
Message 61705 - Posted: 12 Jun 2009, 5:25:42 UTC

I have been watching the server fairly closely, and this is what I have noticed.
It will do WUs for a few days and then crunch to 100%, but instead of stopping at 8 hrs, the percentage goes to 100% and the CPU time still increases.

If I then suspend the project (not the WU) and resume the project, it moves onto the next WU and makes the frozen one "ready to run"

If I stop and restart the BOINC process the percent complete drops to around 14% and CPU time drops to 1hr 40 mins.
The task is lb_thread_all_multi_hb_t305__IGNORE_THE_REST_12722_13_1 using minirosetta version 1.71

Ian
ID: 61705 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,380
RAC: 6,544
Message 61707 - Posted: 12 Jun 2009, 9:13:41 UTC - in response to Message 61705.  

I have been watching the server fairly closely, and this is what I have noticed.
It will do WUs for a few days and then crunch to 100%, but instead of stopping at 8 hrs, the percentage goes to 100% and the CPU time still increases.

If I then suspend the project (not the WU) and resume the project, it moves onto the next WU and makes the frozen one "ready to run"

If I stop and restart the BOINC process the percent complete drops to around 14% and CPU time drops to 1hr 40 mins.
The task is lb_thread_all_multi_hb_t305__IGNORE_THE_REST_12722_13_1 using minirosetta version 1.71

Ian


I have read about this before but have not personally seen it in a long time. I am not sure there is a fix as it seems to be a Boinc thing, not happening only to one project.
ID: 61707 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1225
Credit: 13,861,012
RAC: 2,322
Message 61801 - Posted: 17 Jun 2009, 3:50:24 UTC - in response to Message 61705.  

I have been watching the server fairly closely, and this is what I have noticed.
It will do WUs for a few days and then crunch to 100%, but instead of stopping at 8 hrs, the percentage goes to 100% and the CPU time still increases.

If I then suspend the project (not the WU) and resume the project, it moves onto the next WU and makes the frozen one "ready to run"

If I stop and restart the BOINC process the percent complete drops to around 14% and CPU time drops to 1hr 40 mins.
The task is lb_thread_all_multi_hb_t305__IGNORE_THE_REST_12722_13_1 using minirosetta version 1.71

Ian


With Rosetta@home, the usual behavior for workunits needing more time than the estimate is that they appear to slow down drastically between 10 minutes before the expected time limit and the actual finish time. This is due to problems in estimating the actual time needed to finish.

Workunits sometimes appear to freeze at 100% complete if the actual computations are complete, but the work needed to pack up the results of the computations isn't. In this case, they'll usually finish in a few minutes if you resume them, assuming you chose the leave in memory option.

If you restart BOINC, all workunits must restart from their last checkpoint and this can lose a considerable amount of work done since then.

Also, there's a lockfile problem, most recently described under 1.75, which stops writing anything to the output files, and therefore appears to make the workunit freeze except for using wall clock time until the next BOINC restart. I have one of my machines tuned to look for this problem by using only 95% of the CPU time, and the other one tuned to avoid it by using 100% of the CPU time. Not confined to Rosetta@home and RALPH@home, but those are the two BOINC projects that start it the most often. Once started, it then shows up in any BOINC workunits the try to run in the same slot as the one that started it, until BOINC is restarted and cleans up any leftover lockfiles.

I've found that 6.2.28 will run as a service if you install it after telling 5.10.45 to run as a service; something in the install options carries over.

I'm not familiar with running BOINC on a Domain Controller.
ID: 61801 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : New Motherboard, old Disk and Rosetta



©2024 University of Washington
https://www.bakerlab.org