Posts by Stephen Miller

1) Message boards : Number crunching : Multiple Computation Errors (Message 75315)
Posted 4 Apr 2013 by Stephen Miller
Post:
I have a different cause for all tasks ending up Computation Error. I'm running a Windows 7 64bit PC using an I5 3570K overclocked. If I overclock over 4.5GHz then all tasks Computation Error - at this over clock the PC is otherwise stable and runs other applications in a stable fashion without apparent errors.

If I throttle back to 4.4Ghz the Rosetta tasks run normally without errors.

Any ideas about the Computation Errors at 4.5GHz overclock ?????


Try running Prime95 torture test from
this website: http://www.mersenne.org/freesoft/

I remember reading from their website that while an overclocked computer appears to be stable, it outputs garbage for scientific work. Hence the torture test. If you can't get Prime95 to run flawless for hours/days, you are unstable.

Once you run the torture test for hours/days, then you have a stable system.

In my testing, running successfully for 4+ hours is a leading indicator of stability. Longer is better, especially if your ambient temperatures vary over time; that is, runs while room is cool, fails when room is hot.
2) Message boards : Number crunching : Client errors (Message 75287)
Posted 24 Mar 2013 by Stephen Miller
Post:
I nominate JacobKlein for the Rosetta@home's HERO AWARD or some method on the FRONT PAGE to acknowledge his persistent effort to get credit where credit is due (double entendre intended).

Crunching since 18 Sep 2005
3) Message boards : Number crunching : 3.43 is causing pop-ups (Message 74483)
Posted 20 Nov 2012 by Stephen Miller
Post:
A BOINC project message would have been a good method to let everyone (whose running a current BOINC anyways) know the announcement that's on the main page.

I updated BOINC about the same time Rosetta went into stupid mode and thought it was a new "feature" of BOINC. Today I see that it's a 3.43 issue. Thanks to Microsoft training, we have learned to live with bugs and issues until they are resolved. I'm embarrassed.

Never in the field of human conflict was so much owed by so many to so few - Winston Churchill

BOINC version: Never in the field of distributed computing was so much wasted by so many for so few credit.
4) Message boards : Number crunching : minirosetta_3.17_windows_X86_64.exe fails to download (Message 71728)
Posted 3 Dec 2011 by Stephen Miller
Post:
I recently added a computer to Rosetta and have tried to download two work units.

The units have downloaded, but the minirosetta 3.17 fails to complete loading. Both hung about the 94%range.



The problem seems to be with my connection to the ethernet at work. It downloaded fine when I took the laptop home and connected there. It never did finish at work.

The weird part is that I downloaded nearly 2Gigs of files without any issues until rosetta. Such is life.
5) Message boards : Number crunching : minirosetta_3.17_windows_X86_64.exe fails to download (Message 71720)
Posted 2 Dec 2011 by Stephen Miller
Post:

That's odd. You can donwload it manually here for x64 if that's any use?:

http://boinc.bakerlab.org/rosetta/download/minirosetta_3.17_windows_x86_64.exe

I can't find the location of the 32-bit version, but it should be boinc.bakerlab.org/rosetta/download/ and then the file name...

I think that will work if you shut BOINC down (and check "stop running science app..."), save that file to the projectsboinc.bakerlab.org_rosetta folder, and then restart BOINC.


Where should the file be located? I'm not at near a computer to hunt for the final file folder.


I used the above link to manually download the file. Two attempts to download the file have failed. Win 7 allows me to resume a manual download. So far, both attempts are "Resuming..." without results.
6) Message boards : Number crunching : minirosetta_3.17_windows_X86_64.exe fails to download (Message 71719)
Posted 2 Dec 2011 by Stephen Miller
Post:
BOINC will automatically retry any downloads that "hang".


It hung for 24 hours. Is that normal?
7) Message boards : Number crunching : minirosetta_3.17_windows_X86_64.exe fails to download (Message 71718)
Posted 2 Dec 2011 by Stephen Miller
Post:

That's odd. You can donwload it manually here for x64 if that's any use?:

http://boinc.bakerlab.org/rosetta/download/minirosetta_3.17_windows_x86_64.exe

I can't find the location of the 32-bit version, but it should be boinc.bakerlab.org/rosetta/download/ and then the file name...

I think that will work if you shut BOINC down (and check "stop running science app..."), save that file to the projectsboinc.bakerlab.org_rosetta folder, and then restart BOINC.


Where should the file be located? I'm not at near a computer to hunt for the final file folder.
8) Message boards : Number crunching : minirosetta_3.17_windows_X86_64.exe fails to download (Message 71714)
Posted 2 Dec 2011 by Stephen Miller
Post:
I recently added a computer to Rosetta and have tried to download two work units.

The units have downloaded, but the minirosetta 3.17 fails to complete loading. Both hung about the 94%range.



9) Message boards : Number crunching : Many Problems (Message 29157)
Posted 11 Oct 2006 by Stephen Miller
Post:
murky

If memtest runs without error I may give Rosetta another try. I'm having no problems on the other system

memtest86 ran for 21 hours before I shut it down. 0 errors
This system will go back to F@H for Team Helix and the pentium4 box will work on Rosetta.

Thanks again for the input.
murky
[/quote]

In the last few weeks, I upgraded one system with a new CPU and "Better" memory. When I first started it up, Einstein was happy but Rosetta would fail. I ran Prime95 and it failed. I changed memory timings several times and running Prime95 between tweaks. Prime95 would either fail in an hour or run for several hours. I find that a 4 hour test is usually sufficient. However, when I ran Rosetta, it would run a few successfully and eventually it would fail and succeed again. When I ran Einstein, it wouldn't fail. I even had Prime95 run for 18 hours without fail and then ran Rosetta with some successful and then Rosetta would fail. Out of desperation, I bought some more memory. It could have been a bad CPU or a bad power supply not able to run the new CPU.

The memory was the problem. The better memory wasn't better (PC3500 instead of PC3200, both good named brands). The better memory was unstable. I ran the memory with settings so relaxed that I swear I saw some bits stretched out on the sofa. Running the specified tighter memory settings didn't make it more reliable. I didn't try overvoltage since I was running stock speeds.

Prime95 is not PROOF that a system is stable. Prime95 is like the Olympics. The Olympics only proves who is the best athlete in that sport for that day. Prime95 only shows the results for as long as you let it run or it finds an error. However, Prime95 is still a good indicator. But I now look at it with less certainty.

I now consider Rosetta a better indicator of a stable system. If it has errors, the system is unstable. There were no other symptoms of "bad memory".

Oh, and good luck.

Stephen
10) Message boards : Number crunching : Help us solve the 1% bug! (Message 12602)
Posted 24 Mar 2006 by Stephen Miller
Post:


I've got a stuck unit too.

FA_RLXpt_hom004_1ptq_361_27_0 is stuck at 8.63% at 48:41:25 CPU time in BOINC.

Per the instuctions at the bottom of this thread, I launched:

rosetta_4.82_windows_intelx86.exe xx 1ptq _ -output_silent_gz -silent -increase_cycles 10 -relax_score_filter -new_centroid_packing -abrelax -output_chi_silent -stringent_relax -vary_omega -omega_weight 0.5 -farlx -ex1 -ex2 -short_range_hb_weight 0.50 -long_range_hb_weight 1.0 -no_filters -nstruct 10 -protein_name_prefix hom004_ -frags_name_prefix hom004_ -filter1 -45 -filter2 -55 -termini -cpu_run_time 7200 -constant_seed -jran 2484844

which ran for 19 minutes (started with 18 minutes = 37 minutes total) and stuck at 22.7%, Stage: Ful atom relax, Model 2, step 255492. There is no graphic movement and no step changes.

CPU time is now 0 hr 48 min 0 sec.

Hope this helps.

I have a screen shot of the BOINC application if desired.

I am restarting BOINC to see if it will finish.

On this particular computer, Rosetta is the only project being processed.

update - after a reboot, BOINC is continuing to process the unit. It is currently at 20 minutes 27 secs and at model 3 step 67000+. It took only 10 minutes to get to this point.

Stephen



Hi Stephen, so on your computer the identical work unit does not get stuck at the same point when you run it outside boinc? are other people seeing this as well? thanks, David


Correct, it hung at a different place when ran outside BOINC. And hung at a different place within BOINC too.
11) Message boards : Number crunching : Help us solve the 1% bug! (Message 12521)
Posted 22 Mar 2006 by Stephen Miller
Post:
[quote
I've got a stuck unit too.

FA_RLXpt_hom004_1ptq_361_27_0 is stuck at 8.63% at 7:28:12 CPU time in BOINC.

[/quote]

It hung again at 60.52% on Model 9, step 237186.

It had the same random seed as earlier before I dumped it.

I've aborted it and moved on.

This is the first one that failed to complete after a reboot.

12) Message boards : Number crunching : Help us solve the 1% bug! (Message 12501)
Posted 22 Mar 2006 by Stephen Miller
Post:


as long as the graphics show movement, the calculation is proceeding, so best to stick with it..



I've got a stuck unit too.

FA_RLXpt_hom004_1ptq_361_27_0 is stuck at 8.63% at 48:41:25 CPU time in BOINC.

Per the instuctions at the bottom of this thread, I launched:

rosetta_4.82_windows_intelx86.exe xx 1ptq _ -output_silent_gz -silent -increase_cycles 10 -relax_score_filter -new_centroid_packing -abrelax -output_chi_silent -stringent_relax -vary_omega -omega_weight 0.5 -farlx -ex1 -ex2 -short_range_hb_weight 0.50 -long_range_hb_weight 1.0 -no_filters -nstruct 10 -protein_name_prefix hom004_ -frags_name_prefix hom004_ -filter1 -45 -filter2 -55 -termini -cpu_run_time 7200 -constant_seed -jran 2484844

which ran for 19 minutes (started with 18 minutes = 37 minutes total) and stuck at 22.7%, Stage: Ful atom relax, Model 2, step 255492. There is no graphic movement and no step changes.

CPU time is now 0 hr 48 min 0 sec.

Hope this helps.

I have a screen shot of the BOINC application if desired.

I am restarting BOINC to see if it will finish.

On this particular computer, Rosetta is the only project being processed.

update - after a reboot, BOINC is continuing to process the unit. It is currently at 20 minutes 27 secs and at model 3 step 67000+. It took only 10 minutes to get to this point.

Stephen
13) Message boards : Number crunching : Report stuck & aborted WU here please (Message 11661)
Posted 5 Mar 2006 by Stephen Miller
Post:
I have had two 1% stuck recently.

This one wasted 50+ hours but finished after one BOINC restart:
2/25/2006 1:50:25 AM|rosetta@home|Resuming computation for result PRODUCTION_ABINITIO_DBFLAGS_1aiu__307_294_1 using rosetta version 482

This one wasted 15 hours before I restarted BOINC. Now it is at 45:00 minutes and still at 1% and showing 8:00:00 to complete:
3/4/2006 3:27:35 PM|rosetta@home|Resuming computation for result HB_BARCODE_30_2chf__347_425_0 using rosetta version 482

I plan to reboot and restart to see if it will complete.
Update - It has now passed 1% and expect it to finish.


Stephen M






©2024 University of Washington
https://www.bakerlab.org