Posts by Christian Barrett

1) Message boards : Number crunching : Report stuck & aborted WU here please - II (Message 13515)
Posted 12 Apr 2006 by Christian Barrett
Post:
here is one that cost me dearly

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=13792169

10 Apr 2006 22:50:35 UTC 12 Apr 2006 3:54:52 UTC Over Client error Done 70,199.00 105.31 ---
2) Message boards : Number crunching : Workunit series 1hz6a (Message 3423)
Posted 16 Nov 2005 by Christian Barrett
Post:
Christian,

This is probably due to the increased size of the work units and/or having an older client version. If you are running multiple projects, be sure to keep the application in memory and set the "Switch between applications every" option in your general preferences to at least 2 hours. You may want to try the most recent version of the BOINC client. I have reduced the size of new work units, but there will likely be large work units in the future (larger proteins, longer methods etc..).


Ok, thanks. I am upgrading to the new 5.2.* tonight. I didnt want to upgrade earlier because i was running a spinup for another project and was worried about the future stability but they assured it wont crash. We shall see.
3) Message boards : Number crunching : Workunit series 1hz6a (Message 3254)
Posted 15 Nov 2005 by Christian Barrett
Post:
Yes, I've got one of the 1hz6a work units and it's got to 90% after 3 hours 30 mins. Most of the other work units have taken just over an hour.
I wonder if we could be given some idea how long a WU is going to take in comparison with the original WU's that were given out. The project team must have some idea how complex each different protein WU is.


mine went for 4 hours and 40min before i got this
11/14/2005 5:48:49 PM|rosetta@home|Unrecoverable error for result 1hz6A_abrelaxmode_random_length20_jitter02_omega_sim_aneal_03386_0 ( - exit code -164 (0xffffff5c))

thats 4 failures out of 4 different units. They also failed at various times during the run.

the unit is located here
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=1680973
4) Message boards : Number crunching : Workunit series 1hz6a (Message 3226)
Posted 14 Nov 2005 by Christian Barrett
Post:
Mine is stuck at 70% for the past 6 hours. Never had any problems before.


well, i have had 3 fail on me now. the fourth has run twice if not three times as long as the previous set but hasnt client errored yet. we shall see.
5) Message boards : Number crunching : Workunit series 1hz6a (Message 3165)
Posted 14 Nov 2005 by Christian Barrett
Post:
I just recieved new workunits for this series and the first two i started crashed during the run. Is this series stable? I havent had any problems with the other series.

11/13/2005 6:30:43 PM|rosetta@home|Pausing result 1hz6A_abrelaxmode_random_length20_jitter02_omega_00594_0 (removed from memory)
11/13/2005 6:30:45 PM|rosetta@home|Unrecoverable error for result 1hz6A_abrelaxmode_random_length20_jitter02_omega_00594_0 ( - exit code -1073741819 (0xc0000005))
11/13/2005 6:30:45 PM||request_reschedule_cpus: process exited

and

11/14/2005 1:07:15 AM|rosetta@home|Unrecoverable error for result 1hz6A_abrelaxmode_random_length20_jitter02_omega_sim_aneal_01678_0 ( - exit code -1073741819 (0xc0000005))

i know from the FAQ that these are general client errors but it didnt start until this new series.
6) Message boards : Number crunching : Error when running CPU benchmark ?? (Message 512)
Posted 26 Sep 2005 by Christian Barrett
Post:
24/09/2005 17:44:03||Suspending computation and network activity - running CPU benchmarks
24/09/2005 17:44:03|rosetta@home|Pausing result 1pvaA_abrelax_20533_0 (removed from memory)
24/09/2005 17:44:03|rosetta@home|Pausing result 1pvaA_abrelax_23518_0 (removed from memory)
24/09/2005 17:44:04|rosetta@home|Unrecoverable error for result 1pvaA_abrelax_20533_0 ( - exit code -1073741819 (0xc0000005))
24/09/2005 17:44:04|rosetta@home|Unrecoverable error for result 1pvaA_abrelax_23518_0 ( - exit code -1073741819 (0xc0000005))
24/09/2005 17:44:04||request_reschedule_cpus: process exited


Did someone of you got that kind of errors ?


I got this same error but for a different reason. Mine happened when i manually switched it to another project during a run. I think the error is from the same action. Rosetta must have trouble holding its information after well into the crunch, maybe 50% or more. I played with switching around 8% and didnt have a problem with crashes, only later in the runs.

I think this bug is new with 4.77 but we might have to wait until the Rosetta peeps are back from vacation.
7) Message boards : Number crunching : Error report (Message 511)
Posted 26 Sep 2005 by Christian Barrett
Post:
@All
If somebody has a better idea, i'm sure that C. Barrett will be happy to hear you.
He has also 512 mb of ram.
And if he dooesn't try, he will not know if my idea is the or a solution.


Ill be honest, this is the first time i had a stop on a switch in my 2 years of crunching so i had "assumed" that the programs were retained in memory on a switch. After reading your suggestion, i checked and lo and behold, it wasnt set up that way. hmmm.

I will switch it and see what happens. thanks!

A friend got the same error like you (but no problems with the other Boinc projects).
He has also "only" 256 mb of ram and all the wus ran errored.
He tried "Leave applications in memory while preempted? yes" and now, all is ok.

So when i saw your error message, i had the same idea.


Well, i tried this and it worked, BUT, my ram utilization pegged and i barely had anything left over to navigate the web among other things. 498mb used out of 508 avail. I shut it back off (retain in memory) and i can operate my PC again.

Since this problem is only with Rosetta, ill just have to do Rosetta in straight runs as opposed to switching to other projects.

thanks for assist
8) Message boards : Number crunching : Error report (Message 466)
Posted 25 Sep 2005 by Christian Barrett
Post:
@All
If somebody has a better idea, i'm sure that C. Barrett will be happy to hear you.
He has also 512 mb of ram.
And if he dooesn't try, he will not know if my idea is the or a solution.


Ill be honest, this is the first time i had a stop on a switch in my 2 years of crunching so i had "assumed" that the programs were retained in memory on a switch. After reading your suggestion, i checked and lo and behold, it wasnt set up that way. hmmm.

I will switch it and see what happens. thanks!
9) Message boards : Number crunching : Error report (Message 438)
Posted 25 Sep 2005 by Christian Barrett
Post:
9/24/2005 9:26:46 PM|Einstein@Home|Restarting result l1_0992.5__0992.9_0.1_T06_S4lB_0 using einstein version 4.79
9/24/2005 9:26:46 PM|rosetta@home|Pausing result 1pvaA_abrelax_no_cst_1487_0 (removed from memory)
9/24/2005 9:26:48 PM|rosetta@home|Unrecoverable error for result 1pvaA_abrelax_no_cst_1487_0 ( - exit code -1073741819 (0xc0000005))

FYI

I got this when BOINC switched to einstein. the current run errored out. Running client 4.77
10) Message boards : Number crunching : If There Is No Screensaver........ (Message 430)
Posted 24 Sep 2005 by Christian Barrett
Post:
Total agreement here -- I would not be using the screensaver ...



a screensaver takes too much ram and processor time out of crunching. I will also go to blank screen. The only time i use a boinc project saver is if im in class and trying to recruit new crunchers. i have recruited so many to CPDN, LHC etc.. because of the screen saver. It pops up during a lecture and all those peeps behind me can watch what is going on. :-)

11) Message boards : Cafe Rosetta : Where are the BOINC regulars...8) (Message 45)
Posted 17 Sep 2005 by Christian Barrett
Post:
Hiya, a big booya from Nebraska USA!!






©2024 University of Washington
https://www.bakerlab.org