Posts by Marky-UK

1) Message boards : Number crunching : Problems with web site (Message 54928)
Posted 5 Aug 2008 by Marky-UK
Post:
The back-end fileserver had crashed - kernel panic. I've just returned from a quick run to campus to reboot that server...

Stats export doesn't seem to have run since then. Does this need a kick too?
2) Message boards : Number crunching : Problems with version 5.90/5.91 (Message 49919)
Posted 21 Dec 2007 by Marky-UK
Post:
OK, just did the update -- this should revert the "cpu run time" and "% complete" behavior to what linux clients are used to! Please let me know if this fixes this issue (looks good locally).

Thanks Rhiju! I haven't had any WUs that have run to completion yet, but at least the CPU time is incrementing :-) I'll check on my clients in the morning.
3) Message boards : Number crunching : Problems with version 5.90/5.91 (Message 49882)
Posted 21 Dec 2007 by Marky-UK
Post:
if it's so common, why wasn't the linux problem picked up on RALPH???


Appears work actually completes normally, just the progress indicator not looking right along the way. So you would actually have to watch it run to see any problem.

Work might complete eventually, but it definately doesn't complete normally. Every WU I've watched has gone past my runtime limit by hours. I suspect the only way the WUs will complete on their own is when Rosetta's internal timelimit kicks in (6x the runtime limit isn't it?). And that's assuming the built-in limit is even working.
4) Message boards : Number crunching : Problems with version 5.90/5.91 (Message 49862)
Posted 21 Dec 2007 by Marky-UK
Post:
This seems very odd. Thanks a lot for posting, especially the link to the workunit. I checked here that the %cpu usage is fine for other platforms, so I fear that this is a linux-specific issue.

Anyone else out there noticing success or failure with Linux?

Yes, I have two Linux boxes. Both are using 100% CPU time on Rosetta as seen in the process list, but BOINC Manager shows 0 progress and 0 CPU time on the WUs.

If I stop & start the BOINC client though, the WUs completed and uploaded OK. One had 9 hrs 50 min runtime (my preference is 3 hrs).
5) Message boards : Number crunching : XML stats not updating (Message 48824)
Posted 19 Nov 2007 by Marky-UK
Post:
Stats are updating again now. Exported at 18:58 UTC.
6) Message boards : Number crunching : XML stats not updating (Message 48753)
Posted 17 Nov 2007 by Marky-UK
Post:
I wonder if the XML stats haven't been updated due to the server update last Thursday.

I think that's when they stopped updating, so the two could be connected.
7) Message boards : Number crunching : XML stats not updating (Message 48737)
Posted 17 Nov 2007 by Marky-UK
Post:
Just a heads-up: The XML stats haven't been updated/exported since 15 Nov 2007.
8) Message boards : Number crunching : Problems with Rosetta version 5.81 (Message 48583)
Posted 12 Nov 2007 by Marky-UK
Post:
This is just a heads up that tons of MFR_SYMM_FOLD_AND_DOCK_RELAX units are erroring out AFTER crunching for a full runtime, which is annoying. It also errored out for people who got the unit after mine errored out.

Just a heads up!

edit: This isn't just one computer of mine but many different computers, not to mention that other people errored when they got the same unit. So it's not just a faulty computer on my end!

I'd agree with that - I haven't had a single MFR_SYMM_FOLD_AND_DOCK_RELAX WU that's worked; they all fail with a -161 error after completion.
9) Message boards : Number crunching : Predictor of the Day stopped again (Message 47881)
Posted 20 Oct 2007 by Marky-UK
Post:
The last two days have had a PotD with no name.
10) Message boards : Number crunching : Problems with Rosetta version 5.80 (Message 47443)
Posted 6 Oct 2007 by Marky-UK
Post:
I'm getting several -161 errors on 5.80 WUs now too.

</stderr_txt>
<message>
<file_xfer_error>
<file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_21387_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>


Waste of CPU time...
11) Message boards : Number crunching : Comparison of projects (Message 46453)
Posted 17 Sep 2007 by Marky-UK
Post:
Einstein and WGC award more credit that Rosetta. There's a cross-project comparison chart here. For example, on that chart Einstein gives ~30% more than Rosetta.

There may be some CPU differences as well though. Intel Core 2 CPUs seem to do slightly better at Rosetta than AMD CPUs.
12) Message boards : Number crunching : Welcome Back! (Message 45836)
Posted 9 Sep 2007 by Marky-UK
Post:
All but one of my uploads have completed, but one seems stuck with the same error:

09/09/2007 17:10:27 [file_xfer] Started upload of file Ly49A_BOINC_MFR_ABRELAX_PICKED_2065_36090_0_0
09/09/2007 17:10:35 [error] Error on file upload: can't open file
09/09/2007 17:10:35 [file_xfer] Temporarily failed upload of Ly49A_BOINC_MFR_ABRELAX_PICKED_2065_36090_0_0: transient upload error

The file does exist locally.
13) Message boards : Number crunching : Daily quota (Message 45367)
Posted 25 Aug 2007 by Marky-UK
Post:
You could increase your 'Target CPU run time' so that each WU runs for longer.
14) Message boards : Number crunching : Max # total results question (Message 44777)
Posted 7 Aug 2007 by Marky-UK
Post:
The mistake was in issueing the task to you in the first place, when it was still possible for a completion report to be accepted. I believe there is a BOINC issue open to get this fixed in the server scheduling programs. (someone please post a link if they find it)

I still contend that the mistake is setting the "max # of success results" number to just 1, when it is a known fact that work can get reissued as soon as a deadline is passed. IMHO this should be changed to 2 (and probably change the "max # of total results" to 3 as well).
15) Message boards : Number crunching : Problems with Rosetta version 5.68 and 5.70 (Message 43323)
Posted 6 Jul 2007 by Marky-UK
Post:
Just had this failure, the first one ever on this host I think: 91030536

core_client_version>5.10.7</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 10800
# random seed: 1086965
No heartbeat from core client for 31 sec - exiting
SIGSEGV: segmentation violation
Stack trace (13 frames):
[0x8cf2edb]
[0x8cedd0c]
[0xffffe420]
[0x8c5d2db]
[0x8b63872]
[0x8c44d10]
[0x849ba9e]
[0x80dae11]
[0x85c8e17]
[0x86f632b]
[0x86f63d6]
[0x8d56dd4]
[0x8048111]

Exiting...

</stderr_txt>
]]>
16) Message boards : Number crunching : Problems with Rosetta version 5.68 and 5.70 (Message 42697)
Posted 27 Jun 2007 by Marky-UK
Post:
Its probably going to be a big pain for you to figure out which one was used for which workunit... its probably best to post issues for both here!
If you can post a link to your workunit we should be able to figure out which application had the problem.

The application version is also shown at the bottom of the Result page.
17) Message boards : Number crunching : Credit not granted for reissued tasks (Message 41377)
Posted 24 May 2007 by Marky-UK
Post:
This issue has come up before and was discussed back in February here; David Kim said they were going to look into it, but the settings still seem to be the same.

18) Message boards : Number crunching : BOINC and electricity (Message 39496)
Posted 16 Apr 2007 by Marky-UK
Post:
For me, the time of year has an effect. During the winter, I'll run some old slow PCs as crunchers, with the added bonus that they heat my house. But they're getting switched off around now as we get into spring.
19) Message boards : Number crunching : Threadmaster: some pointers please? (Message 38188)
Posted 23 Mar 2007 by Marky-UK
Post:
If you want to use ThreadMaster to set a max CPU limit on ALL processes, just put a value in:

[HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesThreadMasterParameters]
CPUThresholdPct Type:REG_SZ Value:75
(for 75%)

You can set application specific values by putting values in:

[HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesThreadMasterParametersApplications]

The amounts are whole numbers, from 3-100.

Any applications listed in:

[HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesThreadMasterParametersExceptions]

are ignored by ThreadMaster - they're free to use 100% CPU if they want and ThreadMaster won't stop them.


If you want to limit or exclude Rosetta you'll have to use the application name (currently rosetta_5.54_windows_intelx86.exe).
20) Message boards : Number crunching : RSS feeds broken again (Message 37936)
Posted 17 Mar 2007 by Marky-UK
Post:
Excellent, both looking good now. Many thanks.


Next 20



©2024 University of Washington
https://www.bakerlab.org