Posts by Jimi@0wned.org.uk

1) Message boards : Number crunching : Discussion of the new credit system (Message 24748)
Posted 24 Aug 2006 by Jimi@0wned.org.uk
Post:
Ah well.

On the day my E6700 retail finally turned up, I've pulled my rigs from distributed computing. The x2 4600+ @ 3GHz, 3000+ @ 2.55GHz and 3700+ are having a nice rest and my electricity bills will fall. It has been fun thanks to Xtreme Systems, Free-DC and the Dutch Power Cows, but I think the DC community (and perhaps the technology) needs to mature a little longer. I have other, very expensive things to do now involving horsepower and acceleration... so it's over here for me. :)

Good luck all.
2) Message boards : Number crunching : Report Problems with Rosetta Version 5.22 (Message 18844)
Posted 17 Jun 2006 by Jimi@0wned.org.uk
Post:
First error ever on this machine (31,000 credit):

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=19785224

stderr out

<core_client_version>5.5.0</core_client_version>
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 3706611
# cpu_run_time_pref: 14400
# cpu_run_time_pref: 14400
ERROR:: Exit at: .dock_structure.cc line:401

</stderr_txt>

btw [BOINCUK]Tigher, (0xc0000005) is usually a memory error, in my experience.

3) Message boards : Number crunching : Can't merge computers (Message 18360)
Posted 10 Jun 2006 by Jimi@0wned.org.uk
Post:
WOOOO00000OOT!!11!!!

Merge and delete works again! Thank you Rosettistas, that is brilliant. :D
4) Message boards : Number crunching : Report Problems with Rosetta Version 5.16 II (Message 18249)
Posted 9 Jun 2006 by Jimi@0wned.org.uk
Post:
My air-cooled x2 3800 is now kaput. Thrashed to death, 78,582.75 Cobblestones since 31st March. A fallen soldier indeed.

I'll have to make up lost ground with Conroe... :D
5) Message boards : Number crunching : Report Problems with Rosetta Version 5.16 II (Message 17993)
Posted 7 Jun 2006 by Jimi@0wned.org.uk
Post:
My air-cooled dual core AMD is slowly dying. Is the data still good in this unit? It bothers me that a bad machine might poison the result. How did the following WU validate with those errors? Was it restarting from an earlier checkpoint?

Result ID 23165576
Name t304__CASP7_ABRELAX_SAVE_ALL_OUT_cterm2_hom001__654_16007_0
Workunit 19503478
Created 7 Jun 2006 11:57:47 UTC
Sent 7 Jun 2006 13:27:18 UTC
Received 7 Jun 2006 19:29:58 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 241725
Report deadline 14 Jun 2006 13:27:18 UTC
CPU time 14133.575291
stderr out

<core_client_version>5.3.6</core_client_version>
<stderr_txt>
# random seed: 1986744
SIGSEGV: segmentation violationStack trace (16 frames):
[0x8836a6b]
[0x884f74c]
[0xffffe500]
[0x88d0170]
[0x88d1a29]
[0x88a0767]
[0x88a2b51]
[0x81eb08b]
[0x87298fc]
[0x87d2f38]
[0x8313d95]
[0x80e49ed]
[0x849682f]
[0x8498c8f]
[0x88aec34]
[0x8048111]

Exiting...
# random seed: 1986744
# cpu_run_time_pref: 14400
SIGSEGV: segmentation violationStack trace (21 frames):
[0x8836a6b]
[0x884f74c]
[0xffffe500]
[0x882e6bc]
[0x8625638]
[0x83671a9]
[0x8361a84]
[0x8729051]
[0x84cea28]
[0x84cedc4]
[0x84cfb67]
[0x84de8b1]
[0x84e06f1]
[0x87d42c3]
[0x86afa6b]
[0x86b2089]
[0x80e5111]
[0x849682f]
[0x8498c8f]
[0x88aec34]
[0x8048111]

Exiting...
# cpu_run_time_pref: 14400
# DONE :: 1 starting structures built 20 (nstruct) times
# This process generated 20 decoys from 20 attempts

</stderr_txt>

Validate state Valid
6) Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I (Message 17073)
Posted 25 May 2006 by Jimi@0wned.org.uk
Post:
No Benny, that's down to whether you have "ACPI Multiprocessor PC" in your devices instead of "ACPI Uniprocessor PC", it makes switching between single and dual cores a hassle.

The machine eventually fell over and rebooted, the WU thought about crashing (there's the first line of BOINC debug in the result) but it picked itself up and completed normally.
7) Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I (Message 17055)
Posted 25 May 2006 by Jimi@0wned.org.uk
Post:
My bad: found the missing WUs, they have low ID numbers.

Still got this weird imbalance between CPU usage though - on a dual-core, instead of 50:50 it's 97:3 (as in one WU using 97% of both cores).

On other machines, RAM usage is 100MB or 235MB. This WU grabbing all the CPU is using 770MB! Is that some kind of doomsday machine? This is it -

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=17193170
8) Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I (Message 17043)
Posted 25 May 2006 by Jimi@0wned.org.uk
Post:
Sorry Rhiju, just saw this. Rosetta is the only BOINC program on the machine and crunching is all it does at the moment. It looks like memory instability; I lost 4 units in quick succession this morning but a reboot seems to have fixed it.

WUs were:

17959274
17975623
17976020
17981072

I've had trouble with this RAM before and managed to tweak it back into life, it seems to be going sour again. Crucial Ballistix DDR500 2x1GB, it tends to do this kind of thing. :(

Jimi, if you happen to be running something
beside rosetta@home, have you noticed application crashes with other
BOINC apps that have graphics?
2 successive application crashes, look like memory errors. Gave the machine a reboot and it seems ok again. WUs:

17905138
17921169


9) Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I (Message 17026)
Posted 24 May 2006 by Jimi@0wned.org.uk
Post:
2 successive application crashes, look like memory errors. Gave the machine a reboot and it seems ok again. WUs:

17905138
17921169
10) Message boards : Number crunching : Improvements to Rosetta@home based on user feedback (Message 16844)
Posted 22 May 2006 by Jimi@0wned.org.uk
Post:
The stats you cite are only adversely effected to the extent that individual machine statics are the focus of the information desired. The vast majority of the stats are in fact collective, and not dependent on a view of a specific machine. All of the stats that depend on your total credit are still as accurate as ever, even the rac for a particular machine will reflect the proper contribution if allowed to do so. The only stats significantly affected by this issue are those relating to the total credit for a particular machine.


True, it doesn't affect the project as such. I was primarily interested in the proportions of different CPUs when I was looking at the stats yesterday; it was then that it occurred to me that the numbers were probably out by a large margin. This in turn affects the credit/RAC averages per processor type, which are also interesting.

Just a way of deleting "false start" instances of machines would be a good place to be - I have several with zero credit; one took 3 "false-starts" before it kicked into life last time, so the credit of one machine is read as that of 4.

It's pointless jeopardising the data in it's entirety while trying to fix it - I'll wait until the bugs are out, thanks. :)
11) Message boards : Number crunching : Improvements to Rosetta@home based on user feedback (Message 16820)
Posted 22 May 2006 by Jimi@0wned.org.uk
Post:
A plea: can merging be fixed? I have orphaned computers left right and centre. It's making a nonsense of the stats as well; Boinc Synergy thinks I have 6 machines when I have 3, repeat that to some degree for every producer and the stats become completely meaningless.

Is there a db whiz among you?
12) Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I (Message 16691)
Posted 20 May 2006 by Jimi@0wned.org.uk
Post:
Failed. I accidentally knocked the power off the water pump without noticing and the CPU brewed up. No damage.

Result ID 20840881
Name t283_HOMOLOG_ABRELAX_hom001__515_15191_0
Workunit 17390031
13) Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I (Message 16525)
Posted 18 May 2006 by Jimi@0wned.org.uk
Post:
This one was plainly moribund; couldn't launch graphics either:

Result ID 20606602
Name CASP_HOMOLOG_ABRELAX_hom001_t287__507_26468_0
Workunit 17176092
Created 17 May 2006 12:03:30 UTC
Sent 17 May 2006 13:44:57 UTC
Received 18 May 2006 8:32:18 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -197 (0xffffff3b)
Computer ID 213209
Report deadline 31 May 2006 13:44:57 UTC
CPU time 628.875
stderr out

<core_client_version>5.2.13</core_client_version>
<message>aborted via GUI RPC
</message>
<stderr_txt>
# random seed: 2200513

</stderr_txt>

Validate state Invalid
Claimed credit 5.0269442828032
Granted credit 0
application version 5.16
14) Message boards : Number crunching : Crunching with GPU? (Message 16250)
Posted 14 May 2006 by Jimi@0wned.org.uk
Post:
Well, I've seen it mooted that it is a multi-core processor with a 4x4 matrix and inter-core communication speeds of 2Tb/s. Not that this helps <sigh>. Ageia don't say anything about the architecture.
15) Message boards : Number crunching : Crunching with GPU? (Message 16187)
Posted 13 May 2006 by Jimi@0wned.org.uk
Post:
Do the Ageia PhysX PPUs have this limitation?
16) Message boards : Number crunching : Report Problems with Rosetta Version 5.07 (Message 15557)
Posted 5 May 2006 by Jimi@0wned.org.uk
Post:
2 WUs with coding errors?

WU 15827716

<core_client_version>5.2.13</core_client_version>
<message>Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 1048865
# cpu_run_time_pref: 14400
# cpu_run_time_pref: 14400
ERROR:: Exit at: .hbonds.cc line:293

</stderr_txt>

WU 15757279

<core_client_version>5.2.13</core_client_version>
<message>Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 1767551
# cpu_run_time_pref: 14400
# cpu_run_time_pref: 14400
17) Message boards : Number crunching : Discuss Rosetta Application Errors and Fixes (all Vers) (Message 15311)
Posted 2 May 2006 by Jimi@0wned.org.uk
Post:
Thanks for the input BennyRop. I detached the project and gave the system a good test again (memtest, dual prime), finally raised the vcore from 1.39v to 1.42v and dual primed for 6 hours, everything is ok once more. Tried 1.39v to check in Rosetta, locked up in 4 minutes. Why it ran pefectly well for nearly 4 weeks at lower vcore and then turned nasty is a mystery. It has been running fine since yesterday afternoon at the higher voltage.

It is a bare build that only does Rosetta. I'm tempted to go Linux 64-bit on it tbh.
18) Message boards : Number crunching : Discuss Rosetta Application Errors and Fixes (all Vers) (Message 15090)
Posted 30 Apr 2006 by Jimi@0wned.org.uk
Post:
Is there something grossly different about 5.06 and 5.07, or are the current WUs just more demanding? Prime is still running after 90 minutes with lower temps than Rosetta, which seems to tell me that Rosetta has become extremely tough on the hardware. I know it's difficult to quantify, but is this acknowledged by those that know more about it?

NB: this seems to be the wrong thread for my comments, but it's possible WU problems are going to be mistook for hardware problems if the WUs really have got tougher. Where shall I take this?
19) Message boards : Number crunching : Discuss Rosetta Application Errors and Fixes (all Vers) (Message 15087)
Posted 30 Apr 2006 by Jimi@0wned.org.uk
Post:
I've started getting lockups on my x2 3800+ system over the last week; I think it's a RAM problem as it has got worse and worse over time, I can barely get 20 minutes loaded out of it now w/o a lockup. Running dual prime seems to stress the system less than Rosetta, certainly CPU temps are 2 or 3 degrees lower. So I've detached the system and am priming again, trying to nail exactly what's wrong.
20) Message boards : Number crunching : Report Errors for Rosetta Version 5.06 (Message 14887)
Posted 28 Apr 2006 by Jimi@0wned.org.uk
Post:
My WUs are set to run 4 hours, generally they finish in around 14,000 seconds of CPU time. However, the first two 5.06 WUs I've had have finished in 3,400 and 7,800 seconds CPU respectively. Is this a bug?


Next 20



©2024 University of Washington
https://www.bakerlab.org