Posts by davidtaille

1) Message boards : Number crunching : Preemption Failures on Linux (Message 47887)
Posted 20 Oct 2007 by davidtaille
Post:
Following up on my stalls experience... some more data.

I setup another machine : it's a dell d610 laptop whith Ubuntu 7.04. Boinc messages say:
Starting BOINC client version 5.4.11 for i686-pc-linux-gnu
libcurl/7.16.0 OpenSSL/0.9.8d zlib/1.2.3
...
Processor: 1 GenuineIntel Intel(R) Pentium(R) M processor 2.00GHz
Memory: 1.98 GB physical, 2.50 GB virtual
Disk: 8.86 GB total, 1.43 GB free
The machine is not overclocked.
This machine was "boinc'ed" this morning (default prefs), attached only to Rosetta.
I let it work on 6 rosetta WUs : 4 completed successfully, 2 died in computation errors ; all 6 WUs ended in some way and were reported. That is : no stalls.
Then I attached the machine to Seti and let boinc switch between seti and rosetta every hour. 1 seti completed, 1 rosetta completed, and then I got the second rosetta WU stalled after a 2-hour run.

Just to make sure the 1st machine I mentioned 2 posts above was not having hardware problems, I paused LHC and Seti to let Rosetta run uninterrupted for some time. It successfully completed a Rosetta WU and seems to happily crunch on another. This has only occurred once before since Oct 7...
(by the way, this computer undergone a hardware test on Sept 30)

So, it seems that whether Pentium M or VIA Esther, Ubuntu 7.04 or CentOS 5 :
preempting => stalls.
not preempting => no stalls.
Until you fix the bug, I think I'll turn off project switching and let each WU run to its end w/o being paused.

David
2) Message boards : Number crunching : Preemption Failures on Linux (Message 47841)
Posted 18 Oct 2007 by davidtaille
Post:
Hi all,
I too experience stalls with rosetta.

I have a dedicated hosted server that spends alsmot all its time on boinc since Oct 7.

$ uname -a
Linux xx 2.6.18-8.1.14.el5 #1 SMP Thu Sep 27 18:58:54 EDT 2007 i686 i686 i386 GNU/Linux
It's a CentOS 5.0

The machine is a piece of hardware made by www.dedibox.fr to run their hosting business.
CPU : Centaur VIA Esther processor 2000MHz stepping 09 ; NX bit activated.
Motherboard : VIA-made, chipsets VIA CN700 & VT8237
RAM : 1GB

$ ./boinc -version
5.4.11 i686-pc-linux-gnu

BOINC settings : all default.

I attached to 3 projects : seti, lhc, rosetta.
Seti & lhc have no problems, and the machine could process successfully 410-credit worth WU since Oct 7 (2007).
Rosetta : only one WU got to successful completion ; all others have been reported in error, and I aborted 3 of them.

The typical situation when boinc thinks rosetta is at work is as follows :
----------------------------------------------
$top
top - 22:23:36 up 13 days, 18 min, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 68 total, 1 running, 67 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1019144k total, 915868k used, 103276k free, 352632k buffers
Swap: 1044216k total, 0k used, 1044216k free, 342212k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
....
9866 seti_at_ 15 0 5836 4124 1828 S 0.0 0.4 0:27.99 boinc
20128 seti_at_ 35 19 28564 22m 12 S 0.0 2.3 0:00.03 rosetta_beta_5.
20304 seti_at_ 35 19 27540 22m 12 S 0.0 2.3 0:00.03 rosetta_beta_5.
...
29430 seti_at_ 34 19 186m 99m 44 S 0.0 10.0 60:01.75 rosetta_beta_5.
29431 seti_at_ 34 19 186m 99m 44 S 0.0 10.0 0:00.00 rosetta_beta_5.
29432 seti_at_ 35 19 186m 99m 44 S 0.0 10.0 0:00.00 rosetta_beta_5.
29433 seti_at_ 34 19 186m 99m 44 S 0.0 10.0 0:00.00 rosetta_beta_5.
----------------------------------------------
Then I can see in boinc log that it has gone through hourly suspend/resume cycles for tens of hours, but process times for rosetta never changes... while WUs for seti or lhc complete !
sterr files in R@H slot shows nasty things :
----------------------------------------
$ cat stderr.txt
Graphics are disabled due to configuration...
# cpu_run_time_pref: 10800
# random seed: 1890337
SIGSEGV: segmentation violation
Stack trace (12 frames):
[0x8d7cf2f]
[0x8d77d1c]
[0xb7f0b420]
[0x8e024c7]
[0x8dd2715]
[0x8dd2481]
[0x83f9b4c]
[0x8de873f]
[0x8d79987]
[0x8d7afa5]
[0x8d73f9d]
[0x8e1487a]

Exiting...
-----------------------
Rosetta applications are rosetta_5.69_i686-pc-linux-gnu and rosetta_beta_5.80_i686-pc-linux-gnu.

Hope this helps.

David






©2024 University of Washington
https://www.bakerlab.org