1)
Message boards :
Number crunching :
Preemption Failures on Linux
(Message 47887)
Posted 20 Oct 2007 by davidtaille Post: Following up on my stalls experience... some more data. I setup another machine : it's a dell d610 laptop whith Ubuntu 7.04. Boinc messages say: Starting BOINC client version 5.4.11 for i686-pc-linux-gnu libcurl/7.16.0 OpenSSL/0.9.8d zlib/1.2.3 ... Processor: 1 GenuineIntel Intel(R) Pentium(R) M processor 2.00GHz Memory: 1.98 GB physical, 2.50 GB virtual Disk: 8.86 GB total, 1.43 GB free The machine is not overclocked. This machine was "boinc'ed" this morning (default prefs), attached only to Rosetta. I let it work on 6 rosetta WUs : 4 completed successfully, 2 died in computation errors ; all 6 WUs ended in some way and were reported. That is : no stalls. Then I attached the machine to Seti and let boinc switch between seti and rosetta every hour. 1 seti completed, 1 rosetta completed, and then I got the second rosetta WU stalled after a 2-hour run. Just to make sure the 1st machine I mentioned 2 posts above was not having hardware problems, I paused LHC and Seti to let Rosetta run uninterrupted for some time. It successfully completed a Rosetta WU and seems to happily crunch on another. This has only occurred once before since Oct 7... (by the way, this computer undergone a hardware test on Sept 30) So, it seems that whether Pentium M or VIA Esther, Ubuntu 7.04 or CentOS 5 : preempting => stalls. not preempting => no stalls. Until you fix the bug, I think I'll turn off project switching and let each WU run to its end w/o being paused. David |
2)
Message boards :
Number crunching :
Preemption Failures on Linux
(Message 47841)
Posted 18 Oct 2007 by davidtaille Post: Hi all, I too experience stalls with rosetta. I have a dedicated hosted server that spends alsmot all its time on boinc since Oct 7. $ uname -a Linux xx 2.6.18-8.1.14.el5 #1 SMP Thu Sep 27 18:58:54 EDT 2007 i686 i686 i386 GNU/Linux It's a CentOS 5.0 The machine is a piece of hardware made by www.dedibox.fr to run their hosting business. CPU : Centaur VIA Esther processor 2000MHz stepping 09 ; NX bit activated. Motherboard : VIA-made, chipsets VIA CN700 & VT8237 RAM : 1GB $ ./boinc -version 5.4.11 i686-pc-linux-gnu BOINC settings : all default. I attached to 3 projects : seti, lhc, rosetta. Seti & lhc have no problems, and the machine could process successfully 410-credit worth WU since Oct 7 (2007). Rosetta : only one WU got to successful completion ; all others have been reported in error, and I aborted 3 of them. The typical situation when boinc thinks rosetta is at work is as follows : ---------------------------------------------- $top top - 22:23:36 up 13 days, 18 min, 1 user, load average: 0.00, 0.00, 0.00 Tasks: 68 total, 1 running, 67 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1019144k total, 915868k used, 103276k free, 352632k buffers Swap: 1044216k total, 0k used, 1044216k free, 342212k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND .... 9866 seti_at_ 15 0 5836 4124 1828 S 0.0 0.4 0:27.99 boinc 20128 seti_at_ 35 19 28564 22m 12 S 0.0 2.3 0:00.03 rosetta_beta_5. 20304 seti_at_ 35 19 27540 22m 12 S 0.0 2.3 0:00.03 rosetta_beta_5. ... 29430 seti_at_ 34 19 186m 99m 44 S 0.0 10.0 60:01.75 rosetta_beta_5. 29431 seti_at_ 34 19 186m 99m 44 S 0.0 10.0 0:00.00 rosetta_beta_5. 29432 seti_at_ 35 19 186m 99m 44 S 0.0 10.0 0:00.00 rosetta_beta_5. 29433 seti_at_ 34 19 186m 99m 44 S 0.0 10.0 0:00.00 rosetta_beta_5. ---------------------------------------------- Then I can see in boinc log that it has gone through hourly suspend/resume cycles for tens of hours, but process times for rosetta never changes... while WUs for seti or lhc complete ! sterr files in R@H slot shows nasty things : ---------------------------------------- $ cat stderr.txt Graphics are disabled due to configuration... # cpu_run_time_pref: 10800 # random seed: 1890337 SIGSEGV: segmentation violation Stack trace (12 frames): [0x8d7cf2f] [0x8d77d1c] [0xb7f0b420] [0x8e024c7] [0x8dd2715] [0x8dd2481] [0x83f9b4c] [0x8de873f] [0x8d79987] [0x8d7afa5] [0x8d73f9d] [0x8e1487a] Exiting... ----------------------- Rosetta applications are rosetta_5.69_i686-pc-linux-gnu and rosetta_beta_5.80_i686-pc-linux-gnu. Hope this helps. David |
©2024 University of Washington
https://www.bakerlab.org