Message boards : Number crunching : Problems with Rosetta version 5.93
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 9 · Next
Author | Message |
---|---|
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
argh im getting angry from the last 9 WU's i had 7 errored out. 7!!!! thats 77.77% today 2 more WU's crashed, but i dont feel like posting links anymore, its always the same stuff, sin and cosin thats out of range, when are you guys going to fix this. or give me a reply.? |
Michael Matthews Send message Joined: 12 Dec 05 Posts: 3 Credit: 37,852 RAC: 0 |
I only have the minimal graphics enabled for BOINC. All that is displayed is a graphic of the BOINC logo, the application that is running (Rosetta@home or SETI@home), the work unit name, and the percentage of the work unit completed so far. None of the 3D graphics is being used. Rosetta@home Beta 5.93 crashed again this afternoon. I'm getting rid of it. -Michael |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
It's been three days without any new watchdog errors. Here's my scoreboard for 5.93. Personally, I wonder what's different between my hosts and those of users like Luuklag who also has an AMD64 host but IS getting computation errors. I haven't seen one computation error yet, so something must be different. |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
well i guess its the type of WU i ran, 1 type but only finished 1 sucessfully out of 7 of them or so. so i guess its in the type of WU. It's been three days without any new watchdog errors. |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
well i guess its the type of WU i ran, 1 type but only finished 1 sucessfully out of 7 of them or so. so i guess its in the type of WU. I took the liberty of running your host with my "Rosetta-Pal". Then I copied and color coded all the work from yours combined with all the work from my "windows" hosts. Then I sorted by WU name and weeded out work not of the same "Job type", so we'd be comparing apples with apples. You had windows xp, I had winxp. You had AMD64, I had AMD64. Etc, Etc. Anyway, I found 4 instances were we did the same "job type" and you can see them below. I see that of the first job type, you had many computation errors, but your host also did one of them successfully. Your hosts are "Blue" when you had a error, and "Green" when you successfully completed one. Mine are a various colors so I added descriptions to the first column. My host can be discerned from the previous chart with the exception of my wife's laptop "M3700" which is a "Mobile AMD64 3700" using win xp(can't put linux on that one....lol). So, from what I see, it's probably NOT the job type/wus, or at least my hosts aren't having trouble with them. I wonder what else it could be?? [edit] on the second set of WUs I noticed a very early return date on the your wu I saw, so I rechecked, and that computation error was with 5.90, whereas my hosts were using 5.93. Also, that one was not a computation error, but Invalid. Also, Look at the 'good' wu you returned (green text), It's the very next consecutive "task ID" and "Work unit ID" number from the previous one, which failed, so your own host managed to do one type that it had previous failed to do.[/edit] |
Path7 Send message Joined: 25 Aug 07 Posts: 128 Credit: 61,751 RAC: 0 |
argh im getting angry Hi Luuklag, I looked into your tasks and opened the task details of WU 132125634 The Windows Runtime Debugger show also: ModLoad: 07280000 0000f000 C:WINDOWSsystem32ATKOGL32.dll (6.14.10.138) (-exported- Symbols Loaded) File Version : 6, 14, 10, 138 Company Name : ASUSTeK COMPUTER INC. Product Name : ASUSTeK Computer Inc. AsusOGL Product Version: 6, 14, 10, 138 ModLoad: 69500000 00574000 C:WINDOWSsystem32nvoglnt.dll (6.14.10.9147) (-exported- Symbols Loaded) File Version : 6.14.10.9147 Company Name : NVIDIA Corporation Product Name : NVIDIA Compatible OpenGL ICD Product Version: 6.14.10.9147 Those 2: ATKOGL32.dll & nvoglnt.dll both look like graphics card drivers. However one of the drivers might as well from some add-on software. Perhaps you changed your graphics card and left an old driver? I hope this information is useful to you. Path7. |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,860,059 RAC: 7,494 |
i also had a quick look and this: [01/08/08 21:42:17] TRACE [3172]: Retrieved the required window station [01/08/08 21:42:17] TRACE [3172]: Retrieved the required desktop [01/08/08 21:47:11] TRACE [3172]: Retrieved the required window station [01/08/08 21:47:11] TRACE [3172]: Retrieved the required desktop i would presume is a graphics issue, which would support Path7's detective work ;) |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
argh im getting angry yes got a new card about 2 months ago, same manufacturer, cause my card was called back because of cooling issues, it made enormous noize cause the bearings of the fan broke down. i just installed the new drivers, imho just an update of the drives, so i dont think there is a problem with that, cause i can do everything like play UT3 on high. |
Barraud Denis Send message Joined: 8 May 06 Posts: 6 Credit: 1,258,677 RAC: 0 |
roseta failed and stop/block boinc completely my Q6600, so i have stop this project to protect my others WU running on boinc. The boinc manager stay in memory but is not running, no WU could work. Even with BOINC and all projets completely reinstalled after a reboot, roseta bug again and block boinc. the only way to recover boinc, i found was to kill boinc manager, restart it and supress the roseta project rapidely, before it reload a new wu. I think roseta must be upgraded to disconnect it better from boinc, when it failled in error, to prevent boinc freeze. The information i have from event observer. Type de l'événement : Erreur Source de l'événement : Application Error Catégorie de l'événement : Aucun ID de l'événement : 1000 Date : 13/01/2008 Heure : 15:05:54 Utilisateur : N/A Ordinateur : C2Q1 Description : Application défaillante minirosetta_1.03_windows_intelx86.exe, version 0.0.0.0, module défaillant minirosetta_1.03_windows_intelx86.exe, version 0.0.0.0, adresse de défaillance 0x0027e8c2. Pour plus d'informations, consultez le centre Aide et support à l'adresse http://go.microsoft.com/fwlink/events.asp. Données : 0000: 41 70 70 6c 69 63 61 74 Applicat 0008: 69 6f 6e 20 46 61 69 6c ion Fail 0010: 75 72 65 20 20 6d 69 6e ure min 0018: 69 72 6f 73 65 74 74 61 irosetta 0020: 5f 31 2e 30 33 5f 77 69 _1.03_wi 0028: 6e 64 6f 77 73 5f 69 6e ndows_in 0030: 74 65 6c 78 38 36 2e 65 telx86.e 0038: 78 65 20 30 2e 30 2e 30 xe 0.0.0 0040: 2e 30 20 69 6e 20 6d 69 .0 in mi 0048: 6e 69 72 6f 73 65 74 74 nirosett 0050: 61 5f 31 2e 30 33 5f 77 a_1.03_w 0058: 69 6e 64 6f 77 73 5f 69 indows_i 0060: 6e 74 65 6c 78 38 36 2e ntelx86. 0068: 65 78 65 20 30 2e 30 2e exe 0.0. 0070: 30 2e 30 20 61 74 20 6f 0.0 at o 0078: 66 66 73 65 74 20 30 30 ffset 00 0080: 32 37 65 38 63 32 0d 0a 27e8c2.. |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
anyone please translate it into english... roseta failed and stop/block boinc completely my Q6600, so i have stop this project to protect my others WU running on boinc. The boinc manager stay in memory but is not running, no WU could work. Even with BOINC and all projets completely reinstalled after a reboot, roseta bug again and block boinc. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
see the enlish stuff in ( ) anyone please translate it into english... |
Ingemar Send message Joined: 28 Feb 06 Posts: 20 Credit: 1,680 RAC: 0 |
argh im getting angry Hi Luuklag, The overall error rates of the WU that are crashing for you are much lower than what you observe (around 2-5%). You may be unlucky, on the other hand they are caused by the the same problem (the cosine error) and not only for one type of WU so we need to fix that. We are looking into this problem to find the bug. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Just returned this task it is marked as valid, but has this in result file. fyi https://boinc.bakerlab.org/rosetta/workunit.php?wuid=121084932 5croA_BOINC_ABRELAX_VF_IGNORE_THE_REST-S25-18-S3-11--5croA-vf__2597_848_0 sin_cos_range ERROR: 1.2851869 is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: 1.2833332 is outside of [-1,+1] sin and cos value legal range pete. |
Yeti Send message Joined: 2 Nov 05 Posts: 45 Credit: 14,945,062 RAC: 0 |
|
Yeti Send message Joined: 2 Nov 05 Posts: 45 Credit: 14,945,062 RAC: 0 |
And one word from me: Please, discuss things like Rosetta against Ralph please in a different thread; I restarted crunching Rosetta with 5.93 and was looking, if something relevant is to be find about Errors with 5.93, but I had to read all your discussion. Yes, the content of this discussion is okay, but for me it is definitely the wrong place in this thread Supporting BOINC, a great concept ! |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
I finally got a computation error, and strangely enough, I woke to find one wus stuck at 100% and gkrellm showed 0% cpu use for that core. I have suspended and resumed that wu and now wait for it to run again. The "stuck one" is 1zpy__BOINC_DEFAULT_SYMM_FOLD_AND_DOCK-1zpy_native_2_2519_22709_0. The one which has already reported as a computation error is resultid=133308819 1zpy__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK_RELAX-1zpy_-native__2477_294683_0 and shows: <core_client_version>5.10.21</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> Graphics are disabled due to configuration... # cpu_run_time_pref: 10800 # random seed: 3191248 ********************************************************************** Rosetta score is stuck or going too long. Watchdog is ending the run! Stuck at score -66.1132 for 900 seconds ********************************************************************** GZIP SILENT FILE: ./xx1zpy.out SIGSEGV: segmentation violation Stack trace (22 frames): [0x8da3037] [0x8d9de2c] [0xffffe500] [0x89a1824] [0x804c828] [0x8a8ae99] [0x8a8babf] [0x8d0c170] [0x8c12abe] [0x8c14e33] [0x804c7c2] [0x8a835ed] [0x8a8586f] [0x89363de] [0x89380e3] [0x893ba27] [0x898ad7a] [0x85e96d6] [0x87289d2] [0x8728af2] [0x8e07384] [0x8048111] Exiting... so, it looks like I'm going to have two computation errors for my AMD64 X2 5200 under Linux |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
too late to edit. The second one which was stuck, remained stuck after the work scheduler got back around to it. I ended up exiting the mangager, opening Konsole, and killing Boinc. I then restarted and opened the manager. The result showed "ready to report", so it must have uploaded before the manager displayed it. Anyway, It was considered "Valid" and was granted credit like this never even happened. It's resultid=133326615 which shows: <core_client_version>5.10.21</core_client_version> <![CDATA[ <stderr_txt> Graphics are disabled due to configuration... # cpu_run_time_pref: 10800 # random seed: 3623102 ====================================================== DONE :: 1 starting structures 9911.7 cpu seconds This process generated 6 decoys from 6 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> ]]> Which seems completely uneventful to me, but I know it stuck. Leaving my host only using one core for who knows how long. |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
oops. linked to the wrong work unit for the stuck one. It was really, resultid=133258619 which showed this. <core_client_version>5.10.21</core_client_version> <![CDATA[ <stderr_txt> Graphics are disabled due to configuration... # cpu_run_time_pref: 10800 # random seed: 3630287 ********************************************************************** Rosetta score is stuck or going too long. Watchdog is ending the run! Stuck at score -84.1725 for 900 seconds ********************************************************************** GZIP SILENT FILE: ./xx1zpy.out SIGSEGV: segmentation violation Stack trace (21 frames): [0x8da3037] [0x8d9de2c] [0xffffe500] [0x8e2a1b9] [0x8df8727] [0x8dfaba1] [0x8cb4a2c] [0x8c1179b] [0x8c14e33] [0x804c7c2] [0x8a835ed] [0x8a8586f] [0x89363de] [0x893822e] [0x893ba27] [0x898ad7a] [0x85e96d6] [0x87289d2] [0x8728af2] [0x8e07384] [0x8048111] Exiting... No heartbeat from core client for 31 sec - exiting FILE_LOCK::unlock(): close failed.: Bad file descriptor Graphics are disabled due to configuration... # cpu_run_time_pref: 10800 ********************************************************************** Rosetta score is stuck or going too long. Watchdog is ending the run! Stuck at score -82.6613 for 900 seconds ********************************************************************** GZIP SILENT FILE: ./xx1zpy.out SIGSEGV: segmentation violation Stack trace (22 frames): [0x8da3037] [0x8d9de2c] [0xffffe500] [0x89a1824] [0x804c828] [0x8a8ae99] [0x8a8babf] [0x8d0c170] [0x8c12abe] [0x8c14e33] [0x804c7c2] [0x8a835ed] [0x8a8586f] [0x89363de] [0x893822e] [0x893ba27] [0x898ad7a] [0x85e96d6] [0x87289d2] [0x8728af2] [0x8e07384] [0x8048111] Exiting... SIGSEGV: segmentation violation SIGABRT: abort called [insert] about 200 more of the "abort called", but I snipped it for brevity SIGABRT: abort called </stderr_txt> ]]> |
AdeB Send message Joined: 12 Dec 06 Posts: 45 Credit: 4,428,086 RAC: 0 |
resultid 133097235 had some problems, but is valid after all - strange. <core_client_version>5.8.15</core_client_version> <![CDATA[ <stderr_txt> Graphics are disabled due to configuration... # cpu_run_time_pref: 10800 # random seed: 3031158 SIGSEGV: segmentation violation Stack trace (12 frames): [0x8da3037] [0x8d9de2c] [0xffffe420] [0x8e28653] [0x8df90a1] [0x8dfaac9] [0x83e8c0f] [0x8e0e98f] [0x8d9fab7] [0x8da10d5] [0x8d9a0c5] [0x8e3aa1a] Exiting... SIGSEGV: segmentation violation Stack trace (17 frames): [0x8da3037] [0x8d9de2c] [0xffffe420] [0x881d8ba] [0x881f90a] [0x88263b5] [0x8827d6d] [0x84fcf7a] [0x84fd442] [0x8b3e9c0] [0x8b4134b] [0x80d8efd] [0x85eaa7e] [0x8728a47] [0x8728af2] [0x8e07384] [0x8048111] Exiting... Graphics are disabled due to configuration... # cpu_run_time_pref: 10800 SIGSEGV: segmentation violation Stack trace (19 frames): [0x8da3037] [0x8d9de2c] [0xffffe420] [0x850ea02] [0x8c12f90] [0x876ba6c] [0x876c3fe] [0x87703bb] [0x878176f] [0x8787179] [0x8cf4461] [0x8b3e9dc] [0x8b4134b] [0x80d8efd] [0x85eaa7e] [0x8728a47] [0x8728af2] [0x8e07384] [0x8048111] Exiting... Graphics are disabled due to configuration... # cpu_run_time_pref: 10800 ====================================================== DONE :: 1 starting structures 10809.5 cpu seconds This process generated 8 decoys from 8 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> ]]> |
hedera Send message Joined: 15 Jul 06 Posts: 76 Credit: 5,263,150 RAC: 144 |
5.93 is eating my Windows machine alive. I tried to do something this afternoon and the box was so hung it was barely responding. Here's my system, from the opening log: 01/14/2008 8:11:54 AM||Starting BOINC client version 5.10.20 for windows_intelx86 01/14/2008 8:11:54 AM||log flags: task, file_xfer, sched_ops 01/14/2008 8:11:54 AM||Libraries: libcurl/7.16.4 OpenSSL/0.9.8e zlib/1.2.3 01/14/2008 8:11:54 AM||Data directory: C:Program FilesBOINC 01/14/2008 8:11:56 AM||Processor: 2 GenuineIntel Intel(R) Pentium(R) 4 CPU 3.20GHz [x86 Family 15 Model 4 Stepping 1] 01/14/2008 8:11:56 AM||Processor features: fpu tsc pae nx sse sse2 mmx 01/14/2008 8:11:57 AM||OS: Microsoft Windows XP: Professional Edition, Service Pack 2, (05.01.2600.00) 01/14/2008 8:11:57 AM||Memory: 1022.09 MB physical, 2.40 GB virtual 01/14/2008 8:11:57 AM||Disk: 145.27 GB total, 106.66 GB free 01/14/2008 8:11:57 AM||Local time is UTC -8 hours In mid-afternoon (around 3:30 PM local), first of all I had three WUs running at once; and when I looked at the task manager I saw that they were using a whole lot of memory: 319,896K 258,352K 34,636K That's 612,884K, just for Rosetta! Add to this the fact that ZoneAlarm Internet Security (which I recently installed to replace Norton) was running some kind of update, and I could barely get the mouse to respond. I suspended Rosetta temporarily so I could post this and let ZA finish whatever it was doing. (I'll be discussing this with them.) I've been running on the assumption that my computing preferences, which are pretty standard, would give me 2 WUs using, between them, 98-100% of CPU but NOT this much memory! Is there some tweak I should do to my settings? Should I expect to be running 3 or even 4 WUs at a time? --hedera Never be afraid to try something new. Remember that amateurs built the ark. Professionals built the Titanic. |
Message boards :
Number crunching :
Problems with Rosetta version 5.93
©2024 University of Washington
https://www.bakerlab.org