Message boards : Number crunching : Discuss Rosetta Application Errors and Fixes (all Vers)
Author | Message |
---|---|
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
Latter edit to include all the ones failed up to 7:20 AM AST Deja Vu, again!!! ALL my 5.06 units processed till now have resulted in computing errors. (ARGHHHH) 1 - https://boinc.bakerlab.org/rosetta/result.php?resultid=18463185 Result ID 18463185 Name AB_CASP6_t216__458_3672_0 Workunit 15244682 Created 28 Apr 2006 0:06:10 UTC Sent 28 Apr 2006 4:29:15 UTC Received 28 Apr 2006 10:29:12 UTC Exit status -1073741819 (0xc0000005) Report deadline 12 May 2006 4:29:15 UTC CPU time 6.140625 stderr out <core_client_version>5.2.13</core_client_version> <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> LoadLibraryA( symsrv.dll ): GetLastError = 126 LoadLibraryA( srcsrv.dll ): GetLastError = 126 2 https://boinc.bakerlab.org/rosetta/result.php?resultid=18460320 Result ID 18460320 Name AB_CASP6_t212__458_3475_0 Workunit 15242120 Created 27 Apr 2006 23:21:55 UTC Sent 28 Apr 2006 3:47:24 UTC Received 28 Apr 2006 10:29:12 UTC Exit status -1073741819 (0xc0000005) Report deadline 12 May 2006 3:47:24 UTC CPU time 2 stderr out <core_client_version>5.2.13</core_client_version> <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> LoadLibraryA( symsrv.dll ): GetLastError = 126 LoadLibraryA( srcsrv.dll ): GetLastError = 126 3 https://boinc.bakerlab.org/rosetta/result.php?resultid=18456375 Name AB_CASP6_t212__456_6140_0 Workunit 15238711 Created 27 Apr 2006 22:26:22 UTC Sent 28 Apr 2006 2:56:19 UTC Received 28 Apr 2006 10:29:12 UTC Exit status -1073741819 (0xc0000005) CPU time 2923.40625 stderr out <core_client_version>5.2.13</core_client_version> <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> # cpu_run_time_pref: 14400 # random seed: 2261381 LoadLibraryA( symsrv.dll ): GetLastError = 126 LoadLibraryA( srcsrv.dll ): GetLastError = 126 4 https://boinc.bakerlab.org/rosetta/result.php?resultid=18465643 Name AB_CASP6_t216__458_3834_0 Workunit 15246788 Created 28 Apr 2006 0:39:35 UTC Sent 28 Apr 2006 5:06:49 UTC Received28 Apr 2006 10:29:12 UTC Exit status -1073741819 (0xc0000005) CPU time 23.484375 stderr out <core_client_version>5.2.13</core_client_version> <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> LoadLibraryA( symsrv.dll ): GetLastError = 126 LoadLibraryA( srcsrv.dll ): GetLastError = 126 5 https://boinc.bakerlab.org/rosetta/result.php?resultid=18465644 Name AB_CASP6_t242__458_3834_0 Workunit 15246789 Created 28 Apr 2006 0:39:35 UTC Sent 28 Apr 2006 5:06:49 UTC Received 28 Apr 2006 11:03:34 UTC Exit status -1073741795 (0xc000001d) Report deadline 12 May 2006 5:06:49 UTC CPU time 673.375 stderr out <core_client_version>5.2.13</core_client_version> <message> - exit code -1073741795 (0xc000001d) </message> <stderr_txt> # random seed: 2173667 LoadLibraryA( symsrv.dll ): GetLastError = 126 LoadLibraryA( srcsrv.dll ): GetLastError = 126 Jose is sad. very frustrated My frustration is way past the issue of credits given/granted and team standings. This rate of errors is not acceptable. This is way too high. How can one, in good faith, ask for new volunteers if when they ask you to see your results the potential volunteers see such a large rate of errors? Large number of errors do not attract people to a project, any project. I know I am a small fish but, the last 10 people I have tried to recruit have pointed to what they perceive is an unsatisfactory error rate as the primary reason they wont join. This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
My frustration is way past the issue of credits given/granted and team standings. Have you considered the possibility that it's your HARDWARE which is at fault here? I have been crunching on 3xP4 PCs 24/7 for almost FOUR (4) MONTHS now and sofar have had TWO (2) stuck WUs. TWO! Do the math about the % of bad WUs! Try joining another project, like Einstein@home and see if your WUs verify there. Check your computer's memory (usually that's where 90% of problems arise) and then CPU heating and finally overall stability (mobo+CPU+memory) using: http://www.memtest.org/ http://www.mersenne.org/prime.htm (use "torture test" option for 12-48hrs) Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
Robinski Send message Joined: 7 Mar 06 Posts: 51 Credit: 85,383 RAC: 0 |
My frustration is way past the issue of credits given/granted and team standings. I have got 5 machines running Rosetta and haven't got many errors. I also think it has to do something with your particular hardware setup. this could be either faulty hardware, so try tunning the test. or it is the combination of rosetta and your hardware so try running something else. Member of the Dutch Power Cows Trying to get the world on IPv6, do you have it? check here: IPv6.RHarmsen.nl |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
Have you considered the possibility that it's your HARDWARE which is at fault here? I have been crunching on 3xP4 PCs 24/7 for almost FOUR (4) MONTHS now and sofar have had TWO (2) stuck WUs. TWO! Do the math about the % of bad WUs! Try joining another project, like Einstein@home and see if your WUs verify there. Check your computer's memory (usually that's where 90% of problems arise) and then CPU heating and finally overall stability (mobo+CPU+memory) using: http://www.memtest.org/ http://www.mersenne.org/prime.htm (use "torture test" option for 12-48hrs) [/quote] I used the torture test and my computer passed it with flying colors. My computer is kept in an air-cooled environment ( AC + a 30" cooling fan) that keeps the temperature around my computer around the 50F . Right now the outside environment temperature in my house is hovering around 87 to 100 F , So I do place a lot of effort to keep my computer cooled. ( I cringe at the thought of how my house partner will react when the electricity bill arrives , but I will worry about that when that comes) Somehow I have the strange feeling that the problem may due to the type of unit and the targeted cpu time in my preferences. Before I had to detach from RALPH ( I was having troubles with the transition between applications that I couldn't solve) , my computer processed without errors some AB_CASP6 units at 1 hour. All my Rosetta Failures came when I reset up my preferences to 4 hours. Right now , the computer is processing another type of WU seemingly wo problem and at a nice clip. Let's see what happens when the next AB_CASP6 type units comes along (and there are three such units in line in my work queque) Dimitris: I wish I could convince my friends showing them your statistics, but the problem is that the people that I am trying to recruit work in environments (computing, environmental and technical) similar to mine. So, they want to check up and compare with something similar to theirs. My life-partner has placed my conundrum and hassles in a perspective that at least makes me smile: As long as I become the error magnet for Rosetta, other people are not getting the dastardly units. If that is going to be my contribution to the project, so be it. But, I wanted to do more. This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
5.06 so far has not shown any improvement in performance (As in completed/valid WU's) . All the 5.06 units I have processed until now have yielded computational errors. This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
I used the torture test and my computer passed it with no errors reported. My computer shows: (According to my Memory optimizer I have 636 mb of free physical memory out of a total memory of 1007.23 MB and 1889 MB of free space in paging file) The following data comes from my computer info in the Rosetta file. Cache 976.56 KB Swap space 2353.72 MB Total disk space 76.69 GB Free Disk Space 71 GB Measured floating point speed 2012.98 million ops/sec Measured integer speed 4045.58 million ops/sec I dont know how good are these numbers for a P4, 2.26 GZ Northwood chip that records a L1 cache of 8kb and an L2 cache of 512 KB) As I told other people before: I know bupcous about hardware/benchmarks . So if any one wants to tell me what all those numbers mean, please contact me at joseantonio@choicecable.net [b] My computer is kept in an air-cooled environment ( AC + a 30" cooling fan) that keeps the temperature around my computer around the 50F . Right now the outside environment temperature in my house is hovering around 87 to 100 F , So I do place a lot of effort to keep my computer cooled. ( I cringe at the thought of how my house partner will react when the electricity bill arrives , but I will worry about that when that comes) Somehow I have the strange feeling that the problem may due to the type of unit and the targeted cpu time in my preferences. Before I had to detach from RALPH ( I was having troubles with the transition between applications that I couldn't solve) , my computer processed without errors some AB_CASP6 units at 1 hour. All my Rosetta Failures came when I reset up my preferences to 4 hours. Right now , the computer is processing another type of WU seemingly wo problem and at a nice clip. Let's see what happens when the next AB_CASP6 type units comes along (and there are three such units in line in my work queque) Dimitris: I wish I could convince my friends showing them your statistics, but the problem is that the people that I am trying to recruit work in environments (computing, environmental and technical) similar to mine. So, they want to check up and compare with something similar to theirs. [b] My life-partner has placed my conundrum and hassles in a perspective that at least makes me smile: As long as I become the error magnet for Rosetta, other people are not getting the dastardly units. If that is going to be my contribution to the project, so be it. But, I wanted to do more. [/quote] This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
Robinski Send message Joined: 7 Mar 06 Posts: 51 Credit: 85,383 RAC: 0 |
Interesting idea that is is something with the WU lenght. Got mine set on 1 Hour, i could check if this is the problem, but right now I am in a competition and I don't want to change the settings. Afther the weekend I could test it, and change my WU's to 4 hours or something. Member of the Dutch Power Cows Trying to get the world on IPv6, do you have it? check here: IPv6.RHarmsen.nl |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
For errors resulting from Rosetta Version 5.01, continue to report here. All the units I have reported in this thread are 5.06. The first 5 WUs are AB_CASP6 and they ALL are 5.06. The 6th one I reported is also a 5.06. This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
@Jose Indeed it seems you are the error magnet. ;-) A few recommendations to make your contribution as helpful as possible: 1. Lower your cache size to 0.01 days so that you only have one WU at a time. 2. Set your CPU-Time to 3600 seconds but don't abort until the WU reaches 12000 seconds. All your errors are the 107 ones (except the ones you aborted), which is a mysterious error code for a variety of failures which are hard to identify. Ignore the 107er errors at the moment and look for any WU going past four times your preference settings. If there are no such ones 5.06 is already an improvement. As for the 107er errors be patient until the devs can figure something out from your dumps. It seems you are not participating in other BOICN-projects so we really can't be sure whether it is not something about your hardware in general. |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
|
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
@Jose Once I finish the Work Units I have I will reduce my cache size to .01 and the CPU-time to 3600 seconds and I wont abort till 1200 seconds I participated briefly in RALPH, My experience was frustrating as I was having problems setting it up so the transition between applications had me more confused and frustrated that I care to be right now. I was for a brief time a member of other projects : Einstein, Folding@Home and another which I dont recall ( Blame it on my old age) but I decided to dedicate myself COMPLETELY to Rosetta. [/color] This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
[color=darkred][b]Help has arrived. Lets see what happens after Friday. ( Today) This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
]Help has arrived. The Work Units that error on friday (today) may not be included in the awards for the same day that they error. If not they will be in the next Fridays awards (one week later). Moderator9 ROSETTA@home FAQ Moderator Contact |
dag Send message Joined: 16 Dec 05 Posts: 106 Credit: 1,000,020 RAC: 0 |
Here's an interesting 5.06 result. Although I have my run-time set to 8 hours and all the jobs are ending around then, this AB_CASP6 job quit all by itself in less than 3.5 hours. If this is an example of the software catching a 'no progress' situation, then it worked. If not, then I'd like to hear speculation. dag --Finding aliens is cool, but understanding the structure of proteins is useful. |
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
Here's an interesting 5.06 result. Although I have my run-time set to 8 hours and all the jobs are ending around then, this AB_CASP6 job quit all by itself in less than 3.5 hours. If this is an example of the software catching a 'no progress' situation, then it worked. If not, then I'd like to hear speculation. I also have 8hr WUs and noticed a similar behaviour under WinXP for the last 2 WUs sofar, ending at ~15.2k seconds (instead of the expected 28k seconds) both at 15 nstruct, e.g. 18463077 18444075 Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
Here's an interesting 5.06 result. Although I have my run-time set to 8 hours and all the jobs are ending around then, this AB_CASP6 job quit all by itself in less than 3.5 hours. If this is an example of the software catching a 'no progress' situation, then it worked. If not, then I'd like to hear speculation. Dimitry you talk about hours... look at my logs the units were terminated with errors in minutes..... some less than in a minute This and no other is the root from which a Tyrant springs; when he first appears he is a protector.†Plato |
anders n Send message Joined: 19 Sep 05 Posts: 403 Credit: 537,991 RAC: 0 |
Here's an interesting 5.06 result. Although I have my run-time set to 8 hours and all the jobs are ending around then, this AB_CASP6 job quit all by itself in less than 3.5 hours. If this is an example of the software catching a 'no progress' situation, then it worked. If not, then I'd like to hear speculation. If you look at the result you will se that it asked for 30 "runns". This text - 30 (nstruct) times - And it was return when they was done. Anders n |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
Jose - how about trying to temporarily unload your memory optimizer for a day, and see how that changes your success rate? |
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
@cMw you will get credit for those in a week or so. These are interesting errors since it seems a lot of models could be calculated until the error stroke. I'm sure Rhiju will have a look on those errors. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
my rac has dropped 25 pts how am i supposed to contribute at all when ALL i have gotten so far is computation errors... Your results both show models completed. Therefore credit will be granted. Your RAC is a temporary problem, it is due to these WUs running for a longer time before they completed (apparently you increased your preference for crunch time per WU??). And so once credit is granted, your RAC will probably jump higher than you expected. With the new changes, and increased frequency of checkpointing, your RAC is going to go up! But it takes time to impact your RAC. You've only been with the project 4 days, your RAC will stabilize over time. Your credit will be granted. Your reported results are useful. Even the fact that you encountered an error is useful. Your langague is unappreciated, unncessary and definately not helping resolve anything. Please retain some composure. And try not to let virtual objects like credits effect your blood pressure, stress level and health. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Message boards :
Number crunching :
Discuss Rosetta Application Errors and Fixes (all Vers)
©2024 University of Washington
https://www.bakerlab.org