Message boards : Number crunching : Problems with Rosetta version 5.43
Previous · 1 . . . 5 · 6 · 7 · 8
Author | Message |
---|---|
Chu Send message Joined: 23 Feb 06 Posts: 120 Credit: 112,439 RAC: 0 |
Thanks, feet1st and Thomas Bates. I don't remember seeing this kind of error before. Apparently the worker thread is caughted in an error(incorrect function), but it did not end the thread itself correctly and later on the watchdog kicked in to end the run. But Thomas reported that he had to manually kill the run. It seems that the boinc manager lost track on that WU and did not respond to the threads at all. Puzzled... Chu, RE: Thomas Bates, looks like this wu. It is called 1ail__BOINC_NOFILTERS_ABRELAX_SAVE_ALL_OUT_NEWRELAXFLAGS_frags83__1505_1221_0 |
zombie67 [MM] Send message Joined: 11 Feb 06 Posts: 316 Credit: 6,621,003 RAC: 0 |
That is the error code for not transfering result files correctly, either because the result files are not generated or because the client is unable to send the result files back to the server correctly. If you have only experienced such a problem recently, I would suggest to reset the project on your hosts as the current application has not been changed since last December and the specific WUs are returning valid results from other hosts. Seem like some communication issue between your host and the server, but I am not exactly sure what is causing that. Thanks. I'll give that a shot. Also, I noticed that these jobs did not run the full length of time (3 hours). So it's not like these are completing properly, and then failing to send back to the server correctly. Two changes on my end about the same time these failures started: upgraded from 5.8.3 to 5.8.4 Attached SZTAKI project Is it possible for another project to cause these problems? No one is reporting similar problems on the BOINC alpha alias, so I am thinking it is not being caused by the upgrade to 5.8.4. Edit: Resetting did not solve the problem. I am going to try downgrading to 5.8.3 to see if that changes anything. Reno, NV Team: SETI.USA |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
New iMac participant has WUs failing after 5min, but not the WU name of the known problem. Their report here. Exit status 131 and some odd messages in the results. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
arminius Send message Joined: 23 Sep 05 Posts: 8 Credit: 805,403 RAC: 0 |
Client error on all PSH_0144_looprlx_... WUs; date: 22.01.2007; Linux 10.2 More: https://boinc.bakerlab.org/rosetta/results.php?hostid=6399 a. |
zombie67 [MM] Send message Joined: 11 Feb 06 Posts: 316 Credit: 6,621,003 RAC: 0 |
Edit: Resetting did not solve the problem. I am going to try downgrading to 5.8.3 to see if that changes anything. FYI: Downgrading further to 5.8.2 solved the problem. obviously I was mistaken on when the problem started (with 5.8.3, not 5.8.4). In any case, I saw this problem only with the mac version, not the windows version. Reno, NV Team: SETI.USA |
finch Send message Joined: 23 Nov 05 Posts: 8 Credit: 4,548 RAC: 0 |
Are you all using Windows or Linux? I've had no failures on 3 Windows XP Pro systems I am running, but I finally had to detach from Rosetta on *ALL* of my Linux systems (I have two running Ubuntu 6.06 and one running Ubuntu 6.10) as I was seeing nearly 90% failure and these systems were offering rosetta 50% of their time. I didn't mind the tasks that ran 10-15 seconds before failing ... I wanted to respond to lynn and bennyrop. I am running ubuntu 6.06. I only have 512MB ram. My situation wasn't exactly the same as lynn's as the work units didn't end after 10-15 seconds. They would run for considerable time. I have run a half dozen or so units and have had failures with all units that have been preempted and removed from memory while running in tandem with worldgrid. At some point I would catch the work units showing no progress and cpu time would no longer increment. I would finally have to abort the units. The two different times that I have suspended worldgrid and run rosetta without interuption I have had no problems. This is a small sample, but I am fairly confident that it isn't just a coincidence but that there is a connection. I'm not going to run both projects simultaneously anymore. I'll alternate between them. |
Michael.L Send message Joined: 12 Nov 06 Posts: 67 Credit: 31,295 RAC: 0 |
30/01/2007 19:06:32|rosetta@home|Unrecoverable error for result 1eyvA_BOINC_ABRELAX_NEWRELAXFLAGS_frags83__1521_2499_0 ( - exit code 1073807364 (0x40010004)) Thinks that WU froze for some time at about 75pct. W.XP home amd64 3200+ Maybe there is a connection in that BOINC manager froze around the same time |
Message boards :
Number crunching :
Problems with Rosetta version 5.43
©2024 University of Washington
https://www.bakerlab.org