Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 43 · 44 · 45 · 46 · 47 · 48 · 49 . . . 311 · Next
Author | Message |
---|---|
Tom M Send message Joined: 20 Jun 17 Posts: 97 Credit: 16,726,096 RAC: 36,642 |
I had nothing but errors on both the i686 applications on my Ryzen. Gave up on Rosetta and moved to Einstein. Discovered later that you can set a flag in cc_config.xml to ignore alternate platforms. Huh, I didn't notice mine apparently. Tom M Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel..... |
Sven Send message Joined: 7 Feb 16 Posts: 8 Credit: 222,005 RAC: 0 |
Hi all, concerning my problems with this issue: **** Rosetta@home | Task xxxx exited with zero status but no 'finished' file Rosetta@home | If this happens repeatedly you may need to reset the project. ***** ... I've got a result of what to do to avoid this kind of error message: It seems to be recommendable to make sure, that the following setting ist adjusted: Use at most 100% of CPU time All other settings, including max usage of CPUs, don't influence the processing of Rosetta tasks. Sven |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2146 Credit: 41,570,180 RAC: 6,141 |
Concerning my problems with this issue: Very useful information, thanks |
Keith Myers Send message Joined: 29 Mar 20 Posts: 97 Credit: 332,619 RAC: 8 |
Hi all, Interesting. You shouldn't be getting those still now that you've updated to the latest 7.16.5 client which has the revised code fix to stop those errors. Your system would have to be too busy to service the slot cleanup for longer than five minutes to still get those errors. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1735 Credit: 18,532,940 RAC: 14,716 |
You're thinking of the "Finished file present too long" issue.Hi all, But it is very probably some sort of I/O problem- the settings for the systems barely allowed any processing, with extremely frequent suspending & resuming occurring. Grant Darwin NT |
Ian Send message Joined: 12 Oct 07 Posts: 3 Credit: 2,611,432 RAC: 0 |
Hello Rosetta community, I have a fairly persistent issue with uploading work units. What seems to happen is that the upload looks to have proceeded normally, but then sticks at 100% and never gets removed from the transfer queue. The net effect of this is that BOINC eventually runs out of disk space as it is all in use by pending Rosetta uploads. Rosetta is the only BOINC project I have this issue with. I have tried suspending/restarting uploads as suggested in another thread. I have also tried resetting and deleting and re-adding the Rosetta project entirely, but I have the same issue. Running on Windows 10, latest BOINC client. Does anyone have anything I can check or try in addtion to the above? Thanks, Ian |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1735 Credit: 18,532,940 RAC: 14,716 |
Hello Rosetta community,Do you use any sort of 3rd party AV/Internet security programme? (just grasping at straws here). Grant Darwin NT |
Sven Send message Joined: 7 Feb 16 Posts: 8 Credit: 222,005 RAC: 0 |
Most of the error messages occured over night when there was no other work load on the system. Usually I would say that a project should be able to handle every kind of client setting. Obiously I'm wrong with this opinion. Due to heavy fan noise I acutally adjusted the settings to "suspend when computer is in use" and I still don't have any more error messages like before when I have reduced the percentage of allowed cpu time. Addtional information: As I've freshly downloaded the newest boinc client on this new computer, I'm working with the version 7.16.5 (x64), which doesn't help against the "Task xxxx exited with zero status but no 'finished' file" error. |
Ian Send message Joined: 12 Oct 07 Posts: 3 Credit: 2,611,432 RAC: 0 |
Hi, the machine has Trend Micro Security agent installed. If I exit this and retry the upload, I see the same behaviour - the upload does not seem to be blocked at all - the progress bar goes up to 100%, it just doesn't ever get removed once the upload is complete. Windows moans about there not being any virus checking active, so the virus checker is seemingly off at this point (well as far as Windows can detect). Same behaviour with the Windows firewall off, just sits at 100% progress. If I leave it for a bit I get a project backoff message, so maybe it is just load on the server end. I have been having this for a while though, before the current interest in covid 19 work. One other point of note is that I do have the upload rate throttled to 500KBps as it is on a shared network if that makes a difference. Temporarily turing this off does not fix the issue however. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1735 Credit: 18,532,940 RAC: 14,716 |
Hi, the machine has Trend Micro Security agent installed. If I exit this and retry the upload, I see the same behaviour - the upload does not seem to be blocked at all - the progress bar goes up to 100%, it just doesn't ever get removed once the upload is complete. Windows moans about there not being any virus checking active, so the virus checker is seemingly off at this point (well as far as Windows can detect). Same behaviour with the Windows firewall off, just sits at 100% progress.A few weeks back there were some upload issues, but they've been sorted. And no one else has been posting about similar upload issues. Getting to 100%, and then stopping would indicate it's not getting a final acknowledgement for the upload, but no idea why everything else would work bar that final ACK. If all else has failed, i'd re-boot your modem, and re-boot the computer. *fingers crossed* Grant Darwin NT |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 677 |
The download issues a few weeks back were due to additional servers near the Rosetta@home end of the connections, running overly aggressive antivirus programs examining everything that went by. Does the shared network at your end of such connections have similar additional servers? If so, you may need to talk whoever runs those servers, and ask them to set it up so that everything sent to the Rosetta@home upload server is excluded from checking. |
amazoph Send message Joined: 24 Nov 13 Posts: 3 Credit: 2,099,022 RAC: 0 |
Looks like you have many access violations. I am not seeing such errors with other people's problem reports. Have you run memtest? I found the issue, this machine had XMP memory timings enabled in BIOS. Reverted to stock lower speed timing and starting to get WUs completed without errors now. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1735 Credit: 18,532,940 RAC: 14,716 |
rb_04_12_21176_20979__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_912410_4 Sent Time reported/deadline Status 15 Apr 2020, 23:33:22 UTC 16 Apr 2020, 2:30:44 UTC Cancelled by serverCancelled only 3 hours after it was sent. If Rosetta is going to allow a grace period for Tasks that are returned after the deadline, then the next replication of it shouldn't be sent out until after the deadline grace period has passed. Saves having things like this occur. Or do away with the grace period & send the the next copy out, and cancel the original Task. Or keep the grace period but still cancel the original Task once the next one is sent. Pretty much everything that host does arrives late. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1735 Credit: 18,532,940 RAC: 14,716 |
rb_04_12_21176_20979__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_912410_4 And then i got hit with this one. hgfp_split2_562_fold_SAVE_ALL_OUT_916305_1 Sent Time reported/deadline Status 15 Apr 2020, 12:54:58 UTC 16 Apr 2020, 3:41:43 UTC Cancelled by server errors Too many errors (may have bug) Too many total results WU cancelled The sever cancelling a bad batch, sure, but to kill off the Task due to an error on the other system- as the problem could be with the system, not the Task (which appears to be the case this time around). Especially when it sent out the new Task after the Error Task had been returned. I've done quite a few resends previously without issue. Grant Darwin NT |
Ian Send message Joined: 12 Oct 07 Posts: 3 Credit: 2,611,432 RAC: 0 |
I'll check, but as far as I am aware, the connection goes straight onto the internet via a firewall. What I don't understand is that Rosetta is the only BOINC project that I have this issue with - I am connected to several. Do they not all share the same upload infrastructure (and acknowledgement message)? I could understand the behaviour if the ACKs were not getting through for any BOINC projects. Is there a particular port that the response comes through that is peculiar to Rosetta? It looks like my contributions for this host are not getting credited looking at the host average page - I see a flat line. Interestingly the contributions for my home machine do get through though, which would seem to point to something blocking the upload response somewhere. |
Raistmer Send message Joined: 7 Apr 20 Posts: 49 Credit: 797,293 RAC: 0 |
"Too many restarts with no progress. Keep application in memory while preempted." https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13811 |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1735 Credit: 18,532,940 RAC: 14,716 |
I'll check, but as far as I am aware, the connection goes straight onto the internet via a firewall. What I don't understand is that Rosetta is the only BOINC project that I have this issue with - I am connected to several. Do they not all share the same upload infrastructure (and acknowledgement message)? I could understand the behaviour if the ACKs were not getting through for any BOINC projects.Each project has it's own servers, and all use TCP/IP for connections. Is there a particular port that the response comes through that is peculiar to Rosetta?No idea on that one. My guess, is no. It looks like my contributions for this host are not getting credited looking at the host average page - I see a flat line.A Result has to be returned before it can be reported. And is has to be reported for the project to be able to Validate it, then allocate Credit. Interestingly the contributions for my home machine do get through though, which would seem to point to something blocking the upload response somewhere.Yep. It's an issue that only you with that system appear to be experiencing, so it's something to do with that particular system, or it's connection to the Rosetta servers. Can you get a cheap USB mobile modem (or know someone with one)? A lot of stuffing around to set up, but if you can get one for $20 just to use that to connect to the internet instead of your existing connection (and still a lot easier than taking the whole computer somewhere else) and that way you can see if it is somehow your system, or it's the internet connection you are using that's causing the issue. Grant Darwin NT |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The sever cancelling a bad batch, sure, but to kill off the Task due to an error on the other system- as the problem could be with the system, not the Task (which appears to be the case this time around). Especially when it sent out the new Task after the Error Task had been returned. I think you are taking it a bit too personally. A batch of WUs is not cancelled by the project when they have good reason to believe your attempt to crunch it will go better. They set up most WUs to do one additional try after a failure for the reason you mention, maybe the second attempt will go better. But, looking across the whole batch is the only way to make a decision about whether to withdraw the batch, and that has nothing to do with your current state on the WU. Rosetta Moderator: Mod.Sense |
crystalsys Send message Joined: 11 Aug 09 Posts: 8 Credit: 1,648,888 RAC: 566 |
Version 3.8 (? I'm not sure and log does not show the version) error and tie up I keep getting jobs running on a 3.X (?) application that hang at some point, and then the log shows something like this: 4/16/2020 11:05:45 AM | Rosetta@home | Task hgfp_het2_576_fold_SAVE_ALL_OUT_911081_124_1 exited with zero status but no 'finished' file 4/16/2020 11:05:45 AM | Rosetta@home | If this happens repeatedly you may need to reset the project. Resetting the project is not necessary, but aborting that task seems to be, unless you want to waste more CPU time on it. Is there a way to restrict which app versions you get? I've looked, can't seem to find it. I've not seen this with version 4.15 |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 677 |
Version 3.8 (? I'm not sure and log does not show the version) error and tie up Upgrading to BOINC 7.16.5 makes that problem much less likely. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2025 University of Washington
https://www.bakerlab.org