Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 49 · 50 · 51 · 52 · 53 · 54 · 55 . . . 311 · Next
Author | Message |
---|---|
JP Send message Joined: 20 Mar 20 Posts: 2 Credit: 102,246 RAC: 4 |
Thank you, but there was no need for me to interfere manually. The database has been downloaded again automatically, the new jobs are running without problems. I just wanted to know what happend. |
Areku Send message Joined: 14 Mar 20 Posts: 1 Credit: 89,823 RAC: 0 |
I think I am having an issue with Rosetta handling PC shutdown incorrectly. My OS is Windows 7. I noticed that I started getting "calculation error" for running tasks when I shut down my PC with Rosetta running. An example of such task would be https://boinc.bakerlab.org/rosetta/result.php?resultid=1192146960 If I use the BOINC interface to pause tasks before shutdown, they would pause normally and would be able to be resumed normally after boot. While manual suspend/restart is not something I can't do, maybe this issue still needs to be addressed? |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2002 Credit: 9,790,281 RAC: 2,986 |
Cannot upload my wus (and the upload server seems ok). |
Tukansam Send message Joined: 14 Mar 20 Posts: 1 Credit: 1,103,185 RAC: 0 |
None of my WU on four computers are not able to upload...according to BOINC Log - Project servers may be temporarily down. |
Curt3g Send message Joined: 30 Mar 20 Posts: 4 Credit: 1,908,126 RAC: 0 |
Upload failing here also. |
JAMES Send message Joined: 5 May 07 Posts: 8 Credit: 275,386 RAC: 0 |
Same here. Four completed WU's stuck "Uploading" |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
It's a certificate expiry or something. See https://boinc.bakerlab.org/rosetta/forum_thread.php?id=14006 |
Erich56 Send message Joined: 11 Jan 16 Posts: 35 Credit: 1,437,503 RAC: 0 |
same problem with LHC |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2146 Credit: 41,570,180 RAC: 8,210 |
It's a certificate expiry or something. See https://boinc.bakerlab.org/rosetta/forum_thread.php?id=14006 I'm not getting that warning here, but I do note the download server isn't running - no idea if that's related Download server boinc-files.bakerlab.org Not Running Never seen that server down before |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
It's a certificate expiry or something. See https://boinc.bakerlab.org/rosetta/forum_thread.php?id=14006 Someone in another forum mentioned only Windows, not Linux or Mac, is affected. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Sid Celery wrote: I'm not getting that warning here You’ll only see it with http_debug selected in your Event Log options |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
Sid Celery wrote:I'm not getting that warning here I got it without that selected, I only have the first three ticked: file_xfer, sched_ops, task |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
Sorted - see the end of this thread: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=14006 |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2146 Credit: 41,570,180 RAC: 8,210 |
Sid Celery wrote:I'm not getting that warning here I normally have the same options set as you, Peter, and got other errors but not anything about Peer certificates etc But using Brian's setting, the problem was revealed. And using Toby's replacement crt file from the other thread and copying it across to the Boinc directory worked. Upload errors solved, downloads solved too - thank you to everyone who investigated. As someone else asked, now how does this get solved for the 99% of people who don't read a very specific forum thread? If people can't automatically upload or download until it's fixed, how does the updated file get pushed out to them? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
I also used Toby's replacement crt file from the other thread. It worked, and I didn't even have to shut down BOINC to make the change, I suspect that the fix for the 99% who don't read a relevant thread will have to be a new version of BOINC that include the new crt file instead of the old one. No other changes are essential, but it would be nice if it also modifies the way this file is used so that instead of failing at an expired entry, it checks the rest of the file to see if there's a replacement entry that isn't expired yet. |
IBM01902 Send message Joined: 23 Mar 20 Posts: 3 Credit: 43,044 RAC: 0 |
I've seen tasks crash or more frustrating, get through hours of processing and just start over from 0 in the morning. This particular project doesn't seem to handle checkpointing well, at least with me and my low memory, older processors, any brand. Anything I've got with an AMD processor, also older, I've surrendered and put them on World Community Grid work. At this point a few older Intel machines , I start them Saturday morning and let them run for the weekend days to try and get Rosetta a few runs, but this project just doesn't seem to be weak computer friendly. This project likes short deadlines and without checkpointing more frequently, mine can't do much during the week. I will say I haven't bothered the project team for help. I just figure if I can't figure it out, I know where they can be useful. I'm hoping you get a good answer I can use :) |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
I've seen tasks crash or more frustrating, get through hours of processing and just start over from 0 in the morning. How much memory do each of those older computers have? 2 GB per CPU core is enough to run some of the Rosetta@Home tasks, but not all of them. You have those computers set to be hidden, so I can't check rather than ask you. Under Your account, Computing preferences, you can adjust the fraction of the computer's memory BOINC is allowed to use. You can also adjust how often the tasks are allowed to ask to write a checkpoint (if they are at a point where a checkpoint will be useful); you may have this set too high. Of course, it is also possible that the Rosetta tasks have too few places where a checkpoint would be useful. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2146 Credit: 41,570,180 RAC: 8,210 |
I've seen tasks crash or more frustrating, get through hours of processing and just start over from 0 in the morning. I'm pretty sure we can't ask a task to checkpoint - we can only ask for tasks not to checkpoint too often. Checkpointing certainly has been an issue in the past, though in the most recent program version it's also certainly been improved. I'm under the impression (but may be wrong) that when RAM is tight or short, restarting at zero happens a lot more often. It's made worse by the significant increase in RAM demands of tasks since CV19 became a priority. I'm not sure anything can be done about that in order to return meaningful results. Certain types of tasks don't need high RAM, while others definitely do, so it's right to say this project is now very demanding and unforgiving of PCs that can no longer be seen as adequate in the modern day |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
I've seen tasks crash or more frustrating, get through hours of processing and just start over from 0 in the morning. The worst machine I have is an old Acer laptop I got from freecycle in non-working order (busted hard disk, dodgy charger connection). Quad core Intel i3 M350 CPU, with 8GB RAM (I was given it with 3GB! Totally unusable for anything, the board won't take more than 8 though). 4 Rosettas fills the RAM up. So I set Boinc to use 80% of RAM, and limited in the app config of Rosetta to only run 3 at once, allowing the other core to get a Universe task which are tiny in RAM. It's quite happy with pausing and resuming tasks. You could: Limit how many Rosettas run at once in app config. Limit Boinc's total RAM usage. If it's a computer that's not turned off much, set the "change between applications" to a very high number (it will accept 100000 minutes), so it doesn't pause work units to do others, everything runs to completion in one go. |
Jim Martin Send message Joined: 9 Oct 05 Posts: 23 Credit: 1,443,682 RAC: 240 |
Robert -- I tried to delete the certificates, after turning off BOINC, but was unsuccessful. Will waiting for BOINC people to address this issue 'be better? No uploading of wu's. It seems it's a BOINC problem, to fix, and not mine (one of 99%). Thanks, for your efforts, however, on ca-bundle.crt ps, I have a Dell E7240, with Windows 7. No problems, until now (Have run rosetta@home, since 2005.) |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org