Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 49 · 50 · 51 · 52 · 53 · 54 · 55 . . . 310 · Next

AuthorMessage
JP

Send message
Joined: 20 Mar 20
Posts: 2
Credit: 102,246
RAC: 6
Message 96748 - Posted: 23 May 2020, 14:46:43 UTC - in response to Message 96747.  
Last modified: 23 May 2020, 14:51:54 UTC

Thank you, but there was no need for me to interfere manually.
The database has been downloaded again automatically, the new jobs are running without problems.
I just wanted to know what happend.
ID: 96748 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Areku

Send message
Joined: 14 Mar 20
Posts: 1
Credit: 89,823
RAC: 0
Message 96841 - Posted: 29 May 2020, 19:41:49 UTC

I think I am having an issue with Rosetta handling PC shutdown incorrectly. My OS is Windows 7.
I noticed that I started getting "calculation error" for running tasks when I shut down my PC with Rosetta running. An example of such task would be
https://boinc.bakerlab.org/rosetta/result.php?resultid=1192146960
If I use the BOINC interface to pause tasks before shutdown, they would pause normally and would be able to be resumed normally after boot.

While manual suspend/restart is not something I can't do, maybe this issue still needs to be addressed?
ID: 96841 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2002
Credit: 9,790,281
RAC: 4,437
Message 96865 - Posted: 30 May 2020, 12:43:49 UTC - in response to Message 96841.  

Cannot upload my wus (and the upload server seems ok).
ID: 96865 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tukansam

Send message
Joined: 14 Mar 20
Posts: 1
Credit: 1,103,185
RAC: 0
Message 96867 - Posted: 30 May 2020, 13:08:50 UTC - in response to Message 96865.  

None of my WU on four computers are not able to upload...according to BOINC Log - Project servers may be temporarily down.
ID: 96867 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Curt3g

Send message
Joined: 30 Mar 20
Posts: 4
Credit: 1,908,126
RAC: 0
Message 96869 - Posted: 30 May 2020, 13:21:12 UTC - in response to Message 96867.  

Upload failing here also.
ID: 96869 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JAMES

Send message
Joined: 5 May 07
Posts: 8
Credit: 275,386
RAC: 0
Message 96872 - Posted: 30 May 2020, 13:38:12 UTC

Same here. Four completed WU's stuck "Uploading"
ID: 96872 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 6,010
Message 96874 - Posted: 30 May 2020, 13:49:11 UTC

It's a certificate expiry or something. See https://boinc.bakerlab.org/rosetta/forum_thread.php?id=14006
ID: 96874 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Erich56

Send message
Joined: 11 Jan 16
Posts: 35
Credit: 1,437,503
RAC: 0
Message 96878 - Posted: 30 May 2020, 14:45:37 UTC

same problem with LHC
ID: 96878 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2145
Credit: 41,555,266
RAC: 8,961
Message 96879 - Posted: 30 May 2020, 14:48:45 UTC - in response to Message 96874.  
Last modified: 30 May 2020, 14:49:44 UTC

It's a certificate expiry or something. See https://boinc.bakerlab.org/rosetta/forum_thread.php?id=14006

I'm not getting that warning here, but I do note the download server isn't running - no idea if that's related

Download server boinc-files.bakerlab.org Not Running

Never seen that server down before
ID: 96879 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 6,010
Message 96880 - Posted: 30 May 2020, 15:04:55 UTC - in response to Message 96879.  

It's a certificate expiry or something. See https://boinc.bakerlab.org/rosetta/forum_thread.php?id=14006

I'm not getting that warning here, but I do note the download server isn't running - no idea if that's related

Download server boinc-files.bakerlab.org Not Running

Never seen that server down before


Someone in another forum mentioned only Windows, not Linux or Mac, is affected.
ID: 96880 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 96887 - Posted: 30 May 2020, 16:10:22 UTC - in response to Message 96879.  

Sid Celery wrote:
I'm not getting that warning here

You’ll only see it with http_debug selected in your Event Log options
ID: 96887 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 6,010
Message 96888 - Posted: 30 May 2020, 16:16:46 UTC - in response to Message 96887.  

Sid Celery wrote:
I'm not getting that warning here

You’ll only see it with http_debug selected in your Event Log options


I got it without that selected, I only have the first three ticked: file_xfer, sched_ops, task
ID: 96888 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 6,010
Message 96889 - Posted: 30 May 2020, 16:17:32 UTC

ID: 96889 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2145
Credit: 41,555,266
RAC: 8,961
Message 96894 - Posted: 30 May 2020, 17:57:07 UTC - in response to Message 96888.  

Sid Celery wrote:
I'm not getting that warning here

You’ll only see it with http_debug selected in your Event Log options


I got it without that selected, I only have the first three ticked: file_xfer, sched_ops, task

I normally have the same options set as you, Peter, and got other errors but not anything about Peer certificates etc
But using Brian's setting, the problem was revealed.
And using Toby's replacement crt file from the other thread and copying it across to the Boinc directory worked.

Upload errors solved, downloads solved too - thank you to everyone who investigated.

As someone else asked, now how does this get solved for the 99% of people who don't read a very specific forum thread?
If people can't automatically upload or download until it's fixed, how does the updated file get pushed out to them?
ID: 96894 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,227
Message 96935 - Posted: 30 May 2020, 23:33:06 UTC
Last modified: 30 May 2020, 23:39:43 UTC

I also used Toby's replacement crt file from the other thread. It worked, and I didn't even have to shut down BOINC to make the change,

I suspect that the fix for the 99% who don't read a relevant thread will have to be a new version of BOINC that include the new crt file instead of the old one. No other changes are essential, but it would be nice if it also modifies the way this file is used so that instead of failing at an expired entry, it checks the rest of the file to see if there's a replacement entry that isn't expired yet.
ID: 96935 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
IBM01902

Send message
Joined: 23 Mar 20
Posts: 3
Credit: 43,044
RAC: 0
Message 96936 - Posted: 31 May 2020, 0:33:12 UTC - in response to Message 96841.  

I've seen tasks crash or more frustrating, get through hours of processing and just start over from 0 in the morning.
This particular project doesn't seem to handle checkpointing well, at least with me and my low memory, older processors, any brand.
Anything I've got with an AMD processor, also older, I've surrendered and put them on World Community Grid work.
At this point a few older Intel machines , I start them Saturday morning and let them run for the weekend days to try and get Rosetta a few runs, but this project just doesn't seem to be weak computer friendly. This project likes short deadlines and without checkpointing more frequently, mine can't do much during the week. I will say I haven't bothered the project team for help. I just figure if I can't figure it out, I know where they can be useful.
I'm hoping you get a good answer I can use :)
ID: 96936 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,227
Message 96940 - Posted: 31 May 2020, 0:57:15 UTC - in response to Message 96936.  
Last modified: 31 May 2020, 0:58:15 UTC

I've seen tasks crash or more frustrating, get through hours of processing and just start over from 0 in the morning.
This particular project doesn't seem to handle checkpointing well, at least with me and my low memory, older processors, any brand.
Anything I've got with an AMD processor, also older, I've surrendered and put them on World Community Grid work.
At this point a few older Intel machines , I start them Saturday morning and let them run for the weekend days to try and get Rosetta a few runs, but this project just doesn't seem to be weak computer friendly. This project likes short deadlines and without checkpointing more frequently, mine can't do much during the week. I will say I haven't bothered the project team for help. I just figure if I can't figure it out, I know where they can be useful.
I'm hoping you get a good answer I can use :)

How much memory do each of those older computers have? 2 GB per CPU core is enough to run some of the Rosetta@Home tasks, but not all of them. You have those computers set to be hidden, so I can't check rather than ask you.

Under Your account, Computing preferences, you can adjust the fraction of the computer's memory BOINC is allowed to use.

You can also adjust how often the tasks are allowed to ask to write a checkpoint (if they are at a point where a checkpoint will be useful); you may have this set too high. Of course, it is also possible that the Rosetta tasks have too few places where a checkpoint would be useful.
ID: 96940 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2145
Credit: 41,555,266
RAC: 8,961
Message 96971 - Posted: 31 May 2020, 9:10:01 UTC - in response to Message 96940.  

I've seen tasks crash or more frustrating, get through hours of processing and just start over from 0 in the morning.
This particular project doesn't seem to handle checkpointing well, at least with me and my low memory, older processors, any brand.
Anything I've got with an AMD processor, also older, I've surrendered and put them on World Community Grid work.
At this point a few older Intel machines , I start them Saturday morning and let them run for the weekend days to try and get Rosetta a few runs, but this project just doesn't seem to be weak computer friendly. This project likes short deadlines and without checkpointing more frequently, mine can't do much during the week. I will say I haven't bothered the project team for help. I just figure if I can't figure it out, I know where they can be useful.
I'm hoping you get a good answer I can use :)

How much memory do each of those older computers have? 2 GB per CPU core is enough to run some of the Rosetta@Home tasks, but not all of them. You have those computers set to be hidden, so I can't check rather than ask you.

Under Your account, Computing preferences, you can adjust the fraction of the computer's memory BOINC is allowed to use.

You can also adjust how often the tasks are allowed to ask to write a checkpoint (if they are at a point where a checkpoint will be useful); you may have this set too high. Of course, it is also possible that the Rosetta tasks have too few places where a checkpoint would be useful.

I'm pretty sure we can't ask a task to checkpoint - we can only ask for tasks not to checkpoint too often.
Checkpointing certainly has been an issue in the past, though in the most recent program version it's also certainly been improved.

I'm under the impression (but may be wrong) that when RAM is tight or short, restarting at zero happens a lot more often. It's made worse by the significant increase in RAM demands of tasks since CV19 became a priority. I'm not sure anything can be done about that in order to return meaningful results. Certain types of tasks don't need high RAM, while others definitely do, so it's right to say this project is now very demanding and unforgiving of PCs that can no longer be seen as adequate in the modern day
ID: 96971 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 6,010
Message 96980 - Posted: 31 May 2020, 11:42:58 UTC - in response to Message 96936.  

I've seen tasks crash or more frustrating, get through hours of processing and just start over from 0 in the morning.
This particular project doesn't seem to handle checkpointing well, at least with me and my low memory, older processors, any brand.
Anything I've got with an AMD processor, also older, I've surrendered and put them on World Community Grid work.
At this point a few older Intel machines , I start them Saturday morning and let them run for the weekend days to try and get Rosetta a few runs, but this project just doesn't seem to be weak computer friendly. This project likes short deadlines and without checkpointing more frequently, mine can't do much during the week. I will say I haven't bothered the project team for help. I just figure if I can't figure it out, I know where they can be useful.
I'm hoping you get a good answer I can use :)


The worst machine I have is an old Acer laptop I got from freecycle in non-working order (busted hard disk, dodgy charger connection). Quad core Intel i3 M350 CPU, with 8GB RAM (I was given it with 3GB! Totally unusable for anything, the board won't take more than 8 though).

4 Rosettas fills the RAM up. So I set Boinc to use 80% of RAM, and limited in the app config of Rosetta to only run 3 at once, allowing the other core to get a Universe task which are tiny in RAM. It's quite happy with pausing and resuming tasks.

You could:
Limit how many Rosettas run at once in app config.
Limit Boinc's total RAM usage.
If it's a computer that's not turned off much, set the "change between applications" to a very high number (it will accept 100000 minutes), so it doesn't pause work units to do others, everything runs to completion in one go.
ID: 96980 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim Martin

Send message
Joined: 9 Oct 05
Posts: 23
Credit: 1,443,682
RAC: 356
Message 96989 - Posted: 31 May 2020, 12:47:28 UTC - in response to Message 96501.  
Last modified: 31 May 2020, 12:48:15 UTC

Robert -- I tried to delete the certificates, after turning off BOINC, but was unsuccessful. Will waiting for BOINC people to address this issue
'be better? No uploading of wu's. It seems it's a BOINC problem, to fix, and not mine (one of 99%).

Thanks, for your efforts, however, on ca-bundle.crt

ps, I have a Dell E7240, with Windows 7. No problems, until now (Have run rosetta@home, since 2005.)
ID: 96989 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 49 · 50 · 51 · 52 · 53 · 54 · 55 . . . 310 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org