Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 47 · 48 · 49 · 50 · 51 · 52 · 53 . . . 274 · Next

AuthorMessage
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 95755 - Posted: 2 May 2020, 3:44:08 UTC

Today I started receiving the following message in BOINC Manager (v 7.16.5) Event Log, as well as in BOINC Notices.
5/1/2020 8:08:57 PM | Rosetta@home | This project is using an old URL. When convenient, remove the project, then add https://boinc.bakerlab.org/rosetta/

Is it really necessary to remove the project to change URL? Doing this will remove all my current and pending tasks and I'd have to reload from square-one. Correct? Another way to fix this issue?
ID: 95755 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1467
Credit: 14,332,852
RAC: 16,365
Message 95756 - Posted: 2 May 2020, 3:51:01 UTC - in response to Message 95755.  

Today I started receiving the following message in BOINC Manager (v 7.16.5) Event Log, as well as in BOINC Notices.
5/1/2020 8:08:57 PM | Rosetta@home | This project is using an old URL. When convenient, remove the project, then add https://boinc.bakerlab.org/rosetta/
Is it really necessary to remove the project to change URL? Doing this will remove all my current and pending tasks and I'd have to reload from square-one. Correct? Another way to fix this issue?
Set No New Tasks.
When all Tasks have been completed & returned, then Remove & re-attach to the project.
When re-attaching to the project, select the "Existing user option." (or whatever it is actually called).
Grant
Darwin NT
ID: 95756 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,824,497
RAC: 2,340
Message 95757 - Posted: 2 May 2020, 3:59:04 UTC - in response to Message 95755.  

Today I started receiving the following message in BOINC Manager (v 7.16.5) Event Log, as well as in BOINC Notices.
5/1/2020 8:08:57 PM | Rosetta@home | This project is using an old URL. When convenient, remove the project, then add https://boinc.bakerlab.org/rosetta/

Is it really necessary to remove the project to change URL? Doing this will remove all my current and pending tasks and I'd have to reload from square-one. Correct? Another way to fix this issue?

You can set No new tasks, wait for all current tasks to finish, return those, THEN follow the above instructions before turning off No new tasks.

I've done this on other BOINC projects, causing no problems other than a few hours with no tasks for the affected projects running.

It MIGHT be a good way to delete a few hundred megabytes of obsolete R@h files.
ID: 95757 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1966
Credit: 38,189,593
RAC: 10,839
Message 95801 - Posted: 2 May 2020, 13:33:35 UTC - in response to Message 95756.  

Today I started receiving the following message in BOINC Manager (v 7.16.5) Event Log, as well as in BOINC Notices.
5/1/2020 8:08:57 PM | Rosetta@home | This project is using an old URL. When convenient, remove the project, then add https://boinc.bakerlab.org/rosetta/
Is it really necessary to remove the project to change URL? Doing this will remove all my current and pending tasks and I'd have to reload from square-one. Correct? Another way to fix this issue?
Set No New Tasks.
When all Tasks have been completed & returned, then Remove & re-attach to the project.
When re-attaching to the project, select the "Existing user option." (or whatever it is actually called).

So I've realised. Thanks.
Of course, it's also possible to abort all non-running Rosetta tasks to make the process of running down the cache much quicker.
I may do that so removingre-attaching is done at my convenience and not in the middle of the night.
ID: 95801 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile GoldenHat

Send message
Joined: 14 Apr 20
Posts: 3
Credit: 122,663
RAC: 0
Message 95989 - Posted: 4 May 2020, 6:36:32 UTC - in response to Message 95801.  

Thanks, very helpful.
Could you also explain how one cleans out the cache and old files etc?
Thanks.
ID: 95989 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MarkJ

Send message
Joined: 28 Mar 20
Posts: 72
Credit: 25,010,478
RAC: 383
Message 95992 - Posted: 4 May 2020, 7:00:27 UTC - in response to Message 95989.  

Thanks, very helpful.
Could you also explain how one cleans out the cache and old files etc?
Thanks.

That happens when you Reset or Detach (Remove on the BOINC Manager screen) the project.
BOINC blog
ID: 95992 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 96002 - Posted: 4 May 2020, 10:17:21 UTC - in response to Message 95755.  

Today I started receiving the following message in BOINC Manager (v 7.16.5) Event Log, as well as in BOINC Notices.
5/1/2020 8:08:57 PM | Rosetta@home | This project is using an old URL. When convenient, remove the project, then add https://boinc.bakerlab.org/rosetta/

Is it really necessary to remove the project to change URL? Doing this will remove all my current and pending tasks and I'd have to reload from square-one. Correct? Another way to fix this issue?

As a followup, let me state how this process worked for me. Note that I use BOINCstatsBAM as my account manager. I marked Rosetta project "No new tasks" in my host BOINC manager so I could complete jobs in cache before deleting and replacing project with current URL address. I later noticed a note had been added next to "no new tasks" in Project tab that when all tasks completed the project would be deleted and ready for replacement (I've paraphrased exact wording). Sure enough, after last Rosetta task completed and next time host reported to account manager, Rosetta was taken out of my project list. Next time my host reported to account manager, Rosetta was reinstalled with correct info and I was given a starter set of jobs for cache. I was surprised! Not much I had to do other than be sure host synchronized with acct manager. Note that I had previously updated to BOINC manager v7.16.5.
ID: 96002 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1467
Credit: 14,332,852
RAC: 16,365
Message 96003 - Posted: 4 May 2020, 10:37:59 UTC

Seems to be a problem with the web site/database when checking out Tasks.

When i go to my Account, i can click on View next to Tasks to see all of my Tasks.
But if go to click on Valid or Error etc all i get is
Already logged in
You are logged in as Grant (SSSF) . Log out

and the url is
https://boinc.bakerlab.org/rosetta/login_form.php?next_url=%2Fresults.php%3Fuserid%3D2125796%26offset%3D0%26show_names%3D0%26state%3D4%26appid%3D
For some reason it's pulling up the login form.

If on my account page i click on View for Computers on this account, then Tasks for each of the computers i can then see the Valids, Errors etc. However at the top right corner my name is replaced with "Sign Up" and next to it Log out is replaced with Login.
Grant
Darwin NT
ID: 96003 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Boiler Paul

Send message
Joined: 14 Apr 20
Posts: 4
Credit: 775,245
RAC: 0
Message 96008 - Posted: 4 May 2020, 12:27:53 UTC - in response to Message 96003.  

Seems to be a problem with the web site/database when checking out Tasks.

When i go to my Account, i can click on View next to Tasks to see all of my Tasks.
But if go to click on Valid or Error etc all i get is
Already logged in
You are logged in as Grant (SSSF) . Log out

and the url is
https://boinc.bakerlab.org/rosetta/login_form.php?next_url=%2Fresults.php%3Fuserid%3D2125796%26offset%3D0%26show_names%3D0%26state%3D4%26appid%3D
For some reason it's pulling up the login form.

If on my account page i click on View for Computers on this account, then Tasks for each of the computers i can then see the Valids, Errors etc. However at the top right corner my name is replaced with "Sign Up" and next to it Log out is replaced with Login.



happening to me too. I logged out and logged back in...no change. Even cleared out cookies and rebooted.....no change.
ID: 96008 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 96029 - Posted: 4 May 2020, 15:16:59 UTC - in response to Message 96003.  

Same situation with me since last night. All I can see is first screen of "All tasks for James W." If I click on any option, such as to go to next screen, see valid tasks, etc., will get the "already logged in" message like Grant mentioned. Apparently a web site issue.
ID: 96029 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Toni Guerrero

Send message
Joined: 1 Oct 08
Posts: 1
Credit: 163,278
RAC: 0
Message 96030 - Posted: 4 May 2020, 15:27:15 UTC

Hello everybody.

I get computation errors (exit code 11) in all Junior_HalfRoid_design5_COVID-19 tasks I'm crunching on Android 5.0.2, Boinc 7.4.53, Rosetta v4.20 arm-android-linux-gnu, CPU ARMv7 Processor rev 0 (v7l). Previously, when runnin rosetta v4.16 this same device whas crunching those tasks with no issues. Anyone has this same behaviour?

Thank you.
ID: 96030 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Admin
Project administrator

Send message
Joined: 1 Jul 05
Posts: 4805
Credit: 0
RAC: 0
Message 96040 - Posted: 4 May 2020, 16:09:16 UTC - in response to Message 96029.  

Sorry about the web site issues. I made some updates that obviously caused a bug. I'll work on a fix.
ID: 96040 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 96049 - Posted: 4 May 2020, 16:38:14 UTC - in response to Message 96030.  

Hello everybody.

I get computation errors (exit code 11) in all Junior_HalfRoid_design5_COVID-19 tasks I'm crunching on Android 5.0.2, Boinc 7.4.53, Rosetta v4.20 arm-android-linux-gnu, CPU ARMv7 Processor rev 0 (v7l). Previously, when runnin rosetta v4.16 this same device whas crunching those tasks with no issues. Anyone has this same behaviour?

Thank you.


Those jobs appear to have some issues. People have reported that if the job restarts, it can cause an error. We'll look into this but since it's somewhat rare we are continuing these jobs.
ID: 96049 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1467
Credit: 14,332,852
RAC: 16,365
Message 96063 - Posted: 4 May 2020, 18:46:53 UTC - in response to Message 96040.  

Sorry about the web site issues. I made some updates that obviously caused a bug. I'll work on a fix.
Working again.
Thanks.
Grant
Darwin NT
ID: 96063 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 11,805,838
RAC: 0
Message 96237 - Posted: 7 May 2020, 15:28:46 UTC

This task failed on Ubuntu 19.10

https://boinc.bakerlab.org/rosetta/result.php?resultid=1172834155

<core_client_version>7.16.3</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu @rb_05_07_20646_23758_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 1 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_05_07_20646_23758_ab_t000__robetta.zip -frag3 rb_05_07_20646_23758_ab_t000__robetta.200.3mers.index.gz -fragA rb_05_07_20646_23758_ab_t000__robetta.200.11mers.index.gz -fragB rb_05_07_20646_23758_ab_t000__robetta.200.5mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2100616
Using database: database_357d5d93529_n_methyl/minirosetta_database

[ ERROR ]: Caught exception:


File: src/core/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306
chi angle must be between -180 and 180: -nan
------------------------ Begin developer's backtrace -------------------------
BACKTRACE:
[0x3ce8b7f]
[0x62b4e53]
[0x408ae82]
ID: 96237 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Folding Proteins

Send message
Joined: 27 Mar 20
Posts: 2
Credit: 349,986
RAC: 0
Message 96336 - Posted: 10 May 2020, 13:52:42 UTC
Last modified: 10 May 2020, 13:53:55 UTC

Hello,

since changing the URL in the BOINC client (7.16.5, Win10) with the HTTPS prefix, my WUs are not saved correctly upon exit of the BOINC manager.
I have the "Leave non-GPU tasks in memory while suspended" option checked in computing preferences.
The issue also coincides with the Roseta 4.20 release, so I am not exactly sure whether the problem comes from the URL change or it is something from how the server handles tasks now.
I had no problems of WU being saved and resumed after cleint, machine shutdowns before.

Not sure what exactly is needed but I can attach some log files and settings if requested.
ID: 96336 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 96338 - Posted: 10 May 2020, 14:59:25 UTC - in response to Message 96336.  

Another recent change was to add checkpointing between completed models in one of the search protocols. When you say the WUs were not saved correctly... what are you looking at to define that? Are you familiar with the CPU time since last checkpoint shown in the task properties? When you "exit" (rather than "close") BOINC Manager, the active tasks are ended, and will revert to their last checkpoints when they restart. The setting for leaving tasks in memory only applies when BOINC is still running, but has suspended the task to run another project, or because the user requested it.
Rosetta Moderator: Mod.Sense
ID: 96338 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Folding Proteins

Send message
Joined: 27 Mar 20
Posts: 2
Credit: 349,986
RAC: 0
Message 96342 - Posted: 10 May 2020, 19:58:27 UTC - in response to Message 96338.  
Last modified: 10 May 2020, 19:59:24 UTC

To explain further:
I crunch during the bigger part of the day but then have to shut down the machine overnight (for about 8 hours or so).
My routine is as follows: while running, update the Rosetta project so all work is check-pointed, then simply exit the BOINC manager (with the option to suspend all work on exit enabled), then power off my PC. The next day I just power on the machine with running on startup settings and usually the WUs just continue from where they were before shut down.
What happens now is when I power on the PC, all WUs are gone (marked as failed tasks) and new ones start from scratch.
Since the URL/Rosetta 4.20 change I have 60/40 portion in completed and failed tasks.
ID: 96342 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,824,497
RAC: 2,340
Message 96344 - Posted: 10 May 2020, 20:46:15 UTC - in response to Message 96342.  
Last modified: 10 May 2020, 20:50:41 UTC

To explain further:
I crunch during the bigger part of the day but then have to shut down the machine overnight (for about 8 hours or so).
My routine is as follows: while running, update the Rosetta project so all work is check-pointed, then simply exit the BOINC manager (with the option to suspend all work on exit enabled), then power off my PC. The next day I just power on the machine with running on startup settings and usually the WUs just continue from where they were before shut down.
What happens now is when I power on the PC, all WUs are gone (marked as failed tasks) and new ones start from scratch.
Since the URL/Rosetta 4.20 change I have 60/40 portion in completed and failed tasks.

Most computers do not have main memory that will retain its contents with the power turned off.

Updating Rosetta@home does NOT automatically checkpoint all work.

You may need to look into telling BOINC to suspend all work, then telling your computer to Sleep instead of Shut down, so it can write the entire contents of its memory to the hard drive, and then write this back into main memory when you turn the computer on again. This lets it resume any programs that were suspended rather than aborted.

I suspect that you previously had your computer set to use sleep instead of shut down, and some change since then (possibly 4.20) has turned off this setting.

It could, however, also mean that 4.20 has timing routines that cannot properly handle very long delays well, or a resume from checkpoint section that fails to work properly 40% of the time it is used.
ID: 96344 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 96346 - Posted: 11 May 2020, 1:27:57 UTC

As robertmiles points out, you cannot force checkpoints. And using the "sleep" function (where memory remains active and everything is preserved) or the "hibernate" function (where memory contents are purged to disk and memory is powered off) would be good ideas to maximize the work done on your machine. It should also avoid whatever this error condition is that is being encountered.

You mention that you "...exit the BOINC manager (with the option to suspend all work on exit enabled)". I am not familiar with such an option. What is the wording on the screen for this option? Are you referring to the activity option for when to run? And setting it to suspend?
Rosetta Moderator: Mod.Sense
ID: 96346 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 47 · 48 · 49 · 50 · 51 · 52 · 53 . . . 274 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org