Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 270 · 271 · 272 · 273 · 274 · 275 · 276 . . . 286 · Next

AuthorMessage
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 381
Credit: 11,660,029
RAC: 7,195
Message 108928 - Posted: 8 Mar 2024, 2:55:15 UTC - in response to Message 108924.  

I tried again after doing a reset and got computation errors for each task after under a minute. Is there anything else I can do to debug?


You are running a very old version of the Boinc Manager. Try updating to a more current version (7.16.n or 7.20.n).
ID: 108928 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1563
Credit: 16,385,979
RAC: 11,289
Message 108931 - Posted: 8 Mar 2024, 5:17:09 UTC - in response to Message 108928.  
Last modified: 8 Mar 2024, 5:21:52 UTC

I tried again after doing a reset and got computation errors for each task after under a minute. Is there anything else I can do to debug?


You are running a very old version of the Boinc Manager. Try updating to a more current version (7.16.n or 7.20.n).
I can't see that having any effect- the BOINC Manger does just that- manage the science applications. It's the science applications that do the work, and are what are crashing out.




And if resetting the project & excluding the data folders from the AV programme haven't sorted it, i would give the long shot of doing a memory test just to make sure it's not some sort of memory issue (although a i said before- Rosetta 4.20 uses much more RAM).
How to do a memory test in WIn10


Edit- maybe run some hardware monitoring software & check the temperature of your CPU? Rosetta Beta may be making use of instructions that your other project doesn't, so it doesn't push it over the edge where Rosetta Beta does (although i'm grasping at straws here)
Grant
Darwin NT
ID: 108931 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Raj

Send message
Joined: 5 Dec 05
Posts: 7
Credit: 502,940
RAC: 96
Message 108943 - Posted: 8 Mar 2024, 17:04:55 UTC - in response to Message 108928.  

You are running a very old version of the Boinc Manager. Try updating to a more current version (7.16.n or 7.20.n).

Sorry, that was a typo, I'm running 7.24.1
ID: 108943 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Raj

Send message
Joined: 5 Dec 05
Posts: 7
Credit: 502,940
RAC: 96
Message 108944 - Posted: 8 Mar 2024, 17:29:10 UTC - in response to Message 108928.  

I ran the memory test and it reported no errors.
ID: 108944 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 244
Credit: 441,919
RAC: 531
Message 108945 - Posted: 8 Mar 2024, 18:14:49 UTC

ID: 108945 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Raj

Send message
Joined: 5 Dec 05
Posts: 7
Credit: 502,940
RAC: 96
Message 108972 - Posted: 10 Mar 2024, 19:07:33 UTC - in response to Message 108945.  

I ran that (in stress mode) for about an hour and received no errors or warnings at the end of it.
ID: 108972 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 381
Credit: 11,660,029
RAC: 7,195
Message 108983 - Posted: 14 Mar 2024, 21:25:43 UTC

Just downloaded 4 beta 6.05 tasks one of which immediately (0.02 seconds CPU) failed with :-

<core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.05_x86_64-pc-linux-gnu @7a_hal_c_hal_7aa_12899_d40_0001.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937
Using database: database_0f7f01a1b07/database

ERROR: Error in protocols::cyclic_peptide_predict::SimpleCycpepPredictpplication::set_up_n_to_c_cyclization_mover() function: residue 1 does not have a LOWER_CONNECT.
ERROR:: Exit from: src/protocols/cyclic_peptide_predict/SimpleCycpepPredictApplication.cc line: 2442
BOINC:: Error reading and gzipping output datafile: default.out
21:15:06 (176255): called boinc_finish(1)

</stderr_txt>
]]>

Boinc 7.24.1 and Ubuntu 22.04.4
ID: 108983 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Raj

Send message
Joined: 5 Dec 05
Posts: 7
Credit: 502,940
RAC: 96
Message 108984 - Posted: 14 Mar 2024, 23:08:51 UTC - in response to Message 108972.  

Just an update - I've gone back in to look at my tasks on the website, and since yesterday I've had several successful completions, although also a lot of failures that show status "Error while computing". I'm not sure if this is a public URL, but this is what I'm checking: https://boinc.bakerlab.org/rosetta/results.php?hostid=3481412
ID: 108984 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 244
Credit: 441,919
RAC: 531
Message 108986 - Posted: 14 Mar 2024, 23:32:41 UTC

Someone serverside made incorrect workunits in which residue 1 does not have a LOWER_CONNECT.
ID: 108986 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BlackPoison357

Send message
Joined: 5 Mar 24
Posts: 1
Credit: 1,674,561
RAC: 97
Message 108991 - Posted: 15 Mar 2024, 20:49:43 UTC - in response to Message 108986.  

That must been why I've had 31 errors within the last 2 days.
ID: 108991 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,847,457
RAC: 1,288
Message 109007 - Posted: 16 Mar 2024, 18:53:20 UTC - in response to Message 108983.  
Last modified: 16 Mar 2024, 18:54:27 UTC

Just got 21 of those.

It is this series: 7a_hal_c_hal_7aa_.................

Just downloaded 4 beta 6.05 tasks one of which immediately (0.02 seconds CPU) failed with :-

<core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.05_x86_64-pc-linux-gnu @7a_hal_c_hal_7aa_12899_d40_0001.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937
Using database: database_0f7f01a1b07/database

ERROR: Error in protocols::cyclic_peptide_predict::SimpleCycpepPredictpplication::set_up_n_to_c_cyclization_mover() function: residue 1 does not have a LOWER_CONNECT.
ERROR:: Exit from: src/protocols/cyclic_peptide_predict/SimpleCycpepPredictApplication.cc line: 2442
BOINC:: Error reading and gzipping output datafile: default.out
21:15:06 (176255): called boinc_finish(1)

</stderr_txt>
]]>

Boinc 7.24.1 and Ubuntu 22.04.4
ID: 109007 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,768,017
RAC: 865
Message 109009 - Posted: 17 Mar 2024, 7:52:58 UTC
Last modified: 17 Mar 2024, 8:44:34 UTC

A flock of work units arrived recently that are behaving oddly, well, all but one of them. They came with a run time of 8 hours, of those which are running, one has elapsed 6 hours and remaining 6 hours odd, this figure wobbles about for a while, then drops some, then starts wobbling again and repeat. The others have remaining times of 2 to 6 days increasing . The percentage complete figure is increasing, but very slowly.
<edit>
The task is Beta 6.04 running on Windows 10 x64.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 109009 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1563
Credit: 16,385,979
RAC: 11,289
Message 109010 - Posted: 17 Mar 2024, 10:29:02 UTC - in response to Message 109009.  

A flock of work units arrived recently that are behaving oddly, well, all but one of them. They came with a run time of 8 hours, of those which are running, one has elapsed 6 hours and remaining 6 hours odd, this figure wobbles about for a while, then drops some, then starts wobbling again and repeat. The others have remaining times of 2 to 6 days increasing . The percentage complete figure is increasing, but very slowly.
<edit>
The task is Beta 6.04 running on Windows 10 x64.
And the answer is the same as always, but since you have ignored the answer for over 4 years now, there's no point repeating it.
Run time 1 days 20 hours 40 min 36 sec
CPU time 11 hours 59 min 57 sec
Almost 2 days to do 12 hours work, all because of settings you have made & chose not to fix.
Grant
Darwin NT
ID: 109010 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 244
Credit: 441,919
RAC: 531
Message 109011 - Posted: 17 Mar 2024, 11:13:19 UTC - in response to Message 109009.  

What else do you have running?
ID: 109011 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,768,017
RAC: 865
Message 109012 - Posted: 17 Mar 2024, 12:16:16 UTC - in response to Message 109011.  

The only thing running that has any real impact is Folding@Home, this is set to its minimum activity level, but still grabs quite a lot of resources. The only other things I can see are Firefox, and the VPN service I use which has a small window that normally sits open on the desktop. I've no idea what Grant is talking about, (above), but it gives me something to look for. I don't have any weird settings by the way.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 109012 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 244
Credit: 441,919
RAC: 531
Message 109013 - Posted: 17 Mar 2024, 12:18:41 UTC

Reduce folding@home to 4 cores(if running on CPU) and Rosetta to 4 cores
ID: 109013 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,959,867
RAC: 1,779
Message 109014 - Posted: 17 Mar 2024, 23:53:01 UTC - in response to Message 80621.  

Please report any issues with work units in this thread.

All recent WUs on at least 3 different hosts all got cut short with the "error while computing"... :(
ID: 109014 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1226
Credit: 14,125,667
RAC: 2,200
Message 109019 - Posted: 18 Mar 2024, 13:15:46 UTC - in response to Message 109014.  

Please report any issues with work units in this thread.

All recent WUs on at least 3 different hosts all got cut short with the "error while computing"... :(


"Error while computing" means that an error was detected, but gives no information about WHAT error.

There's generally at least one more line saying something about what error.
ID: 109019 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Notarick

Send message
Joined: 19 Nov 06
Posts: 1
Credit: 2,589,499
RAC: 1,852
Message 109032 - Posted: 25 Mar 2024, 14:20:29 UTC
Last modified: 25 Mar 2024, 14:29:35 UTC

I'm not sure what is going on but about 1/4 of the work units are reporting "Error while computing" and I am no longer getting new tasks (which I assume is due to the errors). I've been running boinc for years without issue, but I'm still running some CPU stress tests/diagnostics to see if it is a hardware thing. Anway:

My CPU is a Ryzen 7 3700X on a Gigabye X570 UD mobo with 32 GB of ram and a Radeon RX 570 video card. I'm running Windows 10. The only thing that has changed recently is a UEFI update.

Here is the diagnostic information from one of them (they all have the same error code and similar Stderr output):

Name 7a_hal_d_hal_7aa_13011_d120_0001_SAVE_ALL_OUT_2977649_108_1
Workunit 1383425641
Created 23 Mar 2024, 8:12:33 UTC
Sent 23 Mar 2024, 8:12:35 UTC
Report deadline 26 Mar 2024, 8:12:35 UTC
Received 23 Mar 2024, 8:14:02 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 1 (0x00000001) Unknown error code
Computer ID 6222889
Run time 11 sec
CPU time 2 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 5.15 GFLOPS
Application version Rosetta Beta v6.04
windows_x86_64
Peak working set size 117.73 MB
Peak swap size 88.94 MB
Peak disk usage 0.01 MB

Stderr output

<core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
Incorrect function.
(0x1) - exit code 1 (0x1)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.04_windows_x86_64.exe @7a_hal_d_hal_7aa_13011_d120_0001.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937
Using database: database_0f7f01a1b07database

ERROR: Error in protocols::cyclic_peptide_predict::SimpleCycpepPredictpplication::set_up_n_to_c_cyclization_mover() function: residue 1 does not have a LOWER_CONNECT.
ERROR:: Exit from: src/protocols/cyclic_peptide_predict/SimpleCycpepPredictApplication.cc line: 2442
BOINC:: Error reading and gzipping output datafile: default.out
02:12:51 (5004): called boinc_finish(1)

</stderr_txt>
]]>
ID: 109032 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MJH333

Send message
Joined: 29 Jan 21
Posts: 18
Credit: 5,748,861
RAC: 131
Message 109034 - Posted: 25 Mar 2024, 14:48:08 UTC - in response to Message 109032.  
Last modified: 25 Mar 2024, 15:20:39 UTC

I'm not sure what is going on but about 1/4 of the work units are reporting "Error while computing" and I am no longer getting new tasks (which I assume is due to the errors).
I doubt that there is anything wrong with your system. The latest big batch of work units has run out.

You can see this from the Server Status section on the Rosetta home page: https://boinc.bakerlab.org/rosetta/, which currently shows "Total queued jobs" of 0. It can also be seen from the Project status page: https://boinc.bakerlab.org/rosetta/server_status.php, which currently shows "Tasks ready to send" of 0.

You may, of course, pick up the odd resend, or some Robetta tasks, but there will be no steady flow of tasks until another big batch of work units is released.

We may get another batch if/when the work units you identified with the "residue 1 does not have a LOWER_CONNECT" error are corrected and reissued.

When trying to get an idea of how much work is available, I tend to look at the "Total queued jobs" figure on the Rosetta home page because that shows all the work units available to be crunched, which may be in the millions. Whereas the "Tasks ready to send" figure on the Project status page shows just the tasks ready to be distributed, which is usually no more than 5,000.
ID: 109034 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 270 · 271 · 272 · 273 · 274 · 275 · 276 . . . 286 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org