Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 39 · 40 · 41 · 42 · 43 · 44 · 45 . . . 311 · Next

AuthorMessage
MarkJ

Send message
Joined: 28 Mar 20
Posts: 72
Credit: 25,238,680
RAC: 0
Message 93810 - Posted: 8 Apr 2020, 3:43:20 UTC
Last modified: 8 Apr 2020, 3:45:49 UTC

Now up to 68 with a few more waiting to run. Can we get these removed from the queue please.

Looking at my wing man on a few of them they too are failing so its not just me. I don't want to get my hosts blocked from getting tasks due to what looks like a parameter error with these tasks.
BOINC blog
ID: 93810 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 826
Message 93812 - Posted: 8 Apr 2020, 3:53:24 UTC - in response to Message 93800.  

Got 56 CF_monomer/Rosetta Mini work units that all failed with an instant "Error while computing". Running Linux x64. Other work units seem fine, just not these ones. I ended up aborting the few that hadn't committed suicide.

I notice the scheduler is down now so maybe they are removing them from the queue.

[snip]

I got a few under 64 bit Windows 10. Those hit errors within a few seconds of starting.
ID: 93812 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
SIXER (L.Gammel)

Send message
Joined: 6 Apr 20
Posts: 1
Credit: 393,432
RAC: 0
Message 93814 - Posted: 8 Apr 2020, 4:59:46 UTC

I also switched from seti to rosetta but am getting a message about disc space please help;

Rosetta@home: Notice from server
Rosetta Mini needs 183.28MB more disk space. You currently have 1533.33 MB available and it needs 1716.61 MB.

what do I need to do???
Sixer
ID: 93814 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Marcos Carot

Send message
Joined: 30 Dec 11
Posts: 3
Credit: 301,124
RAC: 0
Message 93815 - Posted: 8 Apr 2020, 5:11:21 UTC - in response to Message 93814.  

Yes, I got the same message... just had to make space deleting other stuff in the partition where BOINC stores the data. In linux, it is not Home, I think it was /var
ID: 93815 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1734
Credit: 18,532,940
RAC: 17,945
Message 93817 - Posted: 8 Apr 2020, 5:37:13 UTC - in response to Message 93814.  

I also switched from seti to rosetta but am getting a message about disc space please help;

Rosetta@home: Notice from server
Rosetta Mini needs 183.28MB more disk space. You currently have 1533.33 MB available and it needs 1716.61 MB.

what do I need to do???
Give it more disk space.
In your Account, Computing preferences
Disk
Use no more than 12GB
Leave at least    2GB free
Use no more than  60% of total

Memory
When computer is in use, use at most     90%
When computer is not in use, use at most 95%
Even so, you will also need to add more RAM, or reduce the number of CPU cores/threads you use (down to 6 or even 5) so you don't run out of RAM (as much as 1.3GB of RAM per task can be required, although present tasks are only using 800MB or less).
Grant
Darwin NT
ID: 93817 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1734
Credit: 18,532,940
RAC: 17,945
Message 93818 - Posted: 8 Apr 2020, 5:48:55 UTC - in response to Message 93809.  

Trying and even eager to be of help, but...

All these short deadline units are troublesome. Is it accomplishing anything if my contributions are just discarded? And discarded for the sake of deadlines that seem quite arbitrary, even silly. Exacerbated by more checkpoint problems, too.

Actually writing from the machine that has the most problems dealing with the deadlines, but even some of my bigger machines clearly have more queued tasks than they can possibly complete within the short deadlines. Obvious workaround (though it's tedious) is to manually abort the tasks that can't be completed, but that causes problems because the flow of tasks has become sporadic again... Plus its wasting the bandwidth at the project end when they send data that is just discarded.

On top of that, some of the machines wind up wasting time because of large batches of tasks with large memory requirements that cause the "Waiting for memory" status on some tasks. Again, selective nuking of tasks can get the CPU's busy again
Simple.
Set a small cache, 1 day or less, additional days .02 or so. Make sure your checkpointing request is set for 60 seconds.
Set the number of threads used for crunching equal to or less than the number that can be supported with the RAM your systems have allowing for 1.3GB per task (unless you wish to add more- that also solves the problem) and in your computing preferences allow BOINC to make use of the RAM you have (set it for 90% or higher). Don't abort the Tasks, if they miss the deadline the project will resend them to another host.

Work gets done, errors reduced (if not eliminated), the Manager will figure out how much work you can & can't do and stop getting too much, and it won't require frequent intervention on your part.
Grant
Darwin NT
ID: 93818 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
michelv

Send message
Joined: 28 Mar 20
Posts: 8
Credit: 216,762
RAC: 0
Message 93820 - Posted: 8 Apr 2020, 6:07:39 UTC - in response to Message 93812.  

I got a few under 64 bit Windows 10. Those hit errors within a few seconds of starting.

Same here, just a few minutes ago.
ID: 93820 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1734
Credit: 18,532,940
RAC: 17,945
Message 93821 - Posted: 8 Apr 2020, 6:14:27 UTC - in response to Message 93812.  
Last modified: 8 Apr 2020, 6:32:10 UTC

Got 56 CF_monomer/Rosetta Mini work units that all failed with an instant "Error while computing". Running Linux x64. Other work units seem fine, just not these ones. I ended up aborting the few that hadn't committed suicide.

I notice the scheduler is down now so maybe they are removing them from the queue.

[snip]

I got a few under 64 bit Windows 10. Those hit errors within a few seconds of starting.

Decided to give mine a try (WIn10).
one Task no problem, crunching away using minirosetta_3.78_windows_x86_64.exe

On the other system, every one was an instant Computation error and they were set to run with Rosetta Mini v3.78 windows_intelx86, and there was one Task using Rosetta Mini v3.78 windows_x86_64 that errored out as well.

Presently processing normally (40min so far so good)
CF_monomer_12_fold_SAVE_ALL_OUT_905409_765_0    Rosetta Mini v3.78 windows_x86_64




All failed instantly
CF_monomer_78_fold_SAVE_ALL_OUT_905546_126_1      Rosetta Mini v3.78 windows_intelx86
CF_monomer_37_fold_SAVE_ALL_OUT_905505_203_0      Rosetta Mini v3.78 windows_intelx86
CF_monomer_103_fold_SAVE_ALL_OUT_905615_59_0      Rosetta Mini v3.78 windows_intelx86
CF_monomer_103_fold_SAVE_ALL_OUT_905620_58_0      Rosetta Mini v3.78 windows_intelx86
CF_monomer_103_relax_SAVE_ALL_OUT_905662_69_0     Rosetta Mini v3.78 windows_x86_64
CF_monomer_103_fold_SAVE_ALL_OUT_905589_56_0      Rosetta Mini v3.78 windows_intelx86
CF_monomer_103_fold_SAVE_ALL_OUT_905646_12_1      Rosetta Mini v3.78 windows_intelx86



<core_client_version>7.6.33</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1 (0xffffffff)
</message>
<stderr_txt>
[2020- 4- 8 15:26:42:] :: BOINC:: Initializing ... ok.
[2020- 4- 8 15:26:42:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully. 
command: projects/boinc.bakerlab.org_rosetta/minirosetta_3.78_windows_x86_64.exe -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:native 00001.pdb -corrections::beta_nov16 -silent_gz 1 -frag9 00001.200.9mers -out:file:silent default.out -ex1 1 -abinitio::rsd_wt_loop 0.5 -relax::default_repeats 15 -abinitio::use_filters false -abinitio::increase_cycles 10 -abinitio::rsd_wt_helix 0.5 -abinitio::rg_reweight 0.5 -in:file:boinc_wu_zip CF_monomer_103_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 2047753
Registering options.. 
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok 
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize()  End reached
ERROR: Option matching -corrections:beta_nov16 not found in command line top-level context

</stderr_txt>
]]>

Grant
Darwin NT
ID: 93821 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MarkJ

Send message
Joined: 28 Mar 20
Posts: 72
Credit: 25,238,680
RAC: 0
Message 93822 - Posted: 8 Apr 2020, 6:14:44 UTC - in response to Message 93818.  
Last modified: 8 Apr 2020, 6:17:45 UTC

<snipped>
Set a small cache, 1 day or less, additional days .02 or so. Make sure your checkpointing request is set for 60 seconds.
Set the number of threads used for crunching equal to or less than the number that can be supported with the RAM your systems have allowing for 1.3GB per task (unless you wish to add more- that also solves the problem) and in your computing preferences allow BOINC to make use of the RAM you have (set it for 90% or higher). Don't abort the Tasks, if they miss the deadline the project will resend them to another host.

Running a 0.33 day cache on my 6 core/12 thread machines with 32GB of memory.

On the couple of Pi's that I have running Rosetta I had to create an app_config to limit the number of tasks running at once.
BOINC blog
ID: 93822 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
jackielan2000

Send message
Joined: 5 Sep 06
Posts: 13
Credit: 14,208
RAC: 0
Message 93830 - Posted: 8 Apr 2020, 7:50:16 UTC - in response to Message 93794.  

A Windows XP pc connected to internet? A Pentium 4? Yeah.... well.... the problem seems self explanatory, too old...
Windows XP is horribly unsecure, at least switch to Linux.


Well. I'm currently living at my dad's place and this is his PC, the only one I can have. It still can run WUs from WCG and Asteroid@Home. The problem is not how powerful the PC is. It's why current Rosetta WUs don't work with it? Compatible issue? A month ago there was no problem, even my older PC, a PIII/WinXP, run Rosetta WUs pretty well.


. . They have revised the app from 4.07/4.08 to 4.12. This new version asks a bit more from the hardware and seems to be a bit of a memory hog, see the thread on use of memory and Rosetta. I have an i5 with 8GB ram and I have had problems getting it to run reliably. But there may be another cause.

Stephen

?


8G RAM? Well, that means it can only be run in x64 systems. Ok, bye bye Rosetta.
ID: 93830 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1734
Credit: 18,532,940
RAC: 17,945
Message 93835 - Posted: 8 Apr 2020, 8:25:33 UTC - in response to Message 93830.  
Last modified: 8 Apr 2020, 8:27:59 UTC

8G RAM? Well, that means it can only be run in x64 systems. Ok, bye bye Rosetta.
You need up to 1.3GB RAM per Task you run.
So even a system with less than 4GB can run 2 Tasks.
Grant
Darwin NT
ID: 93835 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rames

Send message
Joined: 12 Nov 16
Posts: 5
Credit: 3,000,551
RAC: 0
Message 93838 - Posted: 8 Apr 2020, 8:59:21 UTC

Hi there!

I am having calculation error problems too. I have 2 older 45nm opteron server with 24gb ram 12threads each, other can only run 5 rosetta threads at this moment, all others fail. Both have linux with old hdd. Other server have now 2 running tasks and i can tell later when wcg projects are done, but ive seen some errors already.

With lower ram perthread 32nm Xeon servers are running fine with latest rosetta version, same with ryzen based computers.
ID: 93838 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1734
Credit: 18,532,940
RAC: 17,945
Message 93839 - Posted: 8 Apr 2020, 9:04:52 UTC - in response to Message 93838.  

I am having calculation error problems too.
With your systems hidden it's difficult to help, but there are faulty Rosetta Mini Tasks about at present that fail pretty much instantly.
Grant
Darwin NT
ID: 93839 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rames

Send message
Joined: 12 Nov 16
Posts: 5
Credit: 3,000,551
RAC: 0
Message 93840 - Posted: 8 Apr 2020, 9:11:37 UTC - in response to Message 93839.  
Last modified: 8 Apr 2020, 9:17:37 UTC

Ive tried to found setting for that, where is it? how i can un-hide computers lol.

I found it!

https://stats.free-dc.org/stats.php?page=hostbycpid&cpid=93e13a8ffd12ebd3a4c1a0b75e8eba76
ID: 93840 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1734
Credit: 18,532,940
RAC: 17,945
Message 93841 - Posted: 8 Apr 2020, 9:30:59 UTC - in response to Message 93840.  

Ive tried to found setting for that, where is it? how i can un-hide computers lol.

I found it!
Still hidden when i looked again.
If you just go to your account, under "Computing and credit" click on Tasks, Click on Error and it will show all the errored Tasks, and the application.
If it was Rosetta Mini vX3.XX etc, etc then it's due to the dodgy Tasks.



In your account, Rosetta@home preferences, "Should Rosetta@home show your computers on its web site?" needs a tick against it, and "Update preferences" then clicked (some changes do require the BOINC Manager to contact the project before they take effect).
Grant
Darwin NT
ID: 93841 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rames

Send message
Joined: 12 Nov 16
Posts: 5
Credit: 3,000,551
RAC: 0
Message 93842 - Posted: 8 Apr 2020, 9:39:54 UTC - in response to Message 93841.  

Ive done that.

Rosetta v4.12
x86_64-pc-linux-gnu

I check that id matches with opteron and yeah it is correct. I just contacted server and it have now 8 tasks running with more than 5 minutes runtime. All failures have runtime from 5sec to 2minutes.
ID: 93842 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 180
Credit: 5,386,173
RAC: 0
Message 93843 - Posted: 8 Apr 2020, 9:46:57 UTC - in response to Message 93842.  
Last modified: 8 Apr 2020, 9:48:26 UTC

I'm also having issues with the Rosetta mini 3.78. I'm on Windows 10 64bit version 1909, with 3 out of 3 tasks generating a computational error almost instantly. Doesn't seem to be localized to my machine, since my wing-men are also getting the same errors

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1 (0xffffffff)</message>
<stderr_txt>
[2020- 4- 7 22:31:57:] :: BOINC:: Initializing ... ok.
[2020- 4- 7 22:31:57:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully. 
command: projects/boinc.bakerlab.org_rosetta/minirosetta_3.78_windows_x86_64.exe -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:native 00001.pdb -corrections::beta_nov16 -silent_gz 1 -frag9 00001.200.9mers -out:file:silent default.out -ex1 1 -abinitio::rsd_wt_loop 0.5 -relax::default_repeats 15 -abinitio::use_filters false -abinitio::increase_cycles 10 -abinitio::rsd_wt_helix 0.5 -abinitio::rg_reweight 0.5 -in:file:boinc_wu_zip CF_monomer_103_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 2049779
Registering options.. 
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok 
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize()  End reached
ERROR: Option matching -corrections:beta_nov16 not found in command line top-level context

</stderr_txt>
]]>
ID: 93843 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1734
Credit: 18,532,940
RAC: 17,945
Message 93845 - Posted: 8 Apr 2020, 9:56:25 UTC - in response to Message 93842.  
Last modified: 8 Apr 2020, 9:57:07 UTC

Ive done that.

[quote]Rosetta v4.12
x86_64-pc-linux-gnu
Tasks run by that application haven't been having issues, just the Rosetta Mini ones.

Managed to find a Linux host with lots of them
Rosetta Mini v3.78 x86_64-pc-linux-gnu

Stderr ouptut as posted by Tomcat above.
Grant
Darwin NT
ID: 93845 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rames

Send message
Joined: 12 Nov 16
Posts: 5
Credit: 3,000,551
RAC: 0
Message 93865 - Posted: 8 Apr 2020, 13:46:58 UTC - in response to Message 93845.  

Servers are running 8 and 9 rosetta packets now.

3oy2uu2b_Mini_Protein_binds_IL1R_COVID-19_design5_SAVE_ALL_OUT_905407_5
Same packets seems to run fine too. Its random, could it be hw related? Im waiting courier to deliever me cpu upgrade from 1.8ghz to 2.6ghz.

Its strange why they fail.
ID: 93865 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stephen "Heretic"

Send message
Joined: 2 Apr 20
Posts: 21
Credit: 11,028
RAC: 0
Message 93867 - Posted: 8 Apr 2020, 14:00:29 UTC - in response to Message 93835.  

8G RAM? Well, that means it can only be run in x64 systems. Ok, bye bye Rosetta.
You need up to 1.3GB RAM per Task you run.
So even a system with less than 4GB can run 2 Tasks.


. . If you have absolutely nothing else happening you could get away with 4GB, if you have only 2GB, rotsa ruck!

Stephen

. .
ID: 93867 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 39 · 40 · 41 · 42 · 43 · 44 · 45 . . . 311 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org