Posts by csbyseti

1) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 101857)
Posted 20 May 2021 by csbyseti
Post:
Bug in Rosetta Windows App.

If i'll try to use 2 instances of Boinc for Rosetta i get an error on all WU's crunched in the second instance at WU startup.
It's not possible to use 2 active instances at the same time. The later activated instance errors out every WU.

It's not a problem of the Instance System on my systems (works fine with other projects, except WCG MIP wich is using the same Rosetta App).
The 3900X got 64GB of Ram for 24 threads, it should not be an memory problem.

Working with Boinc instances on my Linux System works fine with Rosetta. Machines have the same memory / thread size.

Example WU: https://boinc.bakerlab.org/rosetta/result.php?resultid=1383239722

It looks like starting Rosetta from 2 different Directories at the same time will run in an error on Windows systems.
2) Message boards : News : Coronavirus update from David Baker. Thank you all for your contributions! (Message 99161)
Posted 27 Sep 2020 by csbyseti
Post:
you have you to use the blue - (minus) Button to minimize the message, then the red delete Button will be shown.
3) Message boards : Number crunching : Rosetta 4.1+ and 4.2+ (Message 93881)
Posted 8 Apr 2020 by csbyseti
Post:
Machines run 32 bit must be really old with low speed cpu's.

Or they’re reasonably recent systems running a 32-bit operating system.


Makes no sense because of memory limitation. There is no reason using a 32 bit OS on a modern CPU.
Or you want to use old software which don't work on modern OS. But why must this machine run Boinc?

Perhaps they can count the number of 32-bit Systems, adding the generated TFlops and then decide to stop 32-bit app or not.

Perhaps the Rosetta Admins should think about removing 32-bit apps for x86 cpu's.
Why? They're no slower than the equivalent 64bit application.


The Projekt developers have to support two more App-Versions not really needed anymore.
4) Message boards : Number crunching : Rosetta 4.1+ and 4.2+ (Message 93849)
Posted 8 Apr 2020 by csbyseti
Post:
The 64-Bit app for Linux works fine with my machine, i think the 32 bit app for Linux has a bug.

Perhaps the Rosetta Admins should think about removing 32-bit apps for x86 cpu's. Machines run 32 bit must be really old with low speed cpu's.
It makes no sense running this machines for Boinc.
5) Message boards : Number crunching : Rosetta 4.1+ and 4.2+ (Message 93655)
Posted 6 Apr 2020 by csbyseti
Post:
Thanks

Issue appears to be with Rosetta v4.12 i686-pc-linux-gnu
No issues with Rosetta v4.12 x86_64-pc-linux-gnu

After project reset getting the proper 64bit app tasks, where before would almost exclusively get tasks for the potential faulty 32bit app.


Thanks for this information. Will test it.

Looked at the other machines, all 64-bit machines got the 32 Bit app.

Why is Rosetta delivering 32-Bit apps to 64-bit machines?
6) Message boards : Number crunching : Rosetta 4.1+ and 4.2+ (Message 93364)
Posted 4 Apr 2020 by csbyseti
Post:
No 4.12 WU works on my Linux Ubuntu System all got the same problem:

Stderr Ausgabe

<core_client_version>7.9.3</core_client_version>
<![CDATA[
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.12_i686-pc-linux-gnu -run:protocol jd2_scripting -parser:protocol predictor_v11_boinc--fuse--covid_spike_design_boinc_v1.xml @flags_jhr_cv -in:file:silent 6cp3nh2c_Junior_HalfRoid_vs_COVID-19_design1_dev.silent -in:file:silent_struct_type binary -silent_gz -mute all -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip 6cp3nh2c_Junior_HalfRoid_vs_COVID-19_design1_dev.zip @6cp3nh2c_Junior_HalfRoid_vs_COVID-19_design1_dev.flags -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937
Starting watchdog...
Watchdog active.
BOINC:: CPU time: 43776.6s, 14400s + 28800s[2020- 4- 4 11:33:34:] :: BOINC
WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1
InternalDecoyCount: 0 (GZ)
-----
0
-----
Stream information inconsistent.
Writing W_0000001
======================================================
DONE :: 1 starting structures 43776.6 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
11:33:34 (18264): called boinc_finish(0)

</stderr_txt>
]]>

I stopped all 4.12 WU's on the Linux system, waiting for a bugfix. Switches this machine to TN-Grid, They have also Workunits special for the Corona problem.
7) Message boards : Number crunching : Rosetta 4.1+ and 4.2+ (Message 93258)
Posted 3 Apr 2020 by csbyseti
Post:
Version 4.12 seem under Ubuntu 18.04 faulty, first i wonder about 12 hours runtime instead of 8 hours, then i see only 20 Credits for the work.

looking in the Stderr output shows

Stderr Ausgabe

<core_client_version>7.9.3</core_client_version>
<![CDATA[
<stderr_txt>
command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.12_i686-pc-linux-gnu -run:protocol jd2_scripting -parser:protocol predictor_v11_boinc--fuse--covid_spike_design_boinc_v1.xml @flags_jhr_cv -in:file:silent 9au6ic2s_Junior_HalfRoid_vs_COVID-19_design1_dev.silent -in:file:silent_struct_type binary -silent_gz -mute all -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip 9au6ic2s_Junior_HalfRoid_vs_COVID-19_design1_dev.zip @9au6ic2s_Junior_HalfRoid_vs_COVID-19_design1_dev.flags -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937
Starting watchdog...
Watchdog active.
BOINC:: CPU time: 43777.2s, 14400s + 28800s[2020- 4- 3 11:11:53:] :: BOINC
WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1
InternalDecoyCount: 0 (GZ)
-----
0
-----
Stream information inconsistent.
Writing W_0000001
======================================================
DONE :: 1 starting structures 43777.2 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
11:11:53 (4726): called boinc_finish(0)

</stderr_txt>
]]>

Only 1 structure calculated in 12 hours, the Windows CPU Pendant shows over 70 structures in 8 hours.
It seem that the Linux App doesnt really calculate.
All 12 Results show this behaviour.
8) Message boards : Number crunching : Problems with Rosetta version 5.93 (Message 50959)
Posted 25 Jan 2008 by csbyseti
Post:
Ended by watchdog, and running beyond their runtime target are two rather different things.


I think i'll mean the same as FalconFly.
The 2h4o - WU's have got a Problem. I restartet one WU on the Quad, CPU-time jumps down to 1h:xx (last working Checkpoint?) and seems to be running. Finished with wrong CPU-Time of 11337 sec (3h:8) but with Heartbeat-error.

http://boinc.bakerlab.org/rosetta/result.php?resultid=135428650

On the X2 the CPU-Time jumps down to 0h:0x after restart (from 6h:59), seems to run but dont work anymore until the watchdog will stop it. This WU would konsum 4x3h + 6h:59 = 19h of CPU-Time.
If such a WU will be stopped and restarted because of the Scheduler and resets the CPU-Time it will be a never ending loop.
9) Message boards : Number crunching : Problems with Rosetta version 5.93 (Message 50948)
Posted 24 Jan 2008 by csbyseti
Post:
See my Post above. Its not a Problem with the target runtime, i've got 3 cut off by watchdog and the fourth is aktuell running (only a pic in the native Window, nothing else).
10) Message boards : Number crunching : Problems with Rosetta version 5.93 (Message 50936)
Posted 24 Jan 2008 by csbyseti
Post:
2h4o.........
seems to have an Problem. Got 3 of them with the same problem.

http://boinc.bakerlab.org/rosetta/result.php?resultid=135428621

'<core_client_version>5.3.12.tx36</core_client_version>
<stderr_txt>
# cpu_run_time_pref: 10800
# random seed: 1755374
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
CPU time: 46787.2 seconds. Greater than 4X preferred time: 10800 seconds
**********************************************************************
GZIP SILENT FILE: .xx2h4o.out

</stderr_txt>'

Shutdown by watchdog because of long run time.
Should all of the 2h4o WU's deleted?
11) Message boards : Number crunching : Why is the Rosetta Version 4.83 not functioning? (Message 13089)
Posted 5 Apr 2006 by csbyseti
Post:
Hello @All,

this is my first thread in english, even by Rosetta and i don't know if this is the correct Forum for my Problem.

I've the problem that the Rosetta client won't work correctly on my maschines. The W98 Clients won't work correctly, but also the WXP Client produces errors.

http://boinc.bakerlab.org/rosetta/results.php?hostid=166788

This is the W98 Client i am writing this tread. This Maschine works. The client seems to be working but often the Cpu-Counter will not count (opening the grafik screen will show many caculations, but no CPU-Time). But the WU is displayed as completed with 2-3 hours runtime.
So Wu-Time is zero, granted Credit is zero.

But the maschines are working without errors, and spend CPU-Time for not working WU's.

Please put my 'question' in the right 'forum', i am not shure that i have fetch the right one.

The numbers of failures i've get in the last week are so big that the only solving will be 'don't do some calculation for rosetta'.

csbyrosetta






©2024 University of Washington
https://www.bakerlab.org