Rosetta@home

Minirosetta 3.62-3.65

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search
Message boards : Number crunching : Minirosetta 3.62-3.65

Sort
AuthorMessage
David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 78684 - Posted 3 Sep 2015 21:38:11 UTC

The minirosetta application has been updated to the latest Rosetta source. This update includes a new optimized score function which improves all scientific benchmarks. Please post any bugs/issues related to this update here.

dcdc Profile

Joined: Nov 3 05
Posts: 1596
ID: 8948
Credit: 33,659,946
RAC: 16,006
Message 78687 - Posted 4 Sep 2015 16:08:42 UTC
Last modified: 4 Sep 2015 16:08:57 UTC

Is it possible to say/test what effect SSE is having? Presumably it varies by CPU and the same unit would have to be run with SSE off?
____________

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 78688 - Posted 4 Sep 2015 17:34:31 UTC - in response to Message ID 78687.

Is it possible to say/test what effect SSE is having? Presumably it varies by CPU and the same unit would have to be run with SSE off?


SSE is not being used in these updates.

Dr. Merkwürdigliebe Profile
Avatar

Joined: Dec 5 10
Posts: 74
ID: 404270
Credit: 2,417,348
RAC: 3,829
Message 78689 - Posted 4 Sep 2015 17:55:41 UTC

minirosetta_3.62_x86_64-pc-linux-gnu: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, for GNU/Linux 2.6.9, stripped

A bit disappointed here...

Link
Avatar

Joined: May 4 07
Posts: 260
ID: 173059
Credit: 335,263
RAC: 146
Message 78690 - Posted 4 Sep 2015 18:05:56 UTC

I see the minirosetta_database got quite a lot larger, from 248MB to 411MB unpacked. Not so good considering this thing is extracted each time a new WU starts, lets see how my soon 10 years old laptop and it's hard drive will like it...
____________
.

Timo Profile
Avatar

Joined: Jan 9 12
Posts: 171
ID: 440120
Credit: 10,317,960
RAC: 15,380
Message 78691 - Posted 4 Sep 2015 20:12:55 UTC
Last modified: 4 Sep 2015 20:26:03 UTC

I am showing so far that many 3.62 tasks are encountering 'Validate Errors', though not all of them.

The following tasks are all variations of a project with the title like 'FFD__ xxxxx_abinitioDocking':
Task 755931518 - Validate error
Task 755927819 - Validate error

What sucks is the tasks run for the full runtime (in my case, 8 hours) but end up failing validation

.. However, it's not all bad news some other 3.62 tasks, one of which is 'FFD__xxxxx_insulin_fourH2_Thu_xxxx' are finishing successfully

as well as most all of the 3.62 'rb_' type tasks which seem to be finishing successfully. This leads me to believe it may be an issue with some of the FFD abinitio docking jobs.

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 78692 - Posted 4 Sep 2015 21:07:30 UTC - in response to Message ID 78691.

I am showing so far that many 3.62 tasks are encountering 'Validate Errors', though not all of them.

The following tasks are all variations of a project with the title like 'FFD__ xxxxx_abinitioDocking':
Task 755931518 - Validate error
Task 755927819 - Validate error

What sucks is the tasks run for the full runtime (in my case, 8 hours) but end up failing validation

.. However, it's not all bad news some other 3.62 tasks, one of which is 'FFD__xxxxx_insulin_fourH2_Thu_xxxx' are finishing successfully

as well as most all of the 3.62 'rb_' type tasks which seem to be finishing successfully. This leads me to believe it may be an issue with some of the FFD abinitio docking jobs.


Thanks for the heads up. We are currently looking into this.

wyxchari Profile

Joined: Nov 27 14
Posts: 11
ID: 1022309
Credit: 40,803
RAC: 0
Message 78693 - Posted 4 Sep 2015 21:52:53 UTC

From September 3, not me work tasks.

756146726 685379842 4 Sep 2015 21:43:22 UTC 4 Sep 2015 21:47:31 UTC Over Client error Compute error 0.00 0.00 ---
756146379 685379499 4 Sep 2015 21:47:31 UTC 4 Sep 2015 21:51:38 UTC Over Client error Compute error 0.00 0.00 ---
756142269 685191678 4 Sep 2015 21:11:45 UTC 4 Sep 2015 21:43:22 UTC Over Client error Compute error 0.00 0.00 ---
756136139 685070740 4 Sep 2015 20:35:04 UTC 4 Sep 2015 21:11:45 UTC Over Client detached New 0.00 --- ---
756108060 685348417 4 Sep 2015 17:43:43 UTC 4 Sep 2015 20:35:04 UTC Over Client detached New 0.00 --- ---
756107378 685347883 4 Sep 2015 17:37:58 UTC 4 Sep 2015 20:35:04 UTC Over Client detached New 0.00 --- ---
756106387 685347020 4 Sep 2015 17:33:50 UTC 4 Sep 2015 17:37:58 UTC Over Client error Compute error 0.00 0.00 ---
756106282 685345669 4 Sep 2015 17:29:17 UTC 4 Sep 2015 17:33:50 UTC Over Client error Compute error 0.00 0.00 ---
756047933 685297734 4 Sep 2015 11:47:45 UTC 4 Sep 2015 17:29:17 UTC Over Client error Compute error 0.00 0.00 ---
756046735 685296655 4 Sep 2015 11:42:49 UTC 4 Sep 2015 11:43:37 UTC Over Client error Compute error 0.00 0.00 ---
756031213 685283552 4 Sep 2015 10:09:51 UTC 4 Sep 2015 10:13:57 UTC Over Client error Compute error 0.00 0.00 ---
756030152 685282685 4 Sep 2015 10:03:59 UTC 4 Sep 2015 10:05:43 UTC Over Client error Compute error 0.00 0.00 ---
755997565 685255620 4 Sep 2015 6:47:27 UTC 4 Sep 2015 10:03:59 UTC Over Client error Compute error 0.00 0.00 ---
755987746 685246348 4 Sep 2015 10:13:57 UTC 4 Sep 2015 11:42:49 UTC Over Client error Compute error 0.00 0.00 ---
755906084 685172826 3 Sep 2015 20:36:37 UTC 4 Sep 2015 6:51:35 UTC Over Success Done 5,786.07 12.84 8.77

Timo Profile
Avatar

Joined: Jan 9 12
Posts: 171
ID: 440120
Credit: 10,317,960
RAC: 15,380
Message 78694 - Posted 4 Sep 2015 23:09:12 UTC
Last modified: 4 Sep 2015 23:09:38 UTC

Just an update, I let a few more tasks finish and I still haven't had a single successful Abinitio docking job that has a title like FFD__xxxx.

I've aborted all FFD__xxxx 'abinitioDocking' type jobs on all of my hosts. It definitely looks to be an issue with this particular job type.

Thanks for your hard work David, wish there was more I could do on my end, don't hesitate to ping me if I can be of any help!

Tim

Dr. Merkwürdigliebe Profile
Avatar

Joined: Dec 5 10
Posts: 74
ID: 404270
Credit: 2,417,348
RAC: 3,829
Message 78695 - Posted 5 Sep 2015 0:48:07 UTC

Task ID 756094939 failed under Linux, too.

fseeger
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Nov 3 14
Posts: 1
ID: 1017867
Credit: 75,725
RAC: 0
Message 78696 - Posted 5 Sep 2015 1:22:33 UTC - in response to Message ID 78695.

Task ID 756094939 failed under Linux, too.



Hi all,
These seem to be mine but I am not sure what is going on. I submitted them a week ago and they seemed to have run fine until the update yesterday. Could you provide me with more information about the jobs that are failing?

Thanks,
Franziska

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 78697 - Posted 5 Sep 2015 3:01:35 UTC

You wanted more, enjoy.

My rigs got validate error on these.( See below )

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=682557581

1da894305e9a610bd5667cc03eeff6fd_hhhcommon_15_08_20_46_44_globalDocking_4_SAVE_ALL_OUT_292589_6_1


# cpu_run_time_pref: 14400
======================================================
DONE :: 710 starting structures 14388.2 cpu seconds
This process generated 710 decoys from 710 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

--------------------------------------------------

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685290795

FFD__368f3629c8dc09f0ae4d4402193bd7e0_abinitioDocking_15_08_21_50_55_globalDocking_5_SAVE_ALL_OUT_301979_6_0

# cpu_run_time_pref: 14400
======================================================
DONE :: 425 starting structures 14368.3 cpu seconds
This process generated 425 decoys from 425 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

--------------------------------------------------------------


https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685229735


FFD__368f3629c8dc09f0ae4d4402193bd7e0_abinitioDocking_15_08_21_50_55_globalDocking_5_SAVE_ALL_OUT_301979_6_0


# cpu_run_time_pref: 14400
======================================================
DONE :: 425 starting structures 14368.3 cpu seconds
This process generated 425 decoys from 425 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

----------------------------------------------------------


https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685229735


FFD__61553d966fba4a94dcf25c2a24befc61_abinitioDocking_15_08_14_11_53_globalDocking_4_SAVE_ALL_OUT_299060_4_0


# cpu_run_time_pref: 14400
======================================================
DONE :: 253 starting structures 14362.2 cpu seconds
This process generated 253 decoys from 253 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

-----------------------------------------

More validate errors.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685188207

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=682334661

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685181282

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685181270

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685181239

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685181197

=============================================================

These are Compute errors.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685185054

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685181238

____________


Chilean Profile
Avatar

Joined: Oct 16 05
Posts: 651
ID: 5008
Credit: 10,201,445
RAC: 4,663
Message 78698 - Posted 5 Sep 2015 4:41:44 UTC - in response to Message ID 78689.

minirosetta_3.62_x86_64-pc-linux-gnu: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, for GNU/Linux 2.6.9, stripped

A bit disappointed here...


Same.
At least we're now running a more advanced energy function, but still, I find it frustrating that we're not squeezing the best out of our CPUs.
I mean, the extensions on the CPUs are there for a reason! Performance! Haha
____________

wyxchari Profile

Joined: Nov 27 14
Posts: 11
ID: 1022309
Credit: 40,803
RAC: 0
Message 78699 - Posted 5 Sep 2015 17:01:59 UTC

I gave a "do not ask for new tasks" in Rosetta.
I add while other projects.

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 78700 - Posted 5 Sep 2015 22:38:49 UTC

More validate errors this morning.


https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685225333


FFD__1885c0418f264a32e397ffab4318bbf3_abinitioDocking_15_08_14_27_34_globalDocking_6_SAVE_ALL_OUT_298551_4_0

# cpu_run_time_pref: 14400
======================================================
DONE :: 201 starting structures 14379.4 cpu seconds
This process generated 201 decoys from 201 attempts
======================================================
BOINC :: WS_max 1.35666e-166

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

=================================================


https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685381286


FFD__7a0bd05235f76b7c212c91d604e6b896_abinitioDocking_15_08_10_13_49_globalDocking_0_SAVE_ALL_OUT_300974_4_0


# cpu_run_time_pref: 14400
======================================================
DONE :: 261 starting structures 14384 cpu seconds
This process generated 261 decoys from 261 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

=================================================


https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685381283


FFD__7a0bd05235f76b7c212c91d604e6b896_abinitioDocking_15_08_10_13_49_localDocking_8_SAVE_ALL_OUT_300974_4_0


# cpu_run_time_pref: 14400
======================================================
DONE :: 273 starting structures 14369.2 cpu seconds
This process generated 273 decoys from 273 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

=========================================================

And more FFD validate errors.


https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685299006


https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685330215


https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685308187


https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685291696


https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685378382
____________


Chilean Profile
Avatar

Joined: Oct 16 05
Posts: 651
ID: 5008
Credit: 10,201,445
RAC: 4,663
Message 78701 - Posted 6 Sep 2015 3:57:47 UTC

Validate errors here as well:

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685075655
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685375792
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685375707
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685307845
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685333968
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685333966
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=685306097
____________

dkester788

Joined: Oct 22 14
Posts: 2
ID: 1015748
Credit: 15,564
RAC: 0
Message 78707 - Posted 6 Sep 2015 16:33:26 UTC

I'm new to Rosetta@home in the last couple of days so the majority of the WUs that I've completed recently have ended up in a "Validation Failed" state. The bad part about it is the 5-6 hours of processing time for each job and we don't get any credit for the work. If I can provide additional error logs or information to fix the issue please let me know. I'd like to continue supporting this worthwhile project.

Thanks!

http://boinc.bakerlab.org/rosetta/result.php?resultid=756421006
http://boinc.bakerlab.org/rosetta/result.php?resultid=756420997
http://boinc.bakerlab.org/rosetta/result.php?resultid=756420978
http://boinc.bakerlab.org/rosetta/result.php?resultid=756420964
http://boinc.bakerlab.org/rosetta/result.php?resultid=756420963
http://boinc.bakerlab.org/rosetta/result.php?resultid=756420962

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 78708 - Posted 6 Sep 2015 17:05:15 UTC - in response to Message ID 78698.

At least we're now running a more advanced energy function, but still, I find it frustrating that we're not squeezing the best out of our CPUs.
I mean, the extensions on the CPUs are there for a reason! Performance! Haha


+1
But it's better to have a stable version of app and after optimization.
They have years of delay in optimizations, wait a little bit will not change much :-)
____________

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 78709 - Posted 6 Sep 2015 18:18:41 UTC

Taking a look at mine, I have validate errors for

FFD__3abb772f509ff20673e3db3ba6bd5b23_abinitioDocking_15_08_09_15_46_globalDocking_2_SAVE_ALL_OUT_298234_6_0
FFD__f39ecdd93693aa6c9f7041bfe2807e70_abinitioDocking_15_08_16_09_28_localDocking_9_SAVE_ALL_OUT_302177_6_0
FFD__14dbe8cafbf291760a429294a8d2b33a_abinitioDocking_15_08_07_29_16_globalDocking_2_SAVE_ALL_OUT_293179_6_0
FFD__ca38249e426d15dd7bcba74daffc6fec_abinitioDocking_15_08_10_30_33_globalDocking_4_SAVE_ALL_OUT_295421_4_0
FFD__ca38249e426d15dd7bcba74daffc6fec_abinitioDocking_15_08_10_30_33_globalDocking_4_SAVE_ALL_OUT_295421_4_0
FFD__4a08875fd86b75d6fb9e75758d3fd2c7_abinitioDocking_15_08_14_56_33_localDocking_8_SAVE_ALL_OUT_295567_4_1
FFD__cc71887ae41e282f8164f40af22faac6_abinitioDocking_15_08_13_27_21_globalDocking_1_SAVE_ALL_OUT_296571_6_0
FFD__cc758423a44ab5206e41382ff85987f2_abinitioDocking_15_08_07_17_34_globalDocking_2_SAVE_ALL_OUT_295398_4_1
FFD__9b9c8c57be964de37d8e6892fab9c249_abinitioDocking_15_08_08_17_54_globalDocking_2_SAVE_ALL_OUT_298082_5_1
FFD__516ef6a3ceed06822bf753aff1b909d2_abinitioDocking_15_08_22_17_49_localDocking_9_SAVE_ALL_OUT_300732_5_0
FFD__1c08f5de7feb7471e5d234979f67806a_abinitioDocking_15_08_04_28_25_globalDocking_0_SAVE_ALL_OUT_293896_6_0
FFD__025d259a867b146b4634e87d80278365_abinitioDocking_15_08_18_23_19_globalDocking_1_SAVE_ALL_OUT_294660_6_1

A pattern forms...
____________

Trotador

Joined: May 30 09
Posts: 61
ID: 318648
Credit: 39,402,039
RAC: 76,148
Message 78710 - Posted 6 Sep 2015 19:44:03 UTC - in response to Message ID 78696.

Task ID 756094939 failed under Linux, too.



Hi all,
These seem to be mine but I am not sure what is going on. I submitted them a week ago and they seemed to have run fine until the update yesterday. Could you provide me with more information about the jobs that are failing?

Thanks,
Franziska


Any idea Franziska? I have suspended all FFD_ tasks. I do not want to abort them to avoid that a new copy is sent to another cruncher.

Thanks for keeping us informed.

Chilean Profile
Avatar

Joined: Oct 16 05
Posts: 651
ID: 5008
Credit: 10,201,445
RAC: 4,663
Message 78713 - Posted 7 Sep 2015 0:28:26 UTC - in response to Message ID 78707.

I'm new to Rosetta@home in the last couple of days so the majority of the WUs that I've completed recently have ended up in a "Validation Failed" state. The bad part about it is the 5-6 hours of processing time for each job and we don't get any credit for the work. If I can provide additional error logs or information to fix the issue please let me know. I'd like to continue supporting this worthwhile project.

Thanks!

http://boinc.bakerlab.org/rosetta/result.php?resultid=756421006
http://boinc.bakerlab.org/rosetta/result.php?resultid=756420997
http://boinc.bakerlab.org/rosetta/result.php?resultid=756420978
http://boinc.bakerlab.org/rosetta/result.php?resultid=756420964
http://boinc.bakerlab.org/rosetta/result.php?resultid=756420963
http://boinc.bakerlab.org/rosetta/result.php?resultid=756420962


You can change the running time in your preferences. I set mine to 2-3 hours for exactly this reason.
____________

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 78716 - Posted 7 Sep 2015 1:24:07 UTC - in response to Message ID 78713.

I'm new to Rosetta@home in the last couple of days so the majority of the WUs that I've completed recently have ended up in a "Validation Failed" state. The bad part about it is the 5-6 hours of processing time for each job and we don't get any credit for the work. If I can provide additional error logs or information to fix the issue please let me know. I'd like to continue supporting this worthwhile project.

Thanks!

http://boinc.bakerlab.org/rosetta/result.php?resultid=756421006
http://boinc.bakerlab.org/rosetta/result.php?resultid=756420997
http://boinc.bakerlab.org/rosetta/result.php?resultid=756420978
http://boinc.bakerlab.org/rosetta/result.php?resultid=756420964
http://boinc.bakerlab.org/rosetta/result.php?resultid=756420963
http://boinc.bakerlab.org/rosetta/result.php?resultid=756420962

You can change the running time in your preferences. I set mine to 2-3 hours for exactly this reason.

It's worth remembering that a clean-up job runs once a day that gives credit (equal to claimed credit) for validation errors, so you don't personally lose credit, even if it is delayed by a day.

I also seem to recall it mentioned that the results submitted are still usable to the science bods. It just <looks> really bad - can't deny that.
____________

wyxchari Profile

Joined: Nov 27 14
Posts: 11
ID: 1022309
Credit: 40,803
RAC: 0
Message 78718 - Posted 7 Sep 2015 5:49:32 UTC
Last modified: 7 Sep 2015 5:58:57 UTC

Rosetta should return to 3.61 to fix it.

wyxchari Profile

Joined: Nov 27 14
Posts: 11
ID: 1022309
Credit: 40,803
RAC: 0
Message 78731 - Posted 8 Sep 2015 9:26:28 UTC
Last modified: 8 Sep 2015 9:39:52 UTC

Since update 3.62, I can not run any task Rosetta.

Works correctly:
FiND@Home
DENIS@Home
SETI@Home
Milkiway@home

I using AMD Athlon 1700 with Windows XP SP3, 2Gb RAM.

Download "minirosetta_3.62_windows_intelx86.exe". Is it okay if my CPU is AMD?

Features:
- MMX instructions
- Extensions to MMX
- 3DNow! technology
- Extensions to 3DNow!
- SSE / Streaming SIMD Extensions
____________

Ed Johnson Profile

Joined: Jun 9 06
Posts: 9
ID: 93892
Credit: 4,704,918
RAC: 0
Message 78733 - Posted 8 Sep 2015 11:07:09 UTC - in response to Message ID 78696.

Task ID 756094939 failed under Linux, too.



Hi all,
These seem to be mine but I am not sure what is going on. I submitted them a week ago and they seemed to have run fine until the update yesterday. Could you provide me with more information about the jobs that are failing?

Thanks,
Franziska

Hi Franziska,

Any news on the FFD work units?
____________

Chilean Profile
Avatar

Joined: Oct 16 05
Posts: 651
ID: 5008
Credit: 10,201,445
RAC: 4,663
Message 78737 - Posted 8 Sep 2015 15:57:09 UTC - in response to Message ID 78731.
Last modified: 8 Sep 2015 15:58:12 UTC

Since update 3.62, I can not run any task Rosetta.

Works correctly:
FiND@Home
DENIS@Home
SETI@Home
Milkiway@home

I using AMD Athlon 1700 with Windows XP SP3, 2Gb RAM.

Download "minirosetta_3.62_windows_intelx86.exe". Is it okay if my CPU is AMD?

Features:
- MMX instructions
- Extensions to MMX
- 3DNow! technology
- Extensions to 3DNow!
- SSE / Streaming SIMD Extensions


The "intelx86" part just refers to the architecture. With that said, you should look into investing into a cheap i3 computer in the future. It'd be WAY faster and use the same, if not less, energy.

Don't get me wrong but you have one crappy computer.

Crappy CPU
Crappy RAM
Crappy Windows XP

Forget it.


Not crappy, just old. I remember my first Athlon 2200+, they were so much better than the then P4.

Any need for that?


Well, he is running a very inefficient CPU by today's standards. Doesn't matter how slow, what matters it's how much power it draws for such slow computation. So he could be doing him a favor.
____________

Irgendware

Joined: May 3 15
Posts: 1
ID: 1084000
Credit: 403,876
RAC: 260
Message 78749 - Posted 10 Sep 2015 9:10:02 UTC

I am also using several PCs with AMD CPUs, but more modern ones - like AMD Turion 64 X2 or an AMD A10-7870K APU. But no Problems here, so far with minirosetta 3.62

Jim1348

Joined: Jan 19 06
Posts: 65
ID: 52455
Credit: 2,074,859
RAC: 9,392
Message 78751 - Posted 10 Sep 2015 16:19:13 UTC

I just put this i7-4771 machine back on Rosetta since 3.62 came out, and no problems with 21 successes and no failures (Win7 64-bit). It looks like the worst is over.
____________

wyxchari Profile

Joined: Nov 27 14
Posts: 11
ID: 1022309
Credit: 40,803
RAC: 0
Message 78754 - Posted 10 Sep 2015 20:20:24 UTC
Last modified: 10 Sep 2015 20:23:45 UTC

Has anyone tried Rosetta in AMD Athlon (without SSE2) with Windows XP?
I have discarded software problems when installing cleaned Windows XP in VMware and the problem persists.
I tend to think that Rosetta 3.62 fails with all Windows XP or Rosetta 3.62 needs SSE2 (like Google Chrome).
The old computer is always on for network use (without specific personal use), so I take the opportunity to BOINC.
For personal use, of course, I use one powerful.
____________

Link
Avatar

Joined: May 4 07
Posts: 260
ID: 173059
Credit: 335,263
RAC: 146
Message 78755 - Posted 10 Sep 2015 21:06:59 UTC - in response to Message ID 78754.

I tend to think that Rosetta 3.62 fails with all Windows XP or Rosetta 3.62 needs SSE2 (like Google Chrome).

Roretta 3.62 runs here on WinXP without any issues. According to the application page it's supposed to be a generic i386 application, however exit code -1073741795 indeed indicates, that this is not the case. So they either need to build again a generic i386 app, or update the application info in the system, so the scheduler don't send it to computers, which can't execute it.
____________
.

wyxchari Profile

Joined: Nov 27 14
Posts: 11
ID: 1022309
Credit: 40,803
RAC: 0
Message 78756 - Posted 10 Sep 2015 21:34:01 UTC
Last modified: 10 Sep 2015 21:36:21 UTC

Thank you. Thank you. Thousand thanks. Finally, a coherent explanation.
AMD Athlon has SSE, but not SSE2. That must be the problem.
I switch to another project (WCG cancer) and I solved the problem.
Thanks again. Greeting.
____________

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 78759 - Posted 11 Sep 2015 2:00:34 UTC - in response to Message ID 78755.

I tend to think that Rosetta 3.62 fails with all Windows XP or Rosetta 3.62 needs SSE2 (like Google Chrome).

Roretta 3.62 runs here on WinXP without any issues. According to the application page it's supposed to be a generic i386 application, however exit code -1073741795 indeed indicates, that this is not the case. So they either need to build again a generic i386 app, or update the application info in the system, so the scheduler don't send it to computers, which can't execute it.

I was about to say it does run on such an AMD machine under XP but I noticed I'm using an AMD Athlon 64 X2 Dual Core Processor 3800+ on my backup machine which does have SSE2. That explains why things are fine here. Thanks. Hanging on by my fingertips!

____________

Dr. Merkwürdigliebe Profile
Avatar

Joined: Dec 5 10
Posts: 74
ID: 404270
Credit: 2,417,348
RAC: 3,829
Message 78763 - Posted 11 Sep 2015 6:26:20 UTC

Just another example of why it's about time to finally discard support for ancient CPUs.

They don't pack a punch. Everyone cares about some ancient Athlon, nobody (really) cares about the use of modern instruction sets like SSEx that have been around for ages now. For ages...

AVX support not even mentioned. In a few years most people will have a CPU with such capabilities. But hey, why care about things like doubling the performance or so?

Let's rather see how we can cater to people with ancient CPUs, next to nothing RAM and an OS that has already been phased out for good reasons.

Sorry, admins, developers, this is the completely wrong way. Scientists in the HPC sector would laugh about this.

Jim1348

Joined: Jan 19 06
Posts: 65
ID: 52455
Credit: 2,074,859
RAC: 9,392
Message 78766 - Posted 11 Sep 2015 14:53:37 UTC

Also, the present application is 64-bits. Maybe the old CPUs are 32 bits?
____________

Chilean Profile
Avatar

Joined: Oct 16 05
Posts: 651
ID: 5008
Credit: 10,201,445
RAC: 4,663
Message 78768 - Posted 11 Sep 2015 18:14:44 UTC - in response to Message ID 78755.

I tend to think that Rosetta 3.62 fails with all Windows XP or Rosetta 3.62 needs SSE2 (like Google Chrome).

Roretta 3.62 runs here on WinXP without any issues. According to the application page it's supposed to be a generic i386 application, however exit code -1073741795 indeed indicates, that this is not the case. So they either need to build again a generic i386 app, or update the application info in the system, so the scheduler don't send it to computers, which can't execute it.


Good god, I just hope the devs don't UNOPTIMIZE the app so it can run properly on 15+ year old CPUs.
Rather, if they really want to support old CPUs (which is good for advertising), they should update the app info so BOINC doesn't send them to these CPUs, just like you stated.
____________

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 78770 - Posted 11 Sep 2015 18:18:44 UTC - in response to Message ID 78766.

Also, the present application is 64-bits. Maybe the old CPUs are 32 bits?


App is 32 bit native. 64 is a wrapper

____________

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 78771 - Posted 11 Sep 2015 18:20:11 UTC - in response to Message ID 78763.

Just another example of why it's about time to finally discard support for ancient CPUs.


+1.
But now admins are working on it
____________

sinspin

Joined: Jan 30 06
Posts: 25
ID: 55456
Credit: 5,865,793
RAC: 714
Message 78778 - Posted 12 Sep 2015 22:09:08 UTC - in response to Message ID 78691.

I am showing so far that many 3.62 tasks are encountering 'Validate Errors', though not all of them.

The following tasks are all variations of a project with the title like 'FFD__ xxxxx_abinitioDocking':
Task 755931518 - Validate error
Task 755927819 - Validate error


Same here. 3 times 48 Hours wasted.
Task 757088886
Task 755915384
Task 755915319

I will stop all FFD_* tasks until there is a true solution.
Thanks.

____________

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 78780 - Posted 13 Sep 2015 1:05:33 UTC - in response to Message ID 78778.

I am showing so far that many 3.62 tasks are encountering 'Validate Errors', though not all of them.

The following tasks are all variations of a project with the title like 'FFD__ xxxxx_abinitioDocking':
Task 755931518 - Validate error
Task 755927819 - Validate error


Same here. 3 times 48 Hours wasted.
Task 757088886
Task 755915384
Task 755915319

I will stop all FFD_* tasks until there is a true solution.
Thanks.

These jobs have given credit, but limited to a flat 300 when the claimed amount is much more.

Is this related to the run times being for 2 days? Someone highlkighted an issue with this runtime a few weeks ago, but I can't recall what the circumstances were.

Point is, there seem to be a few consistent issues at play being reported in the last few days. Something is going wrong that needs an urgent look before more big crunchers get alienated.
____________

sinspin

Joined: Jan 30 06
Posts: 25
ID: 55456
Credit: 5,865,793
RAC: 714
Message 78785 - Posted 13 Sep 2015 17:12:23 UTC

I have set the runtime to two days. That have two benefits. First of all, less of network traffic for download of new WUs. And, less of wasted time for finishing one WU and start of another. Means, more time for number crunching.
In addition, it reduces the space on my disc which is needed for the project.
____________

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 78786 - Posted 13 Sep 2015 18:11:46 UTC - in response to Message ID 78785.

I have set the runtime to two days. That have two benefits. First of all, less of network traffic for download of new WUs. And, less of wasted time for finishing one WU and start of another. Means, more time for number crunching.
In addition, it reduces the space on my disc which is needed for the project.

I'm not criticising - fair enough if that's your preference - I run above the default run-time for a similar reason, but there's a trade off when problems occur. It's just that there was a discussion of a bug that applied to 2 day runs that didn't apply to any of the lower settings. It may apply to validation errors that aren't in this faulty batch of FFD tasks, which is a legitimate concern.

I notice the option of 2 days has been removed from current settings. 1 day is now the maximum.
____________

Betting Slip

Joined: Sep 26 05
Posts: 71
ID: 1160
Credit: 5,702,246
RAC: 0
Message 78787 - Posted 13 Sep 2015 21:10:49 UTC - in response to Message ID 78785.
Last modified: 13 Sep 2015 21:13:51 UTC

I have set the runtime to two days. That have two benefits. First of all, less of network traffic for download of new WUs. And, less of wasted time for finishing one WU and start of another. Means, more time for number crunching.
In addition, it reduces the space on my disc which is needed for the project.


There is no 2 day setting, you must mean 1 day as it is the maximum duration.

Edit to add:

Sorry, Sid already posted that, didn't notice at first.

Link
Avatar

Joined: May 4 07
Posts: 260
ID: 173059
Credit: 335,263
RAC: 146
Message 78790 - Posted 14 Sep 2015 15:34:16 UTC - in response to Message ID 78763.

Just another example of why it's about time to finally discard support for ancient CPUs.

There's no need at all to discard support for older CPUs. Different code paths for different instruction sets in same application is nothing impossible or special. And different applications for different CPUs are also possible with BOINC (but not with so ancient BOINC version like Rosetta is using).
____________
.

Chilean Profile
Avatar

Joined: Oct 16 05
Posts: 651
ID: 5008
Credit: 10,201,445
RAC: 4,663
Message 78793 - Posted 15 Sep 2015 1:44:20 UTC - in response to Message ID 78790.

Just another example of why it's about time to finally discard support for ancient CPUs.

There's no need at all to discard support for older CPUs. Different code paths for different instruction sets in same application is nothing impossible or special. And different applications for different CPUs are also possible with BOINC (but not with so ancient BOINC version like Rosetta is using).


I remember that upgrading the server version was a priority. This was back like a year ago tho.
____________

Betting Slip

Joined: Sep 26 05
Posts: 71
ID: 1160
Credit: 5,702,246
RAC: 0
Message 78794 - Posted 15 Sep 2015 9:06:57 UTC - in response to Message ID 78793.
Last modified: 15 Sep 2015 9:09:08 UTC



I remember that upgrading the server version was a priority. This was back like a year ago tho.


Another indication of the "importance" that is placed on this project by the people who own/run it. http://boinc.bakerlab.org/rosetta/forum_thread.php?id=6707&nowrap=true#78764

Even the forum has a very weak pulse, if it were a human we'd be calling in the relatives.

I prefer, when dealing with people from all walks of life to follow this, "Listen to what people do and not what they say"

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 78795 - Posted 15 Sep 2015 10:40:02 UTC

Is this a problem at my end? It's happening on a problematic laptop (hard disk errors)

rb_08_17_58352_103163_ab_stage0_h001___robetta_IGNORE_THE_REST_11_15_303197_48_0

<core_client_version>7.6.9</core_client_version>
<![CDATA[
<message>
couldn't start app: Input file minirosetta_database_f386b3c.zip missing or invalid: RSA key check failed for file
</message>
]]>


YFHP_cst__t000___krypton_SAVE_ALL_OUT_03_09_277585_1936_1
<core_client_version>7.6.9</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>minirosetta_database_f386b3c.zip</file_name>
<error_code>-120 (RSA key check failed for file)</error_code>
<error_message>signature verification failed</error_message>
</file_xfer_error>

</message>
]]>

____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3378
ID: 106194
Credit: 0
RAC: 0
Message 78796 - Posted 15 Sep 2015 16:18:04 UTC

Sid, one possible to see such errors is anti-virus modifying the downloads if it triggers a false positive. Also, if the HDD actually corrupted the file.
____________
Rosetta Moderator: Mod.Sense

Luigi R.

Joined: Feb 7 14
Posts: 19
ID: 493270
Credit: 770,219
RAC: 0
Message 78804 - Posted 16 Sep 2015 15:53:58 UTC

Got many validate errors. :(

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Professor Ray

Joined: Dec 7 05
Posts: 35
ID: 32151
Credit: 40,663
RAC: 43
Message 78805 - Posted 16 Sep 2015 17:35:33 UTC

ALL of my Rosetta WU blow up immediately on my PIII-1400S Win 2003 R2 system.
____________

wyxchari Profile

Joined: Nov 27 14
Posts: 11
ID: 1022309
Credit: 40,803
RAC: 0
Message 78810 - Posted 17 Sep 2015 17:48:50 UTC

Professor Ray:
I use an AMD Athlon XP 1700+ with SSE but without SSE2. From 3.62 Rosetta updating, nothing works for me. Everything worked before me.

Your old PIII not have SSE2 so I recommend leaving Rosetta and switch to WCG cancer, for example, because it is similar and it work.
____________

Professor Ray

Joined: Dec 7 05
Posts: 35
ID: 32151
Credit: 40,663
RAC: 43
Message 78817 - Posted 18 Sep 2015 0:36:34 UTC

I'm very sad to see that my darling Rosetta - of 10 years - has broken up with me - throwing me under the bus - akin to a Twitter / Facebook break-up posting. It is presently the 5th best producing project behind Cosmology, Leiden, SETI & WCG. Over the last decade we've had some good times though; 23379 WU credits worth.

FWIW: Einstein recently did the same thing to me. It STILL is my second best WU cruncher - despite abandoning me in favor of GPU centric cruncher hosts - nearly a 1 1/2 years after their hardware efficiency optimization.

I foresee a day in the near future when I'm going to take all my old electronic equipment - old motherboards, video cards, RAM, power supplies, VCR, TV, et ali - and toss it all into a 55 gal drum supported by cinder-blocks. I'm going to light a charcoal fire underneath, douse the barrel contents with lighter fluid, and cook it for a week. The slag that comes out of that kiln I'll sell to a recycler.

That's what I think of your project 'upgrades' requiring new hardware.

____________

Professor Ray

Joined: Dec 7 05
Posts: 35
ID: 32151
Credit: 40,663
RAC: 43
Message 78818 - Posted 18 Sep 2015 0:43:47 UTC
Last modified: 18 Sep 2015 0:45:23 UTC

For all of you tech snobs out there, keep your condescending, denigrating, snarky, belittling, and superior comments to yourself; contact me via private mail for the particulars on sending me the hardware that meets your minimum standards of acceptance to be cool as a humanitarian aide gesture.

There's a reason that I'm running the hardware platform that I am. Instead of snarking down at me, and people like me, why don't you offer a helping hand and send us all that glorious hardware that meets the requirements to be cool enough to be in the same room as you?
____________

Timo Profile
Avatar

Joined: Jan 9 12
Posts: 171
ID: 440120
Credit: 10,317,960
RAC: 15,380
Message 78819 - Posted 18 Sep 2015 3:47:58 UTC - in response to Message ID 78818.

@Professor Ray

It sounds like your heart is in the right place man, but given that a) your hardware is easily 13 to 17 years old and b) your electricity has a good chance of coming from coal-fired power given that you live in the US, the cost-benefit of crunching for BOINC vs. the generated carcinogens, pollution, heavy metals, and other cancer-causing air particles spewing out of the power plant that powers the incredibly inefficient rig you're running just doesn't make sense to me when even modestly newer hardware would deliver an order of magnitude more science for the same or less wattage.

Let's be creative and think long-term for a second: If money is the issue (I think that's what you alluded to here), then why not shut off this machine for a year, put the savings from not paying for the power to run it for the year into a savings account every month, and in a year from now come back with a newer machine that will easily close the gap in terms of lost credits in a couple of months, and will keep crunching at a much higher productivity going forward?

Dr. Merkwürdigliebe Profile
Avatar

Joined: Dec 5 10
Posts: 74
ID: 404270
Credit: 2,417,348
RAC: 3,829
Message 78820 - Posted 18 Sep 2015 7:23:48 UTC - in response to Message ID 78817.


That's what I think of your project 'upgrades' requiring new hardware.

With all due respect but you have been crunching since Dec 7 05.

Your results so far: Credit: 27,379 / RAC: 15

The moderator is of course invited to throw this posting in the trash.

Jim1348

Joined: Jan 19 06
Posts: 65
ID: 52455
Credit: 2,074,859
RAC: 9,392
Message 78821 - Posted 18 Sep 2015 10:32:15 UTC

I am more interested in the new extensions for my Haswells. Rosetta can keep either the new hardware crunchers or the old ones happy, but probably not both.
____________

sgaboinc

Joined: Apr 2 14
Posts: 169
ID: 498515
Credit: 125,409
RAC: 0
Message 78822 - Posted 18 Sep 2015 11:43:35 UTC

updated to 3.62 ran a set of jobs, seemed quite stable no errors but it is a small set opensuse 13.2 linux 3.16 x86_64 i7 4771

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 78824 - Posted 19 Sep 2015 3:20:48 UTC - in response to Message ID 78796.

Sid, one possible to see such errors is anti-virus modifying the downloads if it triggers a false positive. Also, if the HDD actually corrupted the file.

Thanks. I think it's the latter on this occasion. The laptop can barely cope with anything bar one job at a time right now. Even a mouse-click is causing a 5 minute hang... Attempting to salvage what I can :(
____________

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 78825 - Posted 19 Sep 2015 5:59:46 UTC - in response to Message ID 78820.
Last modified: 19 Sep 2015 6:01:58 UTC

With all due respect but you have been crunching since Dec 7 05.
Your results so far: Credit: 27,379 / RAC: 15
The moderator is of course invited to throw this posting in the trash.


+1
I've also a little hw, but 1M here and 500K in ralph.
And, if possible, i'll buy a new hw in 2016 (i'm waiting for Zen)
____________

Link
Avatar

Joined: May 4 07
Posts: 260
ID: 173059
Credit: 335,263
RAC: 146
Message 78826 - Posted 19 Sep 2015 14:35:11 UTC - in response to Message ID 78817.
Last modified: 19 Sep 2015 14:38:47 UTC

Professor Ray wrote:
That's what I think of your project 'upgrades' requiring new hardware.

New software versions require new hardware in some cases, this is nothing unusual. You can't expect software developers to support old hardware forever. And I say that as someone sitting in front of a soon 10 years old laptop.

However what I do expect, is that the project admins configure the servers properly (or even upgrade them if necessary) and stop sending work to computers, which can't run the science application.



Dr. Merkwürdigliebe wrote:

That's what I think of your project 'upgrades' requiring new hardware.

With all due respect but you have been crunching since Dec 7 05.

Your results so far: Credit: 27,379 / RAC: 15

The moderator is of course invited to throw this posting in the trash.

To be fair: he is also running 4 other projects and has a total RAC of 106.
____________
.

Betting Slip

Joined: Sep 26 05
Posts: 71
ID: 1160
Credit: 5,702,246
RAC: 0
Message 78837 - Posted 20 Sep 2015 22:19:56 UTC - in response to Message ID 78826.


However what I do expect, is that the project admins configure the servers properly (or even upgrade them if necessary) and stop sending work to computers, which can't run the science application.


Looking at past history "Some Chance" they are wating to lose all their supporters because this project is of no importance to them. Until you start banging the drum you will never know how "uninportant" you are http://boinc.bakerlab.org/rosetta/forum_thread.php?id=6707&nowrap=true#78764

Jim1348

Joined: Jan 19 06
Posts: 65
ID: 52455
Credit: 2,074,859
RAC: 9,392
Message 78839 - Posted 21 Sep 2015 14:10:56 UTC

The zeroeth-order question (i.e., most important) is "are they doing good science"? The answer seems to be yes, very much so. How fast they do it is important, but if they are happy with the results they are getting, then it is not for me to tell them how to spend their time and resources. I would not choose a new project just because it can crunch faster, though I would prefer a project that uses AVX2 if I can find it.

You can go still faster with GPUs, but that limits the type of code you can write and the type of science you can perform. Whether it is worth that trade off is up to the Rosetta project. There is room in the DC world for both types of projects, and I do both.
____________

Timo Profile
Avatar

Joined: Jan 9 12
Posts: 171
ID: 440120
Credit: 10,317,960
RAC: 15,380
Message 78840 - Posted 21 Sep 2015 14:20:59 UTC - in response to Message ID 78839.

The zeroeth-order question (i.e., most important) is "are they doing good science"? The answer seems to be yes, very much so. How fast they do it is important, but if they are happy with the results they are getting, then it is not for me to tell them how to spend their time and resources.


+1, EXACTLY!

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 78903 - Posted 13 Oct 2015 3:10:29 UTC

What fixes or features appear in 3.65 please?
____________

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 78904 - Posted 13 Oct 2015 5:36:45 UTC - in response to Message ID 78903.

What fixes or features appear in 3.65 please?


Here

____________

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 78905 - Posted 13 Oct 2015 14:16:52 UTC - in response to Message ID 78904.

What fixes or features appear in 3.65 please?


Here

Thanks. I'll be starting on them some time tomorrow when my buffer of 3.62s complete. Fingers crossed.
____________

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 78906 - Posted 13 Oct 2015 23:28:49 UTC

Hi.

I've had 15 tasks so far error the same, one after the other.

The rig is running Ubuntu 14.04lts x64 with Boinc 7.2.42.

And it has 32bit libs installed for some other projects that need them.

=========================================

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
process exited with code 127 (0x7f, -129)
</message>
<stderr_txt>
../../projects/boinc.bakerlab.org_rosetta/minirosetta_3.65_x86_64-pc-linux-gnu: error while loading shared libraries: libglut.so.3: cannot open shared object file: No such file or directory

</stderr_txt>
]]>

____________


P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 78907 - Posted 14 Oct 2015 3:39:02 UTC
Last modified: 14 Oct 2015 3:40:17 UTC

Hi.

I hope there is a fix for this error on ubuntu rigs, I only have a few 3.62 tasks left on my other rigs( about a days worth ) the rig that got all the errors was the first to get the 3.65 app & I have stopped it getting any new work, no point with all the errors.

I'm not going to let my other rigs update to a new app until a fix is out for linux/ubuntu.
____________


rjs5

Joined: Nov 22 10
Posts: 125
ID: 402637
Credit: 3,223,734
RAC: 2,383
Message 78908 - Posted 14 Oct 2015 5:01:28 UTC - in response to Message ID 78907.

Hi.

I hope there is a fix for this error on ubuntu rigs, I only have a few 3.62 tasks left on my other rigs( about a days worth ) the rig that got all the errors was the first to get the 3.65 app & I have stopped it getting any new work, no point with all the errors.

I'm not going to let my other rigs update to a new app until a fix is out for linux/ubuntu.


I just experienced the same error on 3.65 and had to install libglut.

Running Fedora 22 it was:
"dnf install freeglut" Fedora 22
"yum install freeglut" pre-Fed22

One way you can tell if you are missing the libraries is to go to the Boinc Rosetta project directory and perform an "ldd" on the minirosett *gnu files.

If you are missing the libglut.so library, then you see the "not found" in the ldd print out.




I suspect the

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 78909 - Posted 14 Oct 2015 5:26:00 UTC

Hi.

Well it was supposed to be fixed on Ralph already, guess not!

http://ralph.bakerlab.org/forum_thread.php?id=567&nowrap=true#5898

____________


rjs5

Joined: Nov 22 10
Posts: 125
ID: 402637
Credit: 3,223,734
RAC: 2,383
Message 78910 - Posted 14 Oct 2015 14:09:50 UTC - in response to Message ID 78909.

Hi.

Well it was supposed to be fixed on Ralph already, guess not!

http://ralph.bakerlab.org/forum_thread.php?id=567&nowrap=true#5898


I agree.
I had seen that discussion before and was surprised to see the problem surface on my machine. It was an easy fix for me and I wanted to point out the issue here for those encountering it.



It took 4 hours for Rosetta to start sending jobs again after the hiccup, but it will take some time to fill the pipe on the 16 CPU machine. Resetting and removing/reattaching the project made no difference. Rosetta said ... "See you again in 4 hours" ... 8-)
14 Oct 2015 7:14:54 UTC ..... next new job sent by Rosetta
14 Oct 2015 3:14:16 UTC ..... last job with the library error






Gray Handcock

Joined: Sep 26 05
Posts: 14
ID: 1255
Credit: 1,242,864
RAC: 5
Message 78911 - Posted 14 Oct 2015 21:49:22 UTC - in response to Message ID 78908.

Hi.

I hope there is a fix for this error on ubuntu rigs, I only have a few 3.62 tasks left on my other rigs( about a days worth ) the rig that got all the errors was the first to get the 3.65 app & I have stopped it getting any new work, no point with all the errors.

I'm not going to let my other rigs update to a new app until a fix is out for linux/ubuntu.


I just experienced the same error on 3.65 and had to install libglut.

Running Fedora 22 it was:
"dnf install freeglut" Fedora 22
"yum install freeglut" pre-Fed22

One way you can tell if you are missing the libraries is to go to the Boinc Rosetta project directory and perform an "ldd" on the minirosett *gnu files.

If you are missing the libglut.so library, then you see the "not found" in the ldd print out.




I suspect the


Hello

I did run lld and found the libGLU was missing - I have had a HUGE amount of failed units prior to this. Hopefully this helps ?

Cheers

____________

dcdc Profile

Joined: Nov 3 05
Posts: 1596
ID: 8948
Credit: 33,659,946
RAC: 16,006
Message 78912 - Posted 14 Oct 2015 22:52:09 UTC

I just happened to set up a PC with Dotsch-UX today and thought I'd done something wrong as it kept erroring out with libGLU errors. I reinstalled and the same happened - now I know why! If I'd done it on any other day it would have been fine...

Never mind - hopefully it will work tomorrow!
____________

ChriChri

Joined: Sep 29 15
Posts: 1
ID: 1161161
Credit: 62,206
RAC: 0
Message 78914 - Posted 15 Oct 2015 8:20:03 UTC - in response to Message ID 78906.

Hi,


<stderr_txt>
../../projects/boinc.bakerlab.org_rosetta/minirosetta_3.65_x86_64-pc-linux-gnu: error while loading shared libraries: libglut.so.3: cannot open shared object file: No such file or directory
</stderr_txt>


had the same problem and after installing freeglut3 'sudo apt-get install freeglut3' in Ubuntu 14.04 it worked for me again.

Cheers, Chris

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 78915 - Posted 15 Oct 2015 17:44:31 UTC

There's obviously an issue with the linux app. I'm looking into it now.

Gray Handcock

Joined: Sep 26 05
Posts: 14
ID: 1255
Credit: 1,242,864
RAC: 5
Message 78917 - Posted 15 Oct 2015 21:55:41 UTC - in response to Message ID 78914.
Last modified: 15 Oct 2015 21:56:25 UTC

Hi,


<stderr_txt>
../../projects/boinc.bakerlab.org_rosetta/minirosetta_3.65_x86_64-pc-linux-gnu: error while loading shared libraries: libglut.so.3: cannot open shared object file: No such file or directory
</stderr_txt>


had the same problem and after installing freeglut3 'sudo apt-get install freeglut3' in Ubuntu 14.04 it worked for me again.

Cheers, Chris


Hello Guys

I am on Debian 8 and used the following files to fix the problem (I have now had several work units complete successfully)

apt-get install freeglut3 libglu1-mesalibglu1-mesa

cheers
____________

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 78918 - Posted 16 Oct 2015 2:28:21 UTC

I had to do "sudo apt-get install freeglut3" on three of my Ubuntu rigs to get them working here again.

Not good.

____________


Chilean Profile
Avatar

Joined: Oct 16 05
Posts: 651
ID: 5008
Credit: 10,201,445
RAC: 4,663
Message 78925 - Posted 16 Oct 2015 13:38:52 UTC
Last modified: 16 Oct 2015 13:53:00 UTC

Installed libglut on my machine, let's see if it works.

At least now we have a true 64-bit Linux binary. It is so exclusive, you have to install a non-default library lol.

EDIT: How does one make it so that this missing library is sent by R@H so as to avoid having EVERYONE running Linux do the apt-get command to get things running smoothly?
____________

Dr. Merkwürdigliebe Profile
Avatar

Joined: Dec 5 10
Posts: 74
ID: 404270
Credit: 2,417,348
RAC: 3,829
Message 78928 - Posted 16 Oct 2015 14:42:07 UTC - in response to Message ID 78925.

EDIT: How does one make it so that this missing library is sent by R@H so as to avoid having EVERYONE running Linux do the apt-get command to get things running smoothly?


How about removing that nonsensical eye candy altogether? Who needs it?

What if I want to set up a a cruncher with a minimum footprint? Maybe start Linux, BOINC and Rosetta from a USB thumb drive or possibly boot via PXE so my computer doesn't have to have an HDD?

These GUI libraries pull in a lot of dependencies. What for?

rjs5

Joined: Nov 22 10
Posts: 125
ID: 402637
Credit: 3,223,734
RAC: 2,383
Message 78929 - Posted 16 Oct 2015 15:02:02 UTC - in response to Message ID 78928.

EDIT: How does one make it so that this missing library is sent by R@H so as to avoid having EVERYONE running Linux do the apt-get command to get things running smoothly?


How about removing that nonsensical eye candy altogether? Who needs it?

What if I want to set up a a cruncher with a minimum footprint? Maybe start Linux, BOINC and Rosetta from a USB thumb drive or possibly boot via PXE so my computer doesn't have to have an HDD?

These GUI libraries pull in a lot of dependencies. What for?



The libraries are there exactly for the screen saver mode eye-candy. 8-) Many people like the screensavers.

Rosetta can WEAK link the libraries and avoid the load errors and successfully execute in cruncher mode.

The pointer to the weakly linked symbol gets loaded with a NULL pointer. The code checks for a NULL pointer and print the error. The number-cruncher versions never call that code so it is only ever executed for eye-candy mode.

They could even use this NULL pointer detect missing libraries and to display a message in your log that says "can you please load libglut for your screen saver?".


Dr. Merkwürdigliebe Profile
Avatar

Joined: Dec 5 10
Posts: 74
ID: 404270
Credit: 2,417,348
RAC: 3,829
Message 78932 - Posted 16 Oct 2015 17:00:46 UTC - in response to Message ID 78929.


The libraries are there exactly for the screen saver mode eye-candy. 8-) Many people like the screensavers.

That thing can't even spell my name. Reason enough to drop it. :-p



Seriously, who needs that tumbling, spinning, colorful ****?

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 78936 - Posted 16 Oct 2015 20:38:38 UTC

I just updated the linux app to version 3.66. Let me know if there are any other issues.

It's My Island [SFmWnT6y1ghzTn1hFpD69exBiz5bFtRiam] Profile

Joined: Sep 22 12
Posts: 3
ID: 458853
Credit: 32,199,546
RAC: 35,431
Message 78939 - Posted 17 Oct 2015 18:50:01 UTC - in response to Message ID 78917.

Hi,


<stderr_txt>
../../projects/boinc.bakerlab.org_rosetta/minirosetta_3.65_x86_64-pc-linux-gnu: error while loading shared libraries: libglut.so.3: cannot open shared object file: No such file or directory
</stderr_txt>


had the same problem and after installing freeglut3 'sudo apt-get install freeglut3' in Ubuntu 14.04 it worked for me again.

Cheers, Chris


Hello Guys

I am on Debian 8 and used the following files to fix the problem (I have now had several work units complete successfully)

apt-get install freeglut3 libglu1-mesalibglu1-mesa

cheers



Yeah. Sorry for the 500 or so bad WUs rosetta admins. I was running boinc headless and hit this same issue with 3 days of units queued on two fairly powerful boxes.

On my Ubuntu box: apt-get install freeglut3
On my Arch box: pacman -Sy freeglut

That gives me the freeglut library. Can't check to see if it fixes things or not since my machine's account is limited now:

17-Oct-2015 14:26:37 [rosetta@home] Sending scheduler request: To fetch work.
17-Oct-2015 14:26:37 [rosetta@home] Requesting new tasks for CPU
17-Oct-2015 14:26:40 [rosetta@home] Scheduler request completed: got 0 new tasks
17-Oct-2015 14:26:40 [rosetta@home] No work sent
17-Oct-2015 14:26:40 [rosetta@home] (reached daily quota of 8 results)

Guess we are waiting until tomorrow.....

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 78941 - Posted 18 Oct 2015 0:52:13 UTC

Hi.

The first w.u. with 3.66 app has run & returned no problems from my xeon, my other rigs have a backlog of 3.65 w.u. as I run a two days cashe they will be on it by tonight my time.

____________


It's My Island [SFmWnT6y1ghzTn1hFpD69exBiz5bFtRiam] Profile

Joined: Sep 22 12
Posts: 3
ID: 458853
Credit: 32,199,546
RAC: 35,431
Message 78944 - Posted 18 Oct 2015 11:04:46 UTC - in response to Message ID 78939.




Yeah. Sorry for the 500 or so bad WUs rosetta admins. I was running boinc headless and hit this same issue with 3 days of units queued on two fairly powerful boxes.

On my Ubuntu box: apt-get install freeglut3
On my Arch box: pacman -Sy freeglut

That gives me the freeglut library. Can't check to see if it fixes things or not since my machine's account is limited now:

17-Oct-2015 14:26:37 [rosetta@home] Sending scheduler request: To fetch work.
17-Oct-2015 14:26:37 [rosetta@home] Requesting new tasks for CPU
17-Oct-2015 14:26:40 [rosetta@home] Scheduler request completed: got 0 new tasks
17-Oct-2015 14:26:40 [rosetta@home] No work sent
17-Oct-2015 14:26:40 [rosetta@home] (reached daily quota of 8 results)

Guess we are waiting until tomorrow.....



Was also missing libGLU.so.1

<![CDATA[
<message>
process exited with code 127 (0x7f, -129)
</message>
<stderr_txt>
../../projects/boinc.bakerlab.org_rosetta/minirosetta_3.66_x86_64-pc-linux-gnu: error while loading shared libraries: libGLU.so.1: cannot open shared object file: No such file or directory

</stderr_txt>
]]>

To install the correct libraries that are not pulled in by package management, this is what I'm trying:

On my Ubuntu box: apt-get install freeglu3 libglu1-mesa
On my Arch box: pacman -Sy freeglut glu

Waiting yet another day as I dwell in the pit of quota confinement.

rjs5

Joined: Nov 22 10
Posts: 125
ID: 402637
Credit: 3,223,734
RAC: 2,383
Message 78946 - Posted 18 Oct 2015 13:07:12 UTC - in response to Message ID 78944.




Yeah. Sorry for the 500 or so bad WUs rosetta admins. I was running boinc headless and hit this same issue with 3 days of units queued on two fairly powerful boxes.

On my Ubuntu box: apt-get install freeglut3
On my Arch box: pacman -Sy freeglut

That gives me the freeglut library. Can't check to see if it fixes things or not since my machine's account is limited now:

17-Oct-2015 14:26:37 [rosetta@home] Sending scheduler request: To fetch work.
17-Oct-2015 14:26:37 [rosetta@home] Requesting new tasks for CPU
17-Oct-2015 14:26:40 [rosetta@home] Scheduler request completed: got 0 new tasks
17-Oct-2015 14:26:40 [rosetta@home] No work sent
17-Oct-2015 14:26:40 [rosetta@home] (reached daily quota of 8 results)

Guess we are waiting until tomorrow.....



Was also missing libGLU.so.1

<![CDATA[
<message>
process exited with code 127 (0x7f, -129)
</message>
<stderr_txt>
../../projects/boinc.bakerlab.org_rosetta/minirosetta_3.66_x86_64-pc-linux-gnu: error while loading shared libraries: libGLU.so.1: cannot open shared object file: No such file or directory

</stderr_txt>
]]>

To install the correct libraries that are not pulled in by package management, this is what I'm trying:

On my Ubuntu box: apt-get install freeglu3 libglu1-mesa
On my Arch box: pacman -Sy freeglut glu

Waiting yet another day as I dwell in the pit of quota confinement.



Good detective work and concise instructions! Nice.

I think the needed libraries vary by (possibly) distribution and version of distribution AND possibly by what graphics board drivers you have installed.

There are several tools to check that all the needed libraries on a dynamically linked executable binary exist and there should not be a "missing library" error. I usually use (out of habit) "ldd".

Rosetta@home will have to clear up the dynamic linking problem, because Baker Labs will definitely not want statically linked Rosetta binaries distributed once they understand the LGPL license will require them to freely distribute COMPLETE Rosetta source code or linkable object files. Statically linking LGPL libraries creates a LGPL "COMBINED WORK" and probably more cost legally and distribution wise than just dynamic linking.


LGPL v3.0

LGPL license v3.0 section 4.d.0 is the relevant section describing combined works and requirements.

"For the purpose of complying with the LGPL (any extant
version: v2, v2.1 or v3):

(1) If you statically link against an LGPL'd library, you must also provide your application in an object (not necessarily source) format, so that a user has the opportunity to modify the library and relink the application.
"



Gray Handcock

Joined: Sep 26 05
Posts: 14
ID: 1255
Credit: 1,242,864
RAC: 5
Message 78955 - Posted 20 Oct 2015 21:06:50 UTC - in response to Message ID 78917.

Hi,


<stderr_txt>
../../projects/boinc.bakerlab.org_rosetta/minirosetta_3.65_x86_64-pc-linux-gnu: error while loading shared libraries: libglut.so.3: cannot open shared object file: No such file or directory
</stderr_txt>


had the same problem and after installing freeglut3 'sudo apt-get install freeglut3' in Ubuntu 14.04 it worked for me again.

Cheers, Chris


Hello Guys

I am on Debian 8 and used the following files to fix the problem (I have now had several work units complete successfully)

apt-get install freeglut3 libglu1-mesalibglu1-mesa

cheers


Hi Guys

Just a follow-up - units processing normally now from the time of the adding of
freeglut3 and libglu1-mesalibglu1-mesa - barring one isolated wobble amongst 60-odd successful ones.

Thanks

____________

aguiar@carrier.com.br

Joined: Feb 19 06
Posts: 6
ID: 59950
Credit: 357,496
RAC: 2
Message 79007 - Posted 30 Oct 2015 10:32:52 UTC

Hi, all!

I have two 3.65 tasks stuck at 100%. Elapsed times are 32:16:03 and 14:30:21.

Please, should I let them crunch or abort?

Many thanks,
Valter Aguiar.
____________

Link
Avatar

Joined: May 4 07
Posts: 260
ID: 173059
Credit: 335,263
RAC: 146
Message 79008 - Posted 30 Oct 2015 11:24:15 UTC - in response to Message ID 79007.

Hi, all!

I have two 3.65 tasks stuck at 100%. Elapsed times are 32:16:03 and 14:30:21.

Please, should I let them crunch or abort?

You should restart BOINC.
____________
.

dcdc Profile

Joined: Nov 3 05
Posts: 1596
ID: 8948
Credit: 33,659,946
RAC: 16,006
Message 79009 - Posted 30 Oct 2015 13:45:47 UTC - in response to Message ID 79008.

Hi, all!

I have two 3.65 tasks stuck at 100%. Elapsed times are 32:16:03 and 14:30:21.

Please, should I let them crunch or abort?

You should restart BOINC.

And make sure "Stop running tasks when exiting BOINC manager" is selected in the popup when you exit BOINC.
____________

aguiar@carrier.com.br

Joined: Feb 19 06
Posts: 6
ID: 59950
Credit: 357,496
RAC: 2
Message 79011 - Posted 31 Oct 2015 9:15:06 UTC - in response to Message ID 79009.

Hi, all!

I have two 3.65 tasks stuck at 100%. Elapsed times are 32:16:03 and 14:30:21.

Please, should I let them crunch or abort?

You should restart BOINC.

And make sure "Stop running tasks when exiting BOINC manager" is selected in the popup when you exit BOINC.



Restarted, and both WUs came to the end.

Many thanks,
Valter.
____________

Dr. Merkwürdigliebe Profile
Avatar

Joined: Dec 5 10
Posts: 74
ID: 404270
Credit: 2,417,348
RAC: 3,829
Message 79028 - Posted 10 Nov 2015 10:44:34 UTC

Outstanding!

Yet another statically linked binary. One step forward, two steps back.

We wouldn't want the users to forgo their incredibly important screen savers, right?

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 79032 - Posted 10 Nov 2015 18:56:48 UTC

771051688

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00C76BA6 read attempt to address 0x12364000
____________

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79033 - Posted 10 Nov 2015 19:15:55 UTC - in response to Message ID 79028.

Outstanding!

Yet another statically linked binary. One step forward, two steps back.

We wouldn't want the users to forgo their incredibly important screen savers, right?


Can't please everyone.....

Dr. Merkwürdigliebe Profile
Avatar

Joined: Dec 5 10
Posts: 74
ID: 404270
Credit: 2,417,348
RAC: 3,829
Message 79035 - Posted 10 Nov 2015 20:47:58 UTC - in response to Message ID 79033.

Can't please everyone.....


...just ditch the project? Hmm, sure. At least I can.

Your problem no. 1: You don't seem to have a ginormous supercomputer.
Your problem no. 2: So you have to rely on volunteers through BOINC.

Therefore, sometimes, you'll have to deal with our "input" or get yourself said ginormous super computer with no annoying users.

The lists in the "Statistics" part of this website are littered with corpses of inactive users and ancient computers - last logon in 2008.

You want to make sure that Rosetta will run on any host, no matter how old? OK, but you are wasting a lot of potential and in your drive to cater to all users, you alienate those who are willing to invest in some high-end equipment to help get the work done more quickly.

When it comes to the highly important screen saver part: rjs5 has offered help. Maybe ask him?

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79036 - Posted 10 Nov 2015 22:01:21 UTC - in response to Message ID 79035.

Not saying you should ditch projects and we are not trying to alienate anyone. We'll continue to focus on our research objectives.

rjs5

Joined: Nov 22 10
Posts: 125
ID: 402637
Credit: 3,223,734
RAC: 2,383
Message 79037 - Posted 11 Nov 2015 3:34:47 UTC - in response to Message ID 79033.

Outstanding!

Yet another statically linked binary. One step forward, two steps back.

We wouldn't want the users to forgo their incredibly important screen savers, right?


Can't please everyone.....


Just a "heads up" ....

3.67 Rosetta jobs just hit my Xeon 1540D machine running Fedora22 and ..... and .... after 40 minutes compute time ..... VALIDATE ERRORS. I RESET the project and will see if that clears it up.


Task ID 771281114
Name from_phil_model20_relax_SAVE_ALL_OUT_311211_4_0
Workunit 699094937
Created 11 Nov 2015 2:24:58 UTC
Sent 11 Nov 2015 2:35:38 UTC
Received 11 Nov 2015 3:21:12 UTC
Server state Over
Outcome Validate error
Client state Done
Exit status 0 (0x0)
Computer ID 2366448
Report deadline 25 Nov 2015 2:35:38 UTC
CPU time 2343.913
stderr out <core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
[2015-11-10 18:40:33:] :: BOINC:: Initializing ... ok.
[2015-11-10 18:40:33:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
command: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_3.67_x86_64-pc-linux-gnu -out:file:silent default.out -in:file:s 00001.pdb -frag3 00001.200.3mers -in:file:native 00001.pdb -frag9 00001.200.9mers -silent_gz 1 -ex2aro 1 -relax::default_repeats 15 -in:file:fullatom 1 -run:protocol relax -ex1 1 -in:file:boinc_wu_zip from_phil_model20_data.zip -out:file:silent default.out -silent_gz -mute all -in:file:native 00001.pdb -in:file:fullatom -in:file:s 00001.pdb -nstruct 10000 -cpu_run_time 21600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 2097500
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_b7c7d78.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/from_phil_model20_data.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
======================================================
DONE :: 99 starting structures 2343.59 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>


Validate state Invalid
Claimed credit 6.24573687673079
Granted credit 0
application version 3.67

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 79038 - Posted 11 Nov 2015 7:48:50 UTC - in response to Message ID 79036.
Last modified: 11 Nov 2015 7:51:06 UTC

We'll continue to focus on our research objectives.


And THIS is very important, but....
The lists in the "Statistics" part of this website are littered with corpses of inactive users and ancient computers - last logon in 2008.

It's a simple command in your db to clear these zombies so, why not?
I think it's VERY important the "loyalty" of your crunchers (i'm here since 2005) and some actions, like update the software server, clear the old accounts, create an optimized app, etc help the community.
I hope, in 2016, you will consider these actions
____________

rjs5

Joined: Nov 22 10
Posts: 125
ID: 402637
Credit: 3,223,734
RAC: 2,383
Message 79039 - Posted 11 Nov 2015 13:11:06 UTC - in response to Message ID 79037.

The machine is completing jobs after resetting.



David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79043 - Posted 12 Nov 2015 18:19:26 UTC - in response to Message ID 79038.

We'll continue to focus on our research objectives.


And THIS is very important, but....
The lists in the "Statistics" part of this website are littered with corpses of inactive users and ancient computers - last logon in 2008.

It's a simple command in your db to clear these zombies so, why not?
I think it's VERY important the "loyalty" of your crunchers (i'm here since 2005) and some actions, like update the software server, clear the old accounts, create an optimized app, etc help the community.
I hope, in 2016, you will consider these actions


I can definitely do a simple house cleaning, the others are a bit more involved.

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79044 - Posted 12 Nov 2015 18:25:21 UTC - in response to Message ID 79039.

The machine is completing jobs after resetting.






Looks like that particular job was creating too many models than allowed. If this happens again, you don't have to do anything, but letting us know does help so I can tell the researcher who is running the job.

martyn_2010

Joined: Oct 5 10
Posts: 1
ID: 396816
Credit: 100,468
RAC: 0
Message 79118 - Posted 26 Nov 2015 18:30:25 UTC

An annoying aspect of this project is the 'Validate Error'.

From the log it appears that the workunit has processed successfully, yet when the results are uploaded and checked there appears to be a Validate Error.

This happened to me 3 times for large workunits in September 2015. Have avoided this project until now.

Today I have had 2 more Validate Errors -
rb_11_24_60863_105242__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_312475_210
rb_11_24_60883_105260__t000__4_C1_SAVE_ALL_OUT_IGNORE_THE_REST_312527_72

Please explain why this error occurs and that I should lose credit for what appears to be work finishing successfully.

dcdc Profile

Joined: Nov 3 05
Posts: 1596
ID: 8948
Credit: 33,659,946
RAC: 16,006
Message 79123 - Posted 28 Nov 2015 13:10:52 UTC - in response to Message ID 79118.

I don't know why it happens, but sometimes things go wrong when you're running a science project. The tasks are credited though - it doesn't show up in the "Tasks for computer" page but does under the task:

http://boinc.bakerlab.org/rosetta/result.php?resultid=774133344

An annoying aspect of this project is the 'Validate Error'.

From the log it appears that the workunit has processed successfully, yet when the results are uploaded and checked there appears to be a Validate Error.

This happened to me 3 times for large workunits in September 2015. Have avoided this project until now.

Today I have had 2 more Validate Errors -
rb_11_24_60863_105242__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_312475_210
rb_11_24_60883_105260__t000__4_C1_SAVE_ALL_OUT_IGNORE_THE_REST_312527_72

Please explain why this error occurs and that I should lose credit for what appears to be work finishing successfully.


____________

Betting Slip

Joined: Sep 26 05
Posts: 71
ID: 1160
Credit: 5,702,246
RAC: 0
Message 79124 - Posted 28 Nov 2015 13:50:09 UTC - in response to Message ID 79123.
Last modified: 28 Nov 2015 14:12:24 UTC

I don't know why it happens, but sometimes things go wrong when you're running a science project. The tasks are credited though - it doesn't show up in the "Tasks for computer" page but does under the task:


Only up to a MAX of 300

If you run tasks for longer periods like myself you will lose.

This was supposed to have been fixed a long, long time ago but as everything else with this project, nothing, nothing to be heard, nothing to be seen and certanly nothing to be done.

EDIT TO ADD

There have been more reported sightings of "Elvis" in London alone over the last 5 years than scientists on this project.

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79150 - Posted 3 Dec 2015 3:29:33 UTC - in response to Message ID 79118.

An annoying aspect of this project is the 'Validate Error'.

From the log it appears that the workunit has processed successfully, yet when the results are uploaded and checked there appears to be a Validate Error.

This happened to me 3 times for large workunits in September 2015. Have avoided this project until now.

Today I have had 2 more Validate Errors -
rb_11_24_60863_105242__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_312475_210
rb_11_24_60883_105260__t000__4_C1_SAVE_ALL_OUT_IGNORE_THE_REST_312527_72

Please explain why this error occurs and that I should lose credit for what appears to be work finishing successfully.


I'm not sure what happened with these two results. The logs show:

[CRITICAL] check_set: init_result([RESULT#773910587 rb_11_24_60863_105242__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_312475_210_0]) failed: -1

but the result files have already been cleaned. The files were possibly corrupt but if this continues to happen, let us know. Credit was granted.

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79151 - Posted 3 Dec 2015 3:39:54 UTC - in response to Message ID 79124.

I don't know why it happens, but sometimes things go wrong when you're running a science project. The tasks are credited though - it doesn't show up in the "Tasks for computer" page but does under the task:


Only up to a MAX of 300

If you run tasks for longer periods like myself you will lose.

This was supposed to have been fixed a long, long time ago but as everything else with this project, nothing, nothing to be heard, nothing to be seen and certanly nothing to be done.

EDIT TO ADD

There have been more reported sightings of "Elvis" in London alone over the last 5 years than scientists on this project.



What was supposed to be fixed? The validator or the credit limit. I can increase the credit limit to something more reasonable for those that run long jobs like yourself. I'm not sure what exactly causes the validation errors, it can be computer specific, but I'll take a closer look.

Betting Slip

Joined: Sep 26 05
Posts: 71
ID: 1160
Credit: 5,702,246
RAC: 0
Message 79154 - Posted 4 Dec 2015 1:45:28 UTC - in response to Message ID 79151.
Last modified: 4 Dec 2015 1:45:57 UTC



What was supposed to be fixed? The validator or the credit limit. I can increase the credit limit to something more reasonable for those that run long jobs like yourself. I'm not sure what exactly causes the validation errors, it can be computer specific, but I'll take a closer look.


It's hard to find a post that would prove it conclusively but here is one from October http://boinc.bakerlab.org/rosetta/forum_thread.php?id=6716&nowrap=true#78892

Link
Avatar

Joined: May 4 07
Posts: 260
ID: 173059
Credit: 335,263
RAC: 146
Message 79155 - Posted 5 Dec 2015 10:24:43 UTC
Last modified: 5 Dec 2015 10:26:45 UTC

Validate errors on rb_11_2* tasks:

Task 774759576

<core_client_version>6.10.18</core_client_version>
<![CDATA[
<stderr_txt>
[2015-11-30 17:36:44:] :: BOINC:: Initializing ... ok.
[2015-11-30 17:36:44:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
command: projects/boinc.bakerlab.org_rosetta/minirosetta_3.65_windows_intelx86.exe @rb_11_28_60979_105347_ab_stage0_h003___robetta_FLAGS -psipred_ss2 h003_.psipred_ss2 -in::file::fasta h003_.fasta -kill_hairpins h003_.nobuformat.psipred_ss2 -in:file:boinc_wu_zip rb_11_28_60979_105347_ab_stage0_h003___robetta.zip -frag3 rb_11_28_60979_105347_ab_stage0_h003___robetta_h003_.200.3mers.index.gz -fragA rb_11_28_60979_105347_ab_stage0_h003___robetta_h003_.200.11mers.index.gz -fragB rb_11_28_60979_105347_ab_stage0_h003___robetta_h003_.200.12mers.index.gz -nstruct 10000 -cpu_run_time 10800 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 2611634
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_b7c7d78.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/rb_11_28_60979_105347_ab_stage0_h003___robetta.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: _00001
# cpu_run_time_pref: 86400
Starting work on structure: _00002
Starting work on structure: _00003
Starting work on structure: _00004
Starting work on structure: _00005
Starting work on structure: _00006
Starting work on structure: _00007
Starting work on structure: _00008
Starting work on structure: _00009
Starting work on structure: _00010
Starting work on structure: _00011
Starting work on structure: _00012
Starting work on structure: _00013
Starting work on structure: _00014
Starting work on structure: _00015
Starting work on structure: _00016
Starting work on structure: _00017
Starting work on structure: _00018
Starting work on structure: _00019
Starting work on structure: _00020
Starting work on structure: _00021
Starting work on structure: _00022
Starting work on structure: _00023
Starting work on structure: _00024
Starting work on structure: _00025
Starting work on structure: _00026
Starting work on structure: _00027
Starting work on structure: _00028
Starting work on structure: _00029
Starting work on structure: _00030
Starting work on structure: _00031
Starting work on structure: _00032
Starting work on structure: _00033
Starting work on structure: _00034
Starting work on structure: _00035
Starting work on structure: _00036
Starting work on structure: _00037
Starting work on structure: _00038
Starting work on structure: _00039
Starting work on structure: _00040
Starting work on structure: _00041
Starting work on structure: _00042
Starting work on structure: _00043
Starting work on structure: _00044
Starting work on structure: _00045
Starting work on structure: _00046
Starting work on structure: _00047
Starting work on structure: _00048
Starting work on structure: _00049
Starting work on structure: _00050
Starting work on structure: _00051
Starting work on structure: _00052
Starting work on structure: _00053
Starting work on structure: _00054
Starting work on structure: _00055
Starting work on structure: _00056
Starting work on structure: _00057
Starting work on structure: _00058
Starting work on structure: _00059
Starting work on structure: _00060
Starting work on structure: _00061
Starting work on structure: _00062
Starting work on structure: _00063
Starting work on structure: _00064
Starting work on structure: _00065
Starting work on structure: _00066
Starting work on structure: _00067
Starting work on structure: _00068
Starting work on structure: _00069
Starting work on structure: _00070
Starting work on structure: _00071
Starting work on structure: _00072
Starting work on structure: _00073
Starting work on structure: _00074
Starting work on structure: _00075
Starting work on structure: _00076
Starting work on structure: _00077
Starting work on structure: _00078
Starting work on structure: _00079
Starting work on structure: _00080
Starting work on structure: _00081
Starting work on structure: _00082
Starting work on structure: _00083
Starting work on structure: _00084
Starting work on structure: _00085
Starting work on structure: _00086
Starting work on structure: _00087
Starting work on structure: _00088
Starting work on structure: _00089
Starting work on structure: _00090
Starting work on structure: _00091
Starting work on structure: _00092
Starting work on structure: _00093
Starting work on structure: _00094
Starting work on structure: _00095
Starting work on structure: _00096
Starting work on structure: _00097
Starting work on structure: _00098
Starting work on structure: _00099
======================================================
DONE :: 1 starting structures 58190 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================
BOINC :: WS_max 2.28844e+008

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>




Task 775179895

<core_client_version>6.10.18</core_client_version>
<![CDATA[
<stderr_txt>
[2015-12- 2 11:44:41:] :: BOINC:: Initializing ... ok.
[2015-12- 2 11:44:41:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
command: projects/boinc.bakerlab.org_rosetta/minirosetta_3.65_windows_intelx86.exe @rb_11_29_60395_105396_ab_stage0_t000___robetta_FLAGS -psipred_ss2 t000_.psipred_ss2 -in::file::fasta t000_.fasta -kill_hairpins t000_.nobuformat.psipred_ss2 -in:file:boinc_wu_zip rb_11_29_60395_105396_ab_stage0_t000___robetta.zip -frag3 rb_11_29_60395_105396_ab_stage0_t000___robetta_t000_.200.3mers.index.gz -fragA rb_11_29_60395_105396_ab_stage0_t000___robetta_t000_.200.17mers.index.gz -fragB rb_11_29_60395_105396_ab_stage0_t000___robetta_t000_.200.6mers.index.gz -nstruct 10000 -cpu_run_time 10800 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 2057108
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_b7c7d78.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/rb_11_29_60395_105396_ab_stage0_t000___robetta.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: _00001
# cpu_run_time_pref: 86400
Starting work on structure: _00002
Starting work on structure: _00003
Starting work on structure: _00004
Starting work on structure: _00005
Starting work on structure: _00006
Starting work on structure: _00007
Starting work on structure: _00008
Starting work on structure: _00009
Starting work on structure: _00010
Starting work on structure: _00011
Starting work on structure: _00012
Starting work on structure: _00013
Starting work on structure: _00014
Starting work on structure: _00015
Starting work on structure: _00016
Starting work on structure: _00017
Starting work on structure: _00018
Starting work on structure: _00019
Starting work on structure: _00020
Starting work on structure: _00021
Starting work on structure: _00022
Starting work on structure: _00023
Starting work on structure: _00024
Starting work on structure: _00025
Starting work on structure: _00026
Starting work on structure: _00027
Starting work on structure: _00028
Starting work on structure: _00029
Starting work on structure: _00030
Starting work on structure: _00031
Starting work on structure: _00032
Starting work on structure: _00033
Starting work on structure: _00034
Starting work on structure: _00035
Starting work on structure: _00036
Starting work on structure: _00037
Starting work on structure: _00038
Starting work on structure: _00039
Starting work on structure: _00040
Starting work on structure: _00041
Starting work on structure: _00042
Starting work on structure: _00043
Starting work on structure: _00044
======================================================
DONE :: 1 starting structures 86100.8 cpu seconds
This process generated 44 decoys from 44 attempts
======================================================
BOINC :: WS_max 2.76714e+008

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>




Both workunits were canceled, how about sending a kill command to the clients in order to avoid wasting ressources?
____________
.

David Ball

Joined: Nov 25 05
Posts: 25
ID: 19653
Credit: 1,270,528
RAC: 0
Message 79162 - Posted 7 Dec 2015 21:39:56 UTC - in response to Message ID 79155.

Validate errors on rb_11_2* tasks


Both workunits were canceled, how about sending a kill command to the clients in order to avoid wasting ressources?


I'm also getting validate errors on some workunits that say they completed OK on the client but get a validate error on the server with the workunit details saying they were cancelled. I've started aborting any WUs that start with "rb_" since yours were in the "rb_11_2*" range and mine were in the "rb_12_06_61173_105565_ab_stage0_t000*" range. I also noticed that some in that range were completing and validating.

workunits:

failed: rb_12_06_61173_105565_ab_stage0_t000___robetta_IGNORE_THE_REST_04_10_313585_76_0
failed: rb_12_06_61173_105562_ab_stage0_t000___robetta_IGNORE_THE_REST_07_12_313580_207_0

passed: rb_12_06_61173_105565_ab_stage0_t000___robetta_IGNORE_THE_REST_05_10_313585_82_0
passed: rb_12_06_61173_105562_ab_stage0_t000___robetta_IGNORE_THE_REST_07_12_313580_47_0

if there's a way to cancel WUs without them running for many hours on the client then I really wish rosetta would use it.

Thanks,

David

____________
Have you read a good Science Fiction book lately?

Timo Profile
Avatar

Joined: Jan 9 12
Posts: 171
ID: 440120
Credit: 10,317,960
RAC: 15,380
Message 79164 - Posted 8 Dec 2015 14:02:20 UTC - in response to Message ID 79162.

I've started aborting any WUs that start with "rb_" since yours were in the "rb_11_2*" range and mine were in the "rb_12_06_61173_105565_ab_stage0_t000*" range. I also noticed that some in that range were completing and validating.


This is a bit of a bazooka-to-kill-a-housefly type of a solution. I'd encourage anyone not to abort WUs based on something as broad as starting with 'rb_' as 'rb_' work units are part of the Robetta prediction server and serve an incredibly wide range of research projects.

Secondly, I'll note that a) the two 'failed' WUs you listed above David actually DID grant you full credit (Click on the WU link you posted, and scroll to the bottom, you'll see credit was granted even though it doesn't show in the summary of your WUs it does count towards your total).

Lastly, looking at some of the WUs you've aborted, most of them (like this one, and this one, and this one, for example) were successfully completed by other users after being aborted on your end :).

PanicMan

Joined: Jan 31 10
Posts: 7
ID: 368585
Credit: 276,651
RAC: 0
Message 79165 - Posted 8 Dec 2015 20:08:27 UTC

i was just looking around a bit and found this issue also... all have been rb-12 or 11 workunits..a total of 6 of them in last few days..only 1 before that as far as my history shows on site and that was an rb-11 task...i did get credit for 5/7 of them but boy it isnt nice to se 14k computation seconds thrown out the window twice in the last week...seems obvious from what i have read in this thread the issue is with the rb tasks...admittedly i have no idea as to what all these numbers mean but i assume someone does and could do something to check these units before sending? not sure how many others this has happened to as the site only goes back to 11/24 for me.

David Ball

Joined: Nov 25 05
Posts: 25
ID: 19653
Credit: 1,270,528
RAC: 0
Message 79166 - Posted 8 Dec 2015 20:14:52 UTC - in response to Message ID 79164.

I've started aborting any WUs that start with "rb_" since yours were in the "rb_11_2*" range and mine were in the "rb_12_06_61173_105565_ab_stage0_t000*" range. I also noticed that some in that range were completing and validating.


This is a bit of a bazooka-to-kill-a-housefly type of a solution. I'd encourage anyone not to abort WUs based on something as broad as starting with 'rb_' as 'rb_' work units are part of the Robetta prediction server and serve an incredibly wide range of research projects.

Secondly, I'll note that a) the two 'failed' WUs you listed above David actually DID grant you full credit (Click on the WU link you posted, and scroll to the bottom, you'll see credit was granted even though it doesn't show in the summary of your WUs it does count towards your total).

Lastly, looking at some of the WUs you've aborted, most of them (like this one, and this one, and this one, for example) were successfully completed by other users after being aborted on your end :).


Thanks for the info. Basically I was waiting for more information and will start letting them run. BTW, I could be remembering wrong but when I checked the failed WUs shortly after they failed, ISTR that the granted credit on the workunit details was something like "-----".

Anyway, I'll let them process from now on.

-- David
____________
Have you read a good Science Fiction book lately?

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79167 - Posted 8 Dec 2015 21:06:59 UTC

Unfortunately, there is no easy way to efficiently abort the tasks mid process and keep the results/work. I'll likely change the delay bound for these Robetta jobs so they do not get sent to computers that can't finish them in time. This change may prevent some computers from getting work if there are only Robetta jobs in the queue which would be rare. Or maybe there is a better fix?

wiueiwue

Joined: Sep 7 15
Posts: 1
ID: 1152449
Credit: 0
RAC: 0
Message 79183 - Posted 11 Dec 2015 6:15:53 UTC

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=704600829

zibochen.helix.151129ZCwh4start2_fold_and_dock_SAVE_ALL_OUT_312981_3717

777217222 2201387 10 Dec 2015 7:08:56 UTC 10 Dec 2015 7:29:33 UTC Over Client error Compute error 848.38 1.96 ---

777220523 1729478 10 Dec 2015 7:30:20 UTC 11 Dec 2015 3:02:33 UTC Over Client error Compute error 4,093.50 41.12 ---

Link
Avatar

Joined: May 4 07
Posts: 260
ID: 173059
Credit: 335,263
RAC: 146
Message 79193 - Posted 12 Dec 2015 10:08:55 UTC - in response to Message ID 79167.

Unfortunately, there is no easy way to efficiently abort the tasks mid process and keep the results/work.

No, but you could at least abort them on the next scheduler request if they have not started yet.

Or, if you still want the results back, simply do not generate new WUs for that job, but do not cancel the already sent out ones (should be possible I think).



I'll likely change the delay bound for these Robetta jobs so they do not get sent to computers that can't finish them in time. This change may prevent some computers from getting work if there are only Robetta jobs in the queue which would be rare. Or maybe there is a better fix?

Not sure what you mean, the "canceled" tasks were finished before deadline...
____________
.

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79198 - Posted 12 Dec 2015 20:54:12 UTC - in response to Message ID 79193.

Unfortunately, there is no easy way to efficiently abort the tasks mid process and keep the results/work.

No, but you could at least abort them on the next scheduler request if they have not started yet.

Or, if you still want the results back, simply do not generate new WUs for that job, but do not cancel the already sent out ones (should be possible I think).



I'll likely change the delay bound for these Robetta jobs so they do not get sent to computers that can't finish them in time. This change may prevent some computers from getting work if there are only Robetta jobs in the queue which would be rare. Or maybe there is a better fix?

Not sure what you mean, the "canceled" tasks were finished before deadline...


The Robetta jobs often finished before the deadline but I updated the deadline and Robetta no longer cancels jobs so hopefully this will help. I may have to adjust the delay bound to find an optimal value.

sgaboinc

Joined: Apr 2 14
Posts: 169
ID: 498515
Credit: 125,409
RAC: 0
Message 79201 - Posted 13 Dec 2015 5:36:12 UTC - in response to Message ID 79167.
Last modified: 13 Dec 2015 6:26:41 UTC

Unfortunately, there is no easy way to efficiently abort the tasks mid process and keep the results/work. I'll likely change the delay bound for these Robetta jobs so they do not get sent to computers that can't finish them in time. This change may prevent some computers from getting work if there are only Robetta jobs in the queue which would be rare. Or maybe there is a better fix?


i'd guess users hitting those issues may perhaps need to tune their computing preferences so that the boinc client do not download too many tasks

parameters like the Computing preferences > Maintain enough work for an additional n days should be limited to a manageable number e.g. i used 0.5 days

changing parameters on the server side unfortunately would run into various dilemmas as those normally affects everyone. e.g. having too far an expiry date would lead to the researchers waiting too long for results to be turned around. and if there are orphaned tasks that'd be worse as the boinc server only comes to know that they are orphaned after waiting till expiry and those gets re-assigned say after a 2 weeks expiry period https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6747

having too short an expiry period risks tasks getting cancelled/aborted by the server before they are complete

the other thing is for users to practice a quick turn around rather than to download a large cache of jobs, e.g. i normally get sufficient number of tasks for the current session which i intend to run and set 'no new tasks' once there are adequate tasks, so that those tasks are completed and returned to the server promptly

i'd guess that helps me keep a short Average turnaround time of 0.24 days

Timo Profile
Avatar

Joined: Jan 9 12
Posts: 171
ID: 440120
Credit: 10,317,960
RAC: 15,380
Message 79202 - Posted 13 Dec 2015 6:51:35 UTC - in response to Message ID 79198.
Last modified: 13 Dec 2015 7:01:13 UTC

I updated the deadline and Robetta no longer cancels jobs so hopefully this will help. I may have to adjust the delay bound to find an optimal value.


.. Wondering if that part of things - Robetta canceling jobs - was my fault (as per this thread which resulted in David adding logic to cancel tasks once a Robetta job completes, to solve a seperate issue of jobs being sent out for already-completed Robetta runs). Really hard to please everyone now isn't it :)

Link
Avatar

Joined: May 4 07
Posts: 260
ID: 173059
Credit: 335,263
RAC: 146
Message 79210 - Posted 13 Dec 2015 15:21:14 UTC - in response to Message ID 79201.

i'd guess users hitting those issues may perhaps need to tune their computing preferences so that the boinc client do not download too many tasks

The issue here is not that someone has a too large cache, the work units were done long before deadline, the issue is that jobs are canceled on the server and this information is not passed to the client.
____________
.

sgaboinc

Joined: Apr 2 14
Posts: 169
ID: 498515
Credit: 125,409
RAC: 0
Message 79214 - Posted 13 Dec 2015 18:27:32 UTC - in response to Message ID 79210.
Last modified: 13 Dec 2015 19:04:05 UTC


The issue here is not that someone has a too large cache, the work units were done long before deadline, the issue is that jobs are canceled on the server and this information is not passed to the client.


Thanks link i see your point, there are various/many limitations with boinc i'd guess in part due to the protocol design. For most part it works well, then in the real world we have the extremes which fall our of the 'normal' design ranges of boinc i'd guess. i read that boinc is based on a 'one way polling' design where all network requests are initiated by the client, this limits 'push' notifications from being possible as a solution.

there are probably ways to resolve that e.g. using a 2 phase closure state design (e.g. when the batch/jobs are cancelled on the server, jobs which has been downloaded become 1/2 closed, when the client completes and submits the results, the server can then assign credits and mark the job closed) but that may make boinc codes more complicated & it'd take effort to do so.
i'd guess it may be good to post some of these circumstances/issues in the boinc message boards http://boinc.berkeley.edu/dev/ or log bug reports https://github.com/BOINC/boinc/issues so that the developers could consider them as boinc codes are enhanced.

actually there are many pressing issues such as that servers tend to be overloaded due to design, e.g.heavy 'polling' by clients. those again are complicated protocol design issues which may take significant effort to enhance the codes, test and to install/update servers and distribute updated clients to all participants.

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79216 - Posted 14 Dec 2015 5:40:46 UTC - in response to Message ID 79202.

I updated the deadline and Robetta no longer cancels jobs so hopefully this will help. I may have to adjust the delay bound to find an optimal value.


.. Wondering if that part of things - Robetta canceling jobs - was my fault (as per this thread which resulted in David adding logic to cancel tasks once a Robetta job completes, to solve a seperate issue of jobs being sent out for already-completed Robetta runs). Really hard to please everyone now isn't it :)


Yep, but I think reducing the delay bound was necessary so bringing the issue(s) to light was great. It may need further adjusting but it's better than before IMO. Thanks!

Link
Avatar

Joined: May 4 07
Posts: 260
ID: 173059
Credit: 335,263
RAC: 146
Message 79225 - Posted 15 Dec 2015 19:28:15 UTC - in response to Message ID 79214.
Last modified: 15 Dec 2015 19:31:26 UTC

there are various/many limitations with boinc i'd guess in part due to the protocol design. For most part it works well, then in the real world we have the extremes which fall our of the 'normal' design ranges of boinc i'd guess. i read that boinc is based on a 'one way polling' design where all network requests are initiated by the client, this limits 'push' notifications from being possible as a solution.

No need to "push" anything, the BOINC server has the ability to tell such things to the client simply on the next scheduler request. It's even possible to choose, if a work unit should be aborted even if it already started or only if it has not started yet. So nothing "extreme", just something that's been implemented for long time ago and already in use on other projects when needed.

Of course I can't say, if the ancient version of BOINC Rosetta is using is able to send such messages, but if not, than that's just one more point on the long long list of reasons why they need to upgrade.
____________
.

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79234 - Posted 16 Dec 2015 23:21:35 UTC - in response to Message ID 79225.

there are various/many limitations with boinc i'd guess in part due to the protocol design. For most part it works well, then in the real world we have the extremes which fall our of the 'normal' design ranges of boinc i'd guess. i read that boinc is based on a 'one way polling' design where all network requests are initiated by the client, this limits 'push' notifications from being possible as a solution.

No need to "push" anything, the BOINC server has the ability to tell such things to the client simply on the next scheduler request. It's even possible to choose, if a work unit should be aborted even if it already started or only if it has not started yet. So nothing "extreme", just something that's been implemented for long time ago and already in use on other projects when needed.

Of course I can't say, if the ancient version of BOINC Rosetta is using is able to send such messages, but if not, than that's just one more point on the long long list of reasons why they need to upgrade.


I think there are some client side and server side settings that are available, even for our old version, that would increase the communication with our servers but would put more pressure on our database server. Other options would have to be developed within our app and server (regardless of server version). Unless I'm missing something.

Link
Avatar

Joined: May 4 07
Posts: 260
ID: 173059
Credit: 335,263
RAC: 146
Message 79237 - Posted 17 Dec 2015 18:25:48 UTC - in response to Message ID 79234.

I think there are some client side and server side settings that are available, even for our old version, that would increase the communication with our servers but would put more pressure on our database server. Other options would have to be developed within our app and server (regardless of server version). Unless I'm missing something.

There's also the possibility to abort WUs, which has already been sent out to a client on the next scheduler request of this client. No extra communication. WUs like that appear than as "Aborted by server" (or project, not sure) in the task list.
____________
.

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79241 - Posted 17 Dec 2015 21:46:44 UTC - in response to Message ID 79237.

I think there are some client side and server side settings that are available, even for our old version, that would increase the communication with our servers but would put more pressure on our database server. Other options would have to be developed within our app and server (regardless of server version). Unless I'm missing something.

There's also the possibility to abort WUs, which has already been sent out to a client on the next scheduler request of this client. No extra communication. WUs like that appear than as "Aborted by server" (or project, not sure) in the task list.


I'm not aware of an abort option other than canceling jobs.

rjs5

Joined: Nov 22 10
Posts: 125
ID: 402637
Credit: 3,223,734
RAC: 2,383
Message 79242 - Posted 18 Dec 2015 3:40:50 UTC - in response to Message ID 79241.

I think there are some client side and server side settings that are available, even for our old version, that would increase the communication with our servers but would put more pressure on our database server. Other options would have to be developed within our app and server (regardless of server version). Unless I'm missing something.

There's also the possibility to abort WUs, which has already been sent out to a client on the next scheduler request of this client. No extra communication. WUs like that appear than as "Aborted by server" (or project, not sure) in the task list.


I'm not aware of an abort option other than canceling jobs.


I think this logic is expected to be in the client.

EXIT_UNSTARTED_LATE 200
Task was aborted due to it having not started and already past the deadline.

http://boincfaq.mundayweb.com/index.php?viewCat=3&sessionID=c5a9905b2172d67bb1c1ff12eedd0b6c

Timo Profile
Avatar

Joined: Jan 9 12
Posts: 171
ID: 440120
Credit: 10,317,960
RAC: 15,380
Message 79243 - Posted 18 Dec 2015 4:10:40 UTC - in response to Message ID 79242.
Last modified: 18 Dec 2015 4:17:40 UTC


I'm not aware of an abort option other than canceling jobs.


I think this logic is expected to be in the client.

EXIT_UNSTARTED_LATE 200
Task was aborted due to it having not started and already past the deadline.

http://boincfaq.mundayweb.com/index.php?viewCat=3&sessionID=c5a9905b2172d67bb1c1ff12eedd0b6c


... Definitely, some sort of 'remote abort' command that can be issued from the server to all clients holding a certain job would be ideal.. in the link rjs5 shared above, there's one code EXIT_ABORTED_BY_PROJECT 202 that sounds like it could be it... but it still doesn't seem to be something that 'pulls the plug' remotely, rather it's just a classification of how to handle WUs returned for a job that was cancelled by the server (said Work Units still run their full course from the look of it, sadly) - more reading found here: http://boinc.berkeley.edu/dev/forum_thread.php?id=7704&sort=5 - I also took a look around the BOINC dev docs and didn't see anything like this, which is sad as it sounds like a very practical and useful function. I'll continue looking though...

Ideally, would want some way to actually tell clients to stop crunching a certain task and bail out early (move on to the next useful task) rather than having their cycles spinning for something that is cancelled.

rjs5

Joined: Nov 22 10
Posts: 125
ID: 402637
Credit: 3,223,734
RAC: 2,383
Message 79247 - Posted 18 Dec 2015 16:29:25 UTC - in response to Message ID 79243.


I'm not aware of an abort option other than canceling jobs.


I think this logic is expected to be in the client.

EXIT_UNSTARTED_LATE 200
Task was aborted due to it having not started and already past the deadline.

http://boincfaq.mundayweb.com/index.php?viewCat=3&sessionID=c5a9905b2172d67bb1c1ff12eedd0b6c


... Definitely, some sort of 'remote abort' command that can be issued from the server to all clients holding a certain job would be ideal.. in the link rjs5 shared above, there's one code EXIT_ABORTED_BY_PROJECT 202 that sounds like it could be it... but it still doesn't seem to be something that 'pulls the plug' remotely, rather it's just a classification of how to handle WUs returned for a job that was cancelled by the server (said Work Units still run their full course from the look of it, sadly) - more reading found here: http://boinc.berkeley.edu/dev/forum_thread.php?id=7704&sort=5 - I also took a look around the BOINC dev docs and didn't see anything like this, which is sad as it sounds like a very practical and useful function. I'll continue looking though...

Ideally, would want some way to actually tell clients to stop crunching a certain task and bail out early (move on to the next useful task) rather than having their cycles spinning for something that is cancelled.




Just pass the DEADLINE date to the client job as a command line parameter and have the client periodically check the current date against the deadline. When the deadline is passed, abort.

It would be pretty easy to add the "if ( date > deadline ) abort;" to the code to determine whether to continue or not. I would probably place it just before the checkpointing code so the client could skip the checkpointing if deadline is passed.

sgaboinc

Joined: Apr 2 14
Posts: 169
ID: 498515
Credit: 125,409
RAC: 0
Message 79249 - Posted 18 Dec 2015 17:47:34 UTC - in response to Message ID 79247.




Just pass the DEADLINE date to the client job as a command line parameter and have the client periodically check the current date against the deadline. When the deadline is passed, abort.

It would be pretty easy to add the "if ( date > deadline ) abort;" to the code to determine whether to continue or not. I would probably place it just before the checkpointing code so the client could skip the checkpointing if deadline is passed.




actually i'm wondering a little if it may be fun to have a utility that works like ifttt https://ifttt.com/ or tasker http://www.androidcentral.com/tasker-review-thing-you-need-do-all-things for boinc, then one can literally create all kind of wiz bang scheduling & mini automation that one prefers :D lol

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79253 - Posted 19 Dec 2015 23:37:19 UTC

So if anyone is aware of an actual abort option to push that info onto clients that may be holding or running a job that we'd like to abort in a timely manner, please let me know. A possible option that I am aware of is using a trickle message and coding it into our application and of course we'd also need to code up the server side logic. But I'm not sure there's an abort option that pushes that info out to clients currently in BOINC.

sgaboinc

Joined: Apr 2 14
Posts: 169
ID: 498515
Credit: 125,409
RAC: 0
Message 79255 - Posted 20 Dec 2015 3:41:52 UTC - in response to Message ID 79253.
Last modified: 20 Dec 2015 3:47:18 UTC

So if anyone is aware of an actual abort option to push that info onto clients that may be holding or running a job that we'd like to abort in a timely manner, please let me know. A possible option that I am aware of is using a trickle message and coding it into our application and of course we'd also need to code up the server side logic. But I'm not sure there's an abort option that pushes that info out to clients currently in BOINC.


yeah, it seemed trickle messages is possibly an 'only' way and it'd seem there would need to be enhancements in both the server codes as well as r@h app codes
http://boinc.berkeley.edu/trac/wiki/TrickleApi
e.g. when r@h app receives a trickle message, interprets it for an 'early end' command and it wraps up the job and submits that to the server as a completed task

i'm not too sure if there could be client dependencies, e.g. that certain client versions may have different or don't have the trickle feature

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 79317 - Posted 28 Dec 2015 5:26:05 UTC

Bit of a weird error in this task. It validated ok and only finished because it reached 99 decoys but...

14h2ld2203_fold_and_dock_SAVE_ALL_OUT_319506_998_0

bad torsion type for JumpAtom: 1
ERROR:: Exit from: ..\..\..\src\core\kinematics\tree\JumpAtom.cc line: 94
bad torsion type for JumpAtom: 1
ERROR:: Exit from: ..\..\..\src\core\kinematics\tree\JumpAtom.cc line: 94
bad torsion type for JumpAtom: 1
ERROR:: Exit from: ..\..\..\src\core\kinematics\tree\JumpAtom.cc line: 94

-> -multiple lines of the above

...

ERROR: std::abs( coordsys_rot.det() - 1.0 ) < 1e-6
ERROR:: Exit from: ..\..\..\src\core\pose\symmetry\util.cc line: 894
No heartbeat from core client for 30 sec - exiting

...

ERROR: std::abs( coordsys_rot.det() - 1.0 ) < 1e-6
ERROR:: Exit from: ..\..\..\src\core\pose\symmetry\util.cc line: 894
bad torsion type for JumpAtom: 1
ERROR:: Exit from: ..\..\..\src\core\kinematics\tree\JumpAtom.cc line: 94
bad torsion type for JumpAtom: 1
ERROR:: Exit from: ..\..\..\src\core\kinematics\tree\JumpAtom.cc line: 94
bad torsion type for JumpAtom: 1
ERROR:: Exit from: ..\..\..\src\core\kinematics\tree\JumpAtom.cc line: 94

-> -multiple lines of the above again


____________

sinspin

Joined: Jan 30 06
Posts: 25
ID: 55456
Credit: 5,865,793
RAC: 714
Message 79367 - Posted 7 Jan 2016 14:42:54 UTC

I got a lot of errors with long running *krypton* WUs.
Seems that all of them stop with the same error:

exceeded elapsed time limit 141525.53 (500000.00G/3.53G)


some examples:
https://boinc.bakerlab.org/rosetta/result.php?resultid=781463729
https://boinc.bakerlab.org/rosetta/result.php?resultid=781463704
https://boinc.bakerlab.org/rosetta/result.php?resultid=781463682
https://boinc.bakerlab.org/rosetta/result.php?resultid=781463677
https://boinc.bakerlab.org/rosetta/result.php?resultid=781463675
https://boinc.bakerlab.org/rosetta/result.php?resultid=781463670
https://boinc.bakerlab.org/rosetta/result.php?resultid=781463662
https://boinc.bakerlab.org/rosetta/result.php?resultid=781463660
https://boinc.bakerlab.org/rosetta/result.php?resultid=781463658
https://boinc.bakerlab.org/rosetta/result.php?resultid=781463656
https://boinc.bakerlab.org/rosetta/result.php?resultid=781463652
https://boinc.bakerlab.org/rosetta/result.php?resultid=781463647

I am not sure if other WUs wich run up to the time limit are successful or not.
____________

krypton
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Nov 16 11
Posts: 105
ID: 436004
Credit: 1,796,601
RAC: 1,887
Message 79373 - Posted 7 Jan 2016 19:37:06 UTC

Sorry about that! I'm investigating.

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79374 - Posted 7 Jan 2016 20:20:11 UTC

I've increased our default fpops limit so hopefully this will prevent such errors in the future but we can't update the limit for the jobs that have already been submitted. Thanks for the heads up on this error.

Jim1348

Joined: Jan 19 06
Posts: 65
ID: 52455
Credit: 2,074,859
RAC: 9,392
Message 79376 - Posted 7 Jan 2016 21:56:26 UTC - in response to Message ID 79243.


... Definitely, some sort of 'remote abort' command that can be issued from the server to all clients holding a certain job would be ideal.. in the link rjs5 shared above, there's one code EXIT_ABORTED_BY_PROJECT 202 that sounds like it could be it... but it still doesn't seem to be something that 'pulls the plug' remotely, rather it's just a classification of how to handle WUs returned for a job that was cancelled by the server (said Work Units still run their full course from the look of it, sadly) - more reading found here: http://boinc.berkeley.edu/dev/forum_thread.php?id=7704&sort=5 - I also took a look around the BOINC dev docs and didn't see anything like this, which is sad as it sounds like a very practical and useful function. I'll continue looking though...

Ideally, would want some way to actually tell clients to stop crunching a certain task and bail out early (move on to the next useful task) rather than having their cycles spinning for something that is cancelled.


The Abort (202) command is used all the time on World Community Grid to cancel jobs in your buffer before they start, both on Clean Energy Phase 2 (CEP2) and Mapping Cancer Markers (MCM). I have several in my logs now, and see them all the time. I am not a software developer and don't know how it is implemented, but it works something like this: In those projects where a quorum of two is required to validate results, the two work units are sent out simultaneously to two different users under a given time limit (say 7 days). Suppose the first one comes back in a couple of days, but the second one is delayed. After 5 days or so the server sends out another copy to a "trusted computer" (or whatever it is called). I am one of them, because I leave my machines on 24/7 and have a low error rate. So it sits in my buffer for a few hours, but before I can get to it, the second machine returns its results. Therefore, they don't need me to work on it, and they send me the "Server Abort" command for that work unit.

There are plenty of people on WCG who can explain it in detail; try SekeRob first.

____________

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79381 - Posted 8 Jan 2016 19:35:21 UTC

This sounds like a scheduler function for redundant jobs which makes sense but this is not a general abort option that we can use without having to modify server and/or application code to my knowledge.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3378
ID: 106194
Credit: 0
RAC: 0
Message 79382 - Posted 8 Jan 2016 20:31:28 UTC

I believe it is a check for host-specific messages during the scheduler requests. Big hit to database.

See <msg_to_host/> tag
https://boinc.berkeley.edu/trac/wiki/ProjectOptions
____________
Rosetta Moderator: Mod.Sense

sinspin

Joined: Jan 30 06
Posts: 25
ID: 55456
Credit: 5,865,793
RAC: 714
Message 79384 - Posted 9 Jan 2016 9:56:16 UTC - in response to Message ID 79374.

I've increased our default fpops limit so hopefully this will prevent such errors in the future but we can't update the limit for the jobs that have already been submitted. Thanks for the heads up on this error.


Thanks.
The "500000G" in the error message, what means that? How can i calculate the amount of Seconds, the task can run, from this value?
Is that around 39Hrs? If yes, why? My task runtime is set to two days. (that saves a lot of network traffic and increase the crunching efficiency)
____________

Timo Profile
Avatar

Joined: Jan 9 12
Posts: 171
ID: 440120
Credit: 10,317,960
RAC: 15,380
Message 79385 - Posted 9 Jan 2016 17:53:52 UTC - in response to Message ID 79384.

My task runtime is set to two days. (that saves a lot of network traffic and increase the crunching efficiency)


.. Well there might be the underlying issue, the 2 day runtime was deprecated sometime last year after some type of issue with it (I forget the details at this point, but it wasn't the error your seeing but another one) with certain protocols or job types. I suggest scaling back your target runtime to one of the currently available values (max is 1 day). Cheers!

sinspin

Joined: Jan 30 06
Posts: 25
ID: 55456
Credit: 5,865,793
RAC: 714
Message 79386 - Posted 9 Jan 2016 18:49:02 UTC

I can remember about the other runtime bug. The validation at server side was failing for some kind of WUs when the WU was finished.

The current bug happens only on my new (very fast) Machine. My other machine runs well with all kind of tasks (two days runtime).
I use for the new machine a differend Rosetta@home preferences set, wich have now 1 day runtime. Will see if that solve the problem.

____________

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79387 - Posted 9 Jan 2016 20:43:37 UTC

The limit is set based on the BOINC client benchmark results and our server side fpops limit configuration setting (rsc_fpops_bound) and since your new computer is very fast and you have a relatively long run time setting, it was hitting the limit. This is explained well here:

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=5512

which I found googling the topic. This also explains how to fix it on the client side as a temporary fix. Since I've updated our rsc_fpops_bound value on our server, new jobs should have a value that allows your fast computer to run the 2 day run time.

sinspin

Joined: Jan 30 06
Posts: 25
ID: 55456
Credit: 5,865,793
RAC: 714
Message 79395 - Posted 11 Jan 2016 16:50:39 UTC
Last modified: 11 Jan 2016 16:52:11 UTC

Great news! Thank you very much.
Ich will switch back to two days if all my current work is done.
____________

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 79434 - Posted 23 Jan 2016 5:40:43 UTC

Hi.

Since there is not a tread for app 3.67 I'll put this here, it erred after 48 min this morning it was one of the last for the old app I had left.


P75901_PF03239_1-266_REDO_cst_v02_t000__krypton_SAVE_ALL_OUT_03_09_320949_609_2

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=710119997

Starting work on structure: _00001
# cpu_run_time_pref: 14400

ERROR: unknown atom_name: VRT N
ERROR:: Exit from: src/core/chemical/ResidueType.cc line: 3187
[0xd0a3b23]
[0xcaa8dc9]
[0xa2e271f]
[0xa2e49d2]
[0xa398686]
[0xa2a7765]
[0xa2aafdd]
[0xabc1534]
[0xabb9c72]
[0x89dcccf]
[0x89e0bfd]
[0x89f950a]
[0x89a4dbe]
[0x89aeab5]
[0x8049c59]
[0xd27d8f8]
[0x8048131]
*** glibc detected *** ../../projects/boinc.bakerlab.org_rosetta/minirosetta_3.67_i686-pc-linux-gnu: munmap_chunk(): invalid pointer: 0x12d13f20 ***
======= Backtrace: =========
[0xd2aac45]
[0xd26bfe1]
[0xc9a1813]
[0xc967b46]
[0xc036fe7]
[0xc037be8]
[0x89a7898]
[0x89aeab5]
[0x8049c59]
[0xd27d8f8]
[0x8048131]
======= Memory map: ========
08048000-0dff8000 r-xp 00000000 08:01 133486 /var/lib/boinc-client/projects/boinc.bakerlab.org_rosetta/minirosetta_3.67_i686-pc-linux-gnu
0dff8000-0e001000 rw-p 05faf000 08:01 133486 /var/lib/boinc-client/projects/boinc.bakerlab.org_rosetta/minirosetta_3.67_i686-pc-linux-gnu
0e001000-0e304000 rw-p 00000000 00:00 0
0f98d000-131e1000 rw-p 00000000 00:00 0 [heap]
b4500000-b45b5000 rw-p 00000000 00:00 0
b45b5000-b4600000 ---p 00000000 00:00 0
b4600000-b46b6000 rw-p 00000000 00:00 0
b46b6000-b4700000 ---p 00000000 00:00 0
b4700000-b4800000 rw-p 00000000 00:00 0
b4900000-b49d7000 rw-p 00000000 00:00 0
b49d7000-b4a00000 ---p 00000000 00:00 0
b4a00000-b4b00000 rw-p 00000000 00:00 0
b4b00000-b4bfb000 rw-p 00000000 00:00 0
b4bfb000-b4c00000 ---p 00000000 00:00 0
b4c00000-b4d00000 rw-p 00000000 00:00 0
b4d00000-b4e00000 rw-p 00000000 00:00 0
b4f00000-b5000000 rw-p 00000000 00:00 0
b505f000-b5100000 rw-p 00000000 00:00 0
b5100000-b5200000 rw-p 00000000 00:00 0
b5300000-b5400000 rw-p 00000000 00:00 0
b549d000-b5741000 rw-p 00000000 00:00 0
b5741000-b5742000 ---p 00000000 00:00 0
b5742000-b6344000 rw-p 00000000 00:00 0 [stack:15704]
b6344000-b7658000 rw-s 00000000 08:01 134834 /var/lib/boinc-client/slots/6/boinc_minirosetta_6
b7658000-b7659000 ---p 00000000 00:00 0
b7659000-b765c000 rw-p 00000000 00:00 0 [stack:15686]
b765c000-b765e000 rw-s 00000000 08:01 134716 /var/lib/boinc-client/slots/6/boinc_mmap_file
b765e000-b775b000 rw-p 00000000 00:00 0
b775b000-b775c000 r-xp 00000000 00:00 0 [vdso]
bfac0000-bfaf1000 rw-p 00000000 00:00 0 [stack]
SIGABRT: abort called
Stack trace (16 frames):
[0xd1d9dff]
[0xb775b404]
[0xd2c86a0]
[0xd285880]
[0xd29f2db]
[0xd2aac45]
[0xd26bfe1]
[0xc9a1813]
[0xc967b46]
[0xc036fe7]
[0xc037be8]
[0x89a7898]
[0x89aeab5]
[0x8049c59]
[0xd27d8f8]
[0x8048131]

Exiting...

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 19.6384448376039
Granted credit 0
application version 3.67

____________


Message boards : Number crunching : Minirosetta 3.62-3.65


Home | Join | About | Participants | Community | Statistics

Copyright © 2017 University of Washington

Last Modified: 10 Nov 2010 1:51:38 UTC
Back to top ^