Rosetta@home

Minirosetta 3.73

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search
Message boards : Number crunching : Minirosetta 3.73

Sort
AuthorMessage
David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79425 - Posted 20 Jan 2016 22:42:50 UTC

Please post any issues/bug reports regarding minirosetta and minirosetta_android 3.71 in this thread.

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 79426 - Posted 21 Jan 2016 1:32:37 UTC
Last modified: 21 Jan 2016 1:38:58 UTC

New database coming down at 188Mb in size. One of my uploads struggling to upload. I anticipate everyone struggling to upload and download for a few days as the servers take a hit from everyone.

David: what are the fixes\improvements in this version, please?

Edit: 10mins to upload 2 tasks and download 4 3.71 tasks and the full new database took just 10 minutes here. Hopefully it won't be too bad for everyone, but still some patience required, obviously
____________

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79427 - Posted 21 Jan 2016 4:01:04 UTC

This version uses the latest Rosetta source code which includes an improved score function, new protocols (for example a new cyclic peptide modeling protocol), and some modifications to the graphics application.

James W

Joined: Nov 25 12
Posts: 11
ID: 463505
Credit: 230,556
RAC: 326
Message 79428 - Posted 21 Jan 2016 6:20:51 UTC

One of my hosts is an Android 5 Tablet with 4 CPUs. Just a few moments ago I did an update and BOINC Event Log entry from Rosetta says: Rosetta Mini is not available for your type of computer. Is your server not yet recognizing the qualifications for 3.71? I know this message would be appropriate for earlier Rosetta Mini versions.

Is the delay in sending 3.71 to my device due to overworked server or another reason? Thank you.

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79429 - Posted 21 Jan 2016 18:49:28 UTC

We currently are not sending out the android work units yet. We will soon.

Chilean Profile
Avatar

Joined: Oct 16 05
Posts: 651
ID: 5008
Credit: 10,201,445
RAC: 4,663
Message 79430 - Posted 22 Jan 2016 1:53:31 UTC

Was the blue background intentional? I'd prefer an all black background like the previous version.
____________

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79431 - Posted 22 Jan 2016 4:36:59 UTC

Yes it is intentional. Someone in the lab wanted to make it look better but I guess it's all a matter of opinion. He also added a nice light source for a spacefill representation which you will see soon once he starts submitting his cyclic peptide modeling jobs.

Timo Profile
Avatar

Joined: Jan 9 12
Posts: 171
ID: 440120
Credit: 10,317,960
RAC: 15,380
Message 79432 - Posted 22 Jan 2016 5:52:00 UTC - in response to Message ID 79431.

Yes it is intentional. Someone in the lab wanted to make it look better but I guess it's all a matter of opinion. He also added a nice light source for a spacefill representation which you will see soon once he starts submitting his cyclic peptide modeling jobs.


Personally I like the new background coloring, though either way is fine really. Looks great in full screen. Will be interesting to see the cyclic peptide jobs.
____________
**38 cores crunching for R@H on behalf of cancercomputer.org - a non-profit supporting High Performance Computing in Cancer Research

Professor Ray

Joined: Dec 7 05
Posts: 35
ID: 32151
Credit: 40,663
RAC: 43
Message 79435 - Posted 24 Jan 2016 0:12:17 UTC - in response to Message ID 79427.

This version uses the latest Rosetta source code which includes an improved score function, new protocols (for example a new cyclic peptide modeling protocol), and some modifications to the graphics application.


This insinuated itself whilest crunching a v3.65 WU was retained in memory. Currently my machine is crunching WCG WU - both v3.65 & v3.71 Rosetta WU are RAM'd - prolly to swap space that's been striped across two U160 15k SCSI HDD.

Its a pain in the neck - but I do it - when that v3.65 is done, I can gets rid of the old DB zip? And THEN only make symlinks to the newest DB?

I've mentioned this before: I'd like better control over the DB extract process - so as to automate via script the symlink creation; but I was shouted down, belittled and trivialized by those superior to myself. I understand; that's the consequence of progress. :(

That notwithstanding, once I can participate with 'progress' I'm certain I'll be a happy camper. ;) Imagine crunching on the video card, or even surpassing the 3.5GB memory constraint because of the hardware platform my O/S runs on!

Oh, the future is such halcyion days!

____________

James W

Joined: Nov 25 12
Posts: 11
ID: 463505
Credit: 230,556
RAC: 326
Message 79436 - Posted 24 Jan 2016 6:44:18 UTC - in response to Message ID 79429.

We currently are not sending out the android work units yet. We will soon.


Will we be notified, either by post or BOINC notice, when the WUs will be ready for sending? No use keeping space open on devices until work actually can begin flowing. Thanks.

sgaboinc

Joined: Apr 2 14
Posts: 169
ID: 498515
Credit: 125,409
RAC: 0
Message 79437 - Posted 24 Jan 2016 15:49:10 UTC
Last modified: 24 Jan 2016 16:03:37 UTC

updating, i tried this out on ralph actually, 3.71 in ralph seemed to run ok on linux. the blue background looks nicer but it seem somewhat stressful on the eye, i'd guess preference varies between individuals

sgaboinc

Joined: Apr 2 14
Posts: 169
ID: 498515
Credit: 125,409
RAC: 0
Message 79438 - Posted 25 Jan 2016 0:36:57 UTC

no errors for 8 tasks

Link
Avatar

Joined: May 4 07
Posts: 260
ID: 173059
Credit: 335,263
RAC: 146
Message 79439 - Posted 25 Jan 2016 10:19:47 UTC - in response to Message ID 79437.

updating, i tried this out on ralph actually, 3.71 in ralph seemed to run ok on linux. the blue background looks nicer but it seem somewhat stressful on the eye, i'd guess preference varies between individuals

Maybe they could make it configurabe thru Rosetta@Home Preferences, on SETI for example you can configure the screensaver that way.
____________
.

sinspin

Joined: Jan 30 06
Posts: 25
ID: 55456
Credit: 5,865,793
RAC: 714
Message 79441 - Posted 25 Jan 2016 19:12:38 UTC

Krypton's works well since your changes and two days of runtime.
Now, i got another WU with the same error ("exceeded elapsed time limit 141525.53 (500000.00G/3.53G)"):
https://boinc.bakerlab.org/rosetta/result.php?resultid=788117823

I have about 1300 credits lost with this error, and two days of WU crunching of one core.
____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3378
ID: 106194
Credit: 0
RAC: 0
Message 79442 - Posted 25 Jan 2016 21:11:08 UTC - in response to Message ID 79441.

I have about 1300 credits lost with this error, and two days of WU crunching of one core.


Credit for tasks that have errors is granted via a nightly script and, for whatever reason, is not shown on the task summary web page, only the task details.

____________
Rosetta Moderator: Mod.Sense

Timo Profile
Avatar

Joined: Jan 9 12
Posts: 171
ID: 440120
Credit: 10,317,960
RAC: 15,380
Message 79444 - Posted 25 Jan 2016 23:15:04 UTC - in response to Message ID 79441.

Krypton's works well since your changes and two days of runtime.
Now, i got another WU with the same error ("exceeded elapsed time limit 141525.53 (500000.00G/3.53G)"):
https://boinc.bakerlab.org/rosetta/result.php?resultid=788117823

I have about 1300 credits lost with this error, and two days of WU crunching of one core.


Target runtimes of 2 days have been deprecated for a while, to avoid issues I suggest moving your target runtime down to 1 day or less, this will provide enough buffer space between any hard limits like the one your WU hit.
____________
**38 cores crunching for R@H on behalf of cancercomputer.org - a non-profit supporting High Performance Computing in Cancer Research

sinspin

Joined: Jan 30 06
Posts: 25
ID: 55456
Credit: 5,865,793
RAC: 714
Message 79446 - Posted 26 Jan 2016 20:02:15 UTC - in response to Message ID 79442.
Last modified: 26 Jan 2016 20:02:41 UTC

Credit for tasks that have errors is granted via a nightly script and, for whatever reason, is not shown on the task summary web page, only the task details.


Oh, good to know! Thanks.

Target runtimes of 2 days have been deprecated for a while, to avoid issues I suggest moving your target runtime down to 1 day or less, this will provide enough buffer space between any hard limits like the one your WU hit.


No, i will not move. I have a weak and unstable internet connection. I do not waste internet traffic with a shorter runtime and disk space as well.

This bug is based on Boinc, not on the server bug that we had before. It seems that it need only a change in the parameters for a WU. I had before the same bug with *Krypton* WUs, since a change at server side there is no bug more.
By the way, two days gives a high chance of getting 99 results per WU. And, wasting less crunching time for finalization of one WU and starting another.
____________

Link
Avatar

Joined: May 4 07
Posts: 260
ID: 173059
Credit: 335,263
RAC: 146
Message 79448 - Posted 27 Jan 2016 8:10:33 UTC - in response to Message ID 79446.

Credit for tasks that have errors is granted via a nightly script and, for whatever reason, is not shown on the task summary web page, only the task details.


Oh, good to know! Thanks.

Yes, but the maximum is 300 credits as you can see at the bottom of the result page.



By the way, two days gives a high chance of getting 99 results per WU. And, wasting less crunching time for finalization of one WU and starting another.

This is true, however the chance, that the WU errors out and all that work is lost increases too.
____________
.

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 79449 - Posted 27 Jan 2016 11:23:21 UTC - in response to Message ID 79429.

We currently are not sending out the android work units yet. We will soon.


Will be some limitation, for example more recent than android 4.4??

____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3378
ID: 106194
Credit: 0
RAC: 0
Message 79451 - Posted 27 Jan 2016 18:32:02 UTC - in response to Message ID 79446.


No, i will not move. I have a weak and unstable internet connection. I do not waste internet traffic with a shorter runtime and disk space as well.


@sinspin, I would just point out that if files already exist on your host as you receive new work, then they do not have to be downloaded. However, if the host completes all tasks that need a file, the file is deleted. Any future tasks that might need it would have to download again. So, very difficult to predict your internet bandwidth requirement with much precision. Depends on the mix of tasks you get. But there are cases where having a larger pool of tasks on your host can improve the odds that you already have the needed files downloaded.

If bandwidth and stability are chief concerns, you might consider installing something like squidproxy. It will stash a copy of downloaded files. Then if ever needed again, it already has a copy. Of course that would need some disk space to operate as well. But it (and longer runtimes) is the most effective way to save the network usage.
____________
Rosetta Moderator: Mod.Sense

B.Rothbaecher

Joined: Jun 2 12
Posts: 7
ID: 452192
Credit: 1,101,835
RAC: 0
Message 79454 - Posted 28 Jan 2016 21:45:05 UTC

I've several WUs which end with error "Error reading and gzipping output datafile: default.out".

https://boinc.bakerlab.org/rosetta/result.php?resultid=789266189
https://boinc.bakerlab.org/rosetta/result.php?resultid=789264562
https://boinc.bakerlab.org/rosetta/result.php?resultid=789264561
https://boinc.bakerlab.org/rosetta/result.php?resultid=789175516
https://boinc.bakerlab.org/rosetta/result.php?resultid=789173393

Trotador

Joined: May 30 09
Posts: 61
ID: 318648
Credit: 39,402,039
RAC: 76,148
Message 79455 - Posted 28 Jan 2016 23:05:36 UTC - in response to Message ID 79454.

I've several WUs which end with error "Error reading and gzipping output datafile: default.out".

https://boinc.bakerlab.org/rosetta/result.php?resultid=789266189
https://boinc.bakerlab.org/rosetta/result.php?resultid=789264562
https://boinc.bakerlab.org/rosetta/result.php?resultid=789264561
https://boinc.bakerlab.org/rosetta/result.php?resultid=789175516
https://boinc.bakerlab.org/rosetta/result.php?resultid=789173393


This is occurring with the new WUs starting with 2wcd_, I do not know if any of them is finishing without this error. Moreover, these WUs have very large memory usage, this mornig I had to abort one that was over 20 GB although all the rest are below 1 GB so far

Trotador

Joined: May 30 09
Posts: 61
ID: 318648
Credit: 39,402,039
RAC: 76,148
Message 79457 - Posted 29 Jan 2016 6:45:08 UTC - in response to Message ID 79455.

I've several WUs which end with error "Error reading and gzipping output datafile: default.out".

https://boinc.bakerlab.org/rosetta/result.php?resultid=789266189
https://boinc.bakerlab.org/rosetta/result.php?resultid=789264562
https://boinc.bakerlab.org/rosetta/result.php?resultid=789264561
https://boinc.bakerlab.org/rosetta/result.php?resultid=789175516
https://boinc.bakerlab.org/rosetta/result.php?resultid=789173393


This is occurring with the new WUs starting with 2wcd_, I do not know if any of them is finishing without this error. Moreover, these WUs have very large memory usage, this mornig I had to abort one that was over 20 GB although all the rest are below 1 GB so far


These units are definitively causing problems, cancelling them.

krypton
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Nov 16 11
Posts: 105
ID: 436004
Credit: 1,796,601
RAC: 1,887
Message 79458 - Posted 29 Jan 2016 8:23:52 UTC

Thank you Trotador and B.Rothbaecher, I've informed the group about the error!

kipnis
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jan 29 16
Posts: 1
ID: 1226108
Credit: 45,501
RAC: 0
Message 79459 - Posted 29 Jan 2016 17:19:28 UTC

Jobs starting from 2wcd_ are intended to perform "fold and dock" simulations of a native protein forming large oligomeric complexes. These simulation should allow identification of key features responsible for oligomerization and modifications necessary to convert 12mer to higher order oligomeric states. It looks like the monomer is too big for the protocol to be run without any modification. Sorry, Trotador, for any trouble.

Trotador

Joined: May 30 09
Posts: 61
ID: 318648
Credit: 39,402,039
RAC: 76,148
Message 79460 - Posted 29 Jan 2016 19:25:33 UTC - in response to Message ID 79459.

Jobs starting from 2wcd_ are intended to perform "fold and dock" simulations of a native protein forming large oligomeric complexes. These simulation should allow identification of key features responsible for oligomerization and modifications necessary to convert 12mer to higher order oligomeric states. It looks like the monomer is too big for the protocol to be run without any modification. Sorry, Trotador, for any trouble.


Thanks for the explanation, no issue in finding problems when crunching. The computation error ocurred in all my hosts that received these units (and all the wingmen) but I see the memory problem only in one of them and maybe it is host related and not these wus. Will you rebuilt the wus for further tests? It seems an interesting simulation.


robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79461 - Posted 30 Jan 2016 0:32:12 UTC
Last modified: 30 Jan 2016 0:37:43 UTC

Oddball results from this task after restarting from a checkpoint - both Outcome: Success, and error messages at the end of the log file:

http://boinc.bakerlab.org/rosetta/result.php?resultid=789657365

Could you check if 3.71 restarts from checkpoints correctly, and if it reports correctly whether the task reports success properly?

Natalie de Clare Profile

Joined: Feb 27 12
Posts: 4
ID: 444722
Credit: 669,442
RAC: 0
Message 79462 - Posted 30 Jan 2016 11:54:37 UTC

The blue background is stupid and I do not like it at all. If you are going to change that then have an option for blue and black otherwise I am just not going to use it anymore.

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 79463 - Posted 30 Jan 2016 22:36:22 UTC - in response to Message ID 79462.

The blue background is stupid and I do not like it at all.


Please, use other terms.
____________

Jonathan Jeckell Profile

Joined: Dec 17 05
Posts: 7
ID: 39083
Credit: 3,091,577
RAC: 1,583
Message 79470 - Posted 3 Feb 2016 0:55:22 UTC

Not sure if this is directly due to the application upgrade or not, but seems to correlate to when I started having some issues.

I'm running a few laptops using Linux on USB sticks and now In the last few days BOINC takes an insufferably long time to start while it tells me "please wait." I took a look tonight and top showed the client and Rosetta were barely using the processor. Once in a while one of the Rosetta apps would bounce up to 100% for a bit and then subside. I have the client set to max out the processor and memory using all cores without restriction. No other processes are coming close to using a significant portion of the processor, and nothing indicates the USB bandwidth is the issue.
____________

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79471 - Posted 3 Feb 2016 5:30:56 UTC - in response to Message ID 79451.


No, i will not move. I have a weak and unstable internet connection. I do not waste internet traffic with a shorter runtime and disk space as well.


@sinspin, I would just point out that if files already exist on your host as you receive new work, then they do not have to be downloaded. However, if the host completes all tasks that need a file, the file is deleted.


Mod.Sense, that appears to be due to a new BOINC feature that I had not noticed before. When I checked several months ago, anything downloaded to the project directory was almost always kept for YEARS, such as versions of minirosetta that had been superseded years ago, and doing a Reset project only downloaded another copy each each of the files previously in the project directory.

Also, it appears that not all BOINC projects are set up to use this feature yet; some of the other projects my computers participate in still have files in their project directories that were superseded years ago.

It's good to see this feature used; more BOINC projects should use it.


Wiktor Jezioro | lakewik.pl Profile

Joined: Feb 22 15
Posts: 4
ID: 1048080
Credit: 17,369,574
RAC: 13,953
Message 79472 - Posted 3 Feb 2016 18:26:03 UTC

Hello!

I have error processing units of the series: if32_4_0044_r1_partial_fold_and_dock_nat_cst

Example: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=716899628

The workunits of this batch processing are over 5 hours and haven't checkpoints

Thanks

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3378
ID: 106194
Credit: 0
RAC: 0
Message 79477 - Posted 4 Feb 2016 15:55:32 UTC - in response to Message ID 79471.


Mod.Sense, that appears to be due to a new BOINC feature that I had not noticed before.


The project files get an additional attribute (sticky) which indicates they should be kept regardless of the current tasks. This is used for files that all tasks use. That way if all tasks for a given project are completed at some point in time, you won't have to download the always (or often) needed files with the next tasks.

My comments were more about other files that are more task specific. If you get another task later that requires the same file, a cache in a proxy or having another task on-board already that uses the same file will avoid another download.

____________
Rosetta Moderator: Mod.Sense

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3378
ID: 106194
Credit: 0
RAC: 0
Message 79478 - Posted 4 Feb 2016 15:56:25 UTC - in response to Message ID 79470.

Not sure if this is directly due to the application upgrade or not, but seems to correlate to when I started having some issues.

I'm running a few laptops using Linux on USB sticks and now In the last few days BOINC takes an insufferably long time to start while it tells me "please wait." I took a look tonight and top showed the client and Rosetta were barely using the processor. Once in a while one of the Rosetta apps would bounce up to 100% for a bit and then subside. I have the client set to max out the processor and memory using all cores without restriction. No other processes are coming close to using a significant portion of the processor, and nothing indicates the USB bandwidth is the issue.


Is it possible the tasks ran more than normal memory and the machine is swapping to death? Do you have a swap space somewhere other than the USB?
____________
Rosetta Moderator: Mod.Sense

Jonathan Jeckell Profile

Joined: Dec 17 05
Posts: 7
ID: 39083
Credit: 3,091,577
RAC: 1,583
Message 79488 - Posted 5 Feb 2016 22:16:04 UTC - in response to Message ID 79478.

Not sure if this is directly due to the application upgrade or not, but seems to correlate to when I started having some issues.

I'm running a few laptops using Linux on USB sticks and now In the last few days BOINC takes an insufferably long time to start while it tells me "please wait." I took a look tonight and top showed the client and Rosetta were barely using the processor. Once in a while one of the Rosetta apps would bounce up to 100% for a bit and then subside. I have the client set to max out the processor and memory using all cores without restriction. No other processes are coming close to using a significant portion of the processor, and nothing indicates the USB bandwidth is the issue.


Is it possible the tasks ran more than normal memory and the machine is swapping to death? Do you have a swap space somewhere other than the USB?


No, it's all on USB because of the transitory nature of my possession of these machines. But I will look into getting some hard drives for this purpose instead of the USB sticks. Since nobody else jumped in on the problem, I must be the only one experiencing this (vs a general bug) and this must be the issue. Thanks!
____________

Natalie de Clare Profile

Joined: Feb 27 12
Posts: 4
ID: 444722
Credit: 669,442
RAC: 0
Message 79489 - Posted 6 Feb 2016 2:20:42 UTC - in response to Message ID 79463.

The blue background is stupid and I do not like it at all.


Please, use other terms.


I will use whatever terms I desire to use.

Jesse Viviano

Joined: Jan 14 10
Posts: 39
ID: 366696
Credit: 440,912
RAC: 548
Message 79490 - Posted 6 Feb 2016 3:31:39 UTC

I can confirm that work unit 717556481 crashes due to out of memory errors in both results, and I suspect that work units 717556482 and 717556497 will also fail due to the same reason once the computers assigned these work units return their results because my computer also had the same errors in these work units. Their names are of the form 02_2016_(one numeral and then three letters)_backrub_design_(six numerals)_(one or more numerals).

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79491 - Posted 6 Feb 2016 3:54:54 UTC - in response to Message ID 79489.

The blue background is stupid and I do not like it at all.


Please, use other terms.


I will use whatever terms I desire to use.


Fine if you then let everyone else use those same terms to describe you.

sinspin

Joined: Jan 30 06
Posts: 25
ID: 55456
Credit: 5,865,793
RAC: 714
Message 79492 - Posted 6 Feb 2016 22:44:00 UTC - in response to Message ID 79491.

The blue background is stupid and I do not like it at all.


Please, use other terms.


I will use whatever terms I desire to use.


Fine if you then let everyone else use those same terms to describe you.


I am sure, stupid is not the right description. But, for whatever reason was it necessary to change to blue?
Screensaver background should be as dark as possible. There is no way to say, this wtfs blue is dark enough. Only black is it!

Please change back to black or give us an option to change it by ourself.
____________

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79493 - Posted 7 Feb 2016 2:20:20 UTC

If I was still using screensavers, I'd want an option to choose some main color other than blue. I've read that seeing blue tends to keep people awake for at least half an hour after they see much of it, and I don't find much to do in the half hour between the time I turn the monitors of my computers off and the time I go to bed.

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 79494 - Posted 7 Feb 2016 7:50:20 UTC - in response to Message ID 79492.

I am sure, stupid is not the right description. But, for whatever reason was it necessary to change to blue?


There is a scientific reason:
The graphics application was also updated to include new colors and a light source for spacefill rendering used for the new cyclic peptide modeling protocol. Spacefill rendering is only used as default for this protocol since the additional graphics load is minimal due to the small size of the proteins to be modeled.

____________

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79495 - Posted 7 Feb 2016 14:14:27 UTC - in response to Message ID 79494.

I am sure, stupid is not the right description. But, for whatever reason was it necessary to change to blue?


There is a scientific reason:
The graphics application was also updated to include new colors and a light source for spacefill rendering used for the new cyclic peptide modeling protocol. Spacefill rendering is only used as default for this protocol since the additional graphics load is minimal due to the small size of the proteins to be modeled.


That looks like an adequate reason for making some change, but is there any good reason for making the new colors and the light source mostly blue with no option for any other color instead?

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 79497 - Posted 7 Feb 2016 19:08:15 UTC - in response to Message ID 79495.

That looks like an adequate reason for making some change, but is there any good reason for making the new colors and the light source mostly blue with no option for any other color instead?


Good question, but
- no gpu app
- no 64 bit native app (for windows)
- no optimized app
- no updated server
- no android app (waiting new work)

So, screensaver may be not a priority :-)
____________

Natalie de Clare Profile

Joined: Feb 27 12
Posts: 4
ID: 444722
Credit: 669,442
RAC: 0
Message 79499 - Posted 7 Feb 2016 22:43:42 UTC - in response to Message ID 79491.
Last modified: 7 Feb 2016 22:54:43 UTC

The blue background is stupid and I do not like it at all.


Please, use other terms.


I will use whatever terms I desire to use.


Fine if you then let everyone else use those same terms to describe you.


Apparently the fact that words can have different meanings depending on context escapes you. STUPID can also mean "ANNOYING" or "TROUBLESOME" and even "INANE" or "IRRITATING", but I suspect people like you with your limited rational cognitive processes will always have a problem with seeing anything outside your own limited vernacular expression. Thus, keep your droll comments to yourself. I was merely "EXPRESSING" how "IRRITATING" the color change is, and per the comment about there being a "scientific reason" behind the color change...what a banal reason. Black will work just fine. Regardless, I am done with this thread and this topic. The blue is stupid. That is all I have to say; we are done here.

Chilean Profile
Avatar

Joined: Oct 16 05
Posts: 651
ID: 5008
Credit: 10,201,445
RAC: 4,663
Message 79500 - Posted 8 Feb 2016 16:25:09 UTC - in response to Message ID 79493.
Last modified: 8 Feb 2016 16:35:36 UTC

It's high school all over again! Hahaha

If I was still using screensavers, I'd want an option to choose some main color other than blue. I've read that seeing blue tends to keep people awake for at least half an hour after they see much of it, and I don't find much to do in the half hour between the time I turn the monitors of my computers off and the time I go to bed.


Try f.lux (https://justgetflux.com/)
____________

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 79502 - Posted 8 Feb 2016 18:54:53 UTC - in response to Message ID 79499.

Apparently the fact that words can have different meanings depending on context escapes you. STUPID can also mean "ANNOYING" or "TROUBLESOME" and even "INANE" or "IRRITATING", but I suspect people like you with your limited rational cognitive processes will always have a problem with seeing anything outside your own limited vernacular expression.

I'm italian, so my english is very basic and "stupid" means "stupid".
My rational cognitive processes, on the other hand, are not so bad.

and per the comment about there being a "scientific reason" behind the color change...what a banal reason. Black will work just fine.

Are you a scientist? Do you work with Rosetta code? No? So: shut up.

Regardless, I am done with this thread and this topic. The blue is stupid. That is all I have to say; we are done here.

Bye bye, you don't miss.
____________

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79503 - Posted 8 Feb 2016 20:53:19 UTC

We'll try to put a color option in the next update. It should be simple.

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79505 - Posted 9 Feb 2016 6:00:42 UTC

Two workunits each gave Compute error after about 48 minutes.

http://boinc.bakerlab.org/rosetta/result.php?resultid=792069698
http://boinc.bakerlab.org/rosetta/result.php?resultid=792073059

Both reported out of memory. Is that accurate enough that I'll have to set No new tasks? The motherboard won't take any more memory.

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79507 - Posted 9 Feb 2016 19:39:46 UTC - in response to Message ID 79505.

Two workunits each gave Compute error after about 48 minutes.

http://boinc.bakerlab.org/rosetta/result.php?resultid=792069698
http://boinc.bakerlab.org/rosetta/result.php?resultid=792073059

Both reported out of memory. Is that accurate enough that I'll have to set No new tasks? The motherboard won't take any more memory.


There is a batch of workunits that may contain some high memory jobs. These should be done soon so I wouldn't change anything.

Eric_Kaiser Profile

Joined: Sep 22 13
Posts: 3
ID: 483320
Credit: 341,141
RAC: 0
Message 79510 - Posted 11 Feb 2016 9:54:37 UTC

Had the same issue with https://boinc.bakerlab.org/rosetta/workunit.php?wuid=718197115

What is the problem with high memory wu? I suppose 64 GB RAM should be sufficient.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3378
ID: 106194
Credit: 0
RAC: 0
Message 79513 - Posted 11 Feb 2016 15:45:00 UTC - in response to Message ID 79510.
Last modified: 11 Feb 2016 15:45:26 UTC

Had the same issue with https://boinc.bakerlab.org/rosetta/workunit.php?wuid=718197115

What is the problem with high memory wu? I suppose 64 GB RAM should be sufficient.


It always depends on the BOINC Manager settings as to how much of that memory BOINC is allowed to use, and how much memory is available per CPU core that is running BOINC tasks. But yes, the primary area where high memory tasks will likely turn up issues is on machines with relatively less memory for CPU core.
____________
Rosetta Moderator: Mod.Sense

Eric_Kaiser Profile

Joined: Sep 22 13
Posts: 3
ID: 483320
Credit: 341,141
RAC: 0
Message 79523 - Posted 12 Feb 2016 11:41:24 UTC - in response to Message ID 79513.
Last modified: 12 Feb 2016 11:44:07 UTC


It always depends on the BOINC Manager settings as to how much of that memory BOINC is allowed to use, and how much memory is available per CPU core that is running BOINC tasks. But yes, the primary area where high memory tasks will likely turn up issues is on machines with relatively less memory for CPU core.

Boinc is allowed to use up to 90% of memory independant from usage. Both settings have 90% applied. HDD Usage is set to 500 GB. I'm running 10 simultanous tasks.

CPU type GenuineIntel
Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz [Family 6 Model 45 Stepping 7]
Number of CPUs 12
Operating System Microsoft Windows 10
Professional x64 Edition, (10.00.10586.00)
Memory 65449.89 MB
Cache 256 KB
Swap space 65465.89 MB
Total disk space 3725.9 GB
Free Disk Space 1974.17 GB

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3378
ID: 106194
Credit: 0
RAC: 0
Message 79526 - Posted 12 Feb 2016 14:58:11 UTC

@Eric_Kaiser
I'd say that essentially proves that memory is not the only cause of these tasks ending abnormally. Nice rig!
____________
Rosetta Moderator: Mod.Sense

hsdecalc

Joined: Jan 31 15
Posts: 1
ID: 1040785
Credit: 622,402
RAC: 1,017
Message 79527 - Posted 12 Feb 2016 21:52:12 UTC

Today on my WIN10 with 16GB (6GB in use) I have 5 WU which terminated successfull, but 6 WU with "Compute error":

Task-ID: 792931235 + 792931236 + 792931184
02_2016_3mfj_backrub_design_327089_231_1
02_2016_3ork_backrub_design_327089_310_1
02_2016_1q06_backrub_design_327089_220_1

Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x763DD928
---------

Task-ID: 792839120 Name: 02_2016_1vhs_backrub_design_327089_236_1
Task-ID: 792839114 Name: 02_2016_3bl5_backrub_design_327089_216_1

Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x74F95B68
---------

Task-ID: 792931217
02_2016_2ozz_backrub_design_327089_321_0

Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x015BF87E write attempt to address 0x00000024

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 79548 - Posted 15 Feb 2016 9:28:33 UTC - in response to Message ID 79527.

Today on my WIN10 with 16GB (6GB in use) I have 5 WU which terminated successfull, but 6 WU with "Compute error":

Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x763DD928
---------


Same here with 16gb and win10....
____________

Timo Profile
Avatar

Joined: Jan 9 12
Posts: 171
ID: 440120
Credit: 10,317,960
RAC: 15,380
Message 79549 - Posted 15 Feb 2016 15:37:47 UTC - in response to Message ID 79526.

@Eric_Kaiser
I'd say that essentially proves that memory is not the only cause of these tasks ending abnormally. Nice rig!


Since Eric's rig is Windows, even though its the 64 bit OS, the Windows rosetta app is still only 32 bit and thus can only address up to 4GB per instance (if memory serves). Still, he definitely has enough to allow the entire 4GB allocation to be filled.. With that said, I work with database applications on a daily basis that take many more gigabytes of memory per instance, so it's in the realm of possibility that this particular Rosetta task has a bit more of an appetite... to put it lightly :D
____________
**38 cores crunching for R@H on behalf of cancercomputer.org - a non-profit supporting High Performance Computing in Cancer Research

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79550 - Posted 15 Feb 2016 22:24:52 UTC - in response to Message ID 79549.

@Eric_Kaiser
I'd say that essentially proves that memory is not the only cause of these tasks ending abnormally. Nice rig!


Since Eric's rig is Windows, even though its the 64 bit OS, the Windows rosetta app is still only 32 bit and thus can only address up to 4GB per instance (if memory serves). Still, he definitely has enough to allow the entire 4GB allocation to be filled.. With that said, I work with database applications on a daily basis that take many more gigabytes of memory per instance, so it's in the realm of possibility that this particular Rosetta task has a bit more of an appetite... to put it lightly :D


Does this mean that minirosetta needs to be recompiled in 64-bit mode so it can handle workunits that need more than 4 GB of memory?

Timo Profile
Avatar

Joined: Jan 9 12
Posts: 171
ID: 440120
Credit: 10,317,960
RAC: 15,380
Message 79552 - Posted 16 Feb 2016 4:19:00 UTC - in response to Message ID 79550.
Last modified: 16 Feb 2016 4:19:08 UTC

@Eric_Kaiser
I'd say that essentially proves that memory is not the only cause of these tasks ending abnormally. Nice rig!


Since Eric's rig is Windows, even though its the 64 bit OS, the Windows rosetta app is still only 32 bit and thus can only address up to 4GB per instance (if memory serves). Still, he definitely has enough to allow the entire 4GB allocation to be filled.. With that said, I work with database applications on a daily basis that take many more gigabytes of memory per instance, so it's in the realm of possibility that this particular Rosetta task has a bit more of an appetite... to put it lightly :D


Does this mean that minirosetta needs to be recompiled in 64-bit mode so it can handle workunits that need more than 4 GB of memory?


Bingo!

rjs5

Joined: Nov 22 10
Posts: 125
ID: 402637
Credit: 3,223,734
RAC: 2,383
Message 79553 - Posted 16 Feb 2016 5:22:17 UTC - in response to Message ID 79552.

@Eric_Kaiser
I'd say that essentially proves that memory is not the only cause of these tasks ending abnormally. Nice rig!


Since Eric's rig is Windows, even though its the 64 bit OS, the Windows rosetta app is still only 32 bit and thus can only address up to 4GB per instance (if memory serves). Still, he definitely has enough to allow the entire 4GB allocation to be filled.. With that said, I work with database applications on a daily basis that take many more gigabytes of memory per instance, so it's in the realm of possibility that this particular Rosetta task has a bit more of an appetite... to put it lightly :D


Does this mean that minirosetta needs to be recompiled in 64-bit mode so it can handle workunits that need more than 4 GB of memory?


Bingo!


I think that the Windows version should be 64-bit BUT it will not fix this problem. IMO, it is a memory leak bug. Code is allocating memory but not freeing it.

Rosetta typically takes about 400mb - 450mb of memory to run. There is no way that a new Rosetta version should require 10x as much memory. It has to be a bug. It is a memory leak or a problem with the code that scans command line arguments for proper combinations.

IF 4gb is not enough memory to run Rosetta, then how much memory should a system have to successfully run 1 copy of Rosetta. My 8-core/16-thread system has 32gb of memory. Is 2gb per Rosetta workload enough? So far it has been.

When a system runs low on memory, it will start paging and performance will slow by 100x and choke your machine.


I compiled Rosetta source to build the default gcc 64-bit Linux version AND THEN added the -m32 option to build an identical 32-bit Linux version. The 64-bit version was 10% to 15% faster than the 32-bit version.








[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 79554 - Posted 16 Feb 2016 10:34:47 UTC - in response to Message ID 79553.

I compiled Rosetta source to build the default gcc 64-bit Linux version AND THEN added the -m32 option to build an identical 32-bit Linux version. The 64-bit version was 10% to 15% faster than the 32-bit version.


In Italy, we say "buttali via" (means approximately "better than nothing"). :-P
____________

Michael H.W. Weber Profile
Avatar

Joined: Sep 18 05
Posts: 8
ID: 394
Credit: 3,327,910
RAC: 11,636
Message 79555 - Posted 16 Feb 2016 15:06:58 UTC

On my systems and those of other team members, all WUs carrying the phrase "backrub" are breaking down with computation errors. Often after having consumed quite some CPU time.

@Baker Lab:
Please take a look at this WU series.
Thanks.

Michael.
____________
Michael H.W. Weber
Chairman and scientific advisor of Rechenkraft.net e.V.

http://www.rechenkraft.net - The world's first and largest distributed computing association. We make those things possible that supercomputers don't.

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79558 - Posted 16 Feb 2016 17:59:14 UTC - in response to Message ID 79555.

On my systems and those of other team members, all WUs carrying the phrase "backrub" are breaking down with computation errors. Often after having consumed quite some CPU time.

@Baker Lab:
Please take a look at this WU series.
Thanks.

Michael.


These are my jobs and I do realize that many of them are failing with memory issues on some platforms. I will definitely look into this. The batch is almost complete so I'm going to let them continue since they are producing results which I'm very interested in. Credit should still be granted for the jobs that fail.

Timo Profile
Avatar

Joined: Jan 9 12
Posts: 171
ID: 440120
Credit: 10,317,960
RAC: 15,380
Message 79566 - Posted 20 Feb 2016 17:06:46 UTC
Last modified: 20 Feb 2016 17:11:40 UTC

Two of my systems have started intermittently falling into 'project backoff' for 10-40 hour periods after getting this message in the logs (If I go and do a manual 'request new tasks' they successfully get more tasks but I noticed because their work queues dry out:


2/20/2016 2:07:54 AM | rosetta@home | Reporting 5 completed tasks
2/20/2016 2:07:54 AM | rosetta@home | Requesting new tasks for CPU
2/20/2016 2:07:57 AM | rosetta@home | Scheduler request completed: got 0 new tasks
2/20/2016 2:07:57 AM | rosetta@home | No work sent
2/20/2016 2:07:57 AM | rosetta@home | Rosetta Mini for Android is not available for your type of computer.
2/20/2016 2:07:57 AM | rosetta@home | Rosetta Mini needs 57220.46 MB RAM but only 6842.83 MB is available for use.


Is this perhaps a result of higher 'memory requirements' attached to some of those jobs? If so, no worries, I'll just keep an eye on it until that batch finishes :)

.. a side note though, the backrub type jobs seem to be completing successfully on my boxes - maybe it's something to do with my target runtime being short (4 hours) and it not getting a chance to chew through so much memory? (Speculation ftw!) If that's the case maybe jobs like this should be limited to a shorter target runtime?

Chilean Profile
Avatar

Joined: Oct 16 05
Posts: 651
ID: 5008
Credit: 10,201,445
RAC: 4,663
Message 79567 - Posted 20 Feb 2016 18:25:12 UTC - in response to Message ID 79566.

Two of my systems have started intermittently falling into 'project backoff' for 10-40 hour periods after getting this message in the logs (If I go and do a manual 'request new tasks' they successfully get more tasks but I noticed because their work queues dry out:


2/20/2016 2:07:54 AM | rosetta@home | Reporting 5 completed tasks
2/20/2016 2:07:54 AM | rosetta@home | Requesting new tasks for CPU
2/20/2016 2:07:57 AM | rosetta@home | Scheduler request completed: got 0 new tasks
2/20/2016 2:07:57 AM | rosetta@home | No work sent
2/20/2016 2:07:57 AM | rosetta@home | Rosetta Mini for Android is not available for your type of computer.
2/20/2016 2:07:57 AM | rosetta@home | Rosetta Mini needs 57220.46 MB RAM but only 6842.83 MB is available for use.


Is this perhaps a result of higher 'memory requirements' attached to some of those jobs? If so, no worries, I'll just keep an eye on it until that batch finishes :)

.. a side note though, the backrub type jobs seem to be completing successfully on my boxes - maybe it's something to do with my target runtime being short (4 hours) and it not getting a chance to chew through so much memory? (Speculation ftw!) If that's the case maybe jobs like this should be limited to a shorter target runtime?


Or you could just... you know... buy 60 gigs of RAM lol

____________

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79569 - Posted 20 Feb 2016 19:23:21 UTC - in response to Message ID 79567.

Two of my systems have started intermittently falling into 'project backoff' for 10-40 hour periods after getting this message in the logs (If I go and do a manual 'request new tasks' they successfully get more tasks but I noticed because their work queues dry out:


2/20/2016 2:07:54 AM | rosetta@home | Reporting 5 completed tasks
2/20/2016 2:07:54 AM | rosetta@home | Requesting new tasks for CPU
2/20/2016 2:07:57 AM | rosetta@home | Scheduler request completed: got 0 new tasks
2/20/2016 2:07:57 AM | rosetta@home | No work sent
2/20/2016 2:07:57 AM | rosetta@home | Rosetta Mini for Android is not available for your type of computer.
2/20/2016 2:07:57 AM | rosetta@home | Rosetta Mini needs 57220.46 MB RAM but only 6842.83 MB is available for use.


Is this perhaps a result of higher 'memory requirements' attached to some of those jobs? If so, no worries, I'll just keep an eye on it until that batch finishes :)

.. a side note though, the backrub type jobs seem to be completing successfully on my boxes - maybe it's something to do with my target runtime being short (4 hours) and it not getting a chance to chew through so much memory? (Speculation ftw!) If that's the case maybe jobs like this should be limited to a shorter target runtime?


Or you could just... you know... buy 60 gigs of RAM lol


I'd do just that for both of my computers if their motherboards could handle more memory. They can't.

fractal

Joined: Dec 12 08
Posts: 2
ID: 292315
Credit: 1,000,245
RAC: 0
Message 79570 - Posted 20 Feb 2016 19:32:45 UTC - in response to Message ID 79566.

Two of my systems have started intermittently falling into 'project backoff' for 10-40 hour periods after getting this message in the logs (If I go and do a manual 'request new tasks' they successfully get more tasks but I noticed because their work queues dry out:


2/20/2016 2:07:54 AM | rosetta@home | Reporting 5 completed tasks
2/20/2016 2:07:54 AM | rosetta@home | Requesting new tasks for CPU
2/20/2016 2:07:57 AM | rosetta@home | Scheduler request completed: got 0 new tasks
2/20/2016 2:07:57 AM | rosetta@home | No work sent
2/20/2016 2:07:57 AM | rosetta@home | Rosetta Mini for Android is not available for your type of computer.
2/20/2016 2:07:57 AM | rosetta@home | Rosetta Mini needs 57220.46 MB RAM but only 6842.83 MB is available for use.


Is this perhaps a result of higher 'memory requirements' attached to some of those jobs? If so, no worries, I'll just keep an eye on it until that batch finishes :)


I found two of my machines in that state this morning and several yesterday.

2/19/2016 5:54:25 PM | rosetta@home | Computation for task rb_11_07_60457_104894__t000__0_C1_beta_nov15_cart_fa_wt_0.40_SAVE_ALL_OUT_IGNORE_THE_REST_327108_852_1 finished
2/19/2016 5:54:25 PM | rosetta@home | Starting task rb_02_18_60756_107222_ab_stage0_t000___robetta_IGNORE_THE_REST_10_15_329934_9_0
2/19/2016 5:54:28 PM | rosetta@home | Started upload of rb_11_07_60457_104894__t000__0_C1_beta_nov15_cart_fa_wt_0.40_SAVE_ALL_OUT_IGNORE_THE_REST_327108_852_1_0
2/19/2016 5:54:33 PM | rosetta@home | Finished upload of rb_11_07_60457_104894__t000__0_C1_beta_nov15_cart_fa_wt_0.40_SAVE_ALL_OUT_IGNORE_THE_REST_327108_852_1_0
2/19/2016 5:56:48 PM | rosetta@home | Sending scheduler request: To report completed tasks.
2/19/2016 5:56:48 PM | rosetta@home | Reporting 1 completed tasks
2/19/2016 5:56:48 PM | rosetta@home | Requesting new tasks for CPU
2/19/2016 5:56:50 PM | rosetta@home | Scheduler request completed: got 0 new tasks
2/19/2016 5:56:50 PM | rosetta@home | No work sent
2/19/2016 5:56:50 PM | rosetta@home | Rosetta Mini needs 57220.46 MB RAM but only 6922.61 MB is available for use.
2/19/2016 5:56:50 PM | rosetta@home | Rosetta Mini for Android is not available for your type of computer.
2/19/2016 5:56:50 PM | rosetta@home | Rosetta Mini needs 9536.74 MB RAM but only 6922.61 MB is available for use.

That machine had 18 hours of backoff when I found it this morning. it still had one work unit running out of four cores.
2/20/2016 3:04:19 AM | rosetta@home | Computation for task foldit_2001101_s003_fold_and_dock_SAVE_ALL_OUT_328024_8728_0 finished
2/20/2016 3:04:19 AM | rosetta@home | Starting task TL_QTS_S_nuc_elbow_0072_0328_0047_0006_0487_0021_0997_0006_1669_0001_1503_0001_fold_SAVE_ALL_OUT_326891_2874_0
2/20/2016 3:04:21 AM | rosetta@home | Started upload of foldit_2001101_s003_fold_and_dock_SAVE_ALL_OUT_328024_8728_0_0
2/20/2016 3:04:26 AM | rosetta@home | Finished upload of foldit_2001101_s003_fold_and_dock_SAVE_ALL_OUT_328024_8728_0_0
2/20/2016 3:24:47 AM | rosetta@home | Computation for task rb_02_17_62203_107217_ab_stage0_t000___robetta_IGNORE_THE_REST_03_09_329939_184_0 finished
2/20/2016 3:24:47 AM | rosetta@home | Starting task FFD__adba9af95181d2f6c2e74c99f922bf95_abinitioDocking_16_02_12_21_37_globalDocking_7_SAVE_ALL_OUT_330008_6_0
2/20/2016 3:24:50 AM | rosetta@home | Started upload of rb_02_17_62203_107217_ab_stage0_t000___robetta_IGNORE_THE_REST_03_09_329939_184_0_0
2/20/2016 3:24:57 AM | rosetta@home | Finished upload of rb_02_17_62203_107217_ab_stage0_t000___robetta_IGNORE_THE_REST_03_09_329939_184_0_0
2/20/2016 4:05:08 AM | rosetta@home | Sending scheduler request: To report completed tasks.
2/20/2016 4:05:08 AM | rosetta@home | Reporting 2 completed tasks
2/20/2016 4:05:08 AM | rosetta@home | Requesting new tasks for CPU
2/20/2016 4:05:11 AM | rosetta@home | Scheduler request completed: got 0 new tasks
2/20/2016 4:05:11 AM | rosetta@home | No work sent
2/20/2016 4:05:11 AM | rosetta@home | Rosetta Mini for Android is not available for your type of computer.
2/20/2016 4:05:11 AM | rosetta@home | Rosetta Mini needs 57220.46 MB RAM but only 5670.64 MB is available for use.

This machine was completely out of work when I found it at the same time with over 24 hours of backoff. It got work as soon as I manually refreshed the project. My priority 0 backup project was not getting work either, but that never seems to work..
2/20/2016 7:10:56 AM | Universe@Home | Sending scheduler request: To report completed tasks.
2/20/2016 7:10:56 AM | Universe@Home | Reporting 1 completed tasks
2/20/2016 7:10:56 AM | Universe@Home | Not requesting tasks: don't need (job cache full)
2/20/2016 7:10:59 AM | Universe@Home | Scheduler request completed


I don't mind not getting a work unit that needs 60 GiB of RAM but please don't refuse to give my meager machine more bite sized work just because of that.

Link
Avatar

Joined: May 4 07
Posts: 260
ID: 173059
Credit: 335,263
RAC: 146
Message 79576 - Posted 21 Feb 2016 12:22:21 UTC - in response to Message ID 79566.

2/20/2016 2:07:57 AM | rosetta@home | Rosetta Mini needs 57220.46 MB RAM but only 6842.83 MB is available for use.


Maybe it's time to remove "mini" from the app name... ;-)

On the serious side, considering that most PCs are still sold with 8GB or less, maybe creating another app name for this type of work would indeed be a good idea, so that only people who have much RAM can activate it in their profile while others won't be stopped from getting work (if that can't be solved in another way).
____________
.

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79577 - Posted 21 Feb 2016 15:32:47 UTC - in response to Message ID 79576.

2/20/2016 2:07:57 AM | rosetta@home | Rosetta Mini needs 57220.46 MB RAM but only 6842.83 MB is available for use.


Maybe it's time to remove "mini" from the app name... ;-)

On the serious side, considering that most PCs are still sold with 8GB or less, maybe creating another app name for this type of work would indeed be a good idea, so that only people who have much RAM can activate it in their profile while others won't be stopped from getting work (if that can't be solved in another way).


I decided to buy another of my favorite brand of computers yesterday. They didn't offer any with more than 32 GB that fit my other requirements.

fractal

Joined: Dec 12 08
Posts: 2
ID: 292315
Credit: 1,000,245
RAC: 0
Message 79578 - Posted 21 Feb 2016 23:07:15 UTC - in response to Message ID 79577.
Last modified: 21 Feb 2016 23:08:46 UTC

2/20/2016 2:07:57 AM | rosetta@home | Rosetta Mini needs 57220.46 MB RAM but only 6842.83 MB is available for use.


Maybe it's time to remove "mini" from the app name... ;-)

On the serious side, considering that most PCs are still sold with 8GB or less, maybe creating another app name for this type of work would indeed be a good idea, so that only people who have much RAM can activate it in their profile while others won't be stopped from getting work (if that can't be solved in another way).


I decided to buy another of my favorite brand of computers yesterday. They didn't offer any with more than 32 GB that fit my other requirements.

You generally need server class hardware to get more than 32 GiB of memory. <begin wry humor>And, since the project shuts you down if you fail for ANY work unit, you need 60 GiB of RAM per core. That's 240 GiB for a quad core. You can get that with AMD Opterons or Intel Xeons using registered ECC RDIM's. This is not a viable approach for most volunteers.<end wry humor>

That aside, I had to manually update 8 stuck machines yesterday. I was about to say that I didn't have to restart any today but just found one on a 20 hour backoff. Fortunately I increased my buffer from a half a day to a full day to give me time to find them before they run dry.

Oh, and why is it called "mini rosetta?" See https://www.rosettacommons.org/content/what-minirosetta

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79579 - Posted 22 Feb 2016 0:03:42 UTC - in response to Message ID 79578.

2/20/2016 2:07:57 AM | rosetta@home | Rosetta Mini needs 57220.46 MB RAM but only 6842.83 MB is available for use.


Maybe it's time to remove "mini" from the app name... ;-)

On the serious side, considering that most PCs are still sold with 8GB or less, maybe creating another app name for this type of work would indeed be a good idea, so that only people who have much RAM can activate it in their profile while others won't be stopped from getting work (if that can't be solved in another way).


I decided to buy another of my favorite brand of computers yesterday. They didn't offer any with more than 32 GB that fit my other requirements.

You generally need server class hardware to get more than 32 GiB of memory. <begin wry humor>And, since the project shuts you down if you fail for ANY work unit, you need 60 GiB of RAM per core. That's 240 GiB for a quad core. You can get that with AMD Opterons or Intel Xeons using registered ECC RDIM's. This is not a viable approach for most volunteers.<end wry humor>

That aside, I had to manually update 8 stuck machines yesterday. I was about to say that I didn't have to restart any today but just found one on a 20 hour backoff. Fortunately I increased my buffer from a half a day to a full day to give me time to find them before they run dry.

Oh, and why is it called "mini rosetta?" See https://www.rosettacommons.org/content/what-minirosetta


I might be able to afford server class hardware, but I don't feel like learning a server operating system - I've already learned enough operating systems. Also, I have rather strong electrical power limitations here.

As for removing mini from minirosetta, it looks like someone doesn't know enough of the history of Rosetta@home to remember that the main application was rosetta a few years ago. Do the want the renamed application to be easily confused with the application of a few years ago?

jjch

Joined: Nov 10 13
Posts: 6
ID: 486414
Credit: 149,279,071
RAC: 429,128
Message 79582 - Posted 22 Feb 2016 5:44:59 UTC

It looks like there are two different things going on here but they may be related.

I have a number of servers and workstations that are being used for CPU and GPU computing. These were recently set to primarily to run rosetta for CPU work to help out that project.

The rosetta Task status shows Ready to report but the Project Status goes to Communication Deferred for multiple hours (ex. 18 hrs) and the server runs dry.

What I am seeing is that the project happily goes along for a while Requesting new tasks for CPU and gets the Scheduler request completed: got 1 task message.

Then after a few hours it gets the Scheduler request completed: got 0 tasks. No work sent. Rosetta Mini for Android is not available for your type of computer.

Finally, the message Rosetta Mini needs 57220.46 MB RAM but only 7363.62 MB is available for use. After that it stops updating. Remaining tasks will continue to upload until it runs out.

Rosetta does not automatically download any more tasks or report any that were finished. You can manually update and get it to reset and start again however it will just run through to the same result in a few hours.

I'm not going to baby sit all of these servers everyday to keep running rosetta. Also, these were purposefully only populated with 8GB memory to save on power and cooling requirements. CPU and GPU computing remember.

Please look into this and provide a resolution soon or I will have to move on to other projects. Let me know if I can be of assistance or provide any more detailed information.

Thanks.




robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79584 - Posted 22 Feb 2016 15:34:07 UTC - in response to Message ID 79582.

It looks like there are two different things going on here but they may be related.

I have a number of servers and workstations that are being used for CPU and GPU computing. These were recently set to primarily to run rosetta for CPU work to help out that project.

The rosetta Task status shows Ready to report but the Project Status goes to Communication Deferred for multiple hours (ex. 18 hrs) and the server runs dry.

What I am seeing is that the project happily goes along for a while Requesting new tasks for CPU and gets the Scheduler request completed: got 1 task message.

Then after a few hours it gets the Scheduler request completed: got 0 tasks. No work sent. Rosetta Mini for Android is not available for your type of computer.

Finally, the message Rosetta Mini needs 57220.46 MB RAM but only 7363.62 MB is available for use. After that it stops updating. Remaining tasks will continue to upload until it runs out.

Rosetta does not automatically download any more tasks or report any that were finished. You can manually update and get it to reset and start again however it will just run through to the same result in a few hours.

I'm not going to baby sit all of these servers everyday to keep running rosetta. Also, these were purposefully only populated with 8GB memory to save on power and cooling requirements. CPU and GPU computing remember.

Please look into this and provide a resolution soon or I will have to move on to other projects. Let me know if I can be of assistance or provide any more detailed information.

Thanks.






It looks like all of your computers run some version of Windows and none of them run Android

jjch

Joined: Nov 10 13
Posts: 6
ID: 486414
Credit: 149,279,071
RAC: 429,128
Message 79593 - Posted 22 Feb 2016 19:50:13 UTC

All of the systems are running Windows, either 2012/R2, 7 or 8.1. There isn't any that have an android emulator either. Had to give up my Linux servers.

There were a couple of these that were left with more than 8GB memory. I am going to check if those also have the same problem.

I will also check if one might already have 64 GB memory or upgrade it and see if it makes any difference.

Timo Profile
Avatar

Joined: Jan 9 12
Posts: 171
ID: 440120
Credit: 10,317,960
RAC: 15,380
Message 79594 - Posted 22 Feb 2016 20:38:19 UTC - in response to Message ID 79593.

All of the systems are running Windows, either 2012/R2, 7 or 8.1. There isn't any that have an android emulator either. Had to give up my Linux servers.

There were a couple of these that were left with more than 8GB memory. I am going to check if those also have the same problem.

I will also check if one might already have 64 GB memory or upgrade it and see if it makes any difference.


I think your (very impressive) fleet of servers is being affected by the same memory allocation messages I posted about (seen as follows in my logs):


2/20/2016 2:07:54 AM | rosetta@home | Reporting 5 completed tasks
2/20/2016 2:07:54 AM | rosetta@home | Requesting new tasks for CPU
2/20/2016 2:07:57 AM | rosetta@home | Scheduler request completed: got 0 new tasks
2/20/2016 2:07:57 AM | rosetta@home | No work sent
2/20/2016 2:07:57 AM | rosetta@home | Rosetta Mini for Android is not available for your type of computer.
2/20/2016 2:07:57 AM | rosetta@home | Rosetta Mini needs 57220.46 MB RAM but only 6842.83 MB is available for use.


The above causes the box to head into 'project standoff' for 20-40 hours. Hoping David sees this thread and can take a peak sooner than later :).

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79595 - Posted 22 Feb 2016 21:00:04 UTC - in response to Message ID 79594.

All of the systems are running Windows, either 2012/R2, 7 or 8.1. There isn't any that have an android emulator either. Had to give up my Linux servers.

There were a couple of these that were left with more than 8GB memory. I am going to check if those also have the same problem.

I will also check if one might already have 64 GB memory or upgrade it and see if it makes any difference.


I think your (very impressive) fleet of servers is being affected by the same memory allocation messages I posted about (seen as follows in my logs):


2/20/2016 2:07:54 AM | rosetta@home | Reporting 5 completed tasks
2/20/2016 2:07:54 AM | rosetta@home | Requesting new tasks for CPU
2/20/2016 2:07:57 AM | rosetta@home | Scheduler request completed: got 0 new tasks
2/20/2016 2:07:57 AM | rosetta@home | No work sent
2/20/2016 2:07:57 AM | rosetta@home | Rosetta Mini for Android is not available for your type of computer.
2/20/2016 2:07:57 AM | rosetta@home | Rosetta Mini needs 57220.46 MB RAM but only 6842.83 MB is available for use.


The above causes the box to head into 'project standoff' for 20-40 hours. Hoping David sees this thread and can take a peak sooner than later :).


thanks for the heads up. I'll track this down and try to fix it on our end.

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79596 - Posted 22 Feb 2016 21:01:41 UTC - in response to Message ID 79593.

All of the systems are running Windows, either 2012/R2, 7 or 8.1. There isn't any that have an android emulator either. Had to give up my Linux servers.

There were a couple of these that were left with more than 8GB memory. I am going to check if those also have the same problem.

I will also check if one might already have 64 GB memory or upgrade it and see if it makes any difference.


Something that MIGHT be worth trying: See if your account settings allow you to turn off Android workunits, since none of your computers run Android instead of Windows.

jjch

Joined: Nov 10 13
Posts: 6
ID: 486414
Credit: 149,279,071
RAC: 429,128
Message 79597 - Posted 22 Feb 2016 23:18:20 UTC

I'm not seeing an option to change that setting in rosetta. It is available on a few other BOINC projects though.

jjch

Joined: Nov 10 13
Posts: 6
ID: 486414
Credit: 149,279,071
RAC: 429,128
Message 79600 - Posted 23 Feb 2016 6:38:51 UTC

Update - Several of the servers that had 0 work left yesterday started up again today and began processing Rosetta tasks. Probably after the communication deferred timer ran out.

Seems that if you manually update the project it triggers the loop but if you leave it alone it might sort it out by itself. There are a few that still are stuck so I can check on those tomorrow.

Several servers already have 32GB memory so those are reporting a similar message with slightly different memory size available.

Also, there are three servers one each with 64, 128 and 256GB of memory. They need patching and BOINC updates to 7.6.22 anyway. When I restart them I will watch how they behave.

Chilean Profile
Avatar

Joined: Oct 16 05
Posts: 651
ID: 5008
Credit: 10,201,445
RAC: 4,663
Message 79607 - Posted 24 Feb 2016 2:02:49 UTC - in response to Message ID 79600.

Update - Several of the servers that had 0 work left yesterday started up again today and began processing Rosetta tasks. Probably after the communication deferred timer ran out.

Seems that if you manually update the project it triggers the loop but if you leave it alone it might sort it out by itself. There are a few that still are stuck so I can check on those tomorrow.

Several servers already have 32GB memory so those are reporting a similar message with slightly different memory size available.

Also, there are three servers one each with 64, 128 and 256GB of memory. They need patching and BOINC updates to 7.6.22 anyway. When I restart them I will watch how they behave.


Not to be nosy, but how do you handle the heat from the servers?
You're pulling over a quarter million of credit per day, that's very impressive!
____________

jjch

Joined: Nov 10 13
Posts: 6
ID: 486414
Credit: 149,279,071
RAC: 429,128
Message 79608 - Posted 24 Feb 2016 2:24:58 UTC

The servers are all in a lab room that has an AC cooling unit but I'm actually close to the limit it will handle. Works pretty well during the winter and cooler months but when the weather gets hot outside I have to throttle them back during that day and only run at night.

If it gets past 90 F I have had to just let them run out of work units and idle. If we get to 100+ F I have had to shut them off and let the weather cool down a bit before starting back up again. Gives me a chance to update things and reset them anyway.

sgaboinc

Joined: Apr 2 14
Posts: 169
ID: 498515
Credit: 125,409
RAC: 0
Message 79609 - Posted 24 Feb 2016 5:22:40 UTC - in response to Message ID 79582.
Last modified: 24 Feb 2016 5:24:08 UTC


What I am seeing is that the project happily goes along for a while Requesting new tasks for CPU and gets the Scheduler request completed: got 1 task message.

Then after a few hours it gets the Scheduler request completed: got 0 tasks. No work sent. Rosetta Mini for Android is not available for your type of computer.

Finally, the message Rosetta Mini needs 57220.46 MB RAM but only 7363.62 MB is available for use. After that it stops updating. Remaining tasks will continue to upload until it runs out.

Rosetta does not automatically download any more tasks or report any that were finished. You can manually update and get it to reset and start again however it will just run through to the same result in a few hours.



actually, i'm wondering if limiting the number of concurrent tasks may help.
for r@h, i normally see the number of tasks running as one task/thread per core. hence it nicely use all 8 cores with 8 tasks/threads (incl HT cores) of my i7 4771 cpu. i'm running on 16 GB of ram in linux.

i've yet to encounter the 'needs xxx MB of RAM' with r@h, but with a different project (atlas@home from cern), the memory requirements are quite huge and i often see only 4 threads / tasks running and hit the memory limit.

coming to think about ram, i think linux and windows o/s are able to utilize swap for virtual memory hence disk space as swap memory if you have allocated sufficient space for that. but for atlas@home, i think the use of virtualbox probably limits what could be swappable.

you may like to see if disk swap spaces may be somewhat tunable in that respects.

the other thing i think has to do with the boinc client itself, i'm thinking an updated or more recent boinc client may possibly resolve some of these issues as what you are seeing is probably a behavior of boinc client rather than r@h

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79615 - Posted 24 Feb 2016 11:53:23 UTC - in response to Message ID 79609.

[snip]
coming to think about ram, i think linux and windows o/s are able to utilize swap for virtual memory hence disk space as swap memory if you have allocated sufficient space for that. but for atlas@home, i think the use of virtualbox probably limits what could be swappable.

you may like to see if disk swap spaces may be somewhat tunable in that respects.


BOINC tasks usually have swapping turned off, in an effort to make them run faster. This means that there is often no effort to make the applications able to stand the address changes caused by swapping something out of memory, and then swapping it back in at a different address because the original address is still in use by some other program.

rjs5

Joined: Nov 22 10
Posts: 125
ID: 402637
Credit: 3,223,734
RAC: 2,383
Message 79617 - Posted 24 Feb 2016 14:13:35 UTC - in response to Message ID 79615.

[snip]
coming to think about ram, i think linux and windows o/s are able to utilize swap for virtual memory hence disk space as swap memory if you have allocated sufficient space for that. but for atlas@home, i think the use of virtualbox probably limits what could be swappable.

you may like to see if disk swap spaces may be somewhat tunable in that respects.


BOINC tasks usually have swapping turned off, in an effort to make them run faster. This means that there is often no effort to make the applications able to stand the address changes caused by swapping something out of memory, and then swapping it back in at a different address because the original address is still in use by some other program.


The OS (Windows, all variants of Linux, MACOS, ... ) provides the program with VIRTUAL memory. The virtual memory is translated into PHYSICAL memory using the TLB translations. A virtual page of memory can get swapped to disk and then be relocated into a different PHYSICAL memory location by setting the TLB entry properly. The executing program does not even know if the page has been swapped out to disk.

The last time I looked, Windows allocated a disk swap file the same size as memory ( C:\pagefile.sys ). You can explicitly set the size of this file, even to 0 bytes .... but when you run low on memory, the OS will kill stuff "Out of Memory".

Virtualbox is just a program in memory that runs on top of your OS and you set the memory size that virtualbox is allowed to use. I usually set virtualbox to be able to use about 50% of my physical memory BUT I have 16gb or more on my systems.

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79624 - Posted 25 Feb 2016 2:44:08 UTC - in response to Message ID 79617.

[snip]
coming to think about ram, i think linux and windows o/s are able to utilize swap for virtual memory hence disk space as swap memory if you have allocated sufficient space for that. but for atlas@home, i think the use of virtualbox probably limits what could be swappable.

you may like to see if disk swap spaces may be somewhat tunable in that respects.


BOINC tasks usually have swapping turned off, in an effort to make them run faster. This means that there is often no effort to make the applications able to stand the address changes caused by swapping something out of memory, and then swapping it back in at a different address because the original address is still in use by some other program.


The OS (Windows, all variants of Linux, MACOS, ... ) provides the program with VIRTUAL memory. The virtual memory is translated into PHYSICAL memory using the TLB translations. A virtual page of memory can get swapped to disk and then be relocated into a different PHYSICAL memory location by setting the TLB entry properly. The executing program does not even know if the page has been swapped out to disk.

The last time I looked, Windows allocated a disk swap file the same size as memory ( C:\pagefile.sys ). You can explicitly set the size of this file, even to 0 bytes .... but when you run low on memory, the OS will kill stuff "Out of Memory".

Virtualbox is just a program in memory that runs on top of your OS and you set the memory size that virtualbox is allowed to use. I usually set virtualbox to be able to use about 50% of my physical memory BUT I have 16gb or more on my systems.



It's hard to get Virtualbox working correctly - once the versions of BOINC available so far have detected that virtualization is not enabled in the BIOS or the UEFI, they will remember this forever and prevent the test of whether it is enabled from being run again.

Also, all of the Virtualbox workunits I've seen much about so far seize 4 GB of physical memory, and won't allow any of it to be paged. I'm hoping that a new version of Virtualbox will remove this restriction.

As far as I know, Virtualbox can handle 32-bit workunits, but not 64-bit workunits.

rjs5

Joined: Nov 22 10
Posts: 125
ID: 402637
Credit: 3,223,734
RAC: 2,383
Message 79630 - Posted 25 Feb 2016 15:00:13 UTC - in response to Message ID 79624.

[snip]
coming to think about ram, i think linux and windows o/s are able to utilize swap for virtual memory hence disk space as swap memory if you have allocated sufficient space for that. but for atlas@home, i think the use of virtualbox probably limits what could be swappable.

you may like to see if disk swap spaces may be somewhat tunable in that respects.


BOINC tasks usually have swapping turned off, in an effort to make them run faster. This means that there is often no effort to make the applications able to stand the address changes caused by swapping something out of memory, and then swapping it back in at a different address because the original address is still in use by some other program.


The OS (Windows, all variants of Linux, MACOS, ... ) provides the program with VIRTUAL memory. The virtual memory is translated into PHYSICAL memory using the TLB translations. A virtual page of memory can get swapped to disk and then be relocated into a different PHYSICAL memory location by setting the TLB entry properly. The executing program does not even know if the page has been swapped out to disk.

The last time I looked, Windows allocated a disk swap file the same size as memory ( C:\pagefile.sys ). You can explicitly set the size of this file, even to 0 bytes .... but when you run low on memory, the OS will kill stuff "Out of Memory".

Virtualbox is just a program in memory that runs on top of your OS and you set the memory size that virtualbox is allowed to use. I usually set virtualbox to be able to use about 50% of my physical memory BUT I have 16gb or more on my systems.



It's hard to get Virtualbox working correctly - once the versions of BOINC available so far have detected that virtualization is not enabled in the BIOS or the UEFI, they will remember this forever and prevent the test of whether it is enabled from being run again.

Also, all of the Virtualbox workunits I've seen much about so far seize 4 GB of physical memory, and won't allow any of it to be paged. I'm hoping that a new version of Virtualbox will remove this restriction.

As far as I know, Virtualbox can handle 32-bit workunits, but not 64-bit workunits.


I use regularly use Virtualbox to build Linux images on machines and none of my comments were about the pre-configured BOINC VIRTUALBOX implementation. I have no experience with BOINC packaged Virtualbox.

I imagine that BOINC projects choose to use the BOINC Virtualbox so they can control the execution environment and quality of data generated very closely. 32-bit only probably makes sense to for BOINC Virtualbox in that case.






[FI] OIKARINEN
Avatar

Joined: Nov 16 13
Posts: 6
ID: 486809
Credit: 131,480
RAC: 0
Message 79669 - Posted 1 Mar 2016 14:15:41 UTC

I've been running the 3.71 version of rosetta for 2 days .. And I just noticed a lot of crashing workunits running on different computers , all of those WUs have this attached :

ERROR: unrecognized residue AX1
ERROR:: Exit from: ..\..\..\src\core\io\pdb\file_data.cc line: 2077
BOINC:: Error reading and gzipping output datafile: default.out
____________
Life is too short to live concerned about its mysteries.

Jim1348

Joined: Jan 19 06
Posts: 65
ID: 52455
Credit: 2,074,859
RAC: 9,392
Message 79674 - Posted 1 Mar 2016 19:15:19 UTC - in response to Message ID 79624.
Last modified: 1 Mar 2016 19:16:18 UTC

It's hard to get Virtualbox working correctly - once the versions of BOINC available so far have detected that virtualization is not enabled in the BIOS or the UEFI, they will remember this forever and prevent the test of whether it is enabled from being run again.

They have a solution to that problem in the Cosmology FAQs:
I enabled VT-x/AMD-v but jobs say “Scheduler wait: Please upgrade BOINC”

Also, all of the Virtualbox workunits I've seen much about so far seize 4 GB of physical memory, and won't allow any of it to be paged. I'm hoping that a new version of Virtualbox will remove this restriction.

I think that just depends on the application. ATLAS and vLHC take a lot of memory, but Cosmology does not that I recall.

I have had some problems with VirtualBox interfering with some other programs (both CPU and GPU, even non-BOINC ones), but not with the VBox programs themselves. I just use the pre-packaged versions on the CERN projects and Cosmology, but they all went easily enough, though you do need to watch the memory. If VBox would be of any use for Rosetta, I would be willing to try it here.
____________

Snagletooth

Joined: Feb 22 07
Posts: 191
ID: 149031
Credit: 1,392,338
RAC: 1,462
Message 79704 - Posted 7 Mar 2016 16:42:12 UTC
Last modified: 7 Mar 2016 16:55:38 UTC

Both of my computers received 24 hour backs after a single request for work resulted in this reply:

Sun Mar 6 03:51:29 2016 | rosetta@home | Rosetta Mini for Android is not available for your type of computer.

When I noticed the problem several hours after the back-off began I simply hit the update button and successfully retrieved new tasks.

*****Wild Speculation Alert*****
If the amount of Android tasks exceeds the number of available devices by too great a number and/or fail at too high a rate then the new tasks/resends could be clogging the queue. As long as there are in fact plenty of cpu tasks to crunch, a 24 hour back-off would seem excessive.


I should add that I only became concerned because I recently reduced my preferred cpu runtime and my cache and set other projects to no new tasks (preparing for a possible imminent shut down of computers for an indeterminate period of time) so this 24 hour back-off actually lead to no tasks crunching at all. Otherwise I might have noticed but not been concerned enough to explore the possible causes or to comment.

I only comment now in the possibility that this back-off interval could be changed to something shorter. I know the project doesn't want a bunch of computer asking every 5 minutes while there's a clog but if it is a predictable clog and you can see how long it typically lasts perhaps you could adjust the back-off accordingly. Would anything longer than the 6 hour default target runtime really be necessary? Although not a big deal in the overall scheme of things, would things run somewhat smoother, on both sides of the connection, if crunchers weren't left to go idle unnecessarily for such a long time?

Best,
Snags

edit: I just saw additional posts in this thread that suggest rosie really did run out of cpu tasks. Ah, well. I suppose I should see if I can find BOINC documentation on the back-off settings (documentation that I could actually understand, that is) : /

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79708 - Posted 7 Mar 2016 20:29:32 UTC - in response to Message ID 79704.

Both of my computers received 24 hour backs after a single request for work resulted in this reply:
[quote]Sun Mar 6 03:51:29 2016 | rosetta@home | Rosetta Mini for Android is not available for your type of computer.

When I noticed the problem several hours after the back-off began I simply hit the update button and successfully retrieved new tasks.

*****Wild Speculation Alert*****
If the amount of Android tasks exceeds the number of available devices by too great a number and/or fail at too high a rate then the new tasks/resends could be clogging the queue. As long as there are in fact plenty of cpu tasks to crunch, a 24 hour back-off would seem excessive.


I should add that I only became concerned because I recently reduced my preferred cpu runtime and my cache and set other projects to no new tasks (preparing for a possible imminent shut down of computers for an indeterminate period of time) so this 24 hour back-off actually lead to no tasks crunching at all. Otherwise I might have noticed but not been concerned enough to explore the possible causes or to comment.

I only comment now in the possibility that this back-off interval could be changed to something shorter. I know the project doesn't want a bunch of computer asking every 5 minutes while there's a clog but if it is a predictable clog and you can see how long it typically lasts perhaps you could adjust the back-off accordingly. Would anything longer than the 6 hour default target runtime really be necessary? Although not a big deal in the overall scheme of things, would things run somewhat smoother, on both sides of the connection, if crunchers weren't left to go idle unnecessarily for such a long

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79709 - Posted 7 Mar 2016 20:31:33 UTC - in response to Message ID 79704.

Both of my computers received 24 hour backs after a single request for work resulted in this reply:
[quote]Sun Mar 6 03:51:29 2016 | rosetta@home | Rosetta Mini for Android is not available for your type of computer.

When I noticed the problem several hours after the back-off began I simply hit the update button and successfully retrieved new tasks.

*****Wild Speculation Alert*****
If the amount of Android tasks exceeds the number of available devices by too great a number and/or fail at too high a rate then the new tasks/resends could be clogging the queue. As long as there are in fact plenty of cpu tasks to crunch, a 24 hour back-off would seem excessive.


I should add that I only became concerned because I recently reduced my preferred cpu runtime and my cache and set other projects to no new tasks (preparing for a possible imminent shut down of computers for an indeterminate period of time) so this 24 hour back-off actually lead to no tasks crunching at all. Otherwise I might have noticed but not been concerned enough to explore the possible causes or to comment.

I only comment now in the possibility that this back-off interval could be changed to something shorter. I know the project doesn't want a bunch of computer asking every 5 minutes while there's a clog but if it is a predictable clog and you can see how long it typically lasts perhaps you could adjust the back-off accordingly. Would anything longer than the 6 hour default target runtime really be necessary? Although not a big deal in the overall scheme of things, would things run somewhat smoother, on both sides of the connection, if crunchers weren't left to go idle unnecessarily for such a long

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79710 - Posted 7 Mar 2016 20:31:57 UTC - in response to Message ID 79704.

Both of my computers received 24 hour backs after a single request for work resulted in this reply:
Sun Mar 6 03:51:29 2016 | rosetta@home | Rosetta Mini for Android is not available for your type of computer.

When I noticed the problem several hours after the back-off began I simply hit the update button and successfully retrieved new tasks.
[snip]


I've seen a similar problem twice. I have an Android device in addition to my Windows devices, but so far I have BOINC installed only on the Windows devices.

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79711 - Posted 7 Mar 2016 20:33:33 UTC - in response to Message ID 79704.

Both of my computers received 24 hour backs after a single request for work resulted in this reply:
[quote]Sun Mar 6 03:51:29 2016 | rosetta@home | Rosetta Mini for Android is not available for your type of computer.

When I noticed the problem several hours after the back-off began I simply hit the update button and successfully retrieved new tasks.

*****Wild Speculation Alert*****
If the amount of Android tasks exceeds the number of available devices by too great a number and/or fail at too high a rate then the new tasks/resends could be clogging the queue. As long as there are in fact plenty of cpu tasks to crunch, a 24 hour back-off would seem excessive.


I should add that I only became concerned because I recently reduced my preferred cpu runtime and my cache and set other projects to no new tasks (preparing for a possible imminent shut down of computers for an indeterminate period of time) so this 24 hour back-off actually lead to no tasks crunching at all. Otherwise I might have noticed but not been concerned enough to explore the possible causes or to comment.

I only comment now in the possibility that this back-off interval could be changed to something shorter. I know the project doesn't want a bunch of computer asking every 5 minutes while there's a clog but if it is a predictable clog and you can see how long it typically lasts perhaps you could adjust the back-off accordingly. Would anything longer than the 6 hour default target runtime really be necessary? Although not a big deal in the overall scheme of things, would things run somewhat smoother, on both sides of the connection, if crunchers weren't left to go idle unnecessarily for such a long

iriemon

Joined: Jan 16 16
Posts: 4
ID: 1218845
Credit: 73,745
RAC: 0
Message 79715 - Posted 8 Mar 2016 15:26:32 UTC

Any news on getting the communication problem fixed? Been sitting here for 2 days without any new work being available.....

iriemon

Joined: Jan 16 16
Posts: 4
ID: 1218845
Credit: 73,745
RAC: 0
Message 79717 - Posted 8 Mar 2016 15:34:11 UTC - in response to Message ID 79715.

Any news on getting the communication problem fixed? Been sitting here for 2 days without any new work being available.....



For some reason, I decided to clear my IE cache and then tried to dl a new work unit and to my surprise IT WORKED! Happily crunching......

Dr. Merkwürdigliebe Profile
Avatar

Joined: Dec 5 10
Posts: 74
ID: 404270
Credit: 2,417,348
RAC: 3,829
Message 79721 - Posted 8 Mar 2016 17:10:06 UTC
Last modified: 8 Mar 2016 17:11:04 UTC

Failed task

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79733 - Posted 8 Mar 2016 21:24:02 UTC - in response to Message ID 79717.

Any news on getting the communication problem fixed? Been sitting here for 2 days without any new work being available.....



For some reason, I decided to clear my IE cache and then tried to dl a new work unit and to my surprise IT WORKED! Happily crunching......


I decided to try that on my Windows 10 computer. Surprise - if Windows 10 even includes IE, it is very well hidden.

I told BOINC Manager to update for Rosetta@home anyway - it downloaded a workunit.

It looks likely that the problem is fixed on the server and IE is not involved.

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 79756 - Posted 15 Mar 2016 20:51:49 UTC

801194890

Starting work on structure: _00002
[2016- 3-15 20:35:13:] :: BOINC:: Initializing ... ok.
[2016- 3-15 20:35:13:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
failed to create shared mem segment: minirosetta Size: 25001672


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0085EEB0 write attempt to address 0x017D7EC1

____________

ArcSedna

Joined: Oct 23 11
Posts: 6
ID: 434280
Credit: 12,613,518
RAC: 71
Message 79792 - Posted 23 Mar 2016 21:23:21 UTC

Some workunits hang up for long hours until manual termination.

They have string like
EN_MAP_hyb_cst
EN_MAP_cst
RE_MAP_hyb_cst
RE_MAP_cst
in the middle of the name.

Sample (Already aborted)

Their behavior is 'do nothing for a long time'. Looks like this:
Elapsed real time : 32 hours
Elapsed cpu time : 15 minutes

This is happening on my Mac computers. Windows and Linux seem to be OK.

OS : Mac OS X 10.11.3
Boinc : 7.2.42
Memory : 8GB to 16GB

Thanks.

James Adrian

Joined: Apr 27 12
Posts: 3
ID: 449796
Credit: 827,915
RAC: 581
Message 79799 - Posted 26 Mar 2016 17:17:12 UTC

Has anyone else gotten work units for Minirosetta 3.71 that are estimated to run 14 days? I'm running on an old (2009) Mac with 8GB of memory and lately I've gotten these here and there.

Thanks

Boinc 7.6.22
Mac OS 10.11.4
____________

James Adrian

Joined: Apr 27 12
Posts: 3
ID: 449796
Credit: 827,915
RAC: 581
Message 79800 - Posted 26 Mar 2016 17:51:16 UTC - in response to Message ID 79792.

ArcSedna,

I just saw your post, once I sorted to see newest first. My problem seems slightly different but like you I see the problem with work units named as in your post. One other observation: I have a newer Mac laptop but so far I have not seen the problem with the work units on it, just on my older iMac.
____________

rjs5

Joined: Nov 22 10
Posts: 125
ID: 402637
Credit: 3,223,734
RAC: 2,383
Message 79801 - Posted 27 Mar 2016 0:21:12 UTC - in response to Message ID 79799.

Has anyone else gotten work units for Minirosetta 3.71 that are estimated to run 14 days? I'm running on an old (2009) Mac with 8GB of memory and lately I've gotten these here and there.

Thanks

Boinc 7.6.22
Mac OS 10.11.4


Rosetta appear OK.

I just set my Rosetta PREFERENCES: CPU TARGET RUNTIME = 14 hours, enabled Rosetta computing on one of my Linux 64-bit systems and Rosetta downloaded 50 14-hour jobs. I think the only difference in a default 6-hour job and a 14-hour job is what Rosetta sets in the "-cpu_run_time 21600" as a command line option. I don't think Rosetta jobs care what system they execute on .... MACOS, Windows, Linux86/64.

I think ALL Rosetta jobs are set up to just try 99 decoys. The cpu_run_time limit set by the user is checked after every decoy. If that time is exceeded, Rosetta wraps up the job and it is completed early, before the 99 decoy limit is reached.

NOTE: THIS is one reason why it is very, very tough to compare system performances. Jobs will only stop before the preference time IF and ONLY IF they complete the 99 decoys.

I would guess that your event log is showing some problem with DISK SPACE available for Rosetta. BOINC has 3 possible limits on disk and I always seem to hit them accidently:

1. maximum amount used
2. amount to leave free
3. maximum % of disk


SAMPLE command line will only differ in the leading OS name and is added by the Rosetta server when it dispatches the job to a system.

command: minirosetta_3.71_x86_64-apple-darwin -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:native 00001.pdb -silent_gz 1 -frag9 00001.200.9mers -out:file:silent default.out -ex1 1 -abinitio::rsd_wt_loop 0.5 -relax::default_repeats 15 -abinitio::use_filters false -abinitio::increase_cycles 10 -abinitio::rsd_wt_helix 0.5 -abinitio::rg_reweight 0.5 -in:file:boinc_wu_zip NTF2_215_N90N92K61_4_9_1_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000
-cpu_run_time 21600
-checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 3415526


James Adrian

Joined: Apr 27 12
Posts: 3
ID: 449796
Credit: 827,915
RAC: 581
Message 79805 - Posted 27 Mar 2016 16:27:07 UTC - in response to Message ID 79801.

rls5 thanks for all the info!

I checked the logs and didn't find any errors and prefs show 6 hours as you mentioned below. If it happens again I'll wait for the 6 hour mark, just so I can see what happens. (:-)


Rosetta appear OK.

I just set my Rosetta PREFERENCES: CPU TARGET RUNTIME = 14 hours, enabled Rosetta computing on one of my Linux 64-bit systems and Rosetta downloaded 50 14-hour jobs. I think the only difference in a default 6-hour job and a 14-hour job is what Rosetta sets in the "-cpu_run_time 21600" as a command line option. I don't think Rosetta jobs care what system they execute on .... MACOS, Windows, Linux86/64.

I think ALL Rosetta jobs are set up to just try 99 decoys. The cpu_run_time limit set by the user is checked after every decoy. If that time is exceeded, Rosetta wraps up the job and it is completed early, before the 99 decoy limit is reached.

NOTE: THIS is one reason why it is very, very tough to compare system performances. Jobs will only stop before the preference time IF and ONLY IF they complete the 99 decoys.

I would guess that your event log is showing some problem with DISK SPACE available for Rosetta. BOINC has 3 possible limits on disk and I always seem to hit them accidently:

1. maximum amount used
2. amount to leave free
3. maximum % of disk


SAMPLE command line will only differ in the leading OS name and is added by the Rosetta server when it dispatches the job to a system.

command: minirosetta_3.71_x86_64-apple-darwin -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:native 00001.pdb -silent_gz 1 -frag9 00001.200.9mers -out:file:silent default.out -ex1 1 -abinitio::rsd_wt_loop 0.5 -relax::default_repeats 15 -abinitio::use_filters false -abinitio::increase_cycles 10 -abinitio::rsd_wt_helix 0.5 -abinitio::rg_reweight 0.5 -in:file:boinc_wu_zip NTF2_215_N90N92K61_4_9_1_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000
-cpu_run_time 21600
-checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 3415526




____________

Snagletooth

Joined: Feb 22 07
Posts: 191
ID: 149031
Credit: 1,392,338
RAC: 1,462
Message 79806 - Posted 27 Mar 2016 17:48:48 UTC - in response to Message ID 79801.



I think ALL Rosetta jobs are set up to just try 99 decoys. The cpu_run_time limit set by the user is checked after every decoy. If that time is exceeded, Rosetta wraps up the job and it is completed early, before the 99 decoy limit is reached.

... Jobs will only stop before the preference time IF and ONLY IF they complete the 99 decoys.

Not all. If you check any FFD_ tasks in your list you will see they generate many hundreds of models (I have several with over 1000 models generated).

If memory serves, the 99 model limit was enacted when some tasks created output files too large to be uploaded. The limit only applies to a particular type of task. Others use the preferred cpu time plus 4 method to determine when to end things. When a model is completed the task calculates whether it has time left to complete another model. If the answer is no then the task wraps things up despite there appearing (to the cruncher) hours left. if the answer is yes the tasks will begin another model. All models aren't equal however, even within the same task so some will take longer than predicted. To insure that otherwise good models aren't cut short just before completing (and to increase the odds that the task will complete at least one model) the task will continue past the preferred cpu time. At some point though, you gotta cut your losses and so at preferred cpu time plus 4 hours the watchdog cuts bait and the task goes home. ( I'm curious about the average overtime; my totally uninformed guess is that it's less than an hour.)

There are other types of tasks in which filters are employed to cut off models early. If the model passes the filter it will continue working on that one task to the end. This results in dramatically disparate counts, with one task generating hundreds of models while another task from the same batch only generating one, two, five, etc. Recently on ralph a filter was used to remove models resulting in a file transfer error upon upload. The stderr out listed 13 models from 2 attempts but since the models had been erased the file meant to contain them didn't exist. I'm guessing, based on DEK's post, which I may well have misinterpreted, that the server, possibly as part of a validation check, automatically gives the file transfer error (client error, compute error) when this particular file isn't part of the upload.

All these different strategies result, from the cruncher's point of view, in varied behavior which we struggle to interpret. Is it a problem with my computer or a problem with rosetta? Is it a problem at all? BOINC is complicated enough for the computer savvy, much more so for majority of crunchers who just want to maximize their participation in rosetta and end up massively tangled up in the BOINC settings. The variety of legitimate behaviors exhibited by rosetta tasks trips up the volunteers trying to help them become untangled. From the researcher' point of view everything may look fine, working as expected, and any issues a lone cruncher is having is most likely due to their particular set up. And it probably is, but the lack of information leaves the volunteers flailing.

I have long wished for a reference, a database of tasks, in which the tasks are divided into broad categories of strategies employed (as above, which some info on how they "look " to the crunchers) and what, in a most basic way, is being asked (how does this particular protein fold, how do these two proteins interact, can we create a new protein to do x, etc.)

Best,
Snags

krypton
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Nov 16 11
Posts: 105
ID: 436004
Credit: 1,796,601
RAC: 1,887
Message 79807 - Posted 28 Mar 2016 0:55:42 UTC

Thanks for the report you guys!

I'm responsible for the *MAP* jobs. I'm getting 90% success, which is "normal", but if it turns out that part of the 10% that fail are coming from mac(s), we could fix this!

I'll do some local tests on my mac.

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 79809 - Posted 28 Mar 2016 8:32:49 UTC

Hi krypton.

I've had 9 of your tasks fail on one rig just today so far, all with the same error like so.

https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1622019

P76481_PF12034_90-575_300-486_EN_MAP_hyb_cst_v02_i01_t000__krypton_SAVE_ALL_OUT_03_09_341621_123_0

<core_client_version>7.0.27</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>

(EDITED OUT THE REST)

Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_f513f38.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/P76481_PF12034_90-575_300-486_EN_MAP_hyb_cst_v02_i01_t000__krypton.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 14400
SIGSEGV: segmentation violation
Stack trace (22 frames):
[0xd98d38f]
[0xb7766404]
[0xb6d9849]
[0xb8ef314]
[0xb8f1a90]
[0xb8f4b33]
[0xb90ae55]
[0xb7ecda9]
[0xb8cebea]
[0xc2ff844]
[0xc31427f]
[0xabe3c0b]
[0x8d92b93]
[0xb04b065]
[0xb05021c]
[0xb0f6a35]
[0xb0f959e]
[0xb1b8bc3]
[0xb1b524d]
[0x8057071]
[0xda24988]
[0x8048131]

Exiting...

</stderr_txt>

____________


Cap

Joined: Aug 29 11
Posts: 3
ID: 428675
Credit: 3,619,613
RAC: 3,284
Message 79812 - Posted 28 Mar 2016 18:47:57 UTC

I have had several tasks fail and some of them fail to exit leaving a boinc slot in use but no processing being done. Those I had to force quit. They all have an error message from malloc saying that a free was done on a block that was not allocated or a block was corrupted after being freed. Seems that the app is using a block after it has been freed.

I don't know why boinc isn't cleaning up after some these.

krypton
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Nov 16 11
Posts: 105
ID: 436004
Credit: 1,796,601
RAC: 1,887
Message 79814 - Posted 28 Mar 2016 20:13:13 UTC
Last modified: 28 Mar 2016 21:44:12 UTC

The error turned out to be related to a new rotamer library we are using, which I happened to enable for the *MAP* jobs. I confirmed on my mac, appears to only happen on macs (and some older linux machines).

I currently have no more jobs in the queue. For all future jobs I'll be reverting to the older rotamer library until the error is fixed! Thanks for the examples, it was helpful in debugging.

Update:
I just submitted a new batch of jobs *REDO_MAP*, if you get any errors from these, please report!

Thanks,
-krypton

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79826 - Posted 31 Mar 2016 18:16:25 UTC

I just updated the minirosetta app to 3.73. This version includes new protocols, including the remodel protocol for design, and various bug fixes. It uses the latest Rosetta source.

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 79828 - Posted 1 Apr 2016 5:44:19 UTC

Seems that the memory problem of 3.72 is not completely resolved:
806102354
806102335
____________

Chilean Profile
Avatar

Joined: Oct 16 05
Posts: 651
ID: 5008
Credit: 10,201,445
RAC: 4,663
Message 79840 - Posted 4 Apr 2016 16:21:15 UTC

Maybe instead of the blue, make the graphics "fit" the more common widescreen configuration. That way there won't be any "blank space" on the right of modern monitors.
____________

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 79846 - Posted 5 Apr 2016 6:32:44 UTC

Forgot to mention, I added a project specific option for the black screen (like before).

Chilean Profile
Avatar

Joined: Oct 16 05
Posts: 651
ID: 5008
Credit: 10,201,445
RAC: 4,663
Message 79863 - Posted 9 Apr 2016 12:14:38 UTC - in response to Message ID 79846.

Forgot to mention, I added a project specific option for the black screen (like before).


Oh. Thanks!
____________

Mark Kramer

Joined: Jun 25 10
Posts: 5
ID: 384934
Credit: 74,534
RAC: 0
Message 79918 - Posted 24 Apr 2016 1:15:02 UTC
Last modified: 24 Apr 2016 1:16:16 UTC

Mine seems to be locking up since Friday 4/22/16. I run both SETI and Rosetta but each time it's locked up, it's been stuck on Rosetta. It's running on an XP machine so I don't know if that's the issue. System logs don't show anything other than events stop happening. I've already uninstalled as well as completely wipe the BOINC folder then reinstalled. I have noticed in Task Manager that Minirosetta has recently been starting itself when I've been using the computer despite my preferences.

Since I haven't seen any other entries on the BOINC/SETI/Rosetta board, I presume that it is this computer or XP. As it stands, I'm going to remove BOINC for awhile since the issues with it just turning itself on while the computer is in use as well as it locking up have become too much to ignore. However, if this is happening to anyone else and they worked around it, feel free to post. I also wasn't sure which thread to post this in so I posted it in the top 2 threads concerning technical problems.

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79919 - Posted 24 Apr 2016 2:45:37 UTC - in response to Message ID 79918.

Mine seems to be locking up since Friday 4/22/16. I run both SETI and Rosetta but each time it's locked up, it's been stuck on Rosetta. It's running on an XP machine so I don't know if that's the issue. System logs don't show anything other than events stop happening. I've already uninstalled as well as completely wipe the BOINC folder then reinstalled. I have noticed in Task Manager that Minirosetta has recently been starting itself when I've been using the computer despite my preferences.

Since I haven't seen any other entries on the BOINC/SETI/Rosetta board, I presume that it is this computer or XP. As it stands, I'm going to remove BOINC for awhile since the issues with it just turning itself on while the computer is in use as well as it locking up have become too much to ignore. However, if this is happening to anyone else and they worked around it, feel free to post. I also wasn't sure which thread to post this in so I posted it in the top 2 threads concerning technical problems.


Does the SETI@HOME work use a graphics card with an Nvidia GPU? If so, the 364.* series of drivers for Nvidia GPUs has problems, so you might want to check whether going back to the 362.00 driver fixes any problems for you.

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,056,972
RAC: 5,479
Message 79931 - Posted 25 Apr 2016 16:11:48 UTC

I'm seeing the same issue that's been reported above: an intermittent failure to get new workunits accompanied by this message in the event log.

Rosetta Mini for Android is not available for your type of computer.

Do Network Communication successfully reports the task.

I'm running 2 R@h workunits/Ubuntu 14.04 LTS/Boinc 7.2.42/No workbuffer

LarryMajor

Joined: Apr 1 16
Posts: 4
ID: 1278313
Credit: 1,616,161
RAC: 7,072
Message 79941 - Posted 26 Apr 2016 19:50:58 UTC

Same problem still exists today.
It happened to two of my machines, running 32 and 64 bit Linux 3.16.0-4. Forcing an update reports/fetches jobs and clears the 24 hour wait time, and reports normally for about 12 hours or so, when the cycle repeats.

Mark Kramer

Joined: Jun 25 10
Posts: 5
ID: 384934
Credit: 74,534
RAC: 0
Message 79942 - Posted 26 Apr 2016 20:20:39 UTC - in response to Message ID 79919.

Mine seems to be locking up since Friday 4/22/16. I run both SETI and Rosetta but each time it's locked up, it's been stuck on Rosetta. It's running on an XP machine so I don't know if that's the issue. System logs don't show anything other than events stop happening. I've already uninstalled as well as completely wipe the BOINC folder then reinstalled. I have noticed in Task Manager that Minirosetta has recently been starting itself when I've been using the computer despite my preferences.

Since I haven't seen any other entries on the BOINC/SETI/Rosetta board, I presume that it is this computer or XP. As it stands, I'm going to remove BOINC for awhile since the issues with it just turning itself on while the computer is in use as well as it locking up have become too much to ignore. However, if this is happening to anyone else and they worked around it, feel free to post. I also wasn't sure which thread to post this in so I posted it in the top 2 threads concerning technical problems.


Does the SETI@HOME work use a graphics card with an Nvidia GPU? If so, the 364.* series of drivers for Nvidia GPUs has problems, so you might want to check whether going back to the 362.00 driver fixes any problems for you.


It does but, as it's a 9600GT, the highest driver is a 340.22. As I was running 337.88, I updated and then tried again. It has now outright crashed twice when I was running other programs. (Starcraft 2 when I was redoing graphics settings and SWTOR just now.) Because both crashes completely locked up the system to the point of needing a reboot, I couldn't check task manager to see if Minirosetta had started itself again. Reviewing the logs under admin tools didn't show me anything either.

I've uninstalled BOINC again and I'm just going to run this computer as normal for the next two days. If it crashes again during that time, then I'll know that it's something else wrong with the computer. If it doesn't, then I'm probably going to lean towards it being an XP/older graphics card conflict with BOINC.

Mark Kramer

Joined: Jun 25 10
Posts: 5
ID: 384934
Credit: 74,534
RAC: 0
Message 79944 - Posted 27 Apr 2016 0:04:20 UTC - in response to Message ID 79942.

Mine seems to be locking up since Friday 4/22/16. I run both SETI and Rosetta but each time it's locked up, it's been stuck on Rosetta. It's running on an XP machine so I don't know if that's the issue. System logs don't show anything other than events stop happening. I've already uninstalled as well as completely wipe the BOINC folder then reinstalled. I have noticed in Task Manager that Minirosetta has recently been starting itself when I've been using the computer despite my preferences.

Since I haven't seen any other entries on the BOINC/SETI/Rosetta board, I presume that it is this computer or XP. As it stands, I'm going to remove BOINC for awhile since the issues with it just turning itself on while the computer is in use as well as it locking up have become too much to ignore. However, if this is happening to anyone else and they worked around it, feel free to post. I also wasn't sure which thread to post this in so I posted it in the top 2 threads concerning technical problems.


Does the SETI@HOME work use a graphics card with an Nvidia GPU? If so, the 364.* series of drivers for Nvidia GPUs has problems, so you might want to check whether going back to the 362.00 driver fixes any problems for you.


It does but, as it's a 9600GT, the highest driver is a 340.22. As I was running 337.88, I updated and then tried again. It has now outright crashed twice when I was running other programs. (Starcraft 2 when I was redoing graphics settings and SWTOR just now.) Because both crashes completely locked up the system to the point of needing a reboot, I couldn't check task manager to see if Minirosetta had started itself again. Reviewing the logs under admin tools didn't show me anything either.

I've uninstalled BOINC again and I'm just going to run this computer as normal for the next two days. If it crashes again during that time, then I'll know that it's something else wrong with the computer. If it doesn't, then I'm probably going to lean towards it being an XP/older graphics card conflict with BOINC.


Follow-up: The outright crashes seem to have been caused by the driver upgrade so I reverted it back to 337.88. That doesn't resolve the problem with minirosetta locking up or starting despite preferences but I know it wasn't causing outright crashes.

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,056,972
RAC: 5,479
Message 79950 - Posted 27 Apr 2016 16:43:43 UTC

I'm seeing a lot of robetta tasks crash immediately : the relevant line in the task log seems to be:

ERROR: Unable to open weights/patch file. None of (./)beta_cart or (./)beta_cart.wts or minirosetta_database/scoring/weights/beta_cart or minirosetta_database/scoring/weights/beta_cart.wts exist
ERROR:: Exit from: src/core/scoring/ScoreFunction.cc line: 2884
[0x4485e82]

Sample tasks:

815715361
815591894

Boinc 7.2.42
Ubuntu 14.04

tortuga1

Joined: Oct 16 08
Posts: 1
ID: 284174
Credit: 734,150
RAC: 0
Message 79951 - Posted 27 Apr 2016 21:39:36 UTC - in response to Message ID 79950.

I'm seeing a lot of robetta tasks crash immediately : the relevant line in the task log seems to be:

ERROR: Unable to open weights/patch file. None of (./)beta_cart or (./)beta_cart.wts or minirosetta_database/scoring/weights/beta_cart or minirosetta_database/scoring/weights/beta_cart.wts exist
ERROR:: Exit from: src/core/scoring/ScoreFunction.cc line: 2884
[0x4485e82]

Sample tasks:

815715361
815591894

Boinc 7.2.42
Ubuntu 14.04

rjs5

Joined: Nov 22 10
Posts: 125
ID: 402637
Credit: 3,223,734
RAC: 2,383
Message 79953 - Posted 27 Apr 2016 23:19:20 UTC - in response to Message ID 79950.

I'm seeing a lot of robetta tasks crash immediately : the relevant line in the task log seems to be:

ERROR: Unable to open weights/patch file. None of (./)beta_cart or (./)beta_cart.wts or minirosetta_database/scoring/weights/beta_cart or minirosetta_database/scoring/weights/beta_cart.wts exist
ERROR:: Exit from: src/core/scoring/ScoreFunction.cc line: 2884
[0x4485e82]

Sample tasks:

815715361
815591894

Boinc 7.2.42
Ubuntu 14.04


I think the error happens earlier than where you have pointed. The actual error is happening right at startup where Boinc is initializing the "slot" directory.

The source code seems to point to a missing *.xml file that it is expecting which has the CDATA string and Rosetta then prints the "message".

My guess would be .... out of disk space or a malformed rb* job.

You might do an "ldd" on the rosetta graphics binary "minirosetta_graphics_3.73_x86_64-pc-linux-gnu" and make sure that it finds all the dynamic libraries.

A "not found" would be a problem.
sh-4.3$ ldd minirosetta_graphics_3.73_x86_64-pc-linux-gnu
linux-vdso.so.1 (0x00007ffd343c0000)
libGLU.so.1 => /lib64/libGLU.so.1 (0x00007fdcfa617000)
libGL.so.1 => /lib64/libGL.so.1 (0x00007fdcfa37f000)




YOUR JOB....

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255) <<<<<<<<<<<<<<<<<<<<<< THE ACTUAL ERROR
</message>
<stderr_txt>
[2016- 4-27 8: 5:32:] :: BOINC:: Initializing ... ok.
[2016- 4-27 8: 5:32:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
command: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_3.73_x86_64-pc-linux-gnu @flags_rb_04_26_64750_109092__t000__ab_robetta -in:file:boinc_wu_zip input_rb_04_26_64750_109092__t000__ab_robetta.zip -in:file:fasta t000_.fasta -frag3 t000_.200.3mers.index.gz -fragB t000_.200.3mers.index.gz -fragA t000_.200.9mers.index.gz -nstruct 10000 -cpu_run_time 10800 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 1109089
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok

ERROR: Unable to open weights/patch file. None of (./)beta_cart or (./)beta_cart.wts or minirosetta_database/scoring/weights/beta_cart or minirosetta_database/scoring/weights/beta_cart.wts exist
ERROR:: Exit from: src/core/scoring/ScoreFunction.cc line: 2884
[0x4485e82]
[0x3457456]
[0x34579d8]
[0x346a47c]




An rb* from one of my systems.

Task ID 814246795
Name rb_04_23_64912_108999__t000__ab_robetta_IGNORE_THE_REST_347209_9056_0



<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
[2016- 4-24 13:38:42:] :: BOINC:: Initializing ... ok.
[2016- 4-24 13:38:42:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
command: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_3.73_x86_64-pc-linux-gnu @rb_04_22_64328_108946_ab_stage0_t000___robetta_FLAGS -psipred_ss2 t000_.psipred_ss2 -in::file::fasta t000_.fasta -kill_hairpins t000_.nobuformat.psipred_ss2 -in:file:boinc_wu_zip rb_04_22_64328_108946_ab_stage0_t000___robetta.zip -frag3 rb_04_22_64328_108946_ab_stage0_t000___robetta_t000_.200.3mers.index.gz -fragA rb_04_22_64328_108946_ab_stage0_t000___robetta_t000_.200.5mers.index.gz -fragB rb_04_22_64328_108946_ab_stage0_t000___robetta_t000_.200.3mers.index.gz -nstruct 10000 -cpu_run_time 10800 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 2103181
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_d0bf94b.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/rb_04_22_64328_108946_ab_stage0_t000___robetta.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: _00001
# cpu_run_time_pref: 86400
Starting work on structure: _00002
Starting work on structure: _00003
Starting work on structure: _00004

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,056,972
RAC: 5,479
Message 79954 - Posted 28 Apr 2016 0:28:32 UTC - in response to Message ID 79953.

Thanks for the suggestions.

I don't think it's disk space: I'm using 1.5GB out of 100GB available to Boinc.

In fact I don't think it's anything local to my machine at all since those tasks were given out again to a couple of wingmen: they failed in what seems to be the same fashion.

(I tried the ldd command you suggested: output below)

svincent@svincent-desktop:~/BOINC/projects/boinc.bakerlab.org_rosetta$ ldd minirosetta_graphics_3.73_x86_64-pc-linux-gnu
linux-vdso.so.1 => (0x00007ffc460a5000)
libGLU.so.1 => /usr/lib/x86_64-linux-gnu/libGLU.so.1 (0x00007f102c488000)
libGL.so.1 => /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1 (0x00007f102c222000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f102bf1e000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f102bc18000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f102ba02000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f102b63d000)
libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f102b308000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f102b0ea000)
libglapi.so.0 => /usr/lib/x86_64-linux-gnu/libglapi.so.0 (0x00007f102aec3000)
libXext.so.6 => /usr/lib/x86_64-linux-gnu/libXext.so.6 (0x00007f102acb1000)
libXdamage.so.1 => /usr/lib/x86_64-linux-gnu/libXdamage.so.1 (0x00007f102aaae000)
libXfixes.so.3 => /usr/lib/x86_64-linux-gnu/libXfixes.so.3 (0x00007f102a8a8000)
libX11-xcb.so.1 => /usr/lib/x86_64-linux-gnu/libX11-xcb.so.1 (0x00007f102a6a6000)
libxcb-glx.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-glx.so.0 (0x00007f102a48f000)
libxcb-dri2.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-dri2.so.0 (0x00007f102a28a000)
libxcb-dri3.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-dri3.so.0 (0x00007f102a087000)
libxcb-present.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-present.so.0 (0x00007f1029e84000)
libxcb-sync.so.1 => /usr/lib/x86_64-linux-gnu/libxcb-sync.so.1 (0x00007f1029c7e000)
libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f1029a5f000)
libxshmfence.so.1 => /usr/lib/x86_64-linux-gnu/libxshmfence.so.1 (0x00007f102985d000)
libXxf86vm.so.1 => /usr/lib/x86_64-linux-gnu/libXxf86vm.so.1 (0x00007f1029657000)
libdrm.so.2 => /usr/lib/x86_64-linux-gnu/libdrm.so.2 (0x00007f102944a000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1029246000)
/lib64/ld-linux-x86-64.so.2 (0x00007f102c6f6000)
libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f1029042000)
libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f1028e3c000)

rjs5

Joined: Nov 22 10
Posts: 125
ID: 402637
Credit: 3,223,734
RAC: 2,383
Message 79955 - Posted 28 Apr 2016 2:31:45 UTC - in response to Message ID 79954.

Thanks for the suggestions.

I don't think it's disk space: I'm using 1.5GB out of 100GB available to Boinc.

In fact I don't think it's anything local to my machine at all since those tasks were given out again to a couple of wingmen: they failed in what seems to be the same fashion.

(I tried the ldd command you suggested: output below)

svincent@svincent-desktop:~/BOINC/projects/boinc.bakerlab.org_rosetta$ ldd minirosetta_graphics_3.73_x86_64-pc-linux-gnu
linux-vdso.so.1 => (0x00007ffc460a5000)
libGLU.so.1 => /usr/lib/x86_64-linux-gnu/libGLU.so.1 (0x00007f102c488000)
libGL.so.1 => /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1 (0x00007f102c222000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f102bf1e000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f102bc18000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f102ba02000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f102b63d000)
libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f102b308000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f102b0ea000)
libglapi.so.0 => /usr/lib/x86_64-linux-gnu/libglapi.so.0 (0x00007f102aec3000)
libXext.so.6 => /usr/lib/x86_64-linux-gnu/libXext.so.6 (0x00007f102acb1000)
libXdamage.so.1 => /usr/lib/x86_64-linux-gnu/libXdamage.so.1 (0x00007f102aaae000)
libXfixes.so.3 => /usr/lib/x86_64-linux-gnu/libXfixes.so.3 (0x00007f102a8a8000)
libX11-xcb.so.1 => /usr/lib/x86_64-linux-gnu/libX11-xcb.so.1 (0x00007f102a6a6000)
libxcb-glx.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-glx.so.0 (0x00007f102a48f000)
libxcb-dri2.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-dri2.so.0 (0x00007f102a28a000)
libxcb-dri3.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-dri3.so.0 (0x00007f102a087000)
libxcb-present.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-present.so.0 (0x00007f1029e84000)
libxcb-sync.so.1 => /usr/lib/x86_64-linux-gnu/libxcb-sync.so.1 (0x00007f1029c7e000)
libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f1029a5f000)
libxshmfence.so.1 => /usr/lib/x86_64-linux-gnu/libxshmfence.so.1 (0x00007f102985d000)
libXxf86vm.so.1 => /usr/lib/x86_64-linux-gnu/libXxf86vm.so.1 (0x00007f1029657000)
libdrm.so.2 => /usr/lib/x86_64-linux-gnu/libdrm.so.2 (0x00007f102944a000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1029246000)
/lib64/ld-linux-x86-64.so.2 (0x00007f102c6f6000)
libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f1029042000)
libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f1028e3c000)


Looks fine. I think you are probably correct.

1StepO Profile

Joined: Dec 23 14
Posts: 2
ID: 1030169
Credit: 182,248
RAC: 139
Message 79957 - Posted 28 Apr 2016 11:38:16 UTC - in response to Message ID 79931.

I'm seeing the same issue that's been reported above: an intermittent failure to get new workunits accompanied by this message in the event log.

Rosetta Mini for Android is not available for your type of computer.

Do Network Communication successfully reports the task.

I'm running 2 R@h workunits/Ubuntu 14.04 LTS/Boinc 7.2.42/No workbuffer

I just discovered the same thing in a machine I just assembled. I have a hunch while I am waiting for the Volunteer at BOINC to get back to me: My box has only one 'new' component: the SSD. I purchased an economy SSD at Amazon which works perfectly otherwise. This is just a hunch. Could the memory chips in the SSD somehow be marked as for use in a cellphone? Forgive me if that is stupid. All my other components have crunched Rosetta before (all the other parts I used in this build.) I also have NO gripes pending on this SSD.

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79958 - Posted 28 Apr 2016 12:26:50 UTC - in response to Message ID 79957.

I'm seeing the same issue that's been reported above: an intermittent failure to get new workunits accompanied by this message in the event log.

Rosetta Mini for Android is not available for your type of computer.

Do Network Communication successfully reports the task.

I'm running 2 R@h workunits/Ubuntu 14.04 LTS/Boinc 7.2.42/No workbuffer

I just discovered the same thing in a machine I just assembled. I have a hunch while I am waiting for the Volunteer at BOINC to get back to me: My box has only one 'new' component: the SSD. I purchased an economy SSD at Amazon which works perfectly otherwise. This is just a hunch. Could the memory chips in the SSD somehow be marked as for use in a cellphone? Forgive me if that is stupid. All my other components have crunched Rosetta before (all the other parts I used in this build.) I also have NO gripes pending on this SSD.


What operating system did you see this under? Windows, Linux, Android, or something else?

I've had many Rosetta Mini workunits complete on one of my computers, which came with an SSD as its main drive. It uses Windows 10.

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 79975 - Posted 30 Apr 2016 3:01:11 UTC - in response to Message ID 79958.

I'm seeing the same issue that's been reported above: an intermittent failure to get new workunits accompanied by this message in the event log.

Rosetta Mini for Android is not available for your type of computer.

Do Network Communication successfully reports the task.

I'm running 2 R@h workunits/Ubuntu 14.04 LTS/Boinc 7.2.42/No workbuffer

I just discovered the same thing in a machine I just assembled. I have a hunch while I am waiting for the Volunteer at BOINC to get back to me: My box has only one 'new' component: the SSD. I purchased an economy SSD at Amazon which works perfectly otherwise. This is just a hunch. Could the memory chips in the SSD somehow be marked as for use in a cellphone? Forgive me if that is stupid. All my other components have crunched Rosetta before (all the other parts I used in this build.) I also have NO gripes pending on this SSD.


What operating system did you see this under? Windows, Linux, Android, or something else?

I've had many Rosetta Mini workunits complete on one of my computers, which came with an SSD as its main drive. It uses Windows 10.

It's happening everywhere, Win7 and no SSD here. Manually updating often fixes it, but in the meantime Boinc puts in a 24hr delay before retrying, which is very annoying if you don't notice.

____________

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79976 - Posted 30 Apr 2016 3:48:15 UTC - in response to Message ID 79975.

I'm seeing the same issue that's been reported above: an intermittent failure to get new workunits accompanied by this message in the event log.

Rosetta Mini for Android is not available for your type of computer.

Do Network Communication successfully reports the task.

I'm running 2 R@h workunits/Ubuntu 14.04 LTS/Boinc 7.2.42/No workbuffer

I just discovered the same thing in a machine I just assembled. I have a hunch while I am waiting for the Volunteer at BOINC to get back to me: My box has only one 'new' component: the SSD. I purchased an economy SSD at Amazon which works perfectly otherwise. This is just a hunch. Could the memory chips in the SSD somehow be marked as for use in a cellphone? Forgive me if that is stupid. All my other components have crunched Rosetta before (all the other parts I used in this build.) I also have NO gripes pending on this SSD.


What operating system did you see this under? Windows, Linux, Android, or something else?

I've had many Rosetta Mini workunits complete on one of my computers, which came with an SSD as its main drive. It uses Windows 10.

It's happening everywhere, Win7 and no SSD here. Manually updating often fixes it, but in the meantime Boinc puts in a 24hr delay before retrying, which is very annoying if you don't notice.


Something you might want to try: Check if restarting Win7, with no updates, fixes the problem, with a possible exception for the 24hr delay.

Also, you might want to check for a problem I've seen in my last three completed workunits: For two of them, the application apparently completed with no problem seen. But then, the upload process for the output files gave a compute error of the upload error type. For the third one, a wingmate had this upload error, but then my computer got the workunit, completed it properly, then uploaded the output properly, then had one more validated workunit. The computer appears to about 5 hours into a 24hr delay; the other 19 hours should be more than enough to finish the other three Rosetta@home task it currently has.

sinspin

Joined: Jan 30 06
Posts: 25
ID: 55456
Credit: 5,865,793
RAC: 714
Message 79980 - Posted 30 Apr 2016 7:46:24 UTC

I have the same problem here on my Win7 Ultimate, no updates / changes from my side. Everything works fine up to now.

30.04.2016 11:19:55 | rosetta@home | Sending scheduler request: To fetch work.
30.04.2016 11:19:55 | rosetta@home | Requesting new tasks for CPU and NVIDIA GPU and Intel GPU
30.04.2016 11:19:58 | rosetta@home | Scheduler request completed: got 0 new tasks
30.04.2016 11:19:58 | rosetta@home | No work sent
30.04.2016 11:19:58 | rosetta@home | Rosetta Mini for Android is not available for your type of computer.

Wtf, since when have i a Android? What comes next? A Alien? A Zombie?
____________

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 79982 - Posted 30 Apr 2016 12:05:29 UTC - in response to Message ID 79980.

I have the same problem here on my Win7 Ultimate, no updates / changes from my side. Everything works fine up to now.

30.04.2016 11:19:55 | rosetta@home | Sending scheduler request: To fetch work.
30.04.2016 11:19:55 | rosetta@home | Requesting new tasks for CPU and NVIDIA GPU and Intel GPU
30.04.2016 11:19:58 | rosetta@home | Scheduler request completed: got 0 new tasks
30.04.2016 11:19:58 | rosetta@home | No work sent
30.04.2016 11:19:58 | rosetta@home | Rosetta Mini for Android is not available for your type of computer.

Wtf, since when have i a Android? What comes next? A Alien? A Zombie?


Looks like someone with access to the source code for the Windows application should add a check at the end for whether the operating system setting still indicates Windows.

sinspin

Joined: Jan 30 06
Posts: 25
ID: 55456
Credit: 5,865,793
RAC: 714
Message 79983 - Posted 30 Apr 2016 13:51:52 UTC

There are no changes at my side. No new Boinc version, etc.
I think they have made something wrong at the past server downtime. Maybe some checking which kind of tasks can be send to a certain client.

My computers list at my profile shows the right information about my machines.
____________

Link
Avatar

Joined: May 4 07
Posts: 260
ID: 173059
Credit: 335,263
RAC: 146
Message 79985 - Posted 30 Apr 2016 22:45:49 UTC - in response to Message ID 79975.


I just discovered the same thing in a machine I just assembled. I have a hunch while I am waiting for the Volunteer at BOINC to get back to me: My box has only one 'new' component: the SSD. I purchased an economy SSD at Amazon which works perfectly otherwise. This is just a hunch. Could the memory chips in the SSD somehow be marked as for use in a cellphone? Forgive me if that is stupid. All my other components have crunched Rosetta before (all the other parts I used in this build.) I also have NO gripes pending on this SSD.


What operating system did you see this under? Windows, Linux, Android, or something else?

I've had many Rosetta Mini workunits complete on one of my computers, which came with an SSD as its main drive. It uses Windows 10.

It's happening everywhere, Win7 and no SSD here. Manually updating often fixes it, but in the meantime Boinc puts in a 24hr delay before retrying, which is very annoying if you don't notice.

And here it happened few times already on WinXP (and HDD). It must be a server side thing, since it's the server that's considering the possibility of sending Android stuff to a Windows PC.

And no, it can't have anything to do with the chips on SSD. I strongly doubt even Windows would know, if they were originaly produced for smartphones (I doubt that, since completely other requirements), the OS just sees an SSD or HDD, it knows pretty much nothing about what's inside the drive since it does not need to know it.

And even if we consider this highly unlikely possibility, that you have a SSD with smartphone chips AND Windows knows it, this information is for sure not passed to the servers (sched_request xml human readable, you can see there what information about your system is passed to the server). All the server gets is <platform_name>windows_intelx86</platform_name> and based on that it should send the application.
____________
.

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 79987 - Posted 1 May 2016 4:34:23 UTC - in response to Message ID 79976.

I'm seeing the same issue that's been reported above: an intermittent failure to get new workunits accompanied by this message in the event log.

Rosetta Mini for Android is not available for your type of computer.

Do Network Communication successfully reports the task.

I'm running 2 R@h workunits/Ubuntu 14.04 LTS/Boinc 7.2.42/No workbuffer

I just discovered the same thing in a machine I just assembled. I have a hunch while I am waiting for the Volunteer at BOINC to get back to me: My box has only one 'new' component: the SSD. I purchased an economy SSD at Amazon which works perfectly otherwise. This is just a hunch. Could the memory chips in the SSD somehow be marked as for use in a cellphone? Forgive me if that is stupid. All my other components have crunched Rosetta before (all the other parts I used in this build.) I also have NO gripes pending on this SSD.


What operating system did you see this under? Windows, Linux, Android, or something else?

I've had many Rosetta Mini workunits complete on one of my computers, which came with an SSD as its main drive. It uses Windows 10.

It's happening everywhere, Win7 and no SSD here. Manually updating often fixes it, but in the meantime Boinc puts in a 24hr delay before retrying, which is very annoying if you don't notice.


Something you might want to try: Check if restarting Win7, with no updates, fixes the problem, with a possible exception for the 24hr delay.

If I'm on-site I'll usually notice at the time. Trouble is, I'm away for half of every week, so there's no reliable way to give it a kick or know if my whole system goes down
____________

LarryMajor

Joined: Apr 1 16
Posts: 4
ID: 1278313
Credit: 1,616,161
RAC: 7,072
Message 79988 - Posted 1 May 2016 6:35:49 UTC

Getting this on both of my Linux 3.16.0-4 machines:

Sun 01 May 2016 02:21:40 AM EDT | rosetta@home | Sending scheduler request: To report completed tasks.
Sun 01 May 2016 02:21:40 AM EDT | rosetta@home | Reporting 12 completed tasks
Sun 01 May 2016 02:21:40 AM EDT | rosetta@home | Requesting new tasks for CPU
Sun 01 May 2016 02:21:47 AM EDT | rosetta@home | Scheduler request completed: got 0 new tasks
Sun 01 May 2016 02:21:47 AM EDT | rosetta@home | No work sent
Sun 01 May 2016 02:21:47 AM EDT | rosetta@home | Rosetta Mini for Android is not available for your type of computer.

Jim1348

Joined: Jan 19 06
Posts: 65
ID: 52455
Credit: 2,074,859
RAC: 9,392
Message 79992 - Posted 1 May 2016 12:25:16 UTC - in response to Message ID 79987.
Last modified: 1 May 2016 12:44:37 UTC

If I'm on-site I'll usually notice at the time. Trouble is, I'm away for half of every week, so there's no reliable way to give it a kick or know if my whole system goes down

I have been gone since 20 April, and my first (of two) Haswell machines running Rosetta, both on Win7 64-bit, went down that very day, including the GPU running Einstein. The other machine went down on 29 April, which was running POEM on the GPUs. I like Rosetta for its science, but they have a lot of experimenters doing various types of work, and it is therefore not the ultimate in reliability, if I may say so.
____________

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 79994 - Posted 1 May 2016 13:04:01 UTC

As i said in another threads, if admins don't update the server...
____________

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 79999 - Posted 2 May 2016 1:44:05 UTC - in response to Message ID 79992.
Last modified: 2 May 2016 1:58:37 UTC

If I'm on-site I'll usually notice at the time. Trouble is, I'm away for half of every week, so there's no reliable way to give it a kick or know if my whole system goes down

I have been gone since 20 April, and my first (of two) Haswell machines running Rosetta, both on Win7 64-bit, went down that very day, including the GPU running Einstein. The other machine went down on 29 April, which was running POEM on the GPUs. I like Rosetta for its science, but they have a lot of experimenters doing various types of work, and it is therefore not the ultimate in reliability, if I may say so.

If only I was concerned about Rosetta's stability. After claiming 6 months ago I was going to stop fiddling with my PC's overclock, I've been at it again, adding a further 165MHz. I /think/ I'm stable enough to keep crunching throughout my half-week absences, but never quite know for sure until I get back home.

My previous efforts
Now at 18.0 Multiplier x 243.8MHz FSB = 4379.6MHz compared to 4214.6 back then
____________

Jim1348

Joined: Jan 19 06
Posts: 65
ID: 52455
Credit: 2,074,859
RAC: 9,392
Message 80000 - Posted 2 May 2016 12:50:02 UTC - in response to Message ID 79999.

If only I was concerned about Rosetta's stability. After claiming 6 months ago I was going to stop fiddling with my PC's overclock, I've been at it again, adding a further 165MHz. I /think/ I'm stable enough to keep crunching throughout my half-week absences, but never quite know for sure until I get back home.

I don't overclock either the CPUs or GPUs, and I am speculating a bit that the problem is Rosetta. But my other Haswell machine, and three Ivy Bridge machines, have no problems. They are not running Rosetta either, only Einstein, WCG, CPDN and Folding. Normally all my machines can run for months without problems. I have noticed anomalies with Rosetta before, but never quite had a smoking gun, but I think this is pretty much it.

____________

googloo
Avatar

Joined: Sep 15 06
Posts: 105
ID: 112667
Credit: 5,900,527
RAC: 7,817
Message 80003 - Posted 3 May 2016 10:42:07 UTC

Happening again on my Windows 7 Professional computer. The biggest problem is that BOINC manager puts a 24-hour "Communication deferred" on my computer, and I run out of Rosetta tasks if I don't manually update. Please do something about this. Nothing has changed on my computer.

5/3/2016 6:15:47 AM | rosetta@home | Sending scheduler request: To fetch work.
5/3/2016 6:15:47 AM | rosetta@home | Requesting new tasks for CPU and Intel GPU
5/3/2016 6:15:49 AM | rosetta@home | Scheduler request completed: got 0 new tasks
5/3/2016 6:15:49 AM | rosetta@home | No work sent
5/3/2016 6:15:49 AM | rosetta@home | Rosetta Mini for Android is not available for your type of computer.

____________

AMDave

Joined: Dec 16 05
Posts: 32
ID: 38208
Credit: 3,006,038
RAC: 6,051
Message 80004 - Posted 3 May 2016 14:24:35 UTC - in response to Message ID 80003.

Happening again on my Windows 7 Professional computer. The biggest problem is that BOINC manager puts a 24-hour "Communication deferred" on my computer, and I run out of Rosetta tasks if I don't manually update. Please do something about this. Nothing has changed on my computer.

5/3/2016 6:15:47 AM | rosetta@home | Sending scheduler request: To fetch work.
5/3/2016 6:15:47 AM | rosetta@home | Requesting new tasks for CPU and Intel GPU
5/3/2016 6:15:49 AM | rosetta@home | Scheduler request completed: got 0 new tasks
5/3/2016 6:15:49 AM | rosetta@home | No work sent
5/3/2016 6:15:49 AM | rosetta@home | Rosetta Mini for Android is not available for your type of computer.

Is there an ETA for the resolution of this issue?

Chilean Profile
Avatar

Joined: Oct 16 05
Posts: 651
ID: 5008
Credit: 10,201,445
RAC: 4,663
Message 80005 - Posted 3 May 2016 14:49:16 UTC

+1 on the Android thing.
____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3378
ID: 106194
Credit: 0
RAC: 0
Message 80007 - Posted 3 May 2016 16:19:20 UTC - in response to Message ID 80004.


Is there an ETA for the resolution of this issue?


As I understand it, the issue is that the project briefly exhausted it's supply of work available for non-Android devices. When only Android tasks are found in the scheduler, and you are coming in with a non-Android host, it reports the message "Rosetta Mini for Android is not available for your type of computer."

This problem is resolved in ebbs and flows as large number of new hosts are coming in to the system. The tasks that make new work available are running continuously, but seem to hit periods of time where they still are barely keeping up with the incoming requests for work.
____________
Rosetta Moderator: Mod.Sense

AMDave

Joined: Dec 16 05
Posts: 32
ID: 38208
Credit: 3,006,038
RAC: 6,051
Message 80008 - Posted 3 May 2016 17:02:47 UTC - in response to Message ID 80007.
Last modified: 3 May 2016 17:06:14 UTC


Is there an ETA for the resolution of this issue?


As I understand it, the issue is that the project briefly exhausted it's supply of work available for non-Android devices. When only Android tasks are found in the scheduler, and you are coming in with a non-Android host, it reports the message "Rosetta Mini for Android is not available for your type of computer."

This problem is resolved in ebbs and flows as large number of new hosts are coming in to the system. The tasks that make new work available are running continuously, but seem to hit periods of time where they still are barely keeping up with the incoming requests for work.

Ok. I was concerned that it was a software or hardware malfunction somewhere in the pipeline (Rosetta's end or crunchers' end). How frequently is the Server Status page updated? Presently, there are 434,384 results listed as "Ready to send," and according to here, there are 147,921 Active Users. What is the default back off time for communicating with Rosetta's servers in such cases? It appears to be 24hrs.

Going forward, is it possible to have some notice indicating when such an occurrence takes place (ex. Rosetta's homepage, BOINC Notices tab)? When this happens with other projects, the following lines appear in the BOINC Event Log:

Sending scheduler request
Requesting new tasks for CPU
Scheduler request completed: no new tasks available

googloo
Avatar

Joined: Sep 15 06
Posts: 105
ID: 112667
Credit: 5,900,527
RAC: 7,817
Message 80010 - Posted 3 May 2016 19:48:46 UTC - in response to Message ID 80008.


Is there an ETA for the resolution of this issue?


As I understand it, the issue is that the project briefly exhausted it's supply of work available for non-Android devices. When only Android tasks are found in the scheduler, and you are coming in with a non-Android host, it reports the message "Rosetta Mini for Android is not available for your type of computer."

This problem is resolved in ebbs and flows as large number of new hosts are coming in to the system. The tasks that make new work available are running continuously, but seem to hit periods of time where they still are barely keeping up with the incoming requests for work.

Ok. I was concerned that it was a software or hardware malfunction somewhere in the pipeline (Rosetta's end or crunchers' end). How frequently is the Server Status page updated? Presently, there are 434,384 results listed as "Ready to send," and according to here, there are 147,921 Active Users. What is the default back off time for communicating with Rosetta's servers in such cases? It appears to be 24hrs.

Going forward, is it possible to have some notice indicating when such an occurrence takes place (ex. Rosetta's homepage, BOINC Notices tab)? When this happens with other projects, the following lines appear in the BOINC Event Log:

Sending scheduler request
Requesting new tasks for CPU
Scheduler request completed: no new tasks available


Indeed, it is the 24-hour back off time that is the problem, then. Can this be adjusted?

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 80011 - Posted 3 May 2016 20:22:24 UTC - in response to Message ID 80007.


Is there an ETA for the resolution of this issue?


As I understand it, the issue is that the project briefly exhausted it's supply of work available for non-Android devices. When only Android tasks are found in the scheduler, and you are coming in with a non-Android host, it reports the message "Rosetta Mini for Android is not available for your type of computer."

This problem is resolved in ebbs and flows as large number of new hosts are coming in to the system. The tasks that make new work available are running continuously, but seem to hit periods of time where they still are barely keeping up with the incoming requests for work.


Could the number of tasks ready to send be divided by what host types they are suitable for, so that users can easily tell when only Android tasks are found in the scheduler?

Also, could thing be adjusted so that when no tasks are available for the type of computer requesting them, but many are available for other types of computers, the delay is set much below 24 hours?

rjs5

Joined: Nov 22 10
Posts: 125
ID: 402637
Credit: 3,223,734
RAC: 2,383
Message 80012 - Posted 3 May 2016 22:12:54 UTC - in response to Message ID 80011.
Last modified: 3 May 2016 22:13:58 UTC


Is there an ETA for the resolution of this issue?


As I understand it, the issue is that the project briefly exhausted it's supply of work available for non-Android devices. When only Android tasks are found in the scheduler, and you are coming in with a non-Android host, it reports the message "Rosetta Mini for Android is not available for your type of computer."

This problem is resolved in ebbs and flows as large number of new hosts are coming in to the system. The tasks that make new work available are running continuously, but seem to hit periods of time where they still are barely keeping up with the incoming requests for work.


Could the number of tasks ready to send be divided by what host types they are suitable for, so that users can easily tell when only Android tasks are found in the scheduler?

Also, could thing be adjusted so that when no tasks are available for the type of computer requesting them, but many are available for other types of computers, the delay is set much below 24 hours?




I think that a task can be sent to any machine and the "Android" message is a bogus message from the Rosetta servers. I think that the Android message really means that there are temporarily no tasks ready to be sent to any client.


The researcher submits some text files that contain parameters and they submit the run time COMMAND LINE parameters. This information is wrapped up with the current database and passed to ANY Rosetta cruncher.

Sample list of Rosetta files containing the personality data:

051207_1a19A.fasta: ASCII text
051207_1a19A.psipred_ss2: ASCII text
051207_1a19.pdb: ASCII text
051207_cc1a19A03_05.200_v1_3: ASCII text
051207_cc1a19A09_05.200_v1_3: ASCII text


first couple lines of each file ....

head 051207_*
==> 051207_1a19A.fasta <==
>1a19A
KKAVINGEQIRSISDLHQTLKKELALPEYYGENLDALWDCLTGWVEYPLVLEWRQFEQSKQLTENGAESVLQVFREAKAEGADITIILS

==> 051207_1a19A.psipred_ss2 <==
# PSIPRED VFORMAT (PSIPRED V2.5)

1 K C 0.997 0.000 0.026
2 K E 0.032 0.004 0.928
3 A E 0.010 0.007 0.979
4 V E 0.005 0.009 0.950
5 I E 0.012 0.008 0.957
6 N E 0.053 0.006 0.941
7 G C 0.563 0.347 0.097
8 E H 0.346 0.641 0.060

==> 051207_1a19.pdb <==
ATOM 1 N LYS A 1 99.864 52.581 -5.099 1.00 52.69 N
ATOM 2 CA LYS A 1 98.880 51.736 -5.841 1.00 51.62 C
ATOM 3 C LYS A 1 97.862 51.097 -4.890 1.00 49.92 C
ATOM 4 O LYS A 1 96.658 51.274 -5.048 1.00 49.38 O
ATOM 5 CB LYS A 1 99.614 50.652 -6.636 1.00 52.27 C
ATOM 6 CG LYS A 1 99.215 50.600 -8.104 1.00 53.15 C
ATOM 7 CD LYS A 1 98.997 49.163 -8.582 1.00 52.28 C
ATOM 8 CE LYS A 1 97.824 48.483 -7.860 1.00 53.06 C
ATOM 9 NZ LYS A 1 96.666 48.171 -8.765 1.00 49.66 N
ATOM 10 N LYS A 2 98.344 50.359 -3.898 1.00 50.17 N

==> 051207_cc1a19A03_05.200_v1_3 <==
position: 1 neighbors: 200

1j8r A 68 K L -88.240 -13.689 178.802
1j8r A 69 K E -147.474 144.764 177.366
1j8r A 70 V E -141.217 139.561 179.273

1u2c A 190 K L -112.104 -21.212 176.747
1u2c A 191 K L -135.377 132.197 177.515
1u2c A 192 V L -92.662 112.751 -175.484


==> 051207_cc1a19A09_05.200_v1_3 <==
position: 1 neighbors: 200

1ikp A 13 K L -107.435 -59.471 178.170
1ikp A 14 A E -163.795 139.246 178.105
1ikp A 15 C E -147.647 156.673 175.685
1ikp A 16 V E -121.410 108.831 -175.851
1ikp A 17 L E -92.091 125.443 175.803
1ikp A 18 D E -79.170 121.221 -177.139
1ikp A 19 L L -109.923 1.413 -175.419
1ikp A 20 K L -72.749 -23.385 -175.155

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 80015 - Posted 4 May 2016 2:14:10 UTC - in response to Message ID 80000.

If only I was concerned about Rosetta's stability. After claiming 6 months ago I was going to stop fiddling with my PC's overclock, I've been at it again, adding a further 165MHz. I /think/ I'm stable enough to keep crunching throughout my half-week absences, but never quite know for sure until I get back home.

I don't overclock either the CPUs or GPUs, and I am speculating a bit that the problem is Rosetta. But my other Haswell machine, and three Ivy Bridge machines, have no problems. They are not running Rosetta either, only Einstein, WCG, CPDN and Folding. Normally all my machines can run for months without problems. I have noticed anomalies with Rosetta before, but never quite had a smoking gun, but I think this is pretty much it.

I run AMD rather than Intel - no idea if that makes a difference.When I ran only a mild overclock I would run for months at a time without a reboot. It's only when I've gone to the most extreme levels I sometimes get lockups.
____________

1StepO Profile

Joined: Dec 23 14
Posts: 2
ID: 1030169
Credit: 182,248
RAC: 139
Message 80019 - Posted 4 May 2016 16:08:33 UTC - in response to Message ID 79931.

I'm seeing the same issue that's been reported above: an intermittent failure to get new workunits accompanied by this message in the event log.

Rosetta Mini for Android is not available for your type of computer.

Do Network Communication successfully reports the task.

I'm running 2 R@h workunits/Ubuntu 14.04 LTS/Boinc 7.2.42/No workbuffer


I just discovered: ANDROID is a 32 bit operating system. Is it possible that Rosetta@Home cannot provide for a 32bit operating system? My computer is 32bit Linux -- I could easily load in the 64bit and try again. What does one think? :^)

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 80021 - Posted 4 May 2016 17:21:25 UTC - in response to Message ID 80019.

I'm seeing the same issue that's been reported above: an intermittent failure to get new workunits accompanied by this message in the event log.

Rosetta Mini for Android is not available for your type of computer.

Do Network Communication successfully reports the task.

I'm running 2 R@h workunits/Ubuntu 14.04 LTS/Boinc 7.2.42/No workbuffer


I just discovered: ANDROID is a 32 bit operating system. Is it possible that Rosetta@Home cannot provide for a 32bit operating system? My computer is 32bit Linux -- I could easily load in the 64bit and try again. What does one think? :^)


I've been seeing this problem under 64 bit Windows, so don't expect 64 bits alone to be a cure.

Link
Avatar

Joined: May 4 07
Posts: 260
ID: 173059
Credit: 335,263
RAC: 146
Message 80024 - Posted 5 May 2016 15:42:40 UTC - in response to Message ID 80010.
Last modified: 5 May 2016 15:43:38 UTC

Indeed, it is the 24-hour back off time that is the problem, then. Can this be adjusted?

Well, you can adjust your cache size to something like 4-6 days, than you should not run out of work during that 24 hours.
____________
.

googloo
Avatar

Joined: Sep 15 06
Posts: 105
ID: 112667
Credit: 5,900,527
RAC: 7,817
Message 80026 - Posted 5 May 2016 21:36:10 UTC - in response to Message ID 80024.

Indeed, it is the 24-hour back off time that is the problem, then. Can this be adjusted?

Well, you can adjust your cache size to something like 4-6 days, than you should not run out of work during that 24 hours.


I don't run out of work since I support two other BOINC projects. I run out of Rosetta tasks unless I manually update Rosetta. It's not disastrous, as Rosetta will eventually catch up; it's just inefficient and annoying. That is, once I start getting Rosetta tasks again, the other two projects' tasks get suspended while Rosetta catches up, using memory and storage for those suspended tasks.
____________

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 80040 - Posted 7 May 2016 23:16:32 UTC

Is my CPU not strong enough for the current tasks that have been running since the middle of April? My granted credit is running 50 or so points under the claimed credit. My average credit has dropped something like 500 points since the middle of April.

Can someone have a look at my cpu info and my stats and tell me whats going on?
Rosie used to be nice to me, but now she is being mean.

rjs5

Joined: Nov 22 10
Posts: 125
ID: 402637
Credit: 3,223,734
RAC: 2,383
Message 80045 - Posted 8 May 2016 18:04:32 UTC - in response to Message ID 80040.
Last modified: 8 May 2016 18:14:54 UTC

Is my CPU not strong enough for the current tasks that have been running since the middle of April? My granted credit is running 50 or so points under the claimed credit. My average credit has dropped something like 500 points since the middle of April.

Can someone have a look at my cpu info and my stats and tell me whats going on?
Rosie used to be nice to me, but now she is being mean.



Your CPU is fine. If you are running other projects too, it might be related to the interaction between Rosetta and that job. Rosetta is compiled with aggressive inlining and has a big code footprint. If other work has a big code footprint too, they fight each other for CODE cache and cache misses make the CPU run LESS EFFICIENT than the Whetstone benchmark executed at start up.

I have noticed that Rosetta is EXTREMELY sensitive to what is executing while Rosetta is crunching jobs. I was running my test binaries 5 times sequentially while the machine was running standard Rosetta and WorldGrid jobs. My Rosetta binaries generated consistent runtimes.

I added PrimeGrid tasks and my Rosetta test binary execution times tripled. When I turned off PrimeGrid, my Rosetta run times return to the expected values.


I am seeing similar results. Rosetta grades on a "curve" and is a "tough teacher". 8-)

Results on one YOUR recent jobs shows that it got 63% or requested credit.
Validate state Valid
Claimed credit 168.277828702931
Granted credit 106.485820803471
application version 3.73


A recent results from MY Broadwell 8C/16T microserver: Xeon(R) CPU D-1540 @ 2.00GHz got 60% of requested credit.
Validate state Valid
Claimed credit 635.933767006863
Granted credit 385.737768644956
application version 3.73

A recent result from MY IvyBridge i7: i7-3770K CPU @ 3.50GHz got 40% of requested credit.
Validate state Valid
Claimed credit 773.073241462645
Granted credit 355.890423420893
application version 3.73



icc.o3.mtune.axcoreavx2.m32.50.12345/nohup.out:user 2725.38
icc.o3.mtune.axcoreavx2.m32.50.12345/nohup.out:user 2685.47
icc.o3.mtune.axcoreavx2.m32.50.12345/nohup.out:user 2747.27
icc.o3.mtune.axcoreavx2.m32.50.12345/nohup.out:user 2654.61
icc.o3.mtune.axcoreavx2.m32.50.12345/nohup.out:user 2828.59

icc.o3.mtune.axcoreavx2.m64.50.12345/nohup.out:user 2809.78
icc.o3.mtune.axcoreavx2.m64.50.12345/nohup.out:user 2563.33
icc.o3.mtune.axcoreavx2.m64.50.12345/nohup.out:user 2718.20
icc.o3.mtune.axcoreavx2.m64.50.12345/nohup.out:user 2725.10
icc.o3.mtune.axcoreavx2.m64.50.12345/nohup.out:user 2709.36

icc.o3.mtune.axcoreavxi.m32.50.12345/nohup.out:user 3434.77
icc.o3.mtune.axcoreavxi.m32.50.12345/nohup.out:user 3477.16
icc.o3.mtune.axcoreavxi.m32.50.12345/nohup.out:user 3519.45
icc.o3.mtune.axcoreavxi.m32.50.12345/nohup.out:user 3369.62
icc.o3.mtune.axcoreavxi.m32.50.12345/nohup.out:user 3458.30
icc.o3.mtune.axcoreavxi.m64.50.12345/nohup.out:user 3354.62

icc.o3.mtune.axcoreavxi.m64.50.12345/nohup.out:user 4174.92
icc.o3.mtune.axcoreavxi.m64.50.12345/nohup.out:user 5313.84 <<< I STARTED running primegrid
icc.o3.mtune.axcoreavxi.m64.50.12345/nohup.out:user 10563.18 <<< Rosetta runtime triples!
icc.o3.mtune.axcoreavxi.m64.50.12345/nohup.out:user 11973.17

icc.o3.mtune.axsse42.m32.50.12345/nohup.out:user 9171.31
icc.o3.mtune.axsse42.m32.50.12345/nohup.out:user 8060.61 <<< I STOPPED running primegrid
icc.o3.mtune.axsse42.m32.50.12345/nohup.out:user 3013.09 <<<< NORMAL runtimes
icc.o3.mtune.axsse42.m32.50.12345/nohup.out:user 2881.54
icc.o3.mtune.axsse42.m32.50.12345/nohup.out:user 2883.69

icc.o3.mtune.axsse42.m64.50.12345/nohup.out:user 2832.93
icc.o3.mtune.axsse42.m64.50.12345/nohup.out:user 2707.10
icc.o3.mtune.axsse42.m64.50.12345/nohup.out:user 2800.13
icc.o3.mtune.axsse42.m64.50.12345/nohup.out:user 2970.67
icc.o3.mtune.axsse42.m64.50.12345/nohup.out:user 3067.31

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 80047 - Posted 8 May 2016 22:36:16 UTC - in response to Message ID 80045.

You think that VHC could be interfering? They both seem stuck on low average credit and VHC runs on 24 time slots. You can not alter the run time on that project.

Since I have been on Rosetta longer than VHC, I may have to drop VHC.
I was trying it because I wanted to see how virtual box worked.

Is my CPU not strong enough for the current tasks that have been running since the middle of April? My granted credit is running 50 or so points under the claimed credit. My average credit has dropped something like 500 points since the middle of April.

Can someone have a look at my cpu info and my stats and tell me whats going on?
Rosie used to be nice to me, but now she is being mean.



Your CPU is fine. If you are running other projects too, it might be related to the interaction between Rosetta and that job. Rosetta is compiled with aggressive inlining and has a big code footprint. If other work has a big code footprint too, they fight each other for CODE cache and cache misses make the CPU run LESS EFFICIENT than the Whetstone benchmark executed at start up.

I have noticed that Rosetta is EXTREMELY sensitive to what is executing while Rosetta is crunching jobs. I was running my test binaries 5 times sequentially while the machine was running standard Rosetta and WorldGrid jobs. My Rosetta binaries generated consistent runtimes.

I added PrimeGrid tasks and my Rosetta test binary execution times tripled. When I turned off PrimeGrid, my Rosetta run times return to the expected values.


I am seeing similar results. Rosetta grades on a "curve" and is a "tough teacher". 8-)

Results on one YOUR recent jobs shows that it got 63% or requested credit.
Validate state Valid
Claimed credit 168.277828702931
Granted credit 106.485820803471
application version 3.73


A recent results from MY Broadwell 8C/16T microserver: Xeon(R) CPU D-1540 @ 2.00GHz got 60% of requested credit.
Validate state Valid
Claimed credit 635.933767006863
Granted credit 385.737768644956
application version 3.73

A recent result from MY IvyBridge i7: i7-3770K CPU @ 3.50GHz got 40% of requested credit.
Validate state Valid
Claimed credit 773.073241462645
Granted credit 355.890423420893
application version 3.73



icc.o3.mtune.axcoreavx2.m32.50.12345/nohup.out:user 2725.38
icc.o3.mtune.axcoreavx2.m32.50.12345/nohup.out:user 2685.47
icc.o3.mtune.axcoreavx2.m32.50.12345/nohup.out:user 2747.27
icc.o3.mtune.axcoreavx2.m32.50.12345/nohup.out:user 2654.61
icc.o3.mtune.axcoreavx2.m32.50.12345/nohup.out:user 2828.59

icc.o3.mtune.axcoreavx2.m64.50.12345/nohup.out:user 2809.78
icc.o3.mtune.axcoreavx2.m64.50.12345/nohup.out:user 2563.33
icc.o3.mtune.axcoreavx2.m64.50.12345/nohup.out:user 2718.20
icc.o3.mtune.axcoreavx2.m64.50.12345/nohup.out:user 2725.10
icc.o3.mtune.axcoreavx2.m64.50.12345/nohup.out:user 2709.36

icc.o3.mtune.axcoreavxi.m32.50.12345/nohup.out:user 3434.77
icc.o3.mtune.axcoreavxi.m32.50.12345/nohup.out:user 3477.16
icc.o3.mtune.axcoreavxi.m32.50.12345/nohup.out:user 3519.45
icc.o3.mtune.axcoreavxi.m32.50.12345/nohup.out:user 3369.62
icc.o3.mtune.axcoreavxi.m32.50.12345/nohup.out:user 3458.30
icc.o3.mtune.axcoreavxi.m64.50.12345/nohup.out:user 3354.62

icc.o3.mtune.axcoreavxi.m64.50.12345/nohup.out:user 4174.92
icc.o3.mtune.axcoreavxi.m64.50.12345/nohup.out:user 5313.84 <<< I STARTED running primegrid
icc.o3.mtune.axcoreavxi.m64.50.12345/nohup.out:user 10563.18 <<< Rosetta runtime triples!
icc.o3.mtune.axcoreavxi.m64.50.12345/nohup.out:user 11973.17

icc.o3.mtune.axsse42.m32.50.12345/nohup.out:user 9171.31
icc.o3.mtune.axsse42.m32.50.12345/nohup.out:user 8060.61 <<< I STOPPED running primegrid
icc.o3.mtune.axsse42.m32.50.12345/nohup.out:user 3013.09 <<<< NORMAL runtimes
icc.o3.mtune.axsse42.m32.50.12345/nohup.out:user 2881.54
icc.o3.mtune.axsse42.m32.50.12345/nohup.out:user 2883.69

icc.o3.mtune.axsse42.m64.50.12345/nohup.out:user 2832.93
icc.o3.mtune.axsse42.m64.50.12345/nohup.out:user 2707.10
icc.o3.mtune.axsse42.m64.50.12345/nohup.out:user 2800.13
icc.o3.mtune.axsse42.m64.50.12345/nohup.out:user 2970.67
icc.o3.mtune.axsse42.m64.50.12345/nohup.out:user 3067.31

rjs5

Joined: Nov 22 10
Posts: 125
ID: 402637
Credit: 3,223,734
RAC: 2,383
Message 80048 - Posted 9 May 2016 1:30:43 UTC - in response to Message ID 80047.
Last modified: 9 May 2016 1:36:44 UTC

I am not sure about VHC or its control knobs. I have looked at SixTrack source code many years ago and had a SixTrack (LHC@Home) account but they could not generate work to crunch, so I gave up. I have also never ran a VirtualBox version of any project, so have no experience there either.

If the app runs under BOINC control, you can set the PROJECT->NO NEW TASKS and let the tasks drain out or simply suspend the VLHC project application for a period and see if it makes a difference in the Rosetta results.

A quick examination of the Windows 10 task manager might tell:

TASK MANAGER -> MORE DETAILS -> PROCESSES

screen should tell you a lot.

The CPU column should total close to 100% if you allow all CPU to be busy.
SORT BY CPU by clicking on the CPU column.

The Rosetta jobs should be consuming 1/6 (one of your 6 CPU) or 16.6% of the machine. If they are consuming noticeably less than 16.6% then that means the Rosetta job is not running 100% of the time, the Rosetta code and data is being evicted from the L1/.../Lx CPU caches. It takes a few cycles for the CPU to get data from those near caches. If the CPU has to go to main memory for evicted code/data, it takes 10x that long and Rosetta will run but VERY inefficiently while it waits for code/data to warm the caches again. Rosetta works hard but is waiting on code/data from memory.

It is worth your time to run a couple experiments on your machine to see if anything is affecting progress..




You think that VHC could be interfering? They both seem stuck on low average credit and VHC runs on 24 time slots. You can not alter the run time on that project.

Since I have been on Rosetta longer than VHC, I may have to drop VHC.
I was trying it because I wanted to see how virtual box worked.

Is my CPU not strong enough for the current tasks that have been running since the middle of April? My granted credit is running 50 or so points under the claimed credit. My average credit has dropped something like 500 points since the middle of April.

Can someone have a look at my cpu info and my stats and tell me whats going on?
Rosie used to be nice to me, but now she is being mean.



Your CPU is fine. If you are running other projects too, it might be related to the interaction between Rosetta and that job. Rosetta is compiled with aggressive inlining and has a big code footprint. If other work has a big code footprint too, they fight each other for CODE cache and cache misses make the CPU run LESS EFFICIENT than the Whetstone benchmark executed at start up.

I have noticed that Rosetta is EXTREMELY sensitive to what is executing while Rosetta is crunching jobs. I was running my test binaries 5 times sequentially while the machine was running standard Rosetta and WorldGrid jobs. My Rosetta binaries generated consistent runtimes.

I added PrimeGrid tasks and my Rosetta test binary execution times tripled. When I turned off PrimeGrid, my Rosetta run times return to the expected values.


I am seeing similar results. Rosetta grades on a "curve" and is a "tough teacher". 8-)


Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 80049 - Posted 9 May 2016 11:32:25 UTC - in response to Message ID 80048.

It might be Poem. Even though it is GPU mainly it grabs .263% of the CPU but when looking at processes it takes 17% of the CPU and Rosetta jumps around between 16 and 8%.

I am not sure about VHC or its control knobs. I have looked at SixTrack source code many years ago and had a SixTrack (LHC@Home) account but they could not generate work to crunch, so I gave up. I have also never ran a VirtualBox version of any project, so have no experience there either.

If the app runs under BOINC control, you can set the PROJECT->NO NEW TASKS and let the tasks drain out or simply suspend the VLHC project application for a period and see if it makes a difference in the Rosetta results.

A quick examination of the Windows 10 task manager might tell:

TASK MANAGER -> MORE DETAILS -> PROCESSES

screen should tell you a lot.

The CPU column should total close to 100% if you allow all CPU to be busy.
SORT BY CPU by clicking on the CPU column.

The Rosetta jobs should be consuming 1/6 (one of your 6 CPU) or 16.6% of the machine. If they are consuming noticeably less than 16.6% then that means the Rosetta job is not running 100% of the time, the Rosetta code and data is being evicted from the L1/.../Lx CPU caches. It takes a few cycles for the CPU to get data from those near caches. If the CPU has to go to main memory for evicted code/data, it takes 10x that long and Rosetta will run but VERY inefficiently while it waits for code/data to warm the caches again. Rosetta works hard but is waiting on code/data from memory.

It is worth your time to run a couple experiments on your machine to see if anything is affecting progress..




[quote]You think that VHC could be interfering? They both seem stuck on low average credit and VHC runs on 24 time slots. You can not alter the run time on that project.

Since I have been on Rosetta longer than VHC, I may have to drop VHC.
I was trying it because I wanted to see how virtual box worked.

[quote][quote]Is my CPU not strong enough for the current tasks that have been running since the middle of April? My granted credit is running 50 or so points under the claimed credit. My average credit has dropped something like 500 points since the middle of April.

Can someone have a look at my cpu info and my stats and tell me whats going on?
Rosie used to be nice to me, but now she is being mean.



Your CPU is fine. If you are running other projects too, it might be related to the interaction between Rosetta and that job. Rosetta is compiled with aggressive inlining and has a big code footprint. If other work has a big code footprint too, they fight each other for CODE cache and cache misses make the CPU run LESS EFFICIENT than the Whetstone benchmark executed at start up.

I have noticed that Rosetta is EXTREMELY sensitive to what is executing while Rosetta is crunching jobs. I was running my test binaries 5 times sequentially while the machine was running standard Rosetta and WorldGrid jobs. My Rosetta binaries generated consistent runtimes.

I added PrimeGrid tasks and my Rosetta test binary execution times tripled. When I turned off PrimeGrid, my Rosetta run times return to the expected values.

rjs5

Joined: Nov 22 10
Posts: 125
ID: 402637
Credit: 3,223,734
RAC: 2,383
Message 80051 - Posted 9 May 2016 12:46:32 UTC - in response to Message ID 80050.
Last modified: 9 May 2016 13:26:47 UTC

If the Rosetta job is bouncing between 16% and 8%, the CPU caches are getting cleared out during the 8% time that Rosetta is being idled by other programs executing on your system. You cannot tell how many times Rosetta is getting/losing control during that 1 second sample but it is probably a large number of times.

This is a very good indication that CPU cache thrashing (two or more jobs wanting to have their code/data in CPU caches) is a problem. Since the Boinc Whetstone benchmark ran full speed on your machine and other user machines, when Rosetta bounces between 16%-8% and cache contents are evicted, your machine is not making as much Rosetta compute progress because it is waiting for code/data to be retrieved again from slower main memory. When compared to the other machine ratios of Rosetta/Whetstone, their ratio is higher than yours appears and they are getting a higher % of claimed credits.

It is hard to estimate the exact impact based on these high level numbers but if you saw 8% on Rosetta, that is not good and likely part of the problem.

I have seen the GPU job load on the CPU vary as a function of the SYSTEM and as a function of the GPU, CPU and memory bandwidth. POEM is taking 100% of a CPU on my i7-3770k/Nvidia 970 GPU.

The newer OpenCL GPU apps do seem to take a good chunk of a CPU. They take more CPU than their CUDA counterparts. On machines that I run POEM or similar OpenCL GPU projects, I set the :

BOINC -> COMPUTER PREFERENCES -> USAGE LIMITS -> % of CPUs = 99%

to keep 1 CPU available for the GPU jobs AND for reasonable response on the system.



It might be Poem. Even though it is GPU mainly it grabs .263% of the CPU but when looking at processes it takes 17% of the CPU and Rosetta jumps around between 16 and 8%.

I am not sure about VHC or its control knobs. I have looked at SixTrack source code many years ago and had a SixTrack (LHC@Home) account but they could not generate work to crunch, so I gave up. I have also never ran a VirtualBox version of any project, so have no experience there either.

If the app runs under BOINC control, you can set the PROJECT->NO NEW TASKS and let the tasks drain out or simply suspend the VLHC project application for a period and see if it makes a difference in the Rosetta results.

A quick examination of the Windows 10 task manager might tell:

TASK MANAGER -> MORE DETAILS -> PROCESSES

screen should tell you a lot.

The CPU column should total close to 100% if you allow all CPU to be busy.
SORT BY CPU by clicking on the CPU column.

The Rosetta jobs should be consuming 1/6 (one of your 6 CPU) or 16.6% of the machine. If they are consuming noticeably less than 16.6% then that means the Rosetta job is not running 100% of the time, the Rosetta code and data is being evicted from the L1/.../Lx CPU caches. It takes a few cycles for the CPU to get data from those near caches. If the CPU has to go to main memory for evicted code/data, it takes 10x that long and Rosetta will run but VERY inefficiently while it waits for code/data to warm the caches again. Rosetta works hard but is waiting on code/data from memory.

It is worth your time to run a couple experiments on your machine to see if anything is affecting progress..




[quote]You think that VHC could be interfering? They both seem stuck on low average credit and VHC runs on 24 time slots. You can not alter the run time on that project.

Since I have been on Rosetta longer than VHC, I may have to drop VHC.
I was trying it because I wanted to see how virtual box worked.

[quote][quote]Is my CPU not strong enough for the current tasks that have been running since the middle of April? My granted credit is running 50 or so points under the claimed credit. My average credit has dropped something like 500 points since the middle of April.

Can someone have a look at my cpu info and my stats and tell me whats going on?
Rosie used to be nice to me, but now she is being mean.



Your CPU is fine. If you are running other projects too, it might be related to the interaction between Rosetta and that job. Rosetta is compiled with aggressive inlining and has a big code footprint. If other work has a big code footprint too, they fight each other for CODE cache and cache misses make the CPU run LESS EFFICIENT than the Whetstone benchmark executed at start up.

I have noticed that Rosetta is EXTREMELY sensitive to what is executing while Rosetta is crunching jobs. I was running my test binaries 5 times sequentially while the machine was running standard Rosetta and WorldGrid jobs. My Rosetta binaries generated consistent runtimes.

I added PrimeGrid tasks and my Rosetta test binary execution times tripled. When I turned off PrimeGrid, my Rosetta run times return to the expected values.

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 80052 - Posted 9 May 2016 13:26:47 UTC
Last modified: 9 May 2016 13:30:31 UTC

Never mind. I read your email to fast Mod.
Thanks for the clearing out of the double post.

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 80053 - Posted 9 May 2016 13:34:52 UTC - in response to Message ID 80051.
Last modified: 9 May 2016 13:47:29 UTC

Ok, I will lower my overal Boinc CPU load to 98% and see if that helps.
And what you see on POEM is the same with me. 100% GPU and grabbing a significant percent of CPU. So it could be like you said, Rosetta getting bounced.
- Lowered both levels of processor usage to 96%. Will let things run and see if that helps Rosie catch back up. Thanks for the help. Let you know later if that solves the issue.

If the Rosetta job is bouncing between 16% and 8%, the CPU caches are getting cleared out during the 8% time that Rosetta is being idled by other programs executing on your system. You cannot tell how many times Rosetta is getting/losing control during that 1 second sample but it is probably a large number of times.

This is a very good indication that CPU cache thrashing (two or more jobs wanting to have their code/data in CPU caches) is a problem. Since the Boinc Whetstone benchmark ran full speed on your machine and other user machines, when Rosetta bounces between 16%-8% and cache contents are evicted, your machine is not making as much Rosetta compute progress because it is waiting for code/data to be retrieved again from slower main memory. When compared to the other machine ratios of Rosetta/Whetstone, their ratio is higher than yours appears and they are getting a higher % of claimed credits.

It is hard to estimate the exact impact based on these high level numbers but if you saw 8% on Rosetta, that is not good and likely part of the problem.

I have seen the GPU job load on the CPU vary as a function of the SYSTEM and as a function of the GPU, CPU and memory bandwidth. POEM is taking 100% of a CPU on my i7-3770k/Nvidia 970 GPU.

The newer OpenCL GPU apps do seem to take a good chunk of a CPU. They take more CPU than their CUDA counterparts. On machines that I run POEM or similar OpenCL GPU projects, I set the :

BOINC -> COMPUTER PREFERENCES -> USAGE LIMITS -> % of CPUs = 99%

to keep 1 CPU available for the GPU jobs AND for reasonable response on the system.



It might be Poem. Even though it is GPU mainly it grabs .263% of the CPU but when looking at processes it takes 17% of the CPU and Rosetta jumps around between 16 and 8%.

I am not sure about VHC or its control knobs. I have looked at SixTrack source code many years ago and had a SixTrack (LHC@Home) account but they could not generate work to crunch, so I gave up. I have also never ran a VirtualBox version of any project, so have no experience there either.

If the app runs under BOINC control, you can set the PROJECT->NO NEW TASKS and let the tasks drain out or simply suspend the VLHC project application for a period and see if it makes a difference in the Rosetta results.

A quick examination of the Windows 10 task manager might tell:

TASK MANAGER -> MORE DETAILS -> PROCESSES

screen should tell you a lot.

The CPU column should total close to 100% if you allow all CPU to be busy.
SORT BY CPU by clicking on the CPU column.

The Rosetta jobs should be consuming 1/6 (one of your 6 CPU) or 16.6% of the machine. If they are consuming noticeably less than 16.6% then that means the Rosetta job is not running 100% of the time, the Rosetta code and data is being evicted from the L1/.../Lx CPU caches. It takes a few cycles for the CPU to get data from those near caches. If the CPU has to go to main memory for evicted code/data, it takes 10x that long and Rosetta will run but VERY inefficiently while it waits for code/data to warm the caches again. Rosetta works hard but is waiting on code/data from memory.

It is worth your time to run a couple experiments on your machine to see if anything is affecting progress..




[quote]You think that VHC could be interfering? They both seem stuck on low average credit and VHC runs on 24 time slots. You can not alter the run time on that project.

Since I have been on Rosetta longer than VHC, I may have to drop VHC.
I was trying it because I wanted to see how virtual box worked.

[quote][quote]Is my CPU not strong enough for the current tasks that have been running since the middle of April? My granted credit is running 50 or so points under the claimed credit. My average credit has dropped something like 500 points since the middle of April.

Can someone have a look at my cpu info and my stats and tell me whats going on?
Rosie used to be nice to me, but now she is being mean.



Your CPU is fine. If you are running other projects too, it might be related to the interaction between Rosetta and that job. Rosetta is compiled with aggressive inlining and has a big code footprint. If other work has a big code footprint too, they fight each other for CODE cache and cache misses make the CPU run LESS EFFICIENT than the Whetstone benchmark executed at start up.

I have noticed that Rosetta is EXTREMELY sensitive to what is executing while Rosetta is crunching jobs. I was running my test binaries 5 times sequentially while the machine was running standard Rosetta and WorldGrid jobs. My Rosetta binaries generated consistent runtimes.

I added PrimeGrid tasks and my Rosetta test binary execution times tripled. When I turned off PrimeGrid, my Rosetta run times return to the expected values.

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 80057 - Posted 9 May 2016 21:50:58 UTC - in response to Message ID 80053.

96% seems to be a sweet spot for the machine. Percentages are holding around 16% average now. No drop outs. Thanks for the help

[quote]Ok, I will lower my overal Boinc CPU load to 98% and see if that helps.
And what you see on POEM is the same with me. 100% GPU and grabbing a significant percent of CPU. So it could be like you said, Rosetta getting bounced.
- Lowered both levels of processor usage to 96%. Will let things run and see if that helps Rosie catch back up. Thanks for the help. Let you know later if that solves the issue.

[quote]If the Rosetta job is bouncing between 16% and 8%, the CPU caches are getting cleared out during the 8% time that Rosetta is being idled by other programs executing on your system. You cannot tell how many times Rosetta is getting/losing control during that 1 second sample but it is probably a large number of times.

This is a very good indication that CPU cache thrashing (two or more jobs wanting to have their code/data in CPU caches) is a problem. Since the Boinc Whetstone benchmark ran full speed on your machine and other user machines, when Rosetta bounces between 16%-8% and cache contents are evicted, your machine is not making as much Rosetta compute progress because it is waiting for code/data to be retrieved again from slower main memory. When compared to the other machine ratios of Rosetta/Whetstone, their ratio is higher than yours appears and they are getting a higher % of claimed credits.

It is hard to estimate the exact impact based on these high level numbers but if you saw 8% on Rosetta, that is not good and likely part of the problem.

I have seen the GPU job load on the CPU vary as a function of the SYSTEM and as a function of the GPU, CPU and memory bandwidth. POEM is taking 100% of a CPU on my i7-3770k/Nvidia 970 GPU.

The newer OpenCL GPU apps do seem to take a good chunk of a CPU. They take more CPU than their CUDA counterparts. On machines that I run POEM or similar OpenCL GPU projects, I set the :

BOINC -> COMPUTER PREFERENCES -> USAGE LIMITS -> % of CPUs = 99%

to keep 1 CPU available for the GPU jobs AND for reasonable response on the system.



[quote]It might be Poem. Even though it is GPU mainly it grabs .263% of the CPU but when looking at processes it takes 17% of the CPU and Rosetta jumps around between 16 and 8%.

[quote]I am not sure about VHC or its control knobs. I have looked at SixTrack source code many years ago and had a SixTrack (LHC@Home) account but they could not generate work to crunch, so I gave up. I have also never ran a VirtualBox version of any project, so have no experience there either.

If the app runs under BOINC control, you can set the PROJECT->NO NEW TASKS and let the tasks drain out or simply suspend the VLHC project application for a period and see if it makes a difference in the Rosetta results.

A quick examination of the Windows 10 task manager might tell:

TASK MANAGER -> MORE DETAILS -> PROCESSES

screen should tell you a lot.

The CPU column should total close to 100% if you allow all CPU to be busy.
SORT BY CPU by clicking on the CPU column.

The Rosetta jobs should be consuming 1/6 (one of your 6 CPU) or 16.6% of the machine. If they are consuming noticeably less than 16.6% then that means the Rosetta job is not running 100% of the time, the Rosetta code and data is being evicted from the L1/.../Lx CPU caches. It takes a few cycles for the CPU to get data from those near caches. If the CPU has to go to main memory for evicted code/data, it takes 10x that long and Rosetta will run but VERY inefficiently while it waits for code/data to warm the caches again. Rosetta works hard but is waiting on code/data from memory.

It is worth your time to run a couple experiments on your machine to see if anything is affecting progress..




[quote]You think that VHC could be interfering? They both seem stuck on low average credit and VHC runs on 24 time slots. You can not alter the run time on that project.

Since I have been on Rosetta longer than VHC, I may have to drop VHC.
I was trying it because I wanted to see how virtual box worked.

[quote][quote]Is my CPU not strong enough for the current tasks that have been running since the middle of April? My granted credit is running 50 or so points under the claimed credit. My average credit has dropped something like 500 points since the middle of April.

Can someone have a look at my cpu info and my stats and tell me whats going on?
Rosie used to be nice to me, but now s

rjs5

Joined: Nov 22 10
Posts: 125
ID: 402637
Credit: 3,223,734
RAC: 2,383
Message 80063 - Posted 10 May 2016 13:15:08 UTC - in response to Message ID 80053.

There is no difference between 99% and 96% of CPUs in the computing configuration of your machine. Any minor change was likely due to background churning of other jobs ... either normal system tasks or other Boinc compute jobs.

There are two BOINC COMPUTING PREFERENCES -> COMPUTING controls for the CPU.
One is "% of CPUs" which controls the number of CPUs that are active.
Second is "% of CPU time" which intentionally inserts idle into the compute time.

Use "% of CPUs" and AVOID the "% of CPU time" like the plague. Inserting non-BOINC time into the project execution is like what you saw with Rosetta running at 8%. Your 8% was like setting the "% of CPU time" at 50%.

The "% of CPUs" deals in whole CPUs.
"% of CPUs" set to 99% will allow 5 of your 6 CPU to run CPU only jobs.
You can drop "% of CPUs" down to 100% - 1/6 = 83.4% and it should still allow 5 of your CPUs to run. If you set "% of CPUs" to 83%, then BOINC will idle the second CPU and only 4 would run.

EXAMPLE:
On my i7 with 8-CPUs, setting "% of CPUs" to 99% disables 1 CPU ... and displays the following message in the EVENT LOG:

5/10/2016 6:00:32 AM | | Number of usable CPUs has changed from 8 to 7.
5/10/2016 6:00:32 AM | | max CPUs used: 7

Setting "% of CPUs" to 88% yields the same message.
Setting "% of CPUs" to 87% drops another CPU with the EVENT LOG message:

5/10/2016 6:02:32 AM | | Number of usable CPUs has changed from 7 to 6.
5/10/2016 6:02:32 AM | | max CPUs used: 6


Ok, I will lower my overal Boinc CPU load to 98% and see if that helps.
And what you see on POEM is the same with me. 100% GPU and grabbing a significant percent of CPU. So it could be like you said, Rosetta getting bounced.
- Lowered both levels of processor usage to 96%. Will let things run and see if that helps Rosie catch back up. Thanks for the help. Let you know later if that solves the issue.

[quote]If the Rosetta job is bouncing between 16% and 8%, the CPU caches are getting cleared out during the 8% time that Rosetta is being idled by other programs executing on your system. You cannot tell how many times Rosetta is getting/losing control during that 1 second sample but it is probably a large number of times.

This is a very good indication that CPU cache thrashing (two or more jobs wanting to have their code/data in CPU caches) is a problem. Since the Boinc Whetstone benchmark ran full speed on your machine and other user machines, when Rosetta bounces between 16%-8% and cache contents are evicted, your machine is not making as much Rosetta compute progress because it is waiting for code/data to be retrieved again from slower main memory. When compared to the other machine ratios of Rosetta/Whetstone, their ratio is higher than yours appears and they are getting a higher % of claimed credits.

It is hard to estimate the exact impact based on these high level numbers but if you saw 8% on Rosetta, that is not good and likely part of the problem.

I have seen the GPU job load on the CPU vary as a function of the SYSTEM and as a function of the GPU, CPU and memory bandwidth. POEM is taking 100% of a CPU on my i7-3770k/Nvidia 970 GPU.

The newer OpenCL GPU apps do seem to take a good chunk of a CPU. They take more CPU than their CUDA counterparts. On machines that I run POEM or similar OpenCL GPU projects, I set the :

BOINC -> COMPUTER PREFERENCES -> USAGE LIMITS -> % of CPUs = 99%

to keep 1 CPU available for the GPU jobs AND for reasonable response on the system.



[quote]It might be Poem. Even though it is GPU mainly it grabs .263% of the CPU but when looking at processes it takes 17% of the CPU and Rosetta jumps around between 16 and 8%.

[quote]I am not sure about VHC or its control knobs. I have looked at SixTrack source code many years ago and had a SixTrack (LHC@Home) account but they could not generate work to crunch, so I gave up. I have also never ran a VirtualBox version of any project, so have no experience there either.

If the app runs under BOINC control, you can set the PROJECT->NO NEW TASKS and let the tasks drain out or simply suspend the VLHC project application for a period and see if it makes a difference in the Rosetta results.

A quick examination of the Windows 10 task manager might tell:

TASK MANAGER -> MORE DETAILS -> PROCESSES

screen should tell you a lot.

The CPU column should total close to 100% if you allow all CPU to be busy.
SORT BY CPU by clicking on the CPU column.

The Rosetta jobs should be consuming 1/6 (one of your 6 CPU) or 16.6% of the machine. If they are consuming noticeably less than 16.6% then that means the Rosetta job is not running 100% of the time, the Rosetta code and data is being evicted from the L1/.../Lx CPU caches. It takes a few cycles for the CPU to get data from those near caches. If the CPU has to go to main memory for evicted code/data, it takes 10x that long and Rosetta will run but VERY inefficiently while it waits for code/data to warm the caches again. Rosetta works hard but is waiting on code/data from memory.

It is worth your time to run a couple experiments on your machine to see if anything is affecting progress..




[quote]You think that VHC could be interfering? They both seem stuck on low average credit and VHC runs on 24 time slots. You can not alter the run time on that project.

Since I have been on Rosetta longer than VHC, I may have to drop VHC.
I was trying it because I wanted to see how virtual box worked.

[quote][quote]Is my CPU not strong enough for the current tasks that have been running since the middle of April? My granted credit is running 50 or so points under the claimed credit. My average credit has dropped something like 500 points since the middle of April.

Can someone have a look at my cpu info and my stats and tell me whats going on?
Rosie used to be nice to me, but now she is being mean.



Your CPU is fine. If you are running other projects too, it might be related to the interaction between Rosetta and that job. Rosetta is compiled with aggressive inlining and has a big code footprint. If other work has a big code footprint too, they fight each other for CODE cache and cache misses make the CPU run LESS EFFICIENT than the Whetstone benchmark executed at start up.

I have noticed that Rosetta is EXTREMELY sensitive to what is executing while Rosetta is crunching jobs. I was running my test binaries 5 times sequentially while the machine was r

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 80064 - Posted 10 May 2016 13:28:39 UTC - in response to Message ID 80063.

Now 99% of total cpu's 5 dedicated cpu and 1 to work with GPU and other stuff. Percent of time cpu is back to 100%


[quote]There is no difference between 99% and 96% of CPUs in the computing configuration of your machine. Any minor change was likely due to background churning of other jobs ... either normal system tasks or other Boinc compute jobs.

There are two BOINC COMPUTING PREFERENCES -> COMPUTING controls for the CPU.
One is "% of CPUs" which controls the number of CPUs that are active.
Second is "% of CPU time" which intentionally inserts idle into the compute time.

Use "% of CPUs" and AVOID the "% of CPU time" like the plague. Inserting non-BOINC time into the project execution is like what you saw with Rosetta running at 8%. Your 8% was like setting the "% of CPU time" at 50%.

The "% of CPUs" deals in whole CPUs.
"% of CPUs" set to 99% will allow 5 of your 6 CPU to run CPU only jobs.
You can drop "% of CPUs" down to 100% - 1/6 = 83.4% and it should still allow 5 of your CPUs to run. If you set "% of CPUs" to 83%, then BOINC will idle the second CPU and only 4 would run.

EXAMPLE:
On my i7 with 8-CPUs, setting "% of CPUs" to 99% disables 1 CPU ... and displays the following message in the EVENT LOG:

5/10/2016 6:00:32 AM | | Number of usable CPUs has changed from 8 to 7.
5/10/2016 6:00:32 AM | | max CPUs used: 7

Setting "% of CPUs" to 88% yields the same message.
Setting "% of CPUs" to 87% drops another CPU with the EVENT LOG message:

5/10/2016 6:02:32 AM | | Number of usable CPUs has changed from 7 to 6.
5/10/2016 6:02:32 AM | | max CPUs used: 6


[quote]Ok, I will lower my overal Boinc CPU load to 98% and see if that helps.
And what you see on POEM is the same with me. 100% GPU and grabbing a significant percent of CPU. So it could be like you said, Rosetta getting bounced.
- Lowered both levels of processor usage to 96%. Will let things run and see if that helps Rosie catch back up. Thanks for the help. Let you know later if that solves the issue.

[quote]If the Rosetta job is bouncing between 16% and 8%, the CPU caches are getting cleared out during the 8% time that Rosetta is being idled by other programs executing on your system. You cannot tell how many times Rosetta is getting/losing control during that 1 second sample but it is probably a large number of times.

This is a very good indication that CPU cache thrashing (two or more jobs wanting to have their code/data in CPU caches) is a problem. Since the Boinc Whetstone benchmark ran full speed on your machine and other user machines, when Rosetta bounces between 16%-8% and cache contents are evicted, your machine is not making as much Rosetta compute progress because it is waiting for code/data to be retrieved again from slower main memory. When compared to the other machine ratios of Rosetta/Whetstone, their ratio is higher than yours appears and they are getting a higher % of claimed credits.

It is hard to estimate the exact impact based on these high level numbers but if you saw 8% on Rosetta, that is not good and likely part of the problem.

I have seen the GPU job load on the CPU vary as a function of the SYSTEM and as a function of the GPU, CPU and memory bandwidth. POEM is taking 100% of a CPU on my i7-3770k/Nvidia 970 GPU.

The newer OpenCL GPU apps do seem to take a good chunk of a CPU. They take more CPU than their CUDA counterparts. On machines that I run POEM or similar OpenCL GPU projects, I set the :

BOINC -> COMPUTER PREFERENCES -> USAGE LIMITS -> % of CPUs = 99%

to keep 1 CPU available for the GPU jobs AND for reasonable response on the system.



[quote]It might be Poem. Even though it is GPU mainly it grabs .263% of the CPU but when looking at processes it takes 17% of the CPU and Rosetta jumps around between 16 and 8%.

[quote]I am not sure about VHC or its control knobs. I have looked at SixTrack source code many years ago and had a SixTrack (LHC@Home) account but they could not generate work to crunch, so I gave up. I have also never ran a VirtualBox version of any project, so have no experience there either.

If the app runs under BOINC control, you can set the PROJECT->NO NEW TASKS and let the tasks drain out or simply suspend the VLHC project application for a period and see if it makes a difference in the Rosetta results.

A quick examination of the Windows 10 task manager might tell:

TASK MANAGER -> MORE DETAILS -> PROCESSES

screen should tell you a lot.

The CPU column should total close to 100% if you allow all CPU to be busy.
SORT BY CPU by clicking on the CPU column.

The Rosetta jobs should be consuming 1/6 (one of your 6 CPU) or 16.6% of the machine. If they are consuming noticeably less than 16.6% then that means the Rosetta job is not running 100% of the time, the Rosetta code and data is being evicted from the L1/.../Lx CPU caches. It takes a few cycles for the CPU to get data from those near caches. If the CPU has to go to main memory for evicted code/data, it takes 10x that long and Rosetta will run but VERY inefficiently while it waits for code/data to warm the caches again. Rosetta works hard but is waiting on code/data from memory.

It is worth your time to run a couple experiments on your machine to see if anything is affecting progress..




[quote]You think that VHC could be interfering? They both seem stuck on low average credit and VHC runs on 24 time slots. You can not alter the run time on that project.

Since I have been on Rosetta longer than VHC, I may have to drop VHC.
I was trying it because I wanted to see how virtual box worked.

[quote][quote]Is my CPU not strong enough for the current tasks that have been running

Dr. Merkwürdigliebe Profile
Avatar

Joined: Dec 5 10
Posts: 74
ID: 404270
Credit: 2,417,348
RAC: 3,829
Message 80066 - Posted 10 May 2016 15:07:28 UTC - in response to Message ID 80064.

@Greg_BE: You might want to learn how those nasty quote tags work...

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 80068 - Posted 10 May 2016 17:56:13 UTC - in response to Message ID 80066.
Last modified: 10 May 2016 17:57:15 UTC

@Greg_BE: You might want to learn how those nasty quote tags work...


I think that is because I am writing above the previous post instead of below like here. The computer for the forum can't read backwards.

I haven't posted on here in years so I have forgotten how this works.

rjs5

Joined: Nov 22 10
Posts: 125
ID: 402637
Credit: 3,223,734
RAC: 2,383
Message 80069 - Posted 10 May 2016 19:24:32 UTC - in response to Message ID 80068.

You can put messages above the old message .... AND I thought that was a clever idea since it worked for me.

@Greg_BE: You might want to learn how those nasty quote tags work...


I think that is because I am writing above the previous post instead of below like here. The computer for the forum can't read backwards.

I haven't posted on here in years so I have forgotten how this works.

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 80070 - Posted 10 May 2016 20:01:27 UTC - in response to Message ID 80069.
Last modified: 10 May 2016 20:02:18 UTC

we are going way off topic now. so time to end this.


You can put messages above the old message .... AND I thought that was a clever idea since it worked for me.

@Greg_BE: You might want to learn how those nasty quote tags work...


I think that is because I am writing above the previous post instead of below like here. The computer for the forum can't read backwards.

I haven't posted on here in years so I have forgotten how this works.

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 80103 - Posted 19 May 2016 12:22:54 UTC

825724761

ERROR: in::file::boinc_wu_zip 5H2LD-13_tj58_5_054307_0014_I_0001_data.zip does not exist!
ERROR:: Exit from: ..\..\..\src\apps\public\boinc\minirosetta.cc line: 226
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

____________

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,056,972
RAC: 5,479
Message 80107 - Posted 19 May 2016 23:04:49 UTC - in response to Message ID 80103.

825724761

ERROR: in::file::boinc_wu_zip 5H2LD-13_tj58_5_054307_0014_I_0001_data.zip does not exist!
ERROR:: Exit from: ..\..\..\src\apps\public\boinc\minirosetta.cc line: 226
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish



I see the same error: seems to happen with all tasks named yh_*. Boinc 7.2.42/Ubuntu 14.04

Dr. Merkwürdigliebe Profile
Avatar

Joined: Dec 5 10
Posts: 74
ID: 404270
Credit: 2,417,348
RAC: 3,829
Message 80109 - Posted 20 May 2016 19:29:30 UTC

Yes, all of the yh* jobs are failing on my computer, too.

krypton
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Nov 16 11
Posts: 105
ID: 436004
Credit: 1,796,601
RAC: 1,887
Message 80110 - Posted 20 May 2016 19:59:46 UTC - in response to Message ID 80109.

Thanks for the report! I've contacted the authors of these jobs!

Yes, all of the yh* jobs are failing on my computer, too.

yhsia

Joined: May 21 16
Posts: 1
ID: 1414737
Credit: 92,233
RAC: 4
Message 80111 - Posted 21 May 2016 1:03:20 UTC - in response to Message ID 80110.

Thanks for the report! I've contacted the authors of these jobs!

Yes, all of the yh* jobs are failing on my computer, too.



Sorry those were my jobs! Apologizing for the wasted run times, I'm figuring out what went wrong :(.

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 80113 - Posted 21 May 2016 4:22:55 UTC
Last modified: 21 May 2016 4:30:31 UTC

I got several of the above, so no need to report them, but another isolated one came up:

4hi0_B_16_BEN_SUP_hyb_cst_v02_i00_t000__krypton_SAVE_ALL_OUT_03_09_358432_163_1

ERROR: Cannot open file "i11.pdb"
ERROR:: Exit from: ..\..\..\src\core\import_pose\import_pose.cc line: 255
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

Exited after just 50 seconds, so no harm done at my end

Oh, also another odd one that seemed to run ok, but claimed credit yet received 0 but without a validate error

rb_05_17_65554_109652__t000__ab_robetta_IGNORE_THE_REST_358733_4959_1
======================================================
DONE :: 1 starting structures 28567 cpu seconds
This process generated 47 decoys from 47 attempts
======================================================
BOINC :: WS_max 2.60878e+008

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>


Validate state Workunit error - check skipped
Claimed credit 200.579495343863
Granted credit 0
application version 3.73

____________

krypton
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Nov 16 11
Posts: 105
ID: 436004
Credit: 1,796,601
RAC: 1,887
Message 80116 - Posted 21 May 2016 19:26:01 UTC - in response to Message ID 80113.

Thanks Sid! I've fixed the issue, but unfortunately some units already got sent out =[

I got several of the above, so no need to report them, but another isolated one came up:

4hi0_B_16_BEN_SUP_hyb_cst_v02_i00_t000__krypton_SAVE_ALL_OUT_03_09_358432_163_1
ERROR: Cannot open file "i11.pdb"
ERROR:: Exit from: ..\..\..\src\core\import_pose\import_pose.cc line: 255
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

Exited after just 50 seconds, so no harm done at my end

Oh, also another odd one that seemed to run ok, but claimed credit yet received 0 but without a validate error

rb_05_17_65554_109652__t000__ab_robetta_IGNORE_THE_REST_358733_4959_1
======================================================
DONE :: 1 starting structures 28567 cpu seconds
This process generated 47 decoys from 47 attempts
======================================================
BOINC :: WS_max 2.60878e+008

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>


Validate state Workunit error - check skipped
Claimed credit 200.579495343863
Granted credit 0
application version 3.73

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 80117 - Posted 22 May 2016 1:31:27 UTC - in response to Message ID 80116.

Good stuff. The one with the credit problem seems to have got cleaned up in the meantime and granted credit equal to claimed credit, so all's well there too.

A few more failed tasks but only of the type already reported, so all in hand as they work their way out of the queue.

Thanks Sid! I've fixed the issue, but unfortunately some units already got sent out =[
I got several of the above, so no need to report them, but another isolated one came up:

4hi0_B_16_BEN_SUP_hyb_cst_v02_i00_t000__krypton_SAVE_ALL_OUT_03_09_358432_163_1
ERROR: Cannot open file "i11.pdb"
ERROR:: Exit from: ..\..\..\src\core\import_pose\import_pose.cc line: 255
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

Exited after just 50 seconds, so no harm done at my end

Oh, also another odd one that seemed to run ok, but claimed credit yet received 0 but without a validate error

rb_05_17_65554_109652__t000__ab_robetta_IGNORE_THE_REST_358733_4959_1
======================================================
DONE :: 1 starting structures 28567 cpu seconds
This process generated 47 decoys from 47 attempts
======================================================
BOINC :: WS_max 2.60878e+008

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>


Validate state Workunit error - check skipped
Claimed credit 200.579495343863
Granted credit 0
application version 3.73


____________

Dr. Merkwürdigliebe Profile
Avatar

Joined: Dec 5 10
Posts: 74
ID: 404270
Credit: 2,417,348
RAC: 3,829
Message 80132 - Posted 28 May 2016 14:29:32 UTC

Compute error

krypton
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Nov 16 11
Posts: 105
ID: 436004
Credit: 1,796,601
RAC: 1,887
Message 80133 - Posted 28 May 2016 21:10:45 UTC - in response to Message ID 80132.

Compute error

Thanks! I've informed the author of the job.

wyxchari Profile

Joined: Nov 27 14
Posts: 11
ID: 1022309
Credit: 40,803
RAC: 0
Message 80149 - Posted 2 Jun 2016 11:08:27 UTC

I'm tired of computer errors Rosetta. Many tasks fail at the end and then not receive credit. I prefer to use my computer time on other projects as WCG Cancer never give me error. Goodbye forever.
____________

Timo Profile
Avatar

Joined: Jan 9 12
Posts: 171
ID: 440120
Credit: 10,317,960
RAC: 15,380
Message 80150 - Posted 2 Jun 2016 12:43:43 UTC - in response to Message ID 80149.

I'm tired of computer errors Rosetta. Many tasks fail at the end and then not receive credit. I prefer to use my computer time on other projects as WCG Cancer never give me error. Goodbye forever.



Actually, your error'd task recieved full credit (See bottom of page here: http://boinc.bakerlab.org/rosetta/result.php?resultid=824626167) As with most invalid tasks, there is a job that grants credit to invalid jobs once a day as they don't get credit right away, and this granted credit only shows on the result summary page.

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 80160 - Posted 6 Jun 2016 0:55:36 UTC

A new error report for yhsia to look at - cuts in at 30-45 minutes into the tasks for some reason:

yh160603_5H2LD-13-R_tj59_5_043651_0011_E_0001_SAVE_ALL_OUT_377694_12_1

<message>
(unknown error) - exit code -529697949 (0xe06d7363)
</message>

yh160603_5H2LD-13-R_tj58_5_000001_0002_C_0001_SAVE_ALL_OUT_377666_89_0
<message>
(unknown error) - exit code -529697949 (0xe06d7363)
</message>

____________

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 80161 - Posted 6 Jun 2016 6:53:40 UTC - in response to Message ID 80160.

Also yh160603_5H2LD-13-R_tj59_5_000001_0001_E_0001_SAVE_ALL_OUT_377690_131_1

<message>
(unknown error) - exit code -1073741819 (0xc0000005)
</message>


____________

krypton
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Nov 16 11
Posts: 105
ID: 436004
Credit: 1,796,601
RAC: 1,887
Message 80162 - Posted 6 Jun 2016 21:18:16 UTC

Hi Sid,

Thanks for the alert! looks like these jobs require lots of memory. We have a way to specify how much memory to use. It will corrected in the next round of submission!

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 80163 - Posted 6 Jun 2016 23:59:00 UTC - in response to Message ID 80162.

Hi Sid,

Thanks for the alert! looks like these jobs require lots of memory. We have a way to specify how much memory to use. It will corrected in the next round of submission!

That would kind of explain why the task runs for a reasonable while before crashing out, and I've seen occasional tasks using 1.2Gb, but I'm running with just short of 10Gb free of 16Gb total.

I set Boinc to run 60% of memory when the computer is in use (90% when not in use). Do people routinely allocate more than that? What can I safely adjust that setting to, or is it just trial and error?
____________

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 80164 - Posted 7 Jun 2016 1:00:02 UTC - in response to Message ID 80163.

Hi Sid,

Thanks for the alert! looks like these jobs require lots of memory. We have a way to specify how much memory to use. It will corrected in the next round of submission!

That would kind of explain why the task runs for a reasonable while before crashing out, and I've seen occasional tasks using 1.2Gb, but I'm running with just short of 10Gb free of 16Gb total.

I set Boinc to run 60% of memory when the computer is in use (90% when not in use). Do people routinely allocate more than that? What can I safely adjust that setting to, or is it just trial and error?


I've found that 64-bit Windows Vista is rather inefficient at handling memory for running 32-bit applications, so I set that computer to use 30% to 40% of the memory for BOINC out of 8 GB. 64-bit Windows 7 and Windows 10 are more efficient, so I set that computer to use 70% out of 16 GB. 64-bit BOINC is not very good at giving up memory when the computer is in use, so these settings are the same for when the computer is in use as when not in use.

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 80167 - Posted 8 Jun 2016 18:37:32 UTC - in response to Message ID 80164.

Hi Sid,

Thanks for the alert! looks like these jobs require lots of memory. We have a way to specify how much memory to use. It will corrected in the next round of submission!

That would kind of explain why the task runs for a reasonable while before crashing out, and I've seen occasional tasks using 1.2Gb, but I'm running with just short of 10Gb free of 16Gb total.

I set Boinc to run 60% of memory when the computer is in use (90% when not in use). Do people routinely allocate more than that? What can I safely adjust that setting to, or is it just trial and error?

I've found that 64-bit Windows Vista is rather inefficient at handling memory for running 32-bit applications, so I set that computer to use 30% to 40% of the memory for BOINC out of 8 GB. 64-bit Windows 7 and Windows 10 are more efficient, so I set that computer to use 70% out of 16 GB. 64-bit BOINC is not very good at giving up memory when the computer is in use, so these settings are the same for when the computer is in use as when not in use.

Useful, thanks. I'll tweak my Min 60% Max 90% to 65% & 85% on both my Win7 machines and see how it goes for now
____________

Andy_Taximan

Joined: Jan 20 14
Posts: 1
ID: 492014
Credit: 236,604
RAC: 0
Message 80176 - Posted 14 Jun 2016 18:17:21 UTC

Not much of a problem but 3 hours to download minirosetta_database_d0bf94b.zip really is a pain ! lol and no its not my internet speed

David Fickes

Joined: Jul 12 15
Posts: 1
ID: 1124419
Credit: 764,911
RAC: 1,566
Message 80196 - Posted 19 Jun 2016 5:42:15 UTC

Just been having communications problem with the rosetta@home servers since moving to El Capitan. I had to update the BOINC software but other projects are still running the log follows::

Sat Jun 18 22:38:32 2016 | rosetta@home | Requesting new tasks for CPU and AMD/ATI GPU and Intel GPU
Sat Jun 18 22:39:03 2016 | | Project communication failed: attempting access to reference site
Sat Jun 18 22:39:03 2016 | rosetta@home | Scheduler request failed: Server returned nothing (no headers, no data)
Sat Jun 18 22:39:04 2016 | | Internet access OK - project servers may be temporarily down.
Sat Jun 18 22:40:08 2016 | World Community Grid | Sending scheduler request: To fetch work.
Sat Jun 18 22:40:08 2016 | World Community Grid | Requesting new tasks for CPU and Intel GPU
Sat Jun 18 22:40:10 2016 | World Community Grid | Scheduler request completed: got 2 new tasks
Sat Jun 18 22:40:12 2016 | World Community Grid | Started download of fahb.FAH2_avx40811-ls_000076-in1.dms
Sat Jun 18 22:40:12 2016 | World Community Grid | Started download of fahb.FAH2_avx40811-ls_000076-in2.dms

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 80431 - Posted 25 Jul 2016 17:51:27 UTC

Not sure what's happening with this task atm

000096_C5_0052_0004_fragments_relax_SAVE_ALL_OUT_402757_2_1

CPU time at last checkpoint 07:09:30
CPU time 07:26:30
Elapsed time 07:59:48

62.135% complete (of 8 hour runtime - lagging behind what it should be)

Only at Model 1 Step 10

Getting full CPU time according to Task Manager - heading for the watchdog at that rate

It looks very complicated when I show graphics. Is all well with it?
____________

anarchic teapot

Joined: Mar 25 06
Posts: 2
ID: 67816
Credit: 147,788
RAC: 250
Message 80488 - Posted 5 Aug 2016 14:39:44 UTC

Rosetta Mini 3.73 is running well past the time it's supposed to take on my computer. One task has been running for over 2 days, is shown as being less than 50% done, but the remaining estimated time is blank.

From my logs, I see I've already had trouble with a different Rosetta module this morning: it ended with an error message 05/08/2016 11:12:31 | rosetta@home | Aborting task fEbH1149_fold_SAVE_ALL_OUT_402410_390_0; not started and deadline has passed

There's also this on my account:

851723397 769539264 22 Jul 2016 9:12:30 UTC 5 Aug 2016 9:12:30 UTC Over No reply New 0.00 --- ---
851723358 769539231 22 Jul 2016 9:12:30 UTC 5 Aug 2016 9:13:03 UTC Over Client error Aborted by user 0.00 0.00 ---
851723337 769539210 22 Jul 2016 9:12:30 UTC 5 Aug 2016 9:13:03 UTC Over Client error Aborted by user 0.00 0.00 ---
851723269 769539146 22 Jul 2016 9:12:30 UTC 5 Aug 2016 9:12:30 UTC Over No reply New 0.00 --- ---
851718775 769535427 22 Jul 2016 8:58:34 UTC 5 Aug 2016 8:58:34 UTC Over No reply New 0.00 --- ---

No, I haven't (yet) aborted any tasks, so I don't know why that message appears. It does look as if Mini 3.73 tasks are overrunning to the extent of being rejected by the server.

I'm going to terminate all 8 Mini 3.73 tasks currently in my queue & turn Rosetta off for a bit, to give the devs time to fix the problem.
____________

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 80489 - Posted 5 Aug 2016 15:05:30 UTC

Error on 857376630 task

Exit status: 194 (0xc2)
<message>
finish file present too long
</message>

____________

nanoprobe

Joined: Apr 5 09
Posts: 8
ID: 309801
Credit: 381,804
RAC: 0
Message 80522 - Posted 9 Aug 2016 18:48:49 UTC

I installed Android 5.1.1 on a Pine64 device and attached to Rosetta. I received 1 task which completed and validated. I'm not receiving any more tasks and the event logs says "Minirosetta is not available for your type of computer" every time I try to update. What's up with that?

nanoprobe

Joined: Apr 5 09
Posts: 8
ID: 309801
Credit: 381,804
RAC: 0
Message 80524 - Posted 9 Aug 2016 23:58:51 UTC

Looking again there was an upload error.

<message>
upload failure: <file_xfer_error>
<file_name>db_pred12_7mer_android_7res_t1c.2.86_0001_SAVE_ALL_OUT_344206_6803_3_0</file_name>
<error_code>-161 (not found)</error_code>
</file_xfer_error>

</message>

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 80732 - Posted 11 Oct 2016 8:34:49 UTC

880278687

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0116CAE0 write attempt to address 0x017D7EC1

____________

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 80766 - Posted 21 Oct 2016 6:51:55 UTC

Some of these...
881841371
881841206
etc

ERROR: unrecognized residue TIP
ERROR:: Exit from: ......srccoreiopose_from_sfrPoseFromSFRBuilder.cc line: 1030
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

____________

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 542
ID: 25524
Credit: 1,501,541
RAC: 1,427
Message 80768 - Posted 24 Oct 2016 5:49:15 UTC

Some errors after over 3h of calc (my default runtime is 2h)

882253156
882249925

And this after 6h :-(
882253123

- Unhandled Exception Record -
Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x75CFA6F2


____________

Message boards : Number crunching : Minirosetta 3.73


Home | Join | About | Participants | Community | Statistics

Copyright © 2017 University of Washington

Last Modified: 10 Nov 2010 1:51:38 UTC
Back to top ^