Posts by adrianxw

1) Message boards : Number crunching : Computation errors (Message 91330)
Posted 8 days ago by Profile adrianxw
Post:
I have had four units crash out in recent days. One with "Aborted by Server" so I discount that one. The other three with "Out of Memory". I think this is because I was sent "Rosetta v4.07
windows_intelx86" to run the job, and not "Rosetta v4.07 windows_intelx86_64". Of wingmen on the failing jobs Others have crashed with the same error, except one, who completed the unit, but was running Rosetta v4.07 windows_intelx86_64. Obviously, a 64 bit system can access a much greater memory range than a 32 bit. The question that arises though, is why was I sent x86 and not x86_64? My system runs 64 bit Windows and has more memory installed and available to BOINC than the chap that completed the job without error.
2) Message boards : Number crunching : Computation errors (Message 91157)
Posted 24 Sep 2019 by Profile adrianxw
Post:
I've also had two work units crash out today, one with this...

Exit status 1 (0x00000001) Unknown error code

... the other with this...

Exit status -529697949 (0xE06D7363) Unknown error code

No new tasks set for now.
3) Message boards : Number crunching : Strange work unit. (Message 91131)
Posted 19 Sep 2019 by Profile adrianxw
Post:
It is a little unfortunate! I have a task running now, it is 98.965% complete, VERY slowly increasing 10 minutes odd to complete, but it has run now 16:40:50 so is at the point where it is more than 4 hours over my 12:00:00 run time. A few minutes ago, it showed .963% and after finishing this post, it says .968% so it IS doing something.

<edit>
Okay, I managed to up the run time to 14:00:00 before it got the chop so I hope it will get there. Shows 98.971% right now. Interestingly, the time remaining is not decreasing, it has been 00:10:27 since I started.
</edit>

<edit again>
Yes! It suddenly jumped to 100% after 16:50:47.
</edit>

<edit again>
The task is:

rb_08_27_7614_7823_ab_t000_robetta_cstwt_5.0_FT_IGNORE_THE_REST_08_06_857976_594

Hope it is a good one. I'll leave the time at 14:00:00 in case there are others like this one.
</edit>

<edit AGAIN>

1093987561 985420727 3117659 18 Sep 2019, 5:10:20 UTC 19 Sep 2019, 6:59:05 UTC Completed and validated 44,963.77 43,178.52 569.88 Rosetta Mini v3.78
windows_x86_64
1093972969 984034477 3117659 18 Sep 2019, 3:52:35 UTC 19 Sep 2019, 8:51:40 UTC Completed and validated 60,647.23 57,949.45 398.76 Rosetta v4.07
windows_x86_64

1093968124 985403294 3117659 18 Sep 2019, 2:51:36 UTC 19 Sep 2019, 4:16:12 UTC Completed and validated 44,333.80 43,112.95 625.53 Rosetta Mini v3.78
windows_x86_64
1093967112 985402459 3161065 18 Sep 2019, 2:36:42 UTC 19 Sep 2019, 3:42:42 UTC Completed and validated 43,181.07 43,133.70 512.24 Rosetta Mini v3.78
windows_intelx86
1093967259 985402586 3161065 18 Sep 2019, 2:36:42 UTC 19 Sep 2019, 1:43:24 UTC Completed and validated 43,139.44 43,080.78 590.64 Rosetta Mini v3.78
windows_intelx86

Credit column is interesting. Mini looks to be maxy.
</edit>
4) Message boards : Number crunching : Strange work unit. (Message 91130)
Posted 18 Sep 2019 by Profile adrianxw
Post:
Good, that is what I expected. Thanks.
5) Message boards : Number crunching : Strange work unit. (Message 91127)
Posted 18 Sep 2019 by Profile adrianxw
Post:
And another. I presume that there is a safety kill mechanism which will abort a task if it exceeds some threshold time value. I ask because I have Rosetta in the portfolio of a couple of machines I do not see every day.
6) Message boards : Number crunching : Credit (Message 91062)
Posted 25 Aug 2019 by Profile adrianxw
Post:
The credit granted for jobs varies wildly. Examples from my recent work units:

1089381143 981231766 3117659 24 Aug 2019, 10:48:07 UTC 25 Aug 2019, 7:08:17 UTC Completed and validated 44,313.18 42,995.45 485.51
1089383131 980512536 3117659 24 Aug 2019, 10:34:53 UTC 25 Aug 2019, 4:10:56 UTC Completed and validated 44,485.02 43,186.50 570.90
1089341047 981195046 3117659 24 Aug 2019, 05:11:45 UTC 25 Aug 2019, 1:50:31 UTC Completed and validated 42,232.57 41,267.66 630.21

Credit is basically useless in any case.
7) Message boards : Number crunching : Rosetta 4.0+ (Message 91034)
Posted 16 Aug 2019 by Profile adrianxw
Post:
Still nothing here. My machines are crunching, exporting, end of story. Windows 8.1 x64. Something strange is happening.
8) Message boards : Number crunching : Rosetta 4.0+ (Message 90995)
Posted 7 Aug 2019 by Profile adrianxw
Post:
My post implies there are no errors or bugs in the application as I crunch quite a few workunits. Your error lines indicate that there is a data error in the work unit, an unrecognised residue, (monomer), in a polypeptide, but you don't report the work unit, it is quite possible, however, that the project can find that out from looking at your results, which I, of course, cannot do. The final BOINC error is probably because the job failed to start, and therefore did not create an output file.
9) Message boards : Number crunching : Rosetta 4.0+ (Message 90992)
Posted 6 Aug 2019 by Profile adrianxw
Post:
What is the problem people are seeing? I just looked at my results and have nothing odd, no errors.
10) Message boards : Number crunching : Rosetta 4.0+ (Message 90897)
Posted 8 Jul 2019 by Profile adrianxw
Post:
Got this overnight:

-1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION

... after it had crunched for 8 and a bit hours.
11) Message boards : Number crunching : Strange work unit. (Message 90594)
Posted 30 Mar 2019 by Profile adrianxw
Post:
I've just had another of these, this one:

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=959097968.
12) Message boards : Number crunching : Strange work unit. (Message 90490)
Posted 6 Mar 2019 by Profile adrianxw
Post:
I kept half an eye on it after it restarted, but it appeared to run completely normally, finished and uploaded. I agree, that if it restarts with the same random number, then my theory above is incorrect. I am not sufficiently familiar with the code to comment further really. It ran to a normal completion. Something upset it, cosmic ray, neuitrino interaction, could be anything I suppose. I have not changed anything here, so the project continues to run as it always has. Forget it.
13) Message boards : Number crunching : Strange work unit. (Message 90461)
Posted 2 Mar 2019 by Profile adrianxw
Post:
I have a work unit on here, this one...

http://boinc.bakerlab.org/rosetta/result.php?resultid=1059492200

... which has behaved in an unusual fashion. I have the run time set to 12 hours, and understand the task runs several times in the time allowed starting with a new random number each time. I saw the above unit had run for over two days, and only a little over 8% complete. I suspended it and then released it, it dropped back to the start and is running again now and has passed the point where it stopped before, indeed, it is showing 13.757% progress.

I infer from that, the task is sensitive to certain random numbers, which is a little odd, indeed, worrying. I have Rosetta running as one of the projects on machines that I do not see everyday.
14) Message boards : Rosetta@home Science : Run time. (Message 90027)
Posted 19 Dec 2018 by Profile adrianxw
Post:
If a job is out in the wild, they do seem to have ways of stopping them, I don't know why they did not do that. I still don't know why my job could not write its output, all the others can. A couple of goofs in quick succession,
15) Message boards : Rosetta@home Science : Run time. (Message 90015)
Posted 17 Dec 2018 by Profile adrianxw
Post:
I raised the issue with the research team, I got this back...

>>>
Do you happen to know the name(s) of these jobs? There was a problematic batch that was sent out by a researcher in the lab that had '..' in the name which the BOINC client did not like. These jobs would fail and may be causing the odd behavior. These jobs also had very long names.
<<<

... which seems to apply to your record. Looking at the names of my failures, the .. appears.
16) Message boards : Rosetta@home Science : Run time. (Message 89999)
Posted 14 Dec 2018 by Profile adrianxw
Post:
And now there is something seriously wrong going on here. Looking at my "errors" page, most of the older ones, that had values before are showing "Timed out - no response" - they did NOT show that before.

1045594884 942000780 3117659 6 Dec 2018, 12:54:10 UTC 14 Dec 2018, 12:54:10 UTC Timed out - no response 0.00 0.00 --- Rosetta Mini v3.78
windows_intelx86

Secure copies made.
17) Message boards : Rosetta@home Science : Run time. (Message 89996)
Posted 13 Dec 2018 by Profile adrianxw
Post:
Err, I'm confused now. What I said is...

>>>
Given what you have said here, if a unit has a "computation error" like this one... (highlight added)

https://boinc.bakerlab.org/result.php?resultid=1046340233
<<<<

... I quite understand why tasks can be cancelled by server, and quite agree with the function, however, I did not say that the task was cancelled by server.

I did say I was searching my other active projects for "errors", but found none, only here.

A cancelled by server work unit is this one...

http://boinc.bakerlab.org/workunit.php?wuid=941131930.
18) Message boards : Rosetta@home Science : Run time. (Message 89987)
Posted 12 Dec 2018 by Profile adrianxw
Post:
Deleted.
19) Message boards : Rosetta@home Science : Run time. (Message 89982)
Posted 11 Dec 2018 by Profile adrianxw
Post:
I hear what you say, but, having spent some time checking the other projects totals, I have only one, that has a single error, and that is a "cancelled by server" which is not an error, and I don't know why it flagged as such. So Einstein, Milkyway, Seti, Yoyo, and Acoustics are not having any problems, just here, and just in the last week. Other projects have not sent work for a while but none I looked at had any errors.

Nothing has changed with Windows or Avast, (anti virus) , there is over 20 Gig on the SSD free.
20) Message boards : Rosetta@home Science : Run time. (Message 89980)
Posted 10 Dec 2018 by Profile adrianxw
Post:
Given what you have said here, if a unit has a "computation error" like this one...

https://boinc.bakerlab.org/result.php?resultid=1046340233

... it has run the job many times. I would expect a "computation error" in an early cycle to crash the work unit. yet that one ran for the time limit I have set using the same protein simply with a different random number start point. This implies that the job has run normally for many start points. I have noticed a number of errors in the last few days actually, the one I highlight is just the worst.


Next 20



©2019 University of Washington
http://www.bakerlab.org