Minirosetta v1.45 bug thread

Message boards : Number crunching : Minirosetta v1.45 bug thread

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5

AuthorMessage
A Few Good Men

Send message
Joined: 25 Mar 07
Posts: 14
Credit: 2,031,382
RAC: 0
Message 57889 - Posted: 15 Dec 2008, 14:54:55 UTC

Greg,
I did a full Hard Drive format, there are no files project or otherwise then reinstalled XP. The slots are cleaned up.
After reinstalling Boinc and Rosetta as a project letting it manage itself for 24hours I had all the same errors as before, lockfile, not releasing and as always no credit. I have been running the services from a nonSystem disk, I will reinstall on the system disk and see how that works.
Thanks for taking time to help out.
ID: 57889 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,518,559
RAC: 10,612
Message 57890 - Posted: 15 Dec 2008, 14:57:57 UTC - in response to Message 57703.  

And now 16 more successes out of 18 making 40 out of 50. Most errors came early, so I'm now confident enough to up my run-time from 2 to 3 hours again.

Update on this. In the last week, 116 MiniRosetta 1.45 tasks, 3hr runtime:
64 Success (55%)
52 Failure (45%)

Of failures:
19 manually aborted
28 "Can't acquire lockfile" errors
5 Exit code -1073741819 (0cx0000005)

26 Rosetta Beta 5.98 tasks, 3hr runtime - 100% success.

So, better than last time I ran with 3hr runtimes (was 43%) but still some way to go. I think the figure for 2hr run times was 73% (up to 80% with v1.45 on above figures).
ID: 57890 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ramostol

Send message
Joined: 6 Feb 07
Posts: 64
Credit: 584,052
RAC: 0
Message 57891 - Posted: 15 Dec 2008, 15:09:48 UTC

A number of failed 1.45 abinitio-tasks in the last 24 hours:

These two I had to abort, they had run more than 20 hours on 4 hours default, refused to display graphics:

abinitio_abrelax_nohomfrag_129_B_2ccvA_5483_3423_088

abinitio_abrelax_nohomfrag_129_B_1ctf__5483_3423_0

These collapsed quickly:

abinitio_abrelax_nohomfrag_129_B_1dzoA_5483_1560_1

<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
minirosetta_1.45_i686-apple-darwin(47077,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
# cpu_run_time_pref: 14400
minirosetta_1.45_i686-apple-darwin(47077,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
SIGBUS: bus error

abinitio_abrelax_nohomfrag_129_B_1npsA_5483_3423_0

<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
minirosetta_1.45_i686-apple-darwin(48148,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
# cpu_run_time_pref: 14400
SIGBUS: bus error

abinitio_abrelax_nohomfrag_129_B_2chf__5483_3423_0

<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
minirosetta_1.45_i686-apple-darwin(48486,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
# cpu_run_time_pref: 14400
minirosetta_1.45_i686-apple-darwin(48486,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
minirosetta_1.45_i686-apple-darwin(48486,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
SIGBUS: bus error

abinitio_abrelax_nohomfrag_129_B_1o4wA_5483_3423_0

<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
minirosetta_1.45_i686-apple-darwin(58522,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
# cpu_run_time_pref: 14400
minirosetta_1.45_i686-apple-darwin(58522,0xb0087000) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
terminate called after throwing an instance of 'std::length_error'
what(): basic_string::_S_create
SIGABRT: abort called

abinitio_abrelax_nohomfrag_129_B_1elwA_5483_3423_0

<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
minirosetta_1.45_i686-apple-darwin(47419,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
# cpu_run_time_pref: 14400
SIGSEGV: segmentation violation
ID: 57891 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ramostol

Send message
Joined: 6 Feb 07
Posts: 64
Credit: 584,052
RAC: 0
Message 57892 - Posted: 15 Dec 2008, 15:12:53 UTC
Last modified: 15 Dec 2008, 15:13:57 UTC

Deleted as duplicate - unstable internet connection...
ID: 57892 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 57897 - Posted: 15 Dec 2008, 18:50:28 UTC
Last modified: 15 Dec 2008, 19:15:38 UTC

come on guys..your killing me.
2 compute errors in 8 hours today and then 4 out of 6 compute errors on the 11th that can be placed on bad tasks. What is with tasks getting half way and then crashing with no credit? You should make rosie grant the claimed credit on these errors since it is computing the credit. then we are not wasting our cpu time and electricity on 0 points. I could have got 101 points for the second crash. this month i have lost 18.5 hrs in bad tasks that died halfway and I could have got 508.4 credits if there was granted credit for crashing. The crash rate this month so far has been 6% on my system.

Here are the latest tasks that died.

1ef4A_ZNMP_ABRELAX_tetraL_IGNORE_THE_REST_ZINC_METALLOPROTEIN-1ef4A-_5478_9411_0 Exit status -1073741819 (0xc0000005), CPU time 12223.05
stderr out

<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600

</stderr_txt>
]]>
-----------

abinitio_abrelax_nohomfrag_129_B_1vie__5483_167_0
CPU time 14489.83
Exit status -1073741819 (0xc0000005)
Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0046D054 write attempt to address 0x085A5FFC

Engaging BOINC Windows Runtime Debugger...
ID: 57897 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ramostol

Send message
Joined: 6 Feb 07
Posts: 64
Credit: 584,052
RAC: 0
Message 57924 - Posted: 16 Dec 2008, 10:17:21 UTC

With 1.47 launched these results are perhaps not too interesting, but to be on the safe side:

This has indeed been a Black Monday, with all 1.45 tasks reserved for the coming week already crashed.

Early crashes with no result file:
8 tasks lr5_score13
2 tasks lr5_score12
1 task cc_3_5_nocst4
1 task 1_irna
1 task cs_vanilla
26 tasks abinitio...

In addition 2 tasks had to be manually aborted showing the signs of being non-terminators:
abinitio_abrelax_nohomfrag_129_B_2acy__5483_2781_0

abinitio_abrelax_nohomfrag_129_B_1r26A_5483_2512_0

ID: 57924 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 57931 - Posted: 16 Dec 2008, 12:51:58 UTC - in response to Message 57924.  

With 1.47 launched these results are perhaps not too interesting, but to be on the safe side:

This has indeed been a Black Monday, with all 1.45 tasks reserved for the coming week already crashed.

Early crashes with no result file:
8 tasks lr5_score13
2 tasks lr5_score12
1 task cc_3_5_nocst4
1 task 1_irna
1 task cs_vanilla
26 tasks abinitio...

In addition 2 tasks had to be manually aborted showing the signs of being non-terminators:
abinitio_abrelax_nohomfrag_129_B_2acy__5483_2781_0

abinitio_abrelax_nohomfrag_129_B_1r26A_5483_2512_0



You seem to be inserting a few extra characters when you create links, probably quote marks, which prevents me from following the links.
ID: 57931 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 57933 - Posted: 16 Dec 2008, 13:08:41 UTC - in response to Message 57931.  
Last modified: 16 Dec 2008, 13:10:53 UTC

With 1.47 launched these results are perhaps not too interesting, but to be on the safe side:

This has indeed been a Black Monday, with all 1.45 tasks reserved for the coming week already crashed.

Early crashes with no result file:
8 tasks lr5_score13
2 tasks lr5_score12
1 task cc_3_5_nocst4
1 task 1_irna
1 task cs_vanilla
26 tasks abinitio...

In addition 2 tasks had to be manually aborted showing the signs of being non-terminators:
abinitio_abrelax_nohomfrag_129_B_2acy__5483_2781_0

abinitio_abrelax_nohomfrag_129_B_1r26A_5483_2512_0



You seem to be inserting a few extra characters when you create links, probably quote marks, which prevents me from following the links.




**note** i edited his url lines to get rid of the " ". thing should be ok now.
there is nothing to see there as it was a user abort and that cancels run time information. this is for both links.
ID: 57933 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 57940 - Posted: 16 Dec 2008, 19:25:13 UTC
Last modified: 16 Dec 2008, 19:31:44 UTC

abinitio_abrelax_nohomfrag_129_B_1opd__5483_1764_0

now you guys are starting to irritate me badly!!!!!!!!!
CPU time 21344.08 vs 21600run time.
stderr out

<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00427BEA write attempt to address 0x08787FFC


What the heck is this error now? did your task go bad at the last minute?
can someone from the team explain what the heck 0xc blah blah error means?

I was giving you 6hr run times but now i have dropped to 4. to many credit losses lately. If the 1.47's crash I will be reducing my resource share as well, until you guys figure out what the heck is going on.
ID: 57940 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 57959 - Posted: 17 Dec 2008, 8:59:12 UTC
Last modified: 17 Dec 2008, 9:00:52 UTC

duplicate
ID: 57959 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2842
Credit: 2,020,043
RAC: 0
Message 58441 - Posted: 4 Jan 2009, 1:00:08 UTC

ID: 58441 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bas

Send message
Joined: 18 Dec 08
Posts: 1
Credit: 2,316
RAC: 0
Message 58526 - Posted: 5 Jan 2009, 16:49:48 UTC

I'm new here so i don't know much about this program. But for a couple of days i don't get new task(s) this is what i see at messages window:

5-1-2009 17:28:28|rosetta@home|Sending scheduler request: To fetch work. Requesting 21601 seconds of work, reporting 0 completed tasks
5-1-2009 17:28:33|rosetta@home|Scheduler request completed: got 0 new tasks

Can i do something about this or are their just not any new tasks at the moment?
ID: 58526 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Evan

Send message
Joined: 23 Dec 05
Posts: 268
Credit: 402,585
RAC: 0
Message 58527 - Posted: 5 Jan 2009, 17:13:23 UTC

It would appear that there are no new tasks at the moment. Be patient and all will be revealed. There haven't been any announcement from the team so either there is a problem their end or they are getting some new work units ready.
ID: 58527 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5

Message boards : Number crunching : Minirosetta v1.45 bug thread



©2024 University of Washington
https://www.bakerlab.org