Minirosetta v1.47 bug thread.

Message boards : Number crunching : Minirosetta v1.47 bug thread.

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 10 · Next

AuthorMessage
Mike Tyka

Send message
Joined: 20 Oct 05
Posts: 96
Credit: 2,190
RAC: 0
Message 57902 - Posted: 15 Dec 2008, 22:08:36 UTC

HoHo kids!

We've got a new minirosetta version, with - you've guessed it - more bug fixes ! Woo!

Please report remaining issues here - that would be grand :)
http://beautifulproteins.blogspot.com/
http://www.miketyka.com/
ID: 57902 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Stephen

Send message
Joined: 26 Apr 08
Posts: 32
Credit: 429,286
RAC: 0
Message 57903 - Posted: 15 Dec 2008, 22:21:18 UTC - in response to Message 57902.  

are there any new changes to the science?
ID: 57903 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 57905 - Posted: 15 Dec 2008, 23:25:25 UTC

......sooooo which bugs do you feel you've fixed?

Which of the many users that have abandoned the project due to problems should feel it is safe to reenter the waters?
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 57905 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Tyka

Send message
Joined: 20 Oct 05
Posts: 96
Credit: 2,190
RAC: 0
Message 57906 - Posted: 16 Dec 2008, 0:04:21 UTC - in response to Message 57905.  

......sooooo which bugs do you feel you've fixed?

Which of the many users that have abandoned the project due to problems should feel it is safe to reenter the waters?


Amongst a bunch of minor things, one major bug that was fixed was causing jobs ro rash when they enetered full-atom stage ut had a fullatom energy > 0. Which usually occurs rarely, which would explain the random errors seen with the cs_vanilla jobs. The bug was due to a wrongly initialized varaible.
This bug was also causing the majority of the ccc_1_8_* jobs to fail on RALPH (we didnt move these over to BOINC of course, sicne we noticed the bug there).
THe reason those failed more frequently was that they have constraints built in and those cause the energy to be offset to higher values increasing the frequency of the problem to more like 70%.

Looking at the RALPH results i think most of the easily reproducable errors i think we've fixed. I recently ran close to 10000 WUs on our local compute cluster resulting in.. well.. 0 errors. This is wherei t gets tricky really, if stuff is only failing on other plattforms or due to machine dependent issues or *god knows what*. I will propose that the lab aquire a small farm of windows machiens to do extensive bug testing& hunting on to get a grip one these errors.. but believe us, these are difficult grounds.

Thanks for bearing with us,

Mike
http://beautifulproteins.blogspot.com/
http://www.miketyka.com/
ID: 57906 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LizzieBarry

Send message
Joined: 25 Feb 08
Posts: 76
Credit: 201,862
RAC: 0
Message 57910 - Posted: 16 Dec 2008, 1:24:57 UTC

Sorry Mike, not a good start...

1483407
<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0049162C read attempt to address 0x00000000

Engaging BOINC Windows Runtime Debugger...
ID: 57910 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 57911 - Posted: 16 Dec 2008, 1:39:50 UTC - in response to Message 57906.  

......sooooo which bugs do you feel you've fixed?

Which of the many users that have abandoned the project due to problems should feel it is safe to reenter the waters?


Amongst a bunch of minor things, one major bug that was fixed was causing jobs ro rash when they enetered full-atom stage ut had a fullatom energy > 0. Which usually occurs rarely, which would explain the random errors seen with the cs_vanilla jobs. The bug was due to a wrongly initialized varaible.
This bug was also causing the majority of the ccc_1_8_* jobs to fail on RALPH (we didnt move these over to BOINC of course, sicne we noticed the bug there).
THe reason those failed more frequently was that they have constraints built in and those cause the energy to be offset to higher values increasing the frequency of the problem to more like 70%.

Looking at the RALPH results i think most of the easily reproducable errors i think we've fixed. I recently ran close to 10000 WUs on our local compute cluster resulting in.. well.. 0 errors. This is wherei t gets tricky really, if stuff is only failing on other plattforms or due to machine dependent issues or *god knows what*. I will propose that the lab aquire a small farm of windows machiens to do extensive bug testing& hunting on to get a grip one these errors.. but believe us, these are difficult grounds.

Thanks for bearing with us,

Mike


I can't even imagine the loads of code you (guys) went thru.
ID: 57911 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 57913 - Posted: 16 Dec 2008, 1:56:14 UTC

The one bug that comes to mind that would not be easy to observe by counting successfully completed results, on a farm of Linux machines all running only a single project, would be where the tasks were not suspending properly. Someone mentioned a BOINC API compatibility problem might be the cause?

What would reasonable memory expectations be now? Are all the 1.47 tasks tagged as needing 512MB minimum? Or is there a mix? And, of that 512MB, what should one expect to see a task actually using when running normally?
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 57913 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Tyka

Send message
Joined: 20 Oct 05
Posts: 96
Credit: 2,190
RAC: 0
Message 57914 - Posted: 16 Dec 2008, 2:12:00 UTC - in response to Message 57913.  

The one bug that comes to mind that would not be easy to observe by counting successfully completed results, on a farm of Linux machines all running only a single project, would be where the tasks were not suspending properly. Someone mentioned a BOINC API compatibility problem might be the cause?

You're right. However I believe David Kim has updated and fixed this problem, at 1.45. If you guys *still* see problems with suspension of jobs then do let us know. We also hope that this lockfile problem should be largely fixed. We'll have to wait for the error statistics to come in before we know if the API fix has worked.



What would reasonable memory expectations be now? Are all the 1.47 tasks tagged as needing 512MB minimum? Or is there a mix? And, of that 512MB, what should one expect to see a task actually using when running normally?


I can't speak for the enzyme design guys but to give you an idea:

The jobs named "*_rlbd_*" and "*_rlbn_*" should take no more than 160 MB or so.
The jobs named "cc2_*" or "*_chunk_*" should take between 150 and 320MB or so (they are much larger proteins).

I'm not aware of any jobs that require more than 400MB, that would definitely point to a problem. ALthough the enzyme design guys may well have higher requirements.

Mike


http://beautifulproteins.blogspot.com/
http://www.miketyka.com/
ID: 57914 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Tyka

Send message
Joined: 20 Oct 05
Posts: 96
Credit: 2,190
RAC: 0
Message 57915 - Posted: 16 Dec 2008, 2:14:15 UTC - in response to Message 57910.  

Sorry Mike, not a good start...


Yes, i know. I'm not saying the app is perfect - just that we found a bunch of definite bugs that are now fixed. No doubt there are still issues - we'r e working on it :)

Mike

http://beautifulproteins.blogspot.com/
http://www.miketyka.com/
ID: 57915 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Tyka

Send message
Joined: 20 Oct 05
Posts: 96
Credit: 2,190
RAC: 0
Message 57916 - Posted: 16 Dec 2008, 2:16:13 UTC - in response to Message 57911.  


I can't even imagine the loads of code you (guys) went thru.


Well.. to give you an idea .. Minirosetta has more than 200000 (yes two hundred thousand) lines of code. Each day there are maybe around 20 additions to the code, with around 40 people working on the code at each given time.

But we'll get there, i'm optimisitic that with time we'll find the problems.
http://beautifulproteins.blogspot.com/
http://www.miketyka.com/
ID: 57916 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,633,150
RAC: 945
Message 57923 - Posted: 16 Dec 2008, 8:53:03 UTC
Last modified: 16 Dec 2008, 8:54:42 UTC

i hope you guys get a small farm of windows machines to double check problems against your linux machines. windows is what the majority of us crunchers use and certain error types may or may not show up on linux.

for instance, how does one tell the difference between a machine error and a application error when the task dies with a (0xc0000005) error? is this something that shows up on your linux machines? or is that a specific windows error code?

also in another thread you mentioned aborting tasks that are using lower than 1.47. would these tasks be reissued using 1.47 or would they use the same mini that they originated with?
ID: 57923 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ramostol

Send message
Joined: 6 Feb 07
Posts: 64
Credit: 584,052
RAC: 0
Message 57926 - Posted: 16 Dec 2008, 10:50:42 UTC

My 1.47 cc2_1_8_mammoth-tasks have all crashed on Ralph, now my 1.47 cc2_1_8_native-tasks are crashing on Rosetta.

Example (1 of 2):
cc2_1_8_native_fa_cst_hb_t369__IGNORE_THE_REST_1S3QA_4_5599_36_0

<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
minirosetta_1.47_i686-apple-darwin(95094,0xa0538fa0) malloc: *** error for object 0x1747df0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
# cpu_run_time_pref: 14400
SIGSEGV: segmentation violation

ID: 57926 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ramostol

Send message
Joined: 6 Feb 07
Posts: 64
Credit: 584,052
RAC: 0
Message 57927 - Posted: 16 Dec 2008, 11:59:02 UTC

And now all today's imported 1.47-tasks for the upcoming week have collapsed, most of them after less than 1 minute of computing, one was manually aborted as potentially ever-lasting.

It seems that I have to stick to my 5.98-tasks for some days and increase the default runtime.
ID: 57927 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
jjwhalen
Avatar

Send message
Joined: 20 Dec 06
Posts: 4
Credit: 399,398
RAC: 0
Message 57928 - Posted: 16 Dec 2008, 12:17:35 UTC

Minirosetta apparently "looks like" malware, whether it actually is or not. This applies to all versions I've run, thru v1.47.

I run BOINC on two WinVista (God help me) boxes: one a 32 bit Sony with ZoneAlarm Pro|ESET NOD32 for security; the other a 64 bit Sony with Kaspersky Internet Security 2009.

On the first machine, NOD32 Antivirus thinks the Minirosetta .exe either contains a viral signature or looks bad heuristically (their UI doesn't say which). I have to add an exclusion to get the thing out of quarantine, every time a new version is released. Interestingly, ZoneAlarm Pro's application module hasn't had a problem with it.

On the 64 bit machine, Kaspersky's Application Control module gives Minirosetta's executable a Threat Rating of "Potentially Dangerous" with a heuristic Danger Index score of 82. I have to manually override Kaspersky and move Minirosetta out of the "Untrusted Application" zone, to allow it to execute. (By comparison, Rosetta Beta 5.98 has a DI of 12, as does SETI's recently released Astropulse 5.0. SETI's regular Enhanced v6.03 has a DI of zero.)

I realize that heuristic analysis is as much art as science, but both ESET and Kaspersky are rated at or near the top of their field. Of 10 project hosts I subscribe to, with over 25 project executables, Minirosetta is the ONLY one that has ever sent up a red flag to my security suite(s). Since most folks leave their security suite (if any) on autopilot, there are potentially many testers who never get to run Minirosetta because the .exe goes immediately into a black hole. Somewhere in those 200,000 lines of code, something apparently looks funky.

Best wishes:)

ID: 57928 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nothing But Idle Time

Send message
Joined: 28 Sep 05
Posts: 209
Credit: 139,545
RAC: 0
Message 57929 - Posted: 16 Dec 2008, 12:32:05 UTC
Last modified: 16 Dec 2008, 12:34:34 UTC

After a 1 week hiatus I downloaded v1.47 and 4 tasks. The first task showed a completion time of 12 hours which corresponds to my chosen runtime. The other 3 tasks, all _rlbd_ tasks, showed completion times of only 1 hour. What's up with that? It suggests that the staff provided an estimated task runtime of something like 45 minutes instead of the customary 8 hours.

Because of the 1-hour runtimes BOINC also downloaded additional tasks to fill the cache. Not good.
ID: 57929 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
funkydude

Send message
Joined: 15 Jun 08
Posts: 28
Credit: 397,934
RAC: 0
Message 57935 - Posted: 16 Dec 2008, 13:17:00 UTC - in response to Message 57928.  


On the first machine, NOD32 Antivirus thinks the Minirosetta .exe either contains a viral signature or looks bad heuristically (their UI doesn't say which). I have to add an exclusion to get the thing out of quarantine, every time a new version is released.


Hello, I've been using both nod32 and rosetta for years now, I've never had nod32 detect rosetta as anything malicious, make sure you are updated. v3.0.672.0 DB 3695 as of writing.
ID: 57935 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LizzieBarry

Send message
Joined: 25 Feb 08
Posts: 76
Credit: 201,862
RAC: 0
Message 57936 - Posted: 16 Dec 2008, 13:29:41 UTC - in response to Message 57915.  

Sorry Mike, not a good start...


Yes, I know. I'm not saying the app is perfect - just that we found a bunch of definite bugs that are now fixed. No doubt there are still issues - we're working on it :)


That's ok. Just that I'm trying to get more active here again after some computer problems and the first 1.47 task crashed out quickly. The next 4 have run with no problems though. Hopefully that continues. Usually all the problems are mine, not yours.

Good to see a more active presence from you in this forum. You're feedback to issues makes a big difference, even if it's just to say you're working on it without a solution yet. That matters too.
ID: 57936 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,633,150
RAC: 945
Message 57939 - Posted: 16 Dec 2008, 14:42:37 UTC - in response to Message 57936.  

Sorry Mike, not a good start...


Yes, I know. I'm not saying the app is perfect - just that we found a bunch of definite bugs that are now fixed. No doubt there are still issues - we're working on it :)


That's ok. Just that I'm trying to get more active here again after some computer problems and the first 1.47 task crashed out quickly. The next 4 have run with no problems though. Hopefully that continues. Usually all the problems are mine, not yours.

Good to see a more active presence from you in this forum. You're feedback to issues makes a big difference, even if it's just to say you're working on it without a solution yet. That matters too.



Just to expand on the point of this person....Thanks for taking the time to tell us what is going on. We like to know and the silence has been deafening lately.
Thanks again for breaking it. We hope for more news as time goes along.
ID: 57939 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 57941 - Posted: 16 Dec 2008, 21:26:31 UTC

Hi.

I found a problem with the graphics on Ubuntu 8.04, mini 1.45 worked fine but now when i click

the show graphics button all i get is the outline of the graphic window, it looked transparent.

I could not close it normally i had to go to processes and kill it from there, also it was

showing that the graphics was using mini 1.40 for some reason. I'm sure that mini 1.45 was

using the graphics 1.45, not a bigge but still.

pete.

ID: 57941 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RodrigoPS

Send message
Joined: 28 Nov 08
Posts: 3
Credit: 1,286,066
RAC: 130
Message 57944 - Posted: 16 Dec 2008, 23:14:36 UTC - in response to Message 57941.  
Last modified: 16 Dec 2008, 23:16:20 UTC

Hi.

I found a problem with the graphics on Ubuntu 8.04, mini 1.45 worked fine but now when i click

the show graphics button all i get is the outline of the graphic window, it looked transparent.

I could not close it normally i had to go to processes and kill it from there, also it was

showing that the graphics was using mini 1.40 for some reason. I'm sure that mini 1.45 was

using the graphics 1.45, not a bigge but still.

pete.



I'm having the same problem, but in XP 32-bit, in one of the hosts after the installation of mini 1.47
ID: 57944 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 . . . 10 · Next

Message boards : Number crunching : Minirosetta v1.47 bug thread.



©2024 University of Washington
https://www.bakerlab.org