Problems with minirosetta version 1.+

Message boards : Number crunching : Problems with minirosetta version 1.+

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

AuthorMessage
Profile banditwolf

Send message
Joined: 10 Jan 06
Posts: 28
Credit: 139,737
RAC: 0
Message 51301 - Posted: 10 Feb 2008, 14:46:19 UTC

They are freezing not crashing. The wu's are doing ~15 min of work and then going to the next unit instead of finishing it.
ID: 51301 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
j2satx

Send message
Joined: 17 Sep 05
Posts: 97
Credit: 3,670,592
RAC: 0
Message 51302 - Posted: 10 Feb 2008, 15:21:06 UTC - in response to Message 51291.  

I got 9 mini 6 hour wu's today. I have had 5 of them 'freeze' around 12-18 min and then go on to the next wu. I have not had this with Rosetta before today, but I have had some do this on MilkyWay. Is there any info I can leave to help? I am using Boinc 5.10.13.


You can help by joining the Ralph@home project for alpha testing if you haven't already.

http://ralph.bakerlab.org


Give us a few thousand WUs over there to work on.
ID: 51302 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Angus

Send message
Joined: 17 Sep 05
Posts: 412
Credit: 321,053
RAC: 0
Message 51305 - Posted: 10 Feb 2008, 20:37:15 UTC - in response to Message 51291.  

I attached to Ralph.

There is no work to use for testing.

Nice try.

I got 9 mini 6 hour wu's today. I have had 5 of them 'freeze' around 12-18 min and then go on to the next wu. I have not had this with Rosetta before today, but I have had some do this on MilkyWay. Is there any info I can leave to help? I am using Boinc 5.10.13.


You can help by joining the Ralph@home project for alpha testing if you haven't already.

http://ralph.bakerlab.org


Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :)



"You can't fix stupid" (Ron White)
ID: 51305 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 51306 - Posted: 10 Feb 2008, 20:54:07 UTC

I want to wait a little bit to give users who want to help out a chance to download the symbols file. We'll send out more test jobs later tonight or tomorrow.
ID: 51306 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5659
Credit: 5,691,837
RAC: 1,806
Message 51311 - Posted: 10 Feb 2008, 22:01:33 UTC
Last modified: 10 Feb 2008, 22:02:07 UTC

so far on my system at a 4 hr setting, 1 ran 3hrs 50 or so and quit
another i ran only went to 3hrs 15 min or so and quit.
first one came in 5 credits under claimed, the second one came in at 8 under claimed. kinda hurting my average with these.
ID: 51311 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 51312 - Posted: 11 Feb 2008, 0:21:30 UTC - in response to Message 51311.  

so far on my system at a 4 hr setting, 1 ran 3hrs 50 or so and quit
another i ran only went to 3hrs 15 min or so and quit.
first one came in 5 credits under claimed, the second one came in at 8 under claimed. kinda hurting my average with these.


I don't know why it would be much different compared to the standard rosetta app with our credit averaging system. It does use more memory but we are working on reducing it.
ID: 51312 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
buren

Send message
Joined: 18 Nov 07
Posts: 21
Credit: 132,158
RAC: 0
Message 51313 - Posted: 11 Feb 2008, 1:56:49 UTC - in response to Message 51282.  
Last modified: 11 Feb 2008, 2:04:11 UTC

Yes, it's taking more RAM. I'm running one of each right now. Peak for 5.93 is 114,764K and for mini is 185,424. Current usage is 103M for 5.93 and 174M for mini.

Same here, 167MB (172 max) for Mini vs 154MB (155 max) for standard.

But Mini only takes 166MB of virtual memory, while the standard Rosetta takes 230MB. I don't know if you can just add up those two numbers to get the total memory usage of 334MB vs 484MB.

But this probably means anyway that Mini actually takes less RAM.

first one came in 5 credits under claimed, the second one came in at 8 under claimed. kinda hurting my average with these.
I almost always get a little less credit than claimed. 8 credits out of 350 is only about 2% anway, so it shouldn't matter much.
ID: 51313 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brook

Send message
Joined: 28 Mar 07
Posts: 2
Credit: 154,669
RAC: 0
Message 51314 - Posted: 11 Feb 2008, 2:04:57 UTC

Just a quick note on the NOD32 issue.
I had the same false-positive on my system and after having a quick look and a scan it appears that it is only IMON that detects Mini as a virus. IMON is NOD32's internet scanner, it actually scans files as they are downloading by watching HTTP and POP3 traffic to stop viruses making it onto a system in the first place. NOD32's realtime scanner does not detect Mini as a virus and neither does the on demand scanner.
ID: 51314 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jack Shaftoe
Avatar

Send message
Joined: 30 Apr 06
Posts: 115
Credit: 1,307,916
RAC: 0
Message 51317 - Posted: 11 Feb 2008, 3:23:16 UTC - in response to Message 51283.  
Last modified: 11 Feb 2008, 3:42:02 UTC

I checked the result summary and the minirosetta jobs have a 90% success rate on R@h which is lower than the old rosetta app but it's not bad.


From a Rosetta user's perspective, I would consider 90% to be the minimum acceptable success rate on RALPH. Anything less than 97-98% here on Rosetta is unacceptable. It sucks to realize that you got a work-unit from a non-alpha project that causes BOINC to lock up and crash.

Please be more diligent with your alpha testing. What's the rush to get it here? If we want to run beta/unstable workunits, we will run them on RALPH (which I recently had to suspend on all 3 of my workstations due to mini-rosetta consistently crashing BOINC completely).
ID: 51317 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 51318 - Posted: 11 Feb 2008, 4:43:19 UTC - in response to Message 51317.  

I checked the result summary and the minirosetta jobs have a 90% success rate on R@h which is lower than the old rosetta app but it's not bad.


From a Rosetta user's perspective, I would consider 90% to be the minimum acceptable success rate on RALPH. Anything less than 97-98% here on Rosetta is unacceptable. It sucks to realize that you got a work-unit from a non-alpha project that causes BOINC to lock up and crash.

Please be more diligent with your alpha testing. What's the rush to get it here? If we want to run beta/unstable workunits, we will run them on RALPH (which I recently had to suspend on all 3 of my workstations due to mini-rosetta consistently crashing BOINC completely).



I am not aware of a work unit on R@h from mini that causes BOINC to lock up and crash. There is no real rush except for the interest of science and getting results, that is why we are slowly adding mini work units to R@h. We do however need to get mini running for CASP which is coming up this summer. Ralph does not have the same diversity of computers and active users as R@h so we have to eventually start running jobs, particularly since ralph jobs are having a similar success rate as R@h. I realize some people will have computers that don't like some mini work units as there are some computers that do not like rosetta tasks.
ID: 51318 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
j2satx

Send message
Joined: 17 Sep 05
Posts: 97
Credit: 3,670,592
RAC: 0
Message 51319 - Posted: 11 Feb 2008, 5:25:02 UTC - in response to Message 51318.  

I checked the result summary and the minirosetta jobs have a 90% success rate on R@h which is lower than the old rosetta app but it's not bad.


From a Rosetta user's perspective, I would consider 90% to be the minimum acceptable success rate on RALPH. Anything less than 97-98% here on Rosetta is unacceptable. It sucks to realize that you got a work-unit from a non-alpha project that causes BOINC to lock up and crash.

Please be more diligent with your alpha testing. What's the rush to get it here? If we want to run beta/unstable workunits, we will run them on RALPH (which I recently had to suspend on all 3 of my workstations due to mini-rosetta consistently crashing BOINC completely).



I am not aware of a work unit on R@h from mini that causes BOINC to lock up and crash. There is no real rush except for the interest of science and getting results, that is why we are slowly adding mini work units to R@h. We do however need to get mini running for CASP which is coming up this summer. Ralph does not have the same diversity of computers and active users as R@h so we have to eventually start running jobs, particularly since ralph jobs are having a similar success rate as R@h. I realize some people will have computers that don't like some mini work units as there are some computers that do not like rosetta tasks.


The list of CPUs on Ralph looks very much like the list of CPUs on Rosetta, just fewer of them.........of course it doesn't matter there are fewer CPUs on Ralph, because you don't give us enough WUs to keep those CPUs busy anyway.

What does the diversity of active users have to do with anything?

If you aren't going to use Ralph as a "real" test site, why don't you shut it down, conserve those resources and continue to test on Rosetta like you do now.
ID: 51319 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Angus

Send message
Joined: 17 Sep 05
Posts: 412
Credit: 321,053
RAC: 0
Message 51321 - Posted: 11 Feb 2008, 6:31:49 UTC - in response to Message 51319.  
Last modified: 11 Feb 2008, 6:33:37 UTC

I checked the result summary and the minirosetta jobs have a 90% success rate on R@h which is lower than the old rosetta app but it's not bad.


From a Rosetta user's perspective, I would consider 90% to be the minimum acceptable success rate on RALPH. Anything less than 97-98% here on Rosetta is unacceptable. It sucks to realize that you got a work-unit from a non-alpha project that causes BOINC to lock up and crash.

Please be more diligent with your alpha testing. What's the rush to get it here? If we want to run beta/unstable workunits, we will run them on RALPH (which I recently had to suspend on all 3 of my workstations due to mini-rosetta consistently crashing BOINC completely).



I am not aware of a work unit on R@h from mini that causes BOINC to lock up and crash. There is no real rush except for the interest of science and getting results, that is why we are slowly adding mini work units to R@h. We do however need to get mini running for CASP which is coming up this summer. Ralph does not have the same diversity of computers and active users as R@h so we have to eventually start running jobs, particularly since ralph jobs are having a similar success rate as R@h. I realize some people will have computers that don't like some mini work units as there are some computers that do not like rosetta tasks.


The list of CPUs on Ralph looks very much like the list of CPUs on Rosetta, just fewer of them.........of course it doesn't matter there are fewer CPUs on Ralph, because you don't give us enough WUs to keep those CPUs busy anyway.

What does the diversity of active users have to do with anything?

If you aren't going to use Ralph as a "real" test site, why don't you shut it down, conserve those resources and continue to test on Rosetta like you do now.

Agreed.

90% is a horrible success rate. 1 of every 10 WUs fail? How are you going to get 97-98% on Rosetta if you can only get 90% on Ralph?

I would think until you can get 99.x% on Ralph, it should never see Rosetta.

The whole point of Ralph was to get the bugs out and provide a clean debugged application and WU mix to Rosetta.
Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :)



"You can't fix stupid" (Ron White)
ID: 51321 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jack Shaftoe
Avatar

Send message
Joined: 30 Apr 06
Posts: 115
Credit: 1,307,916
RAC: 0
Message 51323 - Posted: 11 Feb 2008, 6:34:49 UTC - in response to Message 51318.  
Last modified: 11 Feb 2008, 6:39:36 UTC

Ralph does not have the same diversity of computers and active users as R@h so we have to eventually start running jobs


This is something that should be published on the front page of the project, as it is a significantly different approach than almost all other BOINC projects and conflicts with what you are telling us we are doing. To quote a user from a few weeks ago:

"I joined because I want to cure people, not because I want to test software. I understand that with better software, the results might improve - however I would like to stick to the real rosetta for the time being."

It seems like there is no "real rosetta" per your comment above. If you need more diversity on RALPH, perhaps you should be asking for it from your contributors (us). The fact that we are here, attached, and reading the forum means that if you need specific help over at RALPH - i.e. specific OS or hardware tested - just ask. We might be able to help. Lots of people connected to Rosetta don't check the message board. If WU's start failing or acting up, they disconnect from the project and find another one.

According to David E K, there was a memory issue that got through the alpha testing without being identified.

There is a SAV false positive that got through alpha without being identified.

There are several other quirky issues in this thread.

How long was 1.07 tested on Ralph? Doesn't it seem apparent it should have seen more time? I'm not trying to be a jerk about this, I just think that you would help us immensely by being more thorough in alpha.
ID: 51323 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Angus

Send message
Joined: 17 Sep 05
Posts: 412
Credit: 321,053
RAC: 0
Message 51325 - Posted: 11 Feb 2008, 7:39:36 UTC
Last modified: 11 Feb 2008, 7:39:48 UTC

How long was 1.07 tested on Ralph? Doesn't it seem apparent it should have seen more time?

Your answer was posted previously:
with Mini Rosetta 1.07 there is a new record for "test period": 33 minutes and 14 seconds between the time it was released on Ralph and Rosetta.

Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :)



"You can't fix stupid" (Ron White)
ID: 51325 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 51329 - Posted: 11 Feb 2008, 19:37:31 UTC

This is in response to angus, j2satx and jack's comments

The only difference between 1.06, which has been tested thoroughly IMO, and 1.07, is a minor and trivial bug fix. That is why the "test period" was short. There isn't a significant difference in the success rates between minirosetta and old rosetta, see below. I understand people's opinions and frustrations about the testing, acceptable success rates, and errors. That is why we stopped issuing mini tasks for the time being. We will however, slowly issue tasks again for experiments that we are interested in and that are important. We are not running tests on R@h, they are all production jobs. I don't think it is possible for Ralph to be like R@h. It's a lot to ask from users to run a test site and expect the same turnout as a production site and we wouldn't want to crash thousands of computers during a testing phase. Also, I don't like to waste cpu time by flooding ralph with tasks unless we are able to get useful debugging feedback. Issues are likely to surface on R@h that do not on Ralph (and visa versa) due to differences in the computers and users, not just the computers that are listed but the active computers that do work and the active users that provide feedback. The diversity may look similar from the site, but I see differences in practice. Looking at the status of both applications, it's hard to say that more could be done with mini compared to old rosetta. There's room for improvement for both apps.

The remaining issues that we'd like to address with mini are:

1. screen saver. currently under dev using the minirosetta game framework, may even be an optional game where people can compete with computers and each other to design/fold proteins.
2. larger memory footprint. being worked on and will likely be smaller than old rosetta eventually
3. weird NOD32 virus detection with the NOD32's IMON internet scanner. Scanning the application manually does not detect a virus and obviously it is not a virus, but this is a hassle for NOD32 IMON users so we are looking into it.
4. a rare access violation error
5. a rare validation error

Here is a comparison of the status reports (note, we are not planning to support PPC macs with mini):

minirosetta 1.07




rosetta 5.93




I hope this helps. I'm trying to respond to everyones comments and provide useful info. I know it's a big jump for some users to run a new app and I know some users will have issues with it. So if you have any worries or questions, don't hesitate to ask or make comments.

thanks!
ID: 51329 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
M.L.

Send message
Joined: 21 Nov 06
Posts: 182
Credit: 180,462
RAC: 0
Message 51334 - Posted: 11 Feb 2008, 19:51:14 UTC

Can see that i need to get an 'unknown' PC and never have a failure.

Many thanks for your reply to all the problems.
ID: 51334 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 51336 - Posted: 11 Feb 2008, 20:55:43 UTC

"If you aren't going to use Ralph as a "real" test site, why don't you shut it down, conserve those resources and continue to test on Rosetta like you do now."

Ralph is invaluable. We catch many bugs and errors with ralph that would otherwise happen on R@h.
ID: 51336 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 51342 - Posted: 11 Feb 2008, 23:03:22 UTC

j2satx,

Did you download the 1.07 mini app symbols file into your ralph project directory? I see an access violation error from one of your computers but do not see any trace information.
ID: 51342 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5659
Credit: 5,691,837
RAC: 1,806
Message 51343 - Posted: 11 Feb 2008, 23:05:05 UTC - in response to Message 51336.  

I think the general flavor of things here at R@H is that longer more exhaustive testing should be done at Ralph before the program is released to R@H.
The feeling is that "beta" type work should be done at Ralph and then a fresh named release without "beta" be released on R@H. The feeling as I see it, is that work units and programs are not tested enough before being released here. minirosetta for instance should be deeply tested on Ralph and when it is "complete" and work for that program has been tested deeply and when it is 99% tested with work, then release it here. Don't halfway test it on Ralph and then release it here and then pull it back because there are bugs in it.

"If you aren't going to use Ralph as a "real" test site, why don't you shut it down, conserve those resources and continue to test on Rosetta like you do now."

Ralph is invaluable. We catch many bugs and errors with ralph that would otherwise happen on R@h.


ID: 51343 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 51344 - Posted: 12 Feb 2008, 0:44:42 UTC - in response to Message 51343.  

I didn't pull back the mini tasks because there are bugs in it. I did it in response to user's complaints and the fact that the minirosetta jobs were filling the queue more than expected. Are you talking about the beta in the rosetta application name? We should take that out of the name since it isn't a beta app and is misleading.

I think the general flavor of things here at R@H is that longer more exhaustive testing should be done at Ralph before the program is released to R@H.
The feeling is that "beta" type work should be done at Ralph and then a fresh named release without "beta" be released on R@H. The feeling as I see it, is that work units and programs are not tested enough before being released here. minirosetta for instance should be deeply tested on Ralph and when it is "complete" and work for that program has been tested deeply and when it is 99% tested with work, then release it here. Don't halfway test it on Ralph and then release it here and then pull it back because there are bugs in it.

"If you aren't going to use Ralph as a "real" test site, why don't you shut it down, conserve those resources and continue to test on Rosetta like you do now."

Ralph is invaluable. We catch many bugs and errors with ralph that would otherwise happen on R@h.


ID: 51344 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

Message boards : Number crunching : Problems with minirosetta version 1.+



©2024 University of Washington
https://www.bakerlab.org