Minirosetta v1.32 bug thread

Message boards : Number crunching : Minirosetta v1.32 bug thread

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7

AuthorMessage
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,670,291
RAC: 2,328
Message 55655 - Posted: 9 Sep 2008, 21:11:31 UTC

Sid - your message threads make mention of the error, but no one has answered it.
ID: 55655 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1966
Credit: 38,188,338
RAC: 11,005
Message 55656 - Posted: 9 Sep 2008, 22:25:00 UTC - in response to Message 55648.  
Last modified: 9 Sep 2008, 22:26:34 UTC

Peculiar thing is, as soon as I had a little moan about most WUs falling over I just had a great little run of successes, including one in excess of 3 hours.

And the successes continue. The only change I made was to "Leave applications in memory while suspended?" I thought I had this as 'Yes' in my Boinc Manager settings, but it was marked as 'No' online. Hmm. And now 5.98 WUs are coming through too.

That 10 people may represent 10,000 who are having trouble, getting disillusioned and detaching.

It might do. Is there any evidence of that? The home page shows more users and more hosts each day and 239k successes in the last 24hours (up from 235k the previous time I mentioned it). These graphs support that. What's the basis of your assertion?

09/09/08 20:18:35||Starting BOINC client version 6.2.18 for windows_x86_64
[...]
09/09/08 20:18:35||OS: Microsoft Windows Vista: Ultimate x64 Editon, Service Pack 1, (06.00.6001.00)
[...]
PS : no problem with the 5.98 rosetta beta

I was about to highlight this for being another Vista64 issue, then I glanced at the error messages being given and I'm staying clear. Way out of my depth on that one!

Except I noticed all WUs succeeded prior to 7 Sept, which makes me wonder if my whole issue has been about leaving applications suspended in memory or not. I'll keep an eye on my progress now (with WU run time set to default 3 hours again).
ID: 55656 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mitrichr
Avatar

Send message
Joined: 23 May 07
Posts: 44
Credit: 1,005,660
RAC: 0
Message 55657 - Posted: 9 Sep 2008, 23:31:37 UTC - in response to Message 55656.  

Not clear on "Leave applications in memory while suspended?". Mine are Yes, should I switch to no and try again?

>>RSM
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


ID: 55657 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 55658 - Posted: 9 Sep 2008, 23:44:18 UTC - in response to Message 55657.  

Not clear on "Leave applications in memory while suspended?". Mine are Yes, should I switch to no and try again?

>>RSM


...only if you wish to test to see if undoing what apparently improved Sid's situation, and thus putting you with the settings that Sid thinks may have contributed to having problems.

In theory the setting will not effect whether a task runs properly or not. Sid may be building evidence that there is a flaw making the theory not match the reality.

In practice, you want to leave tasks in memory (virtual memory is where they are really) while suspended to preserve all the work possible. Otherwise you are shutting the task down (every hour be default) and it may not have had a chance to save a checkpoint for the work it has done, so the work is lost, and done again when it starts again later.
Rosetta Moderator: Mod.Sense
ID: 55658 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1966
Credit: 38,188,338
RAC: 11,005
Message 55661 - Posted: 10 Sep 2008, 0:26:27 UTC - in response to Message 55657.  

Mine are Yes, should I switch to no and try again?

Trawling back through other threads for clues, the advice seems to be to set it as 'Yes'. But your problemssymptoms seem very different to mine so I can't help, unfortunately.

I was commenting as much on the fact that the online setting was different to what I had in my Boinc Manager. I thought they synchronised on each Update.

In theory the setting will not effect whether a task runs properly or not. Sid may be building evidence that there is a flaw making the theory not match the reality.

Don't think for a minute I have any idea what I'm doing or that I have a plan - I don't! But until something else changes I could hardly make things worse than they were!

Maybe I've stumbled on some oversensitivity to one setting. I don't know. A couple of 2 hour WUs are going through now and 1 of 3 hours. Fingers crossed. I like to think I'm making a difference, even if I'm just deluding myself. (Probably the latter...)
ID: 55661 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1966
Credit: 38,188,338
RAC: 11,005
Message 55672 - Posted: 10 Sep 2008, 14:18:35 UTC - in response to Message 55661.  

A couple of 2 hour WUs are going through now and 1 of 3 hours. Fingers crossed.

All those went through ok, but 2 further ones failed overnight. Now it's all 5.98 WUs and 100% successes as usual.
ID: 55672 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
leonari

Send message
Joined: 11 Dec 05
Posts: 8
Credit: 4,021,196
RAC: 643
Message 55699 - Posted: 11 Sep 2008, 22:43:32 UTC

I noticed at 09:30 BST today, the 1th of September, 2008, that the Work Unit(WU) Rosette Mini 1.32 abinitio_homfrag_71A_1jfvA_4443_45274_0 had 10 minutes to run at 95% complete (after circa five and a half hours CPU run time). About 30 minutes later the WU had only completed another 1%, which could have been because of business work I was doing but because of the problems with Rosetta that I have had in the recent past, I suspended everything else to allow it to finish, and to see what would happen.
About two hours later, the CPU run time had increased to eight hours forty-seven minutes but had not finished at 98.139% complete, yet the SET application, which should run 75% of the time, had not moved. At that point I suspended the Rosetta aplication to allow SETIi@home Enhanced 6.03 to run.

With the time now at 22:54 BST, SETI has not apparently done anything since (that is no progess and no increase in CPU time. However on checking Windows 2000 Task Manager , the Rosetta Mini 1.32 is running at circa 30 to 90% CPU utilisation (even though it is suspended - allegedly), and SETI is running at 0%. I also checked the graphics for both SETI and Rosetta, neither worked.
Question I asked myself: what is at fault here: Rosetta mini 1.32; SETI 6.03 or BOINC 5.10.45?
To see what happens next I have reset Rosetta Mini. SETI has started, the SETI graphics now work, and Windows 2000 Task Manager shows SETI at 90 to 95% utilisation.
Comments, please?
By the way, am I so unlucky with Rosetta, or is this a common occurance?
Also, by the way, I filed another problem with Rosseta Mini on this thread a few days ago but, although I am sure it appeared in the "Thread Record", it has since disappeared. Reasons?
ID: 55699 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
leonari

Send message
Joined: 11 Dec 05
Posts: 8
Credit: 4,021,196
RAC: 643
Message 55700 - Posted: 11 Sep 2008, 23:05:30 UTC - in response to Message 55699.  

I noticed at 09:30 BST today, the 1th of September, 2008, that the Work Unit(WU) Rosette Mini 1.32 abinitio_homfrag_71A_1jfvA_4443_45274_0 had 10 minutes to run at 95% complete (after circa five and a half hours CPU run time). About 30 minutes later the WU had only completed another 1%, which could have been because of business work I was doing but because of the problems with Rosetta that I have had in the recent past, I suspended everything else to allow it to finish, and to see what would happen.
About two hours later, the CPU run time had increased to eight hours forty-seven minutes but had not finished at 98.139% complete, yet the SET application, which should run 75% of the time, had not moved. At that point I suspended the Rosetta aplication to allow SETIi@home Enhanced 6.03 to run.

With the time now at 22:54 BST, SETI has not apparently done anything since (that is no progess and no increase in CPU time. However on checking Windows 2000 Task Manager , the Rosetta Mini 1.32 is running at circa 30 to 90% CPU utilisation (even though it is suspended - allegedly), and SETI is running at 0%. I also checked the graphics for both SETI and Rosetta, neither worked.
Question I asked myself: what is at fault here: Rosetta mini 1.32; SETI 6.03 or BOINC 5.10.45?
To see what happens next I have reset Rosetta Mini. SETI has started, the SETI graphics now work, and Windows 2000 Task Manager shows SETI at 90 to 95% utilisation.
Comments, please?
By the way, am I so unlucky with Rosetta, or is this a common occurance?
Also, by the way, I filed another problem with Rosseta Mini on this thread a few days ago but, although I am sure it appeared in the "Thread Record", it has since disappeared. Reasons?


Stupid person that I am, I now see that some messages are hidden so my last question on why my previous message had disappeared is of no consquence.
My Laptop is a Dell 2.2 GHz C640i running Windows 2000 5.00.2195 SP4 - fairly old but has no problems running SETI or Ralph (strangely)!

ID: 55700 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BrnmccO1

Send message
Joined: 26 Jun 07
Posts: 17
Credit: 578,825
RAC: 0
Message 55724 - Posted: 12 Sep 2008, 18:09:16 UTC

Well, on this comp I've had quite a few 1.32 errors and some 1.28 errors as well. Like other people it run's 5.98's 100%.

191481586 is a typical example of the usual "Unhandled Exception Error" that bombs out the WU.

Hopefully 1.34 will be better! In any case, I for one won't be missing 1.32 RIP.
ID: 55724 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1966
Credit: 38,188,338
RAC: 11,005
Message 55744 - Posted: 14 Sep 2008, 1:43:08 UTC
Last modified: 14 Sep 2008, 1:43:49 UTC

Task ID 191460060
<core_client_version>6.2.18</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00475A96 read attempt to address 0x00000044

Engaging BOINC Windows Runtime Debugger...

Someone else took on this WU and it didn't fare any better either.
ID: 55744 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Roger L. Cousins

Send message
Joined: 5 Nov 05
Posts: 1
Credit: 19,555,040
RAC: 8,566
Message 55822 - Posted: 16 Sep 2008, 23:55:20 UTC

MiniRosetta seems to be spawning multiple threads. I have run out of Page file several times. I see eighteen threads in process right now, and many of them are using up to 170 Meg. What's up with that? How do I terminate them, short of using Task Manager to stop them one by one?

R Cousins
ID: 55822 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 28 Sep 05
Posts: 115
Credit: 101,358
RAC: 0
Message 55830 - Posted: 17 Sep 2008, 12:35:42 UTC - in response to Message 55822.  

MiniRosetta seems to be spawning multiple threads. I have run out of Page file several times. I see eighteen threads in process right now, and many of them are using up to 170 Meg.

Do you see it on your WinXP host 361486? Using which application? Single threads of any Windows process do not have their 'own' allocated memory (in the context described here), memory is allocated (and accessible) 'per process'. What's the total physical/virtual memory usage of the Minirosetta process? Your pagefile size?

What's up with that? How do I terminate them, short of using Task Manager to stop them one by one?

Task manager does not support terminating single threads. Are you sure you are seeing threads, not processes?

Peter
ID: 55830 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mitrichr
Avatar

Send message
Joined: 23 May 07
Posts: 44
Credit: 1,005,660
RAC: 0
Message 55869 - Posted: 18 Sep 2008, 18:42:44 UTC

It looks as if I may have solved my particular problems with Rosetta by giving up BOINC screen savers. I switched to standard Windows screen savers and re-attached the three computers which I had been forced to detach from Rosetta.

The problem had been that something in Rosetta was rendering the three machines totally useless, pinning the CPU at 100% an generally making me miserable. I would get the machines back by using Task Manager and shutting down the errant application.

Once I got rid of the BOINC screen saver, everything seemed to go back too normal.

As I said, I re-attached to Rosetta, now about 36 hours ago. No machine has had any problems and I believe that I have results now in all four machines.

I do not remember seeing any discussion here or screen savers. Maybe I just missed something.

Let me say that I know that there are different philosophical positions on using the screen savers. I favor using them, many of them let me know what is going on in the 10 or so projects to which I am attached with a quick glance at the monitor.

Any comments?

>>RSM
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


ID: 55869 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Tyka

Send message
Joined: 20 Oct 05
Posts: 96
Credit: 2,190
RAC: 0
Message 55985 - Posted: 23 Sep 2008, 19:23:51 UTC - in response to Message 55869.  

Weired - we'll look into this. Has anyone else experienced problems like this ?


Once I got rid of the BOINC screen saver, everything seemed to go back too normal.
....
Any comments?

>>RSM


http://beautifulproteins.blogspot.com/
http://www.miketyka.com/
ID: 55985 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mitrichr
Avatar

Send message
Joined: 23 May 07
Posts: 44
Credit: 1,005,660
RAC: 0
Message 55986 - Posted: 23 Sep 2008, 19:41:04 UTC

Mike-

Just to let you know, things are still going quite well on all four machines, I suppose you can look at my results.

I have just yesterday detached the two PIII's, to make room for another nproject which they can handle; but the two Core 2 Duos, which are really about 90% of my crunching ability, are of course still running Rosetta. I mean, the PIII's only achieve what they do running 24/7, whereas the other two do not.

You guys have a major responsibility in that Rosetta because of its originating software may be the most important project running on BOINC software. At least, I believe it is Proteome at WCG which uses your software.

Best ever always.

>>RSM
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


ID: 55986 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
leonari

Send message
Joined: 11 Dec 05
Posts: 8
Credit: 4,021,196
RAC: 643
Message 56375 - Posted: 15 Oct 2008, 11:56:09 UTC

This is about problems with Minirosetta v1.34 as there does not appear to a "thread" for it!
As can be seen from the three incidents below, Rosetta sometimes continues to run regardless of the rules on how long it is allowed to run (may be a BOINC Manager problem?). It then "locks up" and continues to run at 100% stopping anything else from running!
Note: all of the message sequences below are sequential messages extracted from the "Messages" tab in BOINC.

Incident 1

05/10/2008 12:50:17|rosetta@home|Starting abinitio_nohomfrag_70_A_1ynvA_4466_27265_0
05/10/2008 12:50:35|rosetta@home|Starting task abinitio_nohomfrag_70_A_1ynvA_4466_27265_0 using minirosetta version 134

Rosetta locked up running at 100% - presumably for one and a half days!

Aborted at 09:34 07/10/2008
07/10/2008 09:34:17|rosetta@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks
07/10/2008 09:34:22|rosetta@home|Scheduler request succeeded: got 0 new tasks
07/10/2008 09:34:50|SETI@home|Resuming task 22au08ac.21313.9479.16.8.9_1 using setiathome_enhanced version 603


Incident 2

10/10/2008 11:29:03||Starting BOINC client version 5.10.45 for windows_intelx86
10/10/2008 11:29:03||log flags: task, file_xfer, sched_ops
10/10/2008 11:29:03||Libraries: libcurl/7.18.0 OpenSSL/0.9.8e zlib/1.2.3
10/10/2008 11:29:03||Data directory: C:Program FilesBOINC
10/10/2008 11:29:07||Processor: 1 GenuineIntel Mobile Intel(R) Pentium(R) 4 - M CPU 2.20GHz [x86 Family 15 Model 2 Stepping 7]
10/10/2008 11:29:07||Processor features: fpu tsc sse mmx
10/10/2008 11:29:07||OS: Microsoft Windows 2000: Professional Edition, Service Pack 4, (05.00.2195.00)
10/10/2008 11:29:07||Memory: 511.43 MB physical, 1.21 GB virtual
10/10/2008 11:29:07||Disk: 17.70 GB total, 2.26 GB free
10/10/2008 11:29:07||Local time is UTC +1 hours
10/10/2008 11:29:11|rosetta@home|URL: https://boinc.bakerlab.org/rosetta/; Computer ID: 97037; location: home; project prefs: default
10/10/2008 11:29:11|ralph@home|URL: http://ralph.bakerlab.org/; Computer ID: 1760; location: home; project prefs: default
10/10/2008 11:29:11|SETI@home|URL: http://setiathome.berkeley.edu/; Computer ID: 1960189; location: work; project prefs: default
10/10/2008 11:29:11||General prefs: from http://setiathome.ssl.berkeley.edu/ (last modified 08-Jun-2006 10:33:55)
10/10/2008 11:29:11||Host location: work
10/10/2008 11:29:11||General prefs: no separate prefs for work; using your defaults
10/10/2008 11:29:11||Reading preferences override file
10/10/2008 11:29:11||Preferences limit memory usage when active to 255.71MB
10/10/2008 11:29:11||Preferences limit memory usage when idle to 460.29MB
10/10/2008 11:29:11||Preferences limit disk usage to 2.26GB
10/10/2008 11:29:18|SETI@home|Restarting task 19au08ab.15460.9479.6.8.46_1 using setiathome_enhanced version 603
10/10/2008 11:33:21|SETI@home|Sending scheduler request: Requested by user. Requesting 36 seconds of work, reporting 1 completed tasks
10/10/2008 11:33:24|SETI@home|Scheduler request succeeded: got 1 new tasks
10/10/2008 11:33:27|SETI@home|Started download of 26au08ad.24455.4162.7.8.218
10/10/2008 11:33:38|SETI@home|Finished download of 26au08ad.24455.4162.7.8.218
10/10/2008 12:16:18|SETI@home|Computation for task 19au08ab.15460.9479.6.8.46_1 finished
10/10/2008 12:16:18|SETI@home|Starting 26au08ad.15112.2526.6.8.181_1
10/10/2008 12:16:18|SETI@home|Starting task 26au08ad.15112.2526.6.8.181_1 using setiathome_enhanced version 603
10/10/2008 12:16:20|SETI@home|Started upload of 19au08ab.15460.9479.6.8.46_1_0
10/10/2008 12:16:28|SETI@home|Finished upload of 19au08ab.15460.9479.6.8.46_1_0
10/10/2008 14:14:37|rosetta@home|Restarting task abinitio_nohomfrag_70_A_1unrA_4466_47644_0 using minirosetta version 134

17:31 on the 10/10/2008 - Because Rosetta was still going at circa 85% but with no increase in either of the two SETI tasks (SETI should run 75% of the time), Rosetta was suspended at 17:31 on the 10/10/2008.

10/10/2008 17:31:02|SETI@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 1 completed tasks
10/10/2008 17:31:07|SETI@home|Scheduler request succeeded: got 0 new tasks
10/10/2008 17:31:42|SETI@home|Resuming task 26au08ad.15112.2526.6.8.181_1 using setiathome_enhanced version 603

At 21:54 on the 11/10/08, it was observed that Rosetta had increase to 100% complete even though it was still suspended. Bt the way, there was no message to report that it had restarted. Aborted Rosetta.

11/10/2008 21:54:48|SETI@home|Starting 26au08ad.24455.4162.7.8.218_0
11/10/2008 21:54:51|SETI@home|Starting task 26au08ad.24455.4162.7.8.218_0 using setiathome_enhanced version 603

21:58 on the 11/10/08 - Rosetta still going, even though it had been aborted, but SETI was still not – "Screen capture" available. Terminated Rosetta task. SETI then started


Incident 3

14/10/2008 11:53:21|rosetta@home|Restarting task abinitio_nohomfrag_70_A_1zd0A_4466_59245_0 using minirosetta version 134
14/10/2008 12:38:32|SETI@home|Started download of 25au08af.7275.890.10.8.52
(Note: First SET
14/10/2008 12:38:53|SETI@home|Finished download of 25au08af.7275.890.10.8.52
14/10/2008 12:41:15|ralph@home|Finished download of looprelax_tex_cst_oneparam.looprelax_tex_cst.t328_.tex.boinc_files.zip
14/10/2008 12:47:32|rosetta@home|Finished download of foldcst_simple.foldcst_simple.t313_.mtyka.boinc_files.zip

15/10/08 - Aborted “abinitio_nohomfrag_70_A_1zd0A_4466_59245_0” after the task was running at 100% for over twelve hours and stopping anything else from working – “Screen print" available.
I also suspect that after this task first started, sometime on the 14th, no other task was allowed to start.

Note that Rosetta was still taking processing power before the “abort” – “Screen print" available.

15/10/2008 10:10:01|SETI@home|Starting 25au08af.7275.890.10.8.52_1
15/10/2008 10:10:04|SETI@home|Starting task 25au08af.7275.890.10.8.52_1 using setiathome_enhanced version 603
15/10/2008 10:10:07|rosetta@home|Computation for task abinitio_nohomfrag_70_A_1zd0A_4466_59245_0 finished

Every thing now working as expected.

ID: 56375 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 28 Sep 05
Posts: 115
Credit: 101,358
RAC: 0
Message 56376 - Posted: 15 Oct 2008, 12:11:10 UTC - in response to Message 56375.  

This is about problems with Minirosetta v1.34 as there does not appear to a "thread" for it!

Sure there is one ;-) --> Minirosetta v1.34 bug thread

Peter
ID: 56376 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mitrichr
Avatar

Send message
Joined: 23 May 07
Posts: 44
Credit: 1,005,660
RAC: 0
Message 56383 - Posted: 15 Oct 2008, 16:55:22 UTC

Latest results are not good. 1.32 and 1.34 I had to abort WU's, but, still, I think that at least my problem relates to the screen saver locking everything up. If I use a different screen saver, I seem to have no problems.

>>RSM
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


ID: 56383 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mitrichr
Avatar

Send message
Joined: 23 May 07
Posts: 44
Credit: 1,005,660
RAC: 0
Message 56384 - Posted: 15 Oct 2008, 16:56:56 UTC

Latest results are not good. 1.32 and 1.34 I had to abort WU's, but, still, I think that at least my problem relates to the screen saver locking everything up. If I use a different screen saver, I seem to have no problems.

>>RSM
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


ID: 56384 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7

Message boards : Number crunching : Minirosetta v1.32 bug thread



©2024 University of Washington
https://www.bakerlab.org