Report Problems with Rosetta Version 5.24

Message boards : Number crunching : Report Problems with Rosetta Version 5.24

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 19350 - Posted: 27 Jun 2006, 13:09:11 UTC - in response to Message 19254.  


Ideally ofcourse the BOINC-server should be smarter and noticing your PC can't handle the ultra-big WU, send you one of the regular, smaller jobs. But this feature is not yet available in BOINC server code unfortunately.


Actually it is. There is only enough space in the feeder queue for 1,000 workunits. When the scheduler connects up to the feeder queue to get work it cycles through all 1,000 slots looking for available work. When all 1,000 queue slots are filled up with large jobs that is what the server returns.

Splitting the queue up equally is supported with different applications. If this is really a big problem we could set things up in such a way that the project believes it has more than one application and 50% of the queue is saved for each application.


Rom, thanks for the feedback. Although I don't know how the current system works (what are the "groups" of jobs sent, e.g. jobs needing 256, 512, 768, 1G? memory ) it seems it'd help to split the queue as you suggest to make sure there are always small jobs available.

Apparently many people get this message "there was work, but your PC has less RAM than needed", see e.g. posts by Carlos (a very small percentage of users posts here).

Best UFO Resources
Wikipedia R@h
How-To: Join Distributed Computing projects that benefit humanity
ID: 19350 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 7 Oct 05
Posts: 234
Credit: 15,020
RAC: 0
Message 19352 - Posted: 27 Jun 2006, 14:00:05 UTC

This WU crashed on restart after being preempted.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=21898739

Result: https://boinc.bakerlab.org/rosetta/result.php?resultid=25804159

It almost gave me a heartattack, I thought my harddisk had crashed! It sounded like that for about 5 minutes untill I exit'ed the BOINC manager and then I realized the harddisk was safe. PHEW!!!!

I had a harddisk crash a little more than a year ago and I'll never forget that sound it gives, when this happens! So please don't do this to me again! :-(
[b]"I'm trying to maintain a shred of dignity in this world." - Me[/b]

ID: 19352 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Frisch

Send message
Joined: 5 Apr 06
Posts: 4
Credit: 133,315
RAC: 0
Message 19360 - Posted: 27 Jun 2006, 17:12:06 UTC

Just reported some finished jobs in, but one of them didn't get any credit, as it said, "too many jobs reported" never seen this one before. It was reported as a succes, but 0 credit granted.
Result ID 25727160
Name t307__CASP7_ABRELAX_SAVE_ALL_OUT_BARCODE_hom001__714_30318_2
Workunit 20880206
Created 25 Jun 2006 7:45:32 UTC
Sent 25 Jun 2006 14:39:07 UTC
Received 27 Jun 2006 14:21:50 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 212157
Report deadline 2 Jul 2006 14:39:07 UTC
CPU time 2944.1875
stderr out

<core_client_version>5.5.0</core_client_version>
<stderr_txt>
# random seed: 1551958
# cpu_run_time_pref: 3600
# DONE :: 1 starting structures built 2 (nstruct) times
# This process generated 2 decoys from 2 attempts


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>

Validate state Workunit error - check skipped
Claimed credit 26.2700330744348
Granted credit 0
application version 5.24


ID: 19360 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 19364 - Posted: 27 Jun 2006, 18:17:50 UTC - in response to Message 19360.  

Validate state Workunit error - check skipped
Claimed credit 26.2700330744348
Granted credit 0
application version 5.24


This wu errored out for some reason. You're puters are hidden so I can look it up. You should get credit when they run the daily script. The credit won't show on the "Results Page", but will show on the "result ID" page.

hope this helps

tony
ID: 19364 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Divide Overflow

Send message
Joined: 17 Sep 05
Posts: 82
Credit: 921,382
RAC: 0
Message 19377 - Posted: 27 Jun 2006, 21:49:02 UTC

Is a .pdb file available for download for the 5.24 version of the Rosetta application? I’m only able to find an older version posted online.
ID: 19377 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 7 Oct 05
Posts: 234
Credit: 15,020
RAC: 0
Message 19386 - Posted: 28 Jun 2006, 1:17:44 UTC - in response to Message 19377.  

Is a .pdb file available for download for the 5.24 version of the Rosetta application? I’m only able to find an older version posted online.


I think it comes with the WU's now.



[b]"I'm trying to maintain a shred of dignity in this world." - Me[/b]

ID: 19386 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian B

Send message
Joined: 11 Dec 05
Posts: 3
Credit: 10,681
RAC: 0
Message 19387 - Posted: 28 Jun 2006, 2:19:51 UTC - in response to Message 19319.  

Welcome, Brian Bowles. System Idle Process of zero is a cruncher's goal. A project that runs nicely in the background the way Rosetta does is a dream.

Here's a shot of a gadget available with Windows Vista's sidebar and I keep it on my desktop. I don't want to waste any CPU cycles. :-)

image

If you have issues with running, post them in a new topic and someone will help you get them sorted.


Hi Brian,

As cureseekers has pointed out letting Rosetta even when you are working on your computer should not decrease performance - at least I can't notice any difference. However if you want to preempt BOINC when you use your computer you should keep in mind, that changing the preferences on the webpages does not change anything on your computer. After every change you need to "Update" the project and verify that the general preference were synced with your local host. Then Rosetta should not use cycles if you set your preference to "No work when computer is in use".

You should "Leave application in memory" on yes. Not only because of the errors (which should be fixed) but because checkpointing occurs still quite infrequently. You may loose over an hour of work if you remove the apps while preempted.

If you really can't get Rosetta to pause try to suspend the Project inside BOINC. If even this does not work I don't know.

P.S.: Your computer is hidden, so it is hard to help you out without knowing the specs and OS.


Hmm, that's not correct. If the BOINC Manager is set to "Run based on Preferences" and the general preferences have "Do work while computer is in use?" set to "No", then when typing in Word or doing any activity with the mouse/keyboard, all BOINC applications should be preempted and therefore using 0% CPU. After stopping the activity, the general preference "Do work only after computer is idle for" controls when BOINC will 'un-preempt' the applications.

I can confirm this is the correct behavior. If I have Task Manager open, I can see several BOINC project applications on the Process tab, but using 0% CPU. (I have "Leave applications in memory while preempted?" set to "Yes").

(Normally, I force BOINC Manager to "Run Always", but wanted to confirm the behavior.)


@Brian Bowles:
That's normal. With the normal settings, BOINC (Rosetta) uses the free CPU cycles. When you are typing a letter in Word, or surfing the internet, it uses less than 5% of your CPU power. The project (Rosetta) uses the free CPU cycles.
This is the way the DC program works.

The application runs at the lowest priority. This means that when another application asks more CPU power (example: you start up graphical software), this application gets this immidiatly. So for example your normal work asks at that moment 75% of your CPU, Rosetta only gets the rest (25%).
A few moments later, your normal work asks 15%, boinc gets 85%.
So Rosetta has the lowest priority: it only gets what isn't being used by other applications.



Thanks Cureseekers, skutnar, tralala, and Vester for the feedback. My system was running correctly up until the recent batch of releases. I noticed recently that when R@H is running it would not release the processor for local work. I will try and answer the above questions.

  • I have boinc locally set to 'Run always' (right click on the BOINC icon in the system tray). When I use the computer I right click on the icon and change it to 'Run based on preferences', which I have set to not do work while the computer is in use (project setting). When I am finished I will right click on the icon and change it back to 'Run always'. Also, yes, when I make changes to BOINC in a project I do click on 'Update' for that project.

  • I will change "Leave application in memory" back to yes.

  • I have a GenuineIntel Mobile Intel(R) Pentium(R) 4 - M CPU 2.00GHz running Microsoft Windows 2000 Professional Edition, Service Pack 4, (05.00.2195.00. Currently running BOINC Manager Version 5.2.13, with a few projects other than Rosetta.



What I wanted to do was post a possible problem with R@H not allowing itself to be preempted. When I do change to 'Run based on preferences', the processor is still being used at 100% even though R@H is showing preempted in the BOINC work tab (which of course means the computer is verrrrry slooooowwwww :-( ). The only way I have found to allow the processor to be used for things other than R&H was to go to Task Manager and stop the rosetta_5.24_wi process, then the processor would drop to around 0% and allow for other things to be done on the computer. This issue didn't show up until one of the recent releases (sorry, I'm not sure which one), before these it would exit normally.

Thanks again for the help!


ID: 19387 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian B

Send message
Joined: 11 Dec 05
Posts: 3
Credit: 10,681
RAC: 0
Message 19388 - Posted: 28 Jun 2006, 2:20:34 UTC - in response to Message 19387.  

Sorry for the long post....
ID: 19388 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 19390 - Posted: 28 Jun 2006, 2:36:56 UTC

Brian:
If you use Boinc 5.4.9, does it operate as it used to?


ID: 19390 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nightbird

Send message
Joined: 17 Sep 05
Posts: 70
Credit: 32,418
RAC: 0
Message 19393 - Posted: 28 Jun 2006, 5:38:30 UTC

wu : t312__CASP7_JUMPRELAX_SAVE_ALL_OUT_BARCODE_hom009__711_8478

estimated completion time : + 41 days

Though the wu is suspended, it goes on to run in the background (slowly).
cpu time : 9h 55 min xx sec. (always increasing)
% done : 1 %



ID: 19393 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Frisch

Send message
Joined: 5 Apr 06
Posts: 4
Credit: 133,315
RAC: 0
Message 19420 - Posted: 28 Jun 2006, 15:44:23 UTC
Last modified: 28 Jun 2006, 16:01:58 UTC

Did it again, it's in the process, when i report more than one job.
It doesn't matter if it's 2 or 12 i send, it leaves one reported without credits, with the reason "too many total results" under work unit ID
Link to PC
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=212157
Link to site with job listed
https://boinc.bakerlab.org/rosetta/results.php?hostid=212157&offset=20
25941992
Name t312__CASP7_JUMPRELAX_SAVE_ALL_OUT_BARCODE_hom006__711_5945_2
Workunit 21099883
Created 26 Jun 2006 21:22:35 UTC
Sent 27 Jun 2006 0:03:28 UTC
Received 28 Jun 2006 0:00:59 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 212157
Report deadline 4 Jul 2006 0:03:28 UTC
CPU time 2651.671875
stderr out
<core_client_version>5.5.0</core_client_version>
<stderr_txt>
# random seed: 2136331
# cpu_run_time_pref: 3600
# DONE :: 1 starting structures built 2 (nstruct) times
# This process generated 2 decoys from 2 attempts
BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
</stderr_txt>
Validate state Workunit error - check skipped
Claimed credit 23.6600107360005
Granted credit 0
application version 5.24

ID: 19420 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 8
Message 19424 - Posted: 28 Jun 2006, 16:31:38 UTC
Last modified: 28 Jun 2006, 16:32:24 UTC

Nightbird I'd say you need to restart BOINC on that PC. And if the crunching threads don't stop, then I'd reboot it. If the WU runs again for 2hrs without progressing beyond 1%, I'd abort that WU.

Also, I note you are running an older version of BOINC. I had similar issues where I'd suspend in BOINC Manager but the crunching thread wouldn't respond. But they seem to have been resolved by the current BOINC releases. You can reference info. on how to get the new release in this QA to check your release, and this QA with download info.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 19424 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Akins

Send message
Joined: 22 Oct 05
Posts: 176
Credit: 71,779
RAC: 0
Message 19428 - Posted: 28 Jun 2006, 17:33:37 UTC

Error Result For WU ID:

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=22091481

ID: 19428 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Akins

Send message
Joined: 22 Oct 05
Posts: 176
Credit: 71,779
RAC: 0
Message 19429 - Posted: 28 Jun 2006, 17:37:09 UTC

UPDATE:

Result ID 26029052
Name t312__CASP7_ABINITIO_SAVE_ALL_OUT_BARCODE_hom007__812_931_1
Workunit 22091481
Created 27 Jun 2006 12:37:22 UTC
Sent 27 Jun 2006 14:56:35 UTC
Received 28 Jun 2006 17:25:04 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -1073741819 (0xc0000005)
Computer ID 253124
Report deadline 4 Jul 2006 14:56:35 UTC
CPU time 23627.59375
stderr out

<core_client_version>5.4.9</core_client_version>
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 28800
# random seed: 2908212


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x008C0DD1 write attempt to address 0x25707280

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 5.5.0
ID: 19429 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ubaida

Send message
Joined: 9 Jun 06
Posts: 3
Credit: 206,886
RAC: 0
Message 19538 - Posted: 30 Jun 2006, 7:29:20 UTC

<core_client_version>5.4.9</core_client_version>
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# random seed: 2957681
# cpu_run_time_pref: 10800
# cpu_run_time_pref: 10800
# cpu_run_time_pref: 10800


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0060AAA8 read attempt to address 0x7D7427F9

https://boinc.bakerlab.org/rosetta/result.php?resultid=26281886
ID: 19538 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 19548 - Posted: 30 Jun 2006, 10:36:55 UTC

One interesting thing to note here is that both of these errors are caused by code trying to read/write to address that aren't valid memory addresses, and have something in common too: They consist of ASCII-text... Perhaps some function is not keeping it's text strings within their correct bounds?

Of course, it could be completely random that they look like text, but in my experience that is NOT the case.

Ubaida's address is:
"?'t}" - the question mark is an unknown letter - because the code is most likely accessing an offset away from the address that got overwritten by the string.

Keith's address is:
"?rp%", again, the first (lowest byte) is unknown, as it's most likely an offset.

Theoretically, the offset may be bigger than a byte so the letters further in would possibly also be affected [one likely scenario is that the "'" in Ubaida's text is actually a percent character, if the code is attempting to go 0x200+ bytes into a struct, which isn't entirely unlikely - a 512+ byte struct is not at all unlikely, but of course many common data structures are smaller than this...

--
Mats

ID: 19548 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Thalus

Send message
Joined: 1 Jun 06
Posts: 1
Credit: 1,893
RAC: 0
Message 19551 - Posted: 30 Jun 2006, 11:22:53 UTC

<core_client_version>5.3.6</core_client_version>
<message> - exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# random seed: 2950187
# cpu_run_time_pref: 10800
# cpu_run_time_pref: 10800
# cpu_run_time_pref: 10800
# cpu_run_time_pref: 10800
Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0060AAA8 read attempt to address 0x757D40C9
Engaging BOINC Windows Runtime Debugger...
********************
BOINC Windows Runtime Debugger Version 5.5.0
____________________________________________________________________

4 times the same failure at the moment...
ID: 19551 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ubaida

Send message
Joined: 9 Jun 06
Posts: 3
Credit: 206,886
RAC: 0
Message 19570 - Posted: 30 Jun 2006, 14:18:16 UTC

got another two of those errors

https://boinc.bakerlab.org/rosetta/result.php?resultid=26296697

https://boinc.bakerlab.org/rosetta/result.php?resultid=26297155
ID: 19570 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 19628 - Posted: 1 Jul 2006, 13:01:53 UTC


https://boinc.bakerlab.org/rosetta/result.php?resultid=25777930

- exit code -1073741819 (0xc0000005)

Anders n

ID: 19628 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3

Message boards : Number crunching : Report Problems with Rosetta Version 5.24



©2022 University of Washington
https://www.bakerlab.org