Minirosetta 3.73-3.78

Message boards : Number crunching : Minirosetta 3.73-3.78

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 14 · Next

AuthorMessage
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 79609 - Posted: 24 Feb 2016, 5:22:40 UTC - in response to Message 79582.  
Last modified: 24 Feb 2016, 5:24:08 UTC


What I am seeing is that the project happily goes along for a while Requesting new tasks for CPU and gets the Scheduler request completed: got 1 task message.

Then after a few hours it gets the Scheduler request completed: got 0 tasks. No work sent. Rosetta Mini for Android is not available for your type of computer.

Finally, the message Rosetta Mini needs 57220.46 MB RAM but only 7363.62 MB is available for use. After that it stops updating. Remaining tasks will continue to upload until it runs out.

Rosetta does not automatically download any more tasks or report any that were finished. You can manually update and get it to reset and start again however it will just run through to the same result in a few hours.



actually, i'm wondering if limiting the number of concurrent tasks may help.
for r@h, i normally see the number of tasks running as one task/thread per core. hence it nicely use all 8 cores with 8 tasks/threads (incl HT cores) of my i7 4771 cpu. i'm running on 16 GB of ram in linux.

i've yet to encounter the 'needs xxx MB of RAM' with r@h, but with a different project (atlas@home from cern), the memory requirements are quite huge and i often see only 4 threads / tasks running and hit the memory limit.

coming to think about ram, i think linux and windows o/s are able to utilize swap for virtual memory hence disk space as swap memory if you have allocated sufficient space for that. but for atlas@home, i think the use of virtualbox probably limits what could be swappable.

you may like to see if disk swap spaces may be somewhat tunable in that respects.

the other thing i think has to do with the boinc client itself, i'm thinking an updated or more recent boinc client may possibly resolve some of these issues as what you are seeing is probably a behavior of boinc client rather than r@h
ID: 79609 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1224
Credit: 13,848,401
RAC: 2,043
Message 79615 - Posted: 24 Feb 2016, 11:53:23 UTC - in response to Message 79609.  

[snip]
coming to think about ram, i think linux and windows o/s are able to utilize swap for virtual memory hence disk space as swap memory if you have allocated sufficient space for that. but for atlas@home, i think the use of virtualbox probably limits what could be swappable.

you may like to see if disk swap spaces may be somewhat tunable in that respects.


BOINC tasks usually have swapping turned off, in an effort to make them run faster. This means that there is often no effort to make the applications able to stand the address changes caused by swapping something out of memory, and then swapping it back in at a different address because the original address is still in use by some other program.
ID: 79615 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 21,375,924
RAC: 16,385
Message 79617 - Posted: 24 Feb 2016, 14:13:35 UTC - in response to Message 79615.  

[snip]
coming to think about ram, i think linux and windows o/s are able to utilize swap for virtual memory hence disk space as swap memory if you have allocated sufficient space for that. but for atlas@home, i think the use of virtualbox probably limits what could be swappable.

you may like to see if disk swap spaces may be somewhat tunable in that respects.


BOINC tasks usually have swapping turned off, in an effort to make them run faster. This means that there is often no effort to make the applications able to stand the address changes caused by swapping something out of memory, and then swapping it back in at a different address because the original address is still in use by some other program.


The OS (Windows, all variants of Linux, MACOS, ... ) provides the program with VIRTUAL memory. The virtual memory is translated into PHYSICAL memory using the TLB translations. A virtual page of memory can get swapped to disk and then be relocated into a different PHYSICAL memory location by setting the TLB entry properly. The executing program does not even know if the page has been swapped out to disk.

The last time I looked, Windows allocated a disk swap file the same size as memory ( C:pagefile.sys ). You can explicitly set the size of this file, even to 0 bytes .... but when you run low on memory, the OS will kill stuff "Out of Memory".

Virtualbox is just a program in memory that runs on top of your OS and you set the memory size that virtualbox is allowed to use. I usually set virtualbox to be able to use about 50% of my physical memory BUT I have 16gb or more on my systems.

ID: 79617 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1224
Credit: 13,848,401
RAC: 2,043
Message 79624 - Posted: 25 Feb 2016, 2:44:08 UTC - in response to Message 79617.  

[snip]
coming to think about ram, i think linux and windows o/s are able to utilize swap for virtual memory hence disk space as swap memory if you have allocated sufficient space for that. but for atlas@home, i think the use of virtualbox probably limits what could be swappable.

you may like to see if disk swap spaces may be somewhat tunable in that respects.


BOINC tasks usually have swapping turned off, in an effort to make them run faster. This means that there is often no effort to make the applications able to stand the address changes caused by swapping something out of memory, and then swapping it back in at a different address because the original address is still in use by some other program.


The OS (Windows, all variants of Linux, MACOS, ... ) provides the program with VIRTUAL memory. The virtual memory is translated into PHYSICAL memory using the TLB translations. A virtual page of memory can get swapped to disk and then be relocated into a different PHYSICAL memory location by setting the TLB entry properly. The executing program does not even know if the page has been swapped out to disk.

The last time I looked, Windows allocated a disk swap file the same size as memory ( C:pagefile.sys ). You can explicitly set the size of this file, even to 0 bytes .... but when you run low on memory, the OS will kill stuff "Out of Memory".

Virtualbox is just a program in memory that runs on top of your OS and you set the memory size that virtualbox is allowed to use. I usually set virtualbox to be able to use about 50% of my physical memory BUT I have 16gb or more on my systems.



It's hard to get Virtualbox working correctly - once the versions of BOINC available so far have detected that virtualization is not enabled in the BIOS or the UEFI, they will remember this forever and prevent the test of whether it is enabled from being run again.

Also, all of the Virtualbox workunits I've seen much about so far seize 4 GB of physical memory, and won't allow any of it to be paged. I'm hoping that a new version of Virtualbox will remove this restriction.

As far as I know, Virtualbox can handle 32-bit workunits, but not 64-bit workunits.
ID: 79624 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 21,375,924
RAC: 16,385
Message 79630 - Posted: 25 Feb 2016, 15:00:13 UTC - in response to Message 79624.  

[snip]
coming to think about ram, i think linux and windows o/s are able to utilize swap for virtual memory hence disk space as swap memory if you have allocated sufficient space for that. but for atlas@home, i think the use of virtualbox probably limits what could be swappable.

you may like to see if disk swap spaces may be somewhat tunable in that respects.


BOINC tasks usually have swapping turned off, in an effort to make them run faster. This means that there is often no effort to make the applications able to stand the address changes caused by swapping something out of memory, and then swapping it back in at a different address because the original address is still in use by some other program.


The OS (Windows, all variants of Linux, MACOS, ... ) provides the program with VIRTUAL memory. The virtual memory is translated into PHYSICAL memory using the TLB translations. A virtual page of memory can get swapped to disk and then be relocated into a different PHYSICAL memory location by setting the TLB entry properly. The executing program does not even know if the page has been swapped out to disk.

The last time I looked, Windows allocated a disk swap file the same size as memory ( C:pagefile.sys ). You can explicitly set the size of this file, even to 0 bytes .... but when you run low on memory, the OS will kill stuff "Out of Memory".

Virtualbox is just a program in memory that runs on top of your OS and you set the memory size that virtualbox is allowed to use. I usually set virtualbox to be able to use about 50% of my physical memory BUT I have 16gb or more on my systems.



It's hard to get Virtualbox working correctly - once the versions of BOINC available so far have detected that virtualization is not enabled in the BIOS or the UEFI, they will remember this forever and prevent the test of whether it is enabled from being run again.

Also, all of the Virtualbox workunits I've seen much about so far seize 4 GB of physical memory, and won't allow any of it to be paged. I'm hoping that a new version of Virtualbox will remove this restriction.

As far as I know, Virtualbox can handle 32-bit workunits, but not 64-bit workunits.


I use regularly use Virtualbox to build Linux images on machines and none of my comments were about the pre-configured BOINC VIRTUALBOX implementation. I have no experience with BOINC packaged Virtualbox.

I imagine that BOINC projects choose to use the BOINC Virtualbox so they can control the execution environment and quality of data generated very closely. 32-bit only probably makes sense to for BOINC Virtualbox in that case.






ID: 79630 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
[FI] OIKARINEN
Avatar

Send message
Joined: 16 Nov 13
Posts: 6
Credit: 131,483
RAC: 0
Message 79669 - Posted: 1 Mar 2016, 14:15:41 UTC

I've been running the 3.71 version of rosetta for 2 days .. And I just noticed a lot of crashing workunits running on different computers , all of those WUs have this attached :

ERROR: unrecognized residue AX1
ERROR:: Exit from: ......srccoreiopdbfile_data.cc line: 2077
BOINC:: Error reading and gzipping output datafile: default.out
Life is too short to live concerned about its mysteries.
ID: 79669 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 79674 - Posted: 1 Mar 2016, 19:15:19 UTC - in response to Message 79624.  
Last modified: 1 Mar 2016, 19:16:18 UTC

It's hard to get Virtualbox working correctly - once the versions of BOINC available so far have detected that virtualization is not enabled in the BIOS or the UEFI, they will remember this forever and prevent the test of whether it is enabled from being run again.

They have a solution to that problem in the Cosmology FAQs:
I enabled VT-x/AMD-v but jobs say “Scheduler wait: Please upgrade BOINC”

Also, all of the Virtualbox workunits I've seen much about so far seize 4 GB of physical memory, and won't allow any of it to be paged. I'm hoping that a new version of Virtualbox will remove this restriction.

I think that just depends on the application. ATLAS and vLHC take a lot of memory, but Cosmology does not that I recall.

I have had some problems with VirtualBox interfering with some other programs (both CPU and GPU, even non-BOINC ones), but not with the VBox programs themselves. I just use the pre-packaged versions on the CERN projects and Cosmology, but they all went easily enough, though you do need to watch the memory. If VBox would be of any use for Rosetta, I would be willing to try it here.
ID: 79674 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,811,598
RAC: 764
Message 79704 - Posted: 7 Mar 2016, 16:42:12 UTC
Last modified: 7 Mar 2016, 16:55:38 UTC

Both of my computers received 24 hour backs after a single request for work resulted in this reply:
Sun Mar 6 03:51:29 2016 | rosetta@home | Rosetta Mini for Android is not available for your type of computer.

When I noticed the problem several hours after the back-off began I simply hit the update button and successfully retrieved new tasks.

*****Wild Speculation Alert*****
If the amount of Android tasks exceeds the number of available devices by too great a number and/or fail at too high a rate then the new tasks/resends could be clogging the queue. As long as there are in fact plenty of cpu tasks to crunch, a 24 hour back-off would seem excessive.


I should add that I only became concerned because I recently reduced my preferred cpu runtime and my cache and set other projects to no new tasks (preparing for a possible imminent shut down of computers for an indeterminate period of time) so this 24 hour back-off actually lead to no tasks crunching at all. Otherwise I might have noticed but not been concerned enough to explore the possible causes or to comment.

I only comment now in the possibility that this back-off interval could be changed to something shorter. I know the project doesn't want a bunch of computer asking every 5 minutes while there's a clog but if it is a predictable clog and you can see how long it typically lasts perhaps you could adjust the back-off accordingly. Would anything longer than the 6 hour default target runtime really be necessary? Although not a big deal in the overall scheme of things, would things run somewhat smoother, on both sides of the connection, if crunchers weren't left to go idle unnecessarily for such a long time?

Best,
Snags

edit: I just saw additional posts in this thread that suggest rosie really did run out of cpu tasks. Ah, well. I suppose I should see if I can find BOINC documentation on the back-off settings (documentation that I could actually understand, that is) : /
ID: 79704 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1224
Credit: 13,848,401
RAC: 2,043
Message 79708 - Posted: 7 Mar 2016, 20:29:32 UTC - in response to Message 79704.  

Both of my computers received 24 hour backs after a single request for work resulted in this reply:
[quote]Sun Mar 6 03:51:29 2016 | rosetta@home | Rosetta Mini for Android is not available for your type of computer.

When I noticed the problem several hours after the back-off began I simply hit the update button and successfully retrieved new tasks.

*****Wild Speculation Alert*****
If the amount of Android tasks exceeds the number of available devices by too great a number and/or fail at too high a rate then the new tasks/resends could be clogging the queue. As long as there are in fact plenty of cpu tasks to crunch, a 24 hour back-off would seem excessive.


I should add that I only became concerned because I recently reduced my preferred cpu runtime and my cache and set other projects to no new tasks (preparing for a possible imminent shut down of computers for an indeterminate period of time) so this 24 hour back-off actually lead to no tasks crunching at all. Otherwise I might have noticed but not been concerned enough to explore the possible causes or to comment.

I only comment now in the possibility that this back-off interval could be changed to something shorter. I know the project doesn't want a bunch of computer asking every 5 minutes while there's a clog but if it is a predictable clog and you can see how long it typically lasts perhaps you could adjust the back-off accordingly. Would anything longer than the 6 hour default target runtime really be necessary? Although not a big deal in the overall scheme of things, would things run somewhat smoother, on both sides of the connection, if crunchers weren't left to go idle unnecessarily for such a long
ID: 79708 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1224
Credit: 13,848,401
RAC: 2,043
Message 79709 - Posted: 7 Mar 2016, 20:31:33 UTC - in response to Message 79704.  

Both of my computers received 24 hour backs after a single request for work resulted in this reply:
[quote]Sun Mar 6 03:51:29 2016 | rosetta@home | Rosetta Mini for Android is not available for your type of computer.

When I noticed the problem several hours after the back-off began I simply hit the update button and successfully retrieved new tasks.

*****Wild Speculation Alert*****
If the amount of Android tasks exceeds the number of available devices by too great a number and/or fail at too high a rate then the new tasks/resends could be clogging the queue. As long as there are in fact plenty of cpu tasks to crunch, a 24 hour back-off would seem excessive.


I should add that I only became concerned because I recently reduced my preferred cpu runtime and my cache and set other projects to no new tasks (preparing for a possible imminent shut down of computers for an indeterminate period of time) so this 24 hour back-off actually lead to no tasks crunching at all. Otherwise I might have noticed but not been concerned enough to explore the possible causes or to comment.

I only comment now in the possibility that this back-off interval could be changed to something shorter. I know the project doesn't want a bunch of computer asking every 5 minutes while there's a clog but if it is a predictable clog and you can see how long it typically lasts perhaps you could adjust the back-off accordingly. Would anything longer than the 6 hour default target runtime really be necessary? Although not a big deal in the overall scheme of things, would things run somewhat smoother, on both sides of the connection, if crunchers weren't left to go idle unnecessarily for such a long
ID: 79709 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1224
Credit: 13,848,401
RAC: 2,043
Message 79710 - Posted: 7 Mar 2016, 20:31:57 UTC - in response to Message 79704.  

Both of my computers received 24 hour backs after a single request for work resulted in this reply:
Sun Mar 6 03:51:29 2016 | rosetta@home | Rosetta Mini for Android is not available for your type of computer.

When I noticed the problem several hours after the back-off began I simply hit the update button and successfully retrieved new tasks.
[snip]


I've seen a similar problem twice. I have an Android device in addition to my Windows devices, but so far I have BOINC installed only on the Windows devices.
ID: 79710 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1224
Credit: 13,848,401
RAC: 2,043
Message 79711 - Posted: 7 Mar 2016, 20:33:33 UTC - in response to Message 79704.  

Both of my computers received 24 hour backs after a single request for work resulted in this reply:
[quote]Sun Mar 6 03:51:29 2016 | rosetta@home | Rosetta Mini for Android is not available for your type of computer.

When I noticed the problem several hours after the back-off began I simply hit the update button and successfully retrieved new tasks.

*****Wild Speculation Alert*****
If the amount of Android tasks exceeds the number of available devices by too great a number and/or fail at too high a rate then the new tasks/resends could be clogging the queue. As long as there are in fact plenty of cpu tasks to crunch, a 24 hour back-off would seem excessive.


I should add that I only became concerned because I recently reduced my preferred cpu runtime and my cache and set other projects to no new tasks (preparing for a possible imminent shut down of computers for an indeterminate period of time) so this 24 hour back-off actually lead to no tasks crunching at all. Otherwise I might have noticed but not been concerned enough to explore the possible causes or to comment.

I only comment now in the possibility that this back-off interval could be changed to something shorter. I know the project doesn't want a bunch of computer asking every 5 minutes while there's a clog but if it is a predictable clog and you can see how long it typically lasts perhaps you could adjust the back-off accordingly. Would anything longer than the 6 hour default target runtime really be necessary? Although not a big deal in the overall scheme of things, would things run somewhat smoother, on both sides of the connection, if crunchers weren't left to go idle unnecessarily for such a long
ID: 79711 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
iriemon

Send message
Joined: 16 Jan 16
Posts: 6
Credit: 741,509
RAC: 71
Message 79715 - Posted: 8 Mar 2016, 15:26:32 UTC

Any news on getting the communication problem fixed? Been sitting here for 2 days without any new work being available.....
ID: 79715 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
iriemon

Send message
Joined: 16 Jan 16
Posts: 6
Credit: 741,509
RAC: 71
Message 79717 - Posted: 8 Mar 2016, 15:34:11 UTC - in response to Message 79715.  

Any news on getting the communication problem fixed? Been sitting here for 2 days without any new work being available.....



For some reason, I decided to clear my IE cache and then tried to dl a new work unit and to my surprise IT WORKED! Happily crunching......
ID: 79717 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dr. Merkwürdigliebe
Avatar

Send message
Joined: 5 Dec 10
Posts: 81
Credit: 2,657,273
RAC: 0
Message 79721 - Posted: 8 Mar 2016, 17:10:06 UTC
Last modified: 8 Mar 2016, 17:11:04 UTC

ID: 79721 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1224
Credit: 13,848,401
RAC: 2,043
Message 79733 - Posted: 8 Mar 2016, 21:24:02 UTC - in response to Message 79717.  

Any news on getting the communication problem fixed? Been sitting here for 2 days without any new work being available.....



For some reason, I decided to clear my IE cache and then tried to dl a new work unit and to my surprise IT WORKED! Happily crunching......


I decided to try that on my Windows 10 computer. Surprise - if Windows 10 even includes IE, it is very well hidden.

I told BOINC Manager to update for Rosetta@home anyway - it downloaded a workunit.

It looks likely that the problem is fixed on the server and IE is not involved.
ID: 79733 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1861
Credit: 8,160,940
RAC: 8,128
Message 79756 - Posted: 15 Mar 2016, 20:51:49 UTC

801194890

Starting work on structure: _00002
[2016- 3-15 20:35:13:] :: BOINC:: Initializing ... ok.
[2016- 3-15 20:35:13:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
failed to create shared mem segment: minirosetta Size: 25001672


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0085EEB0 write attempt to address 0x017D7EC1

ID: 79756 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ArcSedna

Send message
Joined: 23 Oct 11
Posts: 14
Credit: 60,982,832
RAC: 77,015
Message 79792 - Posted: 23 Mar 2016, 21:23:21 UTC

Some workunits hang up for long hours until manual termination.

They have string like
EN_MAP_hyb_cst
EN_MAP_cst
RE_MAP_hyb_cst
RE_MAP_cst
in the middle of the name.

Sample (Already aborted)

Their behavior is 'do nothing for a long time'. Looks like this:
Elapsed real time : 32 hours
Elapsed cpu time : 15 minutes

This is happening on my Mac computers. Windows and Linux seem to be OK.

OS : Mac OS X 10.11.3
Boinc : 7.2.42
Memory : 8GB to 16GB

Thanks.
ID: 79792 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile James Adrian

Send message
Joined: 27 Apr 12
Posts: 5
Credit: 1,801,535
RAC: 0
Message 79799 - Posted: 26 Mar 2016, 17:17:12 UTC

Has anyone else gotten work units for Minirosetta 3.71 that are estimated to run 14 days? I'm running on an old (2009) Mac with 8GB of memory and lately I've gotten these here and there.

Thanks

Boinc 7.6.22
Mac OS 10.11.4
ID: 79799 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile James Adrian

Send message
Joined: 27 Apr 12
Posts: 5
Credit: 1,801,535
RAC: 0
Message 79800 - Posted: 26 Mar 2016, 17:51:16 UTC - in response to Message 79792.  

ArcSedna,

I just saw your post, once I sorted to see newest first. My problem seems slightly different but like you I see the problem with work units named as in your post. One other observation: I have a newer Mac laptop but so far I have not seen the problem with the work units on it, just on my older iMac.
ID: 79800 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 14 · Next

Message boards : Number crunching : Minirosetta 3.73-3.78



©2024 University of Washington
https://www.bakerlab.org