Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 25 · 26 · 27 · 28 · 29 · 30 · 31 . . . 55 · Next

AuthorMessage
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,312,750
RAC: 3,043
Message 75485 - Posted: 26 Apr 2013, 14:24:29 UTC - in response to Message 75482.  

Saying 'sorry' doesn't mean a darned thing IF NOTHING CHANGES!!!

I have to second this sentiment. About 35% of all work units I've returned in the last week have been aborted due to 'out of memory' errors. If this appalling record doesn't soon change, ROSETTA is history for me.


I've recently sent some comments to boinc_dev about a problem with the way BOINC keeps track of the amount of memory in use, especially under Windows Vista. For 32-bit workunits, it does not count the SYSWOW64 modules needed to run those workunits under 64-bit Windows.

A few possible ways to handle this, at least partially:

Wait for a future version of BOINC that does count them, and offers separate memory limits for 32-bit memory space and for the entire 64-bit memory space BOINC uses for all workunits.

Set each of your computers to subscribe only to BOINC projects that offer only 32-bit workunits, or only 64-bit workunits, but not both on the same computer.

Upgrade your Windows Vista computers to Windows 7, where the SYSWOW64 modules are much smaller. I don't know how this applies to 64-bit Windows XP or 64-bit Windows 8. Windows Vista uses roughly the same amount of memory for the 32-bit workunits and for the SYSWOW64 modules needed to run them.

Persuade all BOINC projects to either offer a true 64-bit version of each of their applications (even if it won't run any faster than the 32-bit version), or double the estimates of required memory for all 32-bit workunits sent to 64-bit versions of BOINC. 64-bit applications don't use any SYSWOW64 modules when they run, and therefore don't need any memory space to load them.


I am using 64bit Win7 Ultimate on all of my Rosetta machines, so that isn't really an issue for me, and I still never crunched a cryo unit successfully!


It's still an issue for Win7, although less than for WinVista, as long as the Rosetta@Home server keeps sending 32-bit workunits to 64-bit versions of Windows.

You might check if your motherboard is able to hold any more memory, and if so, try installing more memory. With 16 GB, my Win7 computer is at least fast at failing cryo workunits.
ID: 75485 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,202,732
RAC: 3,230
Message 75491 - Posted: 26 Apr 2013, 19:48:56 UTC - in response to Message 75485.  

Saying 'sorry' doesn't mean a darned thing IF NOTHING CHANGES!!!

I have to second this sentiment. About 35% of all work units I've returned in the last week have been aborted due to 'out of memory' errors. If this appalling record doesn't soon change, ROSETTA is history for me.


I've recently sent some comments to boinc_dev about a problem with the way BOINC keeps track of the amount of memory in use, especially under Windows Vista. For 32-bit workunits, it does not count the SYSWOW64 modules needed to run those workunits under 64-bit Windows.

A few possible ways to handle this, at least partially:

Wait for a future version of BOINC that does count them, and offers separate memory limits for 32-bit memory space and for the entire 64-bit memory space BOINC uses for all workunits.

Set each of your computers to subscribe only to BOINC projects that offer only 32-bit workunits, or only 64-bit workunits, but not both on the same computer.

Upgrade your Windows Vista computers to Windows 7, where the SYSWOW64 modules are much smaller. I don't know how this applies to 64-bit Windows XP or 64-bit Windows 8. Windows Vista uses roughly the same amount of memory for the 32-bit workunits and for the SYSWOW64 modules needed to run them.

Persuade all BOINC projects to either offer a true 64-bit version of each of their applications (even if it won't run any faster than the 32-bit version), or double the estimates of required memory for all 32-bit workunits sent to 64-bit versions of BOINC. 64-bit applications don't use any SYSWOW64 modules when they run, and therefore don't need any memory space to load them.


I am using 64bit Win7 Ultimate on all of my Rosetta machines, so that isn't really an issue for me, and I still never crunched a cryo unit successfully!


It's still an issue for Win7, although less than for WinVista, as long as the Rosetta@Home server keeps sending 32-bit workunits to 64-bit versions of Windows.

You might check if your motherboard is able to hold any more memory, and if so, try installing more memory. With 16 GB, my Win7 computer is at least fast at failing cryo workunits.


I am about maxed in each machine, meaning that less then 1gb per machine could be added, at least for those less then 16gb. I am still using some older mb's that can only support 4gb max, most already have that amount but a couple only have 3gb. They were upgraded from a 32bit OS and going from 3 to 4gb on a Boinc only machine is pretty pointless.
ID: 75491 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Yifan Song
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 May 09
Posts: 62
Credit: 7,322
RAC: 0
Message 75495 - Posted: 27 Apr 2013, 1:21:15 UTC

the memory problem turned out to be tricky. As least for my local test, it wasn't running out of memory, or memory leak. Turned out, there was a problem with the potential gradient calculation in the electron density energy function. And this makes the reference frame to drift away, and eventually crash.
I just updated the code and restarted the jobs. If you still see the errors after updating the application, please post here: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6222 I'll be monitoring that thread closely.
Yifan
ID: 75495 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 16
Credit: 33,020,247
RAC: 0
Message 75497 - Posted: 27 Apr 2013, 6:19:07 UTC - in response to Message 75495.  

I just updated the code and restarted the jobs.
Thank you for trying to fix this...
ID: 75497 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1997
Credit: 9,727,255
RAC: 10,571
Message 75498 - Posted: 27 Apr 2013, 8:38:21 UTC - in response to Message 75460.  

Docking@home is getting my cpu cycles.


Mmmm, not the best comunicative project...
ID: 75498 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TJ

Send message
Joined: 29 Mar 09
Posts: 127
Credit: 4,799,890
RAC: 0
Message 75500 - Posted: 27 Apr 2013, 12:05:34 UTC - in response to Message 75485.  

I have plenty of memory in my systems but I never see more then the half of it used by BOINC even if all cores are crunching. So I guess putting in a lot of memory is not needed for happy crunching, the apps need to be good.
And as can be read the Rosie app is updated to a new version.
Greetings,
TJ.
ID: 75500 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,312,750
RAC: 3,043
Message 75504 - Posted: 27 Apr 2013, 14:56:17 UTC - in response to Message 75500.  

I have plenty of memory in my systems but I never see more then the half of it used by BOINC even if all cores are crunching. So I guess putting in a lot of memory is not needed for happy crunching, the apps need to be good.
And as can be read the Rosie app is updated to a new version.


Note that if you're using 64-bit Windows Vista, no more than about half the memory CAN be listed as used by BOINC if you're running only 32-bit workunits, since an approximately equal amount of memory is needed for the SYSWOW64 modules needed to run the 32-bit workunits but these modules are not counted toward the memory used by BOINC.

In order to see these modules listed in Windows Task Manager, you must enable Show processes from all users, then look for images listed as svchost.exe.

If you're using 64-bit Windows 7 instead, a higher fraction of the memory can be listed as in use by BOINC because it provides much smaller SYSWOW64 modules.
ID: 75504 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Marcin

Send message
Joined: 26 May 13
Posts: 8
Credit: 764,523
RAC: 0
Message 75674 - Posted: 26 May 2013, 23:00:18 UTC
Last modified: 26 May 2013, 23:02:28 UTC

i'd like to report another bunch of crashed tasks listed below:


583777491 529945440 26 May 2013 21:01:02 UTC 26 May 2013 22:39:54 UTC Over Client error Compute error 0 0x010aa64b SIGPIPE: write on a pipe with no reader
1 0x0034b437 SIGPIPE: write on a pipe with no reader
SIGPIPE: write on a pipe with no reader
2 0x99efb5b7 3 0x99ee5d4e SIGPIPE: write on a pipe with no reader
583760239 530074706 26 May 2013 18:55:14 UTC 26 May 2013 20:38:07 UTC Over Validate error Done
583723990 530093498 26 May 2013 14:37:51 UTC 26 May 2013 14:50:29 UTC Over Client error Compute error error:sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range
ID: 75674 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 75682 - Posted: 29 May 2013, 23:07:18 UTC

RB_05 tasks are killing me. I am getting pop up memory error messages a lot now. What is going on?
ID: 75682 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Leigh Duffill

Send message
Joined: 2 Feb 11
Posts: 2
Credit: 15,176,837
RAC: 0
Message 75710 - Posted: 6 Jun 2013, 9:44:40 UTC

I have a number of clients running, but recently i've added a new machine and re activated some old machines that Ihad re imaged and the clients are not downloading or computing tasks, they are all resulting in client errors.

Client ids im having issues with are 1619183 1618973 and 1618951
ID: 75710 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,796,856
RAC: 15,683
Message 75711 - Posted: 6 Jun 2013, 10:03:28 UTC - in response to Message 75710.  
Last modified: 6 Jun 2013, 10:04:02 UTC

I have a number of clients running, but recently i've added a new machine and re activated some old machines that Ihad re imaged and the clients are not downloading or computing tasks, they are all resulting in client errors.

Client ids im having issues with are 1619183 1618973 and 1618951


Hi

You're getting <error_code>-200</error_code> (you can see that if you click on any of the Task ID links for the failed work units). Googling that leads to this old thread which suggests that it might be that downloads of .exe files are blocked by the firewall, or a lack of disk space - could that be it?

If you want to try dropping them in the folder manually, I believe you just need to put the rosetta exe (minirosetta_3.46_windows_intelx86.exe) and the graphics exe (minirosetta_graphics_3.43_windows_intelx86.exe) into your projects folder (default is something like "c:programdataboincdataprojects").

You can download those files manually from here:
https://boinc.bakerlab.org/rosetta/download/ (scroll to the bottom for the .exe files.)

HTH
Danny
ID: 75711 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Leigh Duffill

Send message
Joined: 2 Feb 11
Posts: 2
Credit: 15,176,837
RAC: 0
Message 75714 - Posted: 6 Jun 2013, 11:38:04 UTC - in response to Message 75711.  

I have a number of clients running, but recently i've added a new machine and re activated some old machines that Ihad re imaged and the clients are not downloading or computing tasks, they are all resulting in client errors.

Client ids im having issues with are 1619183 1618973 and 1618951


Hi

You're getting <error_code>-200</error_code> (you can see that if you click on any of the Task ID links for the failed work units). Googling that leads to this old thread which suggests that it might be that downloads of .exe files are blocked by the firewall, or a lack of disk space - could that be it?

If you want to try dropping them in the folder manually, I believe you just need to put the rosetta exe (minirosetta_3.46_windows_intelx86.exe) and the graphics exe (minirosetta_graphics_3.43_windows_intelx86.exe) into your projects folder (default is something like "c:programdataboincdataprojects").

You can download those files manually from here:
https://boinc.bakerlab.org/rosetta/download/ (scroll to the bottom for the .exe files.)

HTH
Danny


Thanks for the info

I was in fact missing the exe files on the affected machines, and our proxy should block exe's so that probably explains why, however I had started milkyway to see if that worked and it did, and it even managed to download it's exe's.

I'll leave the stations a while and see if they pick up any jobs.

Cheers
ID: 75714 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Ameobea

Send message
Joined: 5 Apr 12
Posts: 3
Credit: 451,318
RAC: 0
Message 75721 - Posted: 7 Jun 2013, 21:07:31 UTC

Hello! I'm new to these forums, and am not sure if this is the right place to post this question. Anyway, I have recently gotten a new computer and installed boinc on it. Rosetta@home being my favorite project, that was the first project for me to install.

However, as far as I can see, every workunit has failed with an error code of -200. I looked it up, and found little information on the topic. The exit code is -186 (0xffffffffffffff46). I'm not sure if the errors are a result of an unstable processor or hardware compatibilities or some other issue I'm not aware of.

I'd appreciate any help you guys could give me! Here's a link to one of the failed workunits:

https://boinc.bakerlab.org/result.php?resultid=585870618
ID: 75721 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,796,856
RAC: 15,683
Message 75722 - Posted: 7 Jun 2013, 22:37:12 UTC - in response to Message 75721.  

Hello! I'm new to these forums, and am not sure if this is the right place to post this question. Anyway, I have recently gotten a new computer and installed boinc on it. Rosetta@home being my favorite project, that was the first project for me to install.

However, as far as I can see, every workunit has failed with an error code of -200. I looked it up, and found little information on the topic. The exit code is -186 (0xffffffffffffff46). I'm not sure if the errors are a result of an unstable processor or hardware compatibilities or some other issue I'm not aware of.

I'd appreciate any help you guys could give me! Here's a link to one of the failed workunits:

https://boinc.bakerlab.org/result.php?resultid=585870618

Hi Ameobea, my first guess would be disk space - is there enough on the partition that the BOINCData folder is on?

Otherwise, could be a download error for that file that's mentioned...

Danny
ID: 75722 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Ameobea

Send message
Joined: 5 Apr 12
Posts: 3
Credit: 451,318
RAC: 0
Message 75724 - Posted: 7 Jun 2013, 23:36:03 UTC - in response to Message 75722.  

Hello! I'm new to these forums, and am not sure if this is the right place to post this question. Anyway, I have recently gotten a new computer and installed boinc on it. Rosetta@home being my favorite project, that was the first project for me to install.

However, as far as I can see, every workunit has failed with an error code of -200. I looked it up, and found little information on the topic. The exit code is -186 (0xffffffffffffff46). I'm not sure if the errors are a result of an unstable processor or hardware compatibilities or some other issue I'm not aware of.

I'd appreciate any help you guys could give me! Here's a link to one of the failed workunits:

https://boinc.bakerlab.org/result.php?resultid=585870618

Hi Ameobea, my first guess would be disk space - is there enough on the partition that the BOINCData folder is on?

Otherwise, could be a download error for that file that's mentioned...

Danny


Yes, boinc has plenty of spare space. I gave it 10gb, and it's only using just over 2. Also, I've been getting that error on almost every workunit I've gotten so far, so I don't think it's a download error. Maybe my antivirus is getting in the way. If it keeps up, I'll try disabling the firewall. I already set it to exclude the boinc program files and program data directories.
ID: 75724 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,796,856
RAC: 15,683
Message 75725 - Posted: 8 Jun 2013, 0:01:33 UTC - in response to Message 75724.  
Last modified: 8 Jun 2013, 0:02:12 UTC

Hello! I'm new to these forums, and am not sure if this is the right place to post this question. Anyway, I have recently gotten a new computer and installed boinc on it. Rosetta@home being my favorite project, that was the first project for me to install.

However, as far as I can see, every workunit has failed with an error code of -200. I looked it up, and found little information on the topic. The exit code is -186 (0xffffffffffffff46). I'm not sure if the errors are a result of an unstable processor or hardware compatibilities or some other issue I'm not aware of.

I'd appreciate any help you guys could give me! Here's a link to one of the failed workunits:

https://boinc.bakerlab.org/result.php?resultid=585870618

Hi Ameobea, my first guess would be disk space - is there enough on the partition that the BOINCData folder is on?

Otherwise, could be a download error for that file that's mentioned...

Danny


Yes, boinc has plenty of spare space. I gave it 10gb, and it's only using just over 2. Also, I've been getting that error on almost every workunit I've gotten so far, so I don't think it's a download error. Maybe my antivirus is getting in the way. If it keeps up, I'll try disabling the firewall. I already set it to exclude the boinc program files and program data directories.

Maybe try running memtest and/or prime95 to check the hardware then. Might be faulty RAM... Or PSU...
ID: 75725 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Ameobea

Send message
Joined: 5 Apr 12
Posts: 3
Credit: 451,318
RAC: 0
Message 75726 - Posted: 8 Jun 2013, 2:14:26 UTC - in response to Message 75725.  

Hello! I'm new to these forums, and am not sure if this is the right place to post this question. Anyway, I have recently gotten a new computer and installed boinc on it. Rosetta@home being my favorite project, that was the first project for me to install.

However, as far as I can see, every workunit has failed with an error code of -200. I looked it up, and found little information on the topic. The exit code is -186 (0xffffffffffffff46). I'm not sure if the errors are a result of an unstable processor or hardware compatibilities or some other issue I'm not aware of.

I'd appreciate any help you guys could give me! Here's a link to one of the failed workunits:

https://boinc.bakerlab.org/result.php?resultid=585870618

Hi Ameobea, my first guess would be disk space - is there enough on the partition that the BOINCData folder is on?

Otherwise, could be a download error for that file that's mentioned...

Danny


Yes, boinc has plenty of spare space. I gave it 10gb, and it's only using just over 2. Also, I've been getting that error on almost every workunit I've gotten so far, so I don't think it's a download error. Maybe my antivirus is getting in the way. If it keeps up, I'll try disabling the firewall. I already set it to exclude the boinc program files and program data directories.

Maybe try running memtest and/or prime95 to check the hardware then. Might be faulty RAM... Or PSU...


Well, I guess I should have checked my overclock better :p

It was still just a bit unstable, as demonstrated by a bluescreen part way into a prime95 run. Shows my nubbyness in regards to that.

Anyway, I nerfed my processor a bit and hope that works!

Thanks for the suggestion :)
ID: 75726 · Rating: 0 · rate: Rate + / Rate - Report as offensive
seybernetx

Send message
Joined: 16 Aug 10
Posts: 5
Credit: 1,520
RAC: 0
Message 75728 - Posted: 9 Jun 2013, 16:45:43 UTC

Are you folks having problems? I had been running Rosetta quite happily for quite a while. Then I quit getting work units.

No error messages (that I noticed, anyway), just no work units.

Did I miss an update or something? Any suggestions?
ID: 75728 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 75729 - Posted: 9 Jun 2013, 18:18:29 UTC - in response to Message 75728.  
Last modified: 9 Jun 2013, 18:22:45 UTC

Are you folks having problems? I had been running Rosetta quite happily for quite a while. Then I quit getting work units.

No error messages (that I noticed, anyway), just no work units.

Did I miss an update or something? Any suggestions?



Since your running other projects and just joined a new project, BOINC is taking time from Rosie to get that project up to speed. If you want to do more Rosie work then you need to go into your Rosetta account page and change your resource share to a higher number. Then do a update of Rosetta in your BOINC manager program so that it reads the new resource share and adjusts things accordingly. Check how your resource share is configured for each project by looking in the BOINC Manager projects tab and then go to each project page and adjust it according to how you want to run each project.

Your running the same kind of system I am. A quadcore xp system with SP3. Only other thing besides the above is to check is that any firewall you have on your system is not blocking the current program related to the tasks that are being run now. Sometimes a program for a new protein is not reconised or a new version of the rosetta program is put out and the firewall does not know what to do with it either.
ID: 75729 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 75730 - Posted: 9 Jun 2013, 23:57:21 UTC

The website page for your host indicates it has not contacted the server (i.e. not requested any work) since the 6th. So, as GregBE suggests, either your machine feels it already has enough work from other projects, or perhaps the Rosetta project is labelled as "no new tasks". See BOINC Manager projects tab status column.
Rosetta Moderator: Mod.Sense
ID: 75730 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 . . . 25 · 26 · 27 · 28 · 29 · 30 · 31 . . . 55 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org