Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 . . . 55 · Next

AuthorMessage
Link
Avatar

Send message
Joined: 4 May 07
Posts: 352
Credit: 382,349
RAC: 0
Message 73537 - Posted: 26 Jul 2012, 19:22:07 UTC - in response to Message 73533.  
Last modified: 26 Jul 2012, 19:23:21 UTC

I get an instant 'compute error' on all my work units for the last few days now. No problems with other projects from WCG.

I assume you mean this computer. See this thread. In short: you might need to downgrade to BOINC v6.12.34, which you use on your other computers (and as you see they don't have such issues).
.
ID: 73537 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,811,598
RAC: 764
Message 73538 - Posted: 26 Jul 2012, 19:58:09 UTC - in response to Message 73537.  

I get an instant 'compute error' on all my work units for the last few days now. No problems with other projects from WCG.

I assume you mean this computer. See this thread. In short: you might need to downgrade to BOINC v6.12.34, which you use on your other computers (and as you see they don't have such issues).


There might be simpler solution than downgrading. He's getting -185 errors with "couldn't start Input file minirosetta_3.31_windows_intelx86.exe missing or invalid: -123: -123". Perhaps simply rebooting the computer and/or possibly resetting rosetta will do the trick.

Best,
Snags
ID: 73538 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 73539 - Posted: 26 Jul 2012, 20:03:21 UTC - in response to Message 73538.  

I get an instant 'compute error' on all my work units for the last few days now. No problems with other projects from WCG.

I assume you mean this computer. See this thread. In short: you might need to downgrade to BOINC v6.12.34, which you use on your other computers (and as you see they don't have such issues).


There might be simpler solution than downgrading. He's getting -185 errors with "couldn't start Input file minirosetta_3.31_windows_intelx86.exe missing or invalid: -123: -123". Perhaps simply rebooting the computer and/or possibly resetting rosetta will do the trick.

Best,
Snags


I agree - maybe the application became corrupted somehow and/or failed download? Try resetting the project.
ID: 73539 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 73540 - Posted: 26 Jul 2012, 22:26:00 UTC

More likely that an anti-virus or firewall has interfered with the executable in some way, or the permissions for the BOINC user are not sufficient to allow it to run.
Rosetta Moderator: Mod.Sense
ID: 73540 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile ex_brit
Avatar

Send message
Joined: 12 Dec 09
Posts: 15
Credit: 100,070
RAC: 0
Message 73541 - Posted: 26 Jul 2012, 22:45:17 UTC - in response to Message 73540.  

More likely that an anti-virus or firewall has interfered with the executable in some way, or the permissions for the BOINC user are not sufficient to allow it to run.


Why would that suddenly be a problem though? I admit I'm only observing this as I haven't lately had any problems at all with Rosetta, but I just had to ditch Poem@Home because every single WU I ever got from them failed - computation error and all they seem to want to blame is Boinc, or me.

Is anyone working with the Boinc people if indeed it is a Boinc problem?

Peter.
Toronto, Canada
ID: 73541 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 73549 - Posted: 29 Jul 2012, 13:57:32 UTC

Each BOINC project periodically sends out new versions of their application. This would mean that there is an executable file that must pass through a/v or firewall protections. It also means a new name of allowed exception might need to be defined, depending on how specifically the exceptions are named.

So, that might be a reason why it would be working one day and not the next, and why I suggested it as a thing to check in to.
Rosetta Moderator: Mod.Sense
ID: 73549 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile ex_brit
Avatar

Send message
Joined: 12 Dec 09
Posts: 15
Credit: 100,070
RAC: 0
Message 73550 - Posted: 29 Jul 2012, 14:03:20 UTC - in response to Message 73549.  

Each BOINC project periodically sends out new versions of their application. This would mean that there is an executable file that must pass through a/v or firewall protections. It also means a new name of allowed exception might need to be defined, depending on how specifically the exceptions are named.

So, that might be a reason why it would be working one day and not the next, and why I suggested it as a thing to check in to.


Understood. In the case of my security software I'd get a popup requesting permission which would have to be OK'd. I'm wondering if all software acts similarly or are people ignoring them.

Peter.
Toronto, Canada
ID: 73550 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 207
Credit: 23,330,965
RAC: 11,867
Message 73650 - Posted: 19 Aug 2012, 1:52:40 UTC

Hi all.
Сan you pass a wish (proposal) on the optimization of minirosetta app to the programmers of the project?
One of the minor drawbacks of the Rosetta@home project compared to other distributed computing projects (in addition to a very high RAM usage) is a major burden on the hard disk at startup.
When you running 2 or 4 computational threads it is usually not a serious problem, but one with a higher number of threads (eg in our team we have a few machines with 24 threads and 1 machine with 48 threads) BOINC start and load all these R@H threads is a problem - load may take up few tens of minutes, when a large number of threads competition to access the disk at the same time.

I do an analysis of the application work with the disc and found that almost all the load creates decompression of minirosetta_database_rev48292.zip archive to a working folder (...boincDataslots...), because archive contains at this moment 1517 files (while the number of all the other files to process one WU is typically less than 100).
In addintion this constant(per each WU - for exaple with standart Intel i7 CPU and default runtime it ~64 times per day) unpacking(writing) and removal(deleting) ~ 1500 files results in fast file system fragmentation and further slow down the disk.

If I understand correctly (correct me if I'm wrong), this file is a complete archive of the main minirosetta database and the processing of specific WUs use it read-only (no writing) and requires only a part (relative small?) of files from it.
Now, how to optimize the disk work.
I do not know, does BOINC architecture permittin to access files outside of the slots folder... Is so the best solution would be to extract and store the database in one instance (for example in ...boincDataprojectsboinc.bakerlab. org_rosetta folder) without unpacking it to a slots folder at every startup of each WU.
If this is not possible (as I suspect) then we have the option to copy the archive to slots folder without unpacking, reading only the necessary files directly from the zip/gzip archive. Relevant functions should be included in a set of standard libraries for many programming languages​​. I have a little programming experience, but I use these features and implementation is relative simple(usually just a few extra lines of code compared to reading from flat files)

Difference in the volume not so great (62Mb vs 147 Mb), but the number of files is huge (1 vs 1517). So acceleration for SSD drives will not be very significant, but on conventional HDD drives - by orders of magnitude (as they do not cope well with the processing of a large number of small files).
ID: 73650 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Trotador

Send message
Joined: 30 May 09
Posts: 108
Credit: 272,283,990
RAC: 212
Message 73687 - Posted: 24 Aug 2012, 8:04:11 UTC - in response to Message 73650.  

I subscribe this proposal/wish. As a owner of several 24/32 threads machines I've suffered this effect. It is also a blocking issue if someone want to set up a system based on a USB stick and no hdd since it can't keep up with this movement of data.

Hi all.
Сan you pass a wish (proposal) on the optimization of minirosetta app to the programmers of the project?
One of the minor drawbacks of the Rosetta@home project compared to other distributed computing projects (in addition to a very high RAM usage) is a major burden on the hard disk at startup.
When you running 2 or 4 computational threads it is usually not a serious problem, but one with a higher number of threads (eg in our team we have a few machines with 24 threads and 1 machine with 48 threads) BOINC start and load all these R@H threads is a problem - load may take up few tens of minutes, when a large number of threads competition to access the disk at the same time.

I do an analysis of the application work with the disc and found that almost all the load creates decompression of minirosetta_database_rev48292.zip archive to a working folder (...boincDataslots...), because archive contains at this moment 1517 files (while the number of all the other files to process one WU is typically less than 100).
In addintion this constant(per each WU - for exaple with standart Intel i7 CPU and default runtime it ~64 times per day) unpacking(writing) and removal(deleting) ~ 1500 files results in fast file system fragmentation and further slow down the disk.

If I understand correctly (correct me if I'm wrong), this file is a complete archive of the main minirosetta database and the processing of specific WUs use it read-only (no writing) and requires only a part (relative small?) of files from it.
Now, how to optimize the disk work.
I do not know, does BOINC architecture permittin to access files outside of the slots folder... Is so the best solution would be to extract and store the database in one instance (for example in ...boincDataprojectsboinc.bakerlab. org_rosetta folder) without unpacking it to a slots folder at every startup of each WU.
If this is not possible (as I suspect) then we have the option to copy the archive to slots folder without unpacking, reading only the necessary files directly from the zip/gzip archive. Relevant functions should be included in a set of standard libraries for many programming languages​​. I have a little programming experience, but I use these features and implementation is relative simple(usually just a few extra lines of code compared to reading from flat files)

Difference in the volume not so great (62Mb vs 147 Mb), but the number of files is huge (1 vs 1517). So acceleration for SSD drives will not be very significant, but on conventional HDD drives - by orders of magnitude (as they do not cope well with the processing of a large number of small files).

ID: 73687 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Clark Williams

Send message
Joined: 25 Nov 09
Posts: 3
Credit: 271,975
RAC: 0
Message 73699 - Posted: 27 Aug 2012, 0:40:19 UTC

I've been having trouble getting work downloaded from Rosetta for more than a week.
Something happening?
C. Williams
ID: 73699 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5662
Credit: 5,701,869
RAC: 2,154
Message 73703 - Posted: 27 Aug 2012, 8:17:59 UTC

the tasks that are labeled hyb_ac_bench_3rdeD_10_SAVE_ALL_OUT_IGNORE_THE_REST (and all the rest) are giving errors and giving me only 20 credits.

OINC:: CPU time: 36569.4s, 14400s + 21600s[2012- 8-26 6: 0:30:] :: BOINC
WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1
InternalDecoyCount: 0 (GZ)
-----
0
-----
Stream information inconsistent.
Writing W_0000001
======================================================
DONE :: 1 starting structures 36569.4 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
called boinc_finish

</stderr_txt>
]]>


what's this all about? more badly written code?
ID: 73703 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 73721 - Posted: 30 Aug 2012, 14:34:22 UTC - in response to Message 73699.  

I've been having trouble getting work downloaded from Rosetta for more than a week.
Something happening?
C. Williams


Nothing looking wrong on your host profile. Quota per day 100. What are you seeing for BOINC messages when you update to the project?
Rosetta Moderator: Mod.Sense
ID: 73721 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1982
Credit: 38,451,347
RAC: 14,322
Message 73738 - Posted: 3 Sep 2012, 3:44:30 UTC - in response to Message 73703.  

The tasks that are labeled hyb_ac_bench_3rdeD_10_SAVE_ALL_OUT_IGNORE_THE_REST (and all the rest) are giving errors and giving me only 20 credits.

OINC:: CPU time: 36569.4s, 14400s + 21600s[2012- 8-26 6: 0:30:] :: BOINC
WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1
InternalDecoyCount: 0 (GZ)
-----
0
-----
Stream information inconsistent.
Writing W_0000001
======================================================
DONE :: 1 starting structures 36569.4 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
called boinc_finish

</stderr_txt>
]]>

What's this all about? More badly written code?

Sorry for the late report, but we're getting this intermittently too:

On this computer, these tasks:
hyb_ab_bench_3rojD_SAVE_ALL_OUT_IGNORE_THE_REST_53909_1171_0
hybrid_ac_bench_4dg9A_SAVE_ALL_OUT_IGNORE_THE_REST_53491_67_0
hyb_ad_bench_T0528_SAVE_ALL_OUT_IGNORE_THE_REST_56539_32_1
hyb_ag_bench_2yeqB_SAVE_ALL_OUT_IGNORE_THE_REST_57261_513_0

These were all cut short by the watchdog and didn't complete 1 decoy

That said, that computer did complete several other tasks of this type without problem as shown here:

Full task list

ID: 73738 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 207
Credit: 23,330,965
RAC: 11,867
Message 73742 - Posted: 3 Sep 2012, 11:32:13 UTC - in response to Message 73738.  

These were all cut short by the watchdog and didn't complete 1 decoy

I encountered with this bug too. I report it in another thread: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6055&nowrap=true#73741
ID: 73742 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1982
Credit: 38,451,347
RAC: 14,322
Message 73743 - Posted: 3 Sep 2012, 13:54:13 UTC - in response to Message 73742.  

These were all cut short by the watchdog and didn't complete 1 decoy

I encountered with this bug too. I report it in another thread: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6055&nowrap=true#73741

Yup, the same. I think I should've posted in the thread you did too as it's not a rosetta@home issue
ID: 73743 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile ex_brit
Avatar

Send message
Joined: 12 Dec 09
Posts: 15
Credit: 100,070
RAC: 0
Message 73744 - Posted: 3 Sep 2012, 13:59:51 UTC

The latest versions of BOINC are causing lots of different issues with many projects.
Peter.
Toronto, Canada
ID: 73744 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Keith Jillings

Send message
Joined: 26 Sep 06
Posts: 7
Credit: 536,631
RAC: 0
Message 73748 - Posted: 3 Sep 2012, 16:50:59 UTC - in response to Message 73699.  

I've been having trouble getting work downloaded from Rosetta for more than a week.
Something happening?
C. Williams



Likewise.

I was having the same problem with SETI, which has just (this afternoon) downloaded several WUs after weeks of silence. I think it may be something to do with the latest BOINC update. The PC is busy crunching other stuff, so I'm not too bothered.

I can't find a way to "force" it download new work.
ID: 73748 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2842
Credit: 2,020,043
RAC: 0
Message 73749 - Posted: 3 Sep 2012, 17:23:20 UTC - in response to Message 73748.  
Last modified: 3 Sep 2012, 17:24:44 UTC

I've been having trouble getting work downloaded from Rosetta for more than a week.
Something happening?
C. Williams



Likewise.

I was having the same problem with SETI, which has just (this afternoon) downloaded several WUs after weeks of silence. I think it may be something to do with the latest BOINC update. The PC is busy crunching other stuff, so I'm not too bothered.

I can't find a way to "force" it download new work.



it shows......tflop estimate is down over 10% for the whole rosetta project
ID: 73749 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Clark Williams

Send message
Joined: 25 Nov 09
Posts: 3
Credit: 271,975
RAC: 0
Message 73757 - Posted: 4 Sep 2012, 1:39:16 UTC - in response to Message 73721.  

I've been having trouble getting work downloaded from Rosetta for more than a week.
Something happening?
C. Williams


Nothing looking wrong on your host profile. Quota per day 100. What are you seeing for BOINC messages when you update to the project?


Scheduler Request Pending, followed by communications deferred for about 4 minutes followed by nothing under tasks. My statistics are flatlined and have been since 2012 Aug 19.
C. Williams
ID: 73757 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1224
Credit: 13,848,401
RAC: 2,043
Message 73758 - Posted: 4 Sep 2012, 3:51:50 UTC - in response to Message 73757.  

I've been having trouble getting work downloaded from Rosetta for more than a week.
Something happening?
C. Williams


Nothing looking wrong on your host profile. Quota per day 100. What are you seeing for BOINC messages when you update to the project?


Scheduler Request Pending, followed by communications deferred for about 4 minutes followed by nothing under tasks. My statistics are flatlined and have been since 2012 Aug 19.
C. Williams


Do you happen to have a graphics board in your computer that BOINC can use for GPU workunits? For my computers with such boards, I've found that BOINC will not download any CPU workunits (such as those for Rosetta@Home) until it has at least one GPU workunit.
ID: 73758 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 . . . 55 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org