Posts by Robert Gammon

1) Message boards : Number crunching : Rosetta@Home version 3.31 (Message 73117)
Posted 20 May 2012 by Profile Robert Gammon
Post:
I'm sorry that you are so upset about this that you felt it was necessary to post twice about your personal choice on handling the situation.

Please don't let credit get in the way of the big picture here. CASP 10 (just like CASP 9, and CASP 8, ...and CASP 7, ... ... and CASP6) will show the significance of what you are contributing to here. The methods developed in BakerLab are consistently proven to be amongst the best and most applicable across the world's entire scientific community.

I know the issues behind the credit will be addressed, indeed they always are. It happens and in recognition that it happens, and that your continued crunching is important to helping resolve it, the project awards credits to reported failures with a nightly run that has been in place for many years now. When this script grants credits, you have to display the WU details to see the awarded credit.


Here is one of my failed work units from Rosetta 3.31 running on Ubuntu 12/04 64 bit under BOINC 7.0.28. Are you saying that this wu was granted credit in spite of the Client Error status???


Task ID 507128901
Name ab_11_29__optpps_T5441_optpps_03_09_35686_298319_0
Workunit 462102711
Created 19 May 2012 13:50:48 UTC
Sent 19 May 2012 13:51:46 UTC
Received 19 May 2012 22:42:03 UTC
Server state Over
Outcome Client error
Client state New
Exit status 0 (0x0)
Computer ID 1543142
Report deadline 29 May 2012 13:51:46 UTC
CPU time 10030.05
stderr out
<core_client_version>7.0.28</core_client_version>
<![CDATA[
<stderr_txt>
[2012- 5-19 13:46:14:] :: BOINC:: Initializing ... ok.
[2012- 5-19 13:46:14:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: _00001
Starting work on structure: _00002
Starting work on structure: _00003
Starting work on structure: _00004
Starting work on structure: _00005
Starting work on structure: _00006
Starting work on structure: _00007
Starting work on structure: _00008
Starting work on structure: _00009
======================================================
DONE :: 1 starting structures 10029.7 cpu seconds
This process generated 9 decoys from 9 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>
Validate state Invalid
Claimed credit 87.855182957171
Granted credit 87.855182957171
application version ---

2) Message boards : Number crunching : Rosetta@Home version 3.31 (Message 73112)
Posted 20 May 2012 by Profile Robert Gammon
Post:
A couple of failures on W7, both work units crashing after 10 hours on a 6 hour preference



I see a similar problem with 3.31 running on Ubuntu Linux 12.04 64 bit.

Looks like all WUs must be abandoned as there is a clear issue with v3.31

My wus are 4 hour WUs that run in the expected amount of time. All 3.31 wus have been returned as client error.
3) Questions and Answers : Unix/Linux : Rosetta@home wont run on ubuntu (Message 73111)
Posted 20 May 2012 by Profile Robert Gammon
Post:
After upgrading this machine to Ubuntu 12.04 64 bit, Rosetta downloaded the Rosetta Mini 3.31 application.

The workunits download fine, the workunits run in the expected amount of time, but all returned units are seen as client error.

Guess I must abandon all workunits as there will not be any credit granted.

I will have to go back to check if Rosetta Mini 3.30 workunits were granted any credit. Uncertain about this point.
4) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 72697)
Posted 8 Apr 2012 by Profile Robert Gammon
Post:
Internet access OK - project servers may be temporarily down.

What is happen?


Whenever we release a new version of the application, the servers get hammered with everyone automatically downloading it. This results in intermittent failures in uploading.

Don't worry, Boinc should keep trying to send the results, and it will get through shortly when the load on the servers finally goes down.


This does not appear to be my current issue. Last night and this morning, I turned in 6-8 completed workunits. My list of tasks shows the reporting of the uploads and credit granted. However, total credit and average credit figures are still stuck at yesterday afternoon figures.
5) Message boards : Number crunching : Problems with Rosetta version 5.98 (Message 55091)
Posted 15 Aug 2008 by Profile Robert Gammon
Post:
Boinc 5.10.45 on XP SP2
Rosetta Beta 5.28

Most of the time, the workunit progresses normally, executing 0.05% or so per tick (on my machine 3-5 seconds) UNTIL WE GET TO ABOUT 90%. Work then slows WAY down, executing 0.001% per tick (same schedule as before at about 3-5 seconds per tick).

This behavior is work unit specific as some complete in as little as 1.5 hours, with no hangups in the 90+% range, while others take 3+ to 4+ hours. The last one, which I will upload tomorrow is M624_BOINC_MFR_ABRELAX_PICKED_4395_9734_0. It took an exceptionally long time to complete, some 4:43:07.

6) Message boards : Number crunching : Rosetta Checks (Message 54935)
Posted 5 Aug 2008 by Profile Robert Gammon
Post:
So the message REALLY is, DO NOT SHUTDOWN A COMPUTER THAT IS WORKING ON ROSETTA. TO LOSE POWER IN ANY FORM OR FASHION, AND THE LIKELIHOOD IS VERY VERY HIGH THAT YOU WILL LOSE ALL WORK ON THAT WORKUNIT!!!


I think that is being a little too dramatic.
Each time I shut down the computer for whatever reason, I am always surprised how little time I have lost. With the units I have been given the lost time is usually less than 10 minutes.

To get around your key problem plug in an old keyboard. It will solve your problem and also make typing easier .


===========================================
My losses are measured to as much as 4 hours, 1/6th of hour is great if i could get that.

old keyoard is ot availale ad i have o trasport or cah to uy oe
7) Message boards : Number crunching : Rosetta Checks (Message 54819)
Posted 2 Aug 2008 by Profile Robert Gammon
Post:
Different types of tasks have different abilities to take checkpoints. Over time, they are always working towards adding more checkpoints to tasks. Especially to tasks that do not presently have any. However, they are also adding over time more new types of tasks. So, each type actually has to endure and mature a little, before it becomes clear whether it is going to show promising results. Only then can you know if it is worth further refinement and investment to add further coding to take checkpoints.


So the message REALLY is, DO NOT SHUTDOWN A COMPUTER THAT IS WORKING ON ROSETTA. TO LOSE POWER IN ANY FORM OR FASHION, AND THE LIKELIHOOD IS VERY VERY HIGH THAT YOU WILL LOSE ALL WORK ON THAT WORKUNIT!!!

The laptop in question has no battery. In this respect, it behaves more like a desktop computer that does not run 24x7. Other BOINC apps may or may not have the same characteristic. Seti work units are roughly 10x bigger, that is take roughly 10x longer to process on the same hardware than Rosetta work units and they have always been long work units to process. Checkpointing keeps us from losing Seti work, but not Rosetta.

And since the laptop cannot type 'n' or 'b', creating a file named cc_config.xml is quite impossible.

Should be able to get a new one tomorrow night.
8) Message boards : Number crunching : Rosetta Checks (Message 54752)
Posted 30 Jul 2008 by Profile Robert Gammon
Post:
Cut paste assumes that I have a file here that will have the lost keys

I do ot have such a file
9) Message boards : Number crunching : Rosetta Checks (Message 54751)
Posted 30 Jul 2008 by Profile Robert Gammon
Post:
Robert, you might try copy/paste for the letters you are unable to type.

What Robert is asking about is how exiting BOINC and restarting sometimes causes a task to begin again from the start. And, based on his prior post, he has noticed that if he suspends a task, rather then exiting BOINC, that is does not start over again.

Robert, any time Rosetta is ended, (not just suspended, but ended) it will have to restart from it's last checkpoint. Checkpoints are saved periodically. But different tasks are able to checkpoint more or less frequently then others. In your case you are seeing as much as a hour or two lost. For tasks with very long running models, and infrequent checkpoints, this is normal. And yes, the Project Team is aware that valueable work is being lost. And they are always working to add more checkpoints over time to the tasks that presently do not checkpoint frequently, or in some cases, they only checkpoint after each completed model.

Rosetta doesn't want to grind your hard disk away by writing all the time. The takes time away from crunching. So there is a fine line to walk here between checkpoint too frequently, and not frequently enough.

All Rosetta tasks checkpoint when a model is completed. You can see this on the website by looking at your results and seeing the number of "decoys" produced. As Rosetta is running on your machine, you can see in the graphic, the current model you are working on and get a feel for how frequently a new model is started.



I have RAC of over 7, so Rosetta has some experiece with how much time is required. My losses are usually more like 3 hours, vs the 1 to 2 hours quoted
here.

I disagree somewhat with the moderator's commets. If I do a orderly shutdow, Suspedig Rosetta, the Exitig OIC, the Shutdow XP, powerup at a ew locatio, restart OIC, Resume Rosetta, this SHOULD restart AT or close to the exit, i.e. if we are at 92% complete, if should restart at 92% or very very close to that. Most of the time, it goes to 0.0%
10) Message boards : Number crunching : Rosetta Checkpointing (Message 54742)
Posted 29 Jul 2008 by Profile Robert Gammon
Post:
I have a XP laptop machine that runs BOINC. its old, and somewhat unreliable, but I cannot afford to replace it now.

The power cord is frayed and MUST stay in a PARTICULAR position in order for the machine to stay powered up. This means if the machine gets bumped or moved, we get a sudden, unexpected power failure. This is no different than someone experiencing power failure due to lighting, just LOTS more frequent. In addition, the laptop has to be moved to get to a location with internet access (wireless access only)

On the intentional power down events, I have done an orderly shutdown of BOINC prior to shutdown. In both cases (orderly shutdown and unexpected), Rosetta REPEATS the WU from 0.0% almost regardless of how far along the WU is.

It was suggested by the moderator that Suspending the project before intentional power downs would go a long way to solving the problem.

Well, I just Suspended the project, told BOINC to shutdown, then did a PowerDown of the computer using XP's Shutdown command. I moved the computer to get wireless inet access, checked a few things, and did an XP Shutdown again.

When I powered back up, restarted BOINC, and Resumed Rosetta, the Rosetta 5.98 WU that was at 92+% complete, reset back to 0.0%. This is very frustrating!!!
11) Message boards : Number crunching : Rosetta Checks (Message 54702)
Posted 28 Jul 2008 by Profile Robert Gammon
Post:
Please forgive my awkward spellig here as this computer has keyoard prolems. The two letters over the space ar will <> show up.

If OIC (see prolems agai) termiates aormally (more prolems), Rosetta losses track of where it was i the WU ad frequetly restarts from zero, occassioally repeatig half the work already doe.

Seti does ot appear to have the same issue. Power fails, XP restart, etc do ot appear to casue ay prolems with the Seti app. It restarts from the last work as expected 99+% of the time.

The Rosetta moderator says "Susped the app efore exitig OIC to avoid this kow prolem" Power failure ad XP lockup o this laptop make that almost impossile to do.

12) Message boards : Number crunching : Problems with Rosetta version 5.98 (Message 54696)
Posted 27 Jul 2008 by Profile Robert Gammon
Post:
BOINC Client 5.10.45
Rosetta 5.98
WinXP SP2 - intermittently connected to Internet

[snip]
In a few moments, 0.00% complete, 3:55:20 to completion!!!

On computers with more than one project active, if this is NOT unique to my laptop, switching to other projects from Rosetta, then back to Rosetta, should show the same characteristic. Note that this is a configuration item on all project Account Info pages, interval between switching tasks. Mine is set to 3 hours.

I cannot do this as I only have access to a single computer.


I tried again, putting the project on Suspend, waiting 30 minutes while I did some other work, then did a Resume, and EUREKA, it WORKED, execution continued from the spot it left of when the Suspend was issued.

So this makes it seem like the signal BOINC issues when the user EXITS the application leaves the Rosetta work unit in an unstable state, same as an abort due to power fail on the computer. SUSPEND appears to act differently and Rosetta does an orderly pause of the work.

13) Message boards : Number crunching : Problems with Rosetta version 5.98 (Message 54691)
Posted 27 Jul 2008 by Profile Robert Gammon
Post:
BOINC Client 5.10.45
Rosetta 5.98
WinXP SP2 - intermittently connected to Internet

[snip]
The WU gets reprocessed, redoing the work of 2-4 hours compute time.


I just duplicated this again. I did an orderly shutdown to move the laptop. Rosetta was at 95.583% complete.

When BOINC restarted, Setiathome was the selected task. I let that run for about 5 minutes, then suspended Seti and allowed Rosetta to restart.

In a few moments, 0.00% complete, 3:55:20 to completion!!!

On computers with more than one project active, if this is NOT unique to my laptop, switching to other projects from Rosetta, then back to Rosetta, should show the same characteristic. Note that this is a configuration item on all project Account Info pages, interval between switching tasks. Mine is set to 3 hours.

I cannot do this as I only have access to a single computer.
14) Message boards : Number crunching : Problems with Rosetta version 5.98 (Message 54690)
Posted 27 Jul 2008 by Profile Robert Gammon
Post:
BOINC Client 5.10.45
Rosetta 5.98
WinXP SP2 - intermittently connected to Internet

This problem duplicates with almost any WU.

Scenario is that the laptop is connected to internet long enough to upload completed results and to request/download new work. The laptop then disconnects from Internet to begin number crunching.

Rosetta processes the file a variable amount ( I have seen 55%, 72%, 88%, and 97% completion), then for one reason or another, BOINC shuts down (XP locks up and needs a reboot, power fails, or its time to shutdown for the night).

Note that some of these BOINC shutdowns are orderly, others are not. The result is the same, regardless of how we got there. Rosetta RESTARTS AT ZERO!! The WU gets reprocessed, redoing the work of 2-4 hours compute time.






©2021 University of Washington
https://www.bakerlab.org