Only 20 credits for 25,000 seconds

Message boards : Number crunching : Only 20 credits for 25,000 seconds

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Francois Racine

Send message
Joined: 17 Nov 09
Posts: 9
Credit: 3,658,771
RAC: 0
Message 74960 - Posted: 24 Jan 2013, 11:37:12 UTC

Hello,

For a while now I only get 20 credits for tasks that ran for 25,000 seconds. You can see this for task 557922201 (WU 506964570). I have tasks that ran for 20,000 seconds that obtained 90 credits. It seems there is a magic limit that when hit the credits drop. I have 4 machines running Rosetta and they all face the same problem.

Thank you for reading this thread.

François
ID: 74960 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 74963 - Posted: 24 Jan 2013, 14:26:05 UTC

Here's the key lines from one of your sterr_out:

BOINC:: CPU time: 25230.3s, 14400s + 10800s[2013- 1-23 18:46:33:] :: BOINC
WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1
InternalDecoyCount: 0 (GZ)
-----
0
-----
Stream information inconsistent.
Writing W_0000001
======================================================
DONE :: 1 starting structures 25230.3 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
called boinc_finish

I can't say that I've seen it happen as frequently as you are encountering but I have seen it very rarely on some tasks (hybrid, I believe... the tasks starting with hyb, that also sometimes have the increased run times as you can see above)

Just some ideas... Hardware/software issues? An unstable overclock? Frequent reboots?

ID: 74963 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 207
Credit: 23,318,765
RAC: 11,495
Message 74964 - Posted: 24 Jan 2013, 14:34:49 UTC

It is just another long known buggy WU series. Almost all of them with names like hyb_.._bench_...
As I understand 20 Cr = no any usefull work done in WU.
ID: 74964 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Francois Racine

Send message
Joined: 17 Nov 09
Posts: 9
Credit: 3,658,771
RAC: 0
Message 74965 - Posted: 24 Jan 2013, 14:35:51 UTC - in response to Message 74963.  

Polian,

These 4 computers are all different, with no BIOS options forced (like overclocking). All are running Ubuntu Server 12.04 with the latest BOINC for Linux version (7.0.28). These computers are rebooted once every two-four weeks, when an Ubuntu upgrade requires it. I'm talking about these 4 computers because they all get the problem after running a 25,000 seconds task.

Up to know I did not believe the problem could come from the computer/OS. I thought the problem could come from Rosetta or the BOINC program itself. I'm surprised I'm the only one to get the problem since I have multiple, all different computers. Only common point between these computers is the BOINC software version and the OS.

Please let me know if you have more specific questions I could answer. Thank you for your help on this.

Francois
ID: 74965 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile trigggl
Avatar

Send message
Joined: 20 Apr 09
Posts: 4
Credit: 102,177
RAC: 0
Message 74972 - Posted: 24 Jan 2013, 22:10:21 UTC - in response to Message 74965.  

These 4 computers are all different, with no BIOS options forced (like overclocking). All are running Ubuntu Server 12.04 with the latest BOINC for Linux version (7.0.28). These computers are rebooted once every two-four weeks, when an Ubuntu upgrade requires it. I'm talking about these 4 computers because they all get the problem after running a 25,000 seconds task.

Up to know I did not believe the problem could come from the computer/OS. I thought the problem could come from Rosetta or the BOINC program itself. I'm surprised I'm the only one to get the problem since I have multiple, all different computers. Only common point between these computers is the BOINC software version and the OS.


Actually, I was getting this problem on my one host that was still validating. I use Gentoo and 7.0.29 on all of mine.
ID: 74972 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Francois Racine

Send message
Joined: 17 Nov 09
Posts: 9
Credit: 3,658,771
RAC: 0
Message 74973 - Posted: 25 Jan 2013, 0:23:27 UTC

Unfortunately I'm getting more and more of these tasks. If the problem does not get solved I will have to work for another project. It is sad because Rosetta is my favourite but I still want to be appreciated for the effort.
ID: 74973 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 74981 - Posted: 25 Jan 2013, 16:06:33 UTC

Yeah, I just checked one of my boxes, the frequency of failures have definitely increased on the hyb tasks, I got several in a row. I don't think there's anything on the user end that can be done (aside from aborting them).
ID: 74981 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Francois Racine

Send message
Joined: 17 Nov 09
Posts: 9
Credit: 3,658,771
RAC: 0
Message 74986 - Posted: 26 Jan 2013, 16:35:28 UTC

I do not know if the problematic tasks have been removed but all completed tasks of the last 24-48 hours are fine.
ID: 74986 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ananas

Send message
Joined: 1 Jan 06
Posts: 232
Credit: 752,471
RAC: 0
Message 75030 - Posted: 2 Feb 2013, 18:28:36 UTC
Last modified: 2 Feb 2013, 18:29:54 UTC

Problem not solved, they are still on delivery. I'll abort all "bench*IGNORE_THE_REST" tasks before they start I guess.
ID: 75030 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,285
RAC: 12,464
Message 75031 - Posted: 3 Feb 2013, 11:59:54 UTC
Last modified: 3 Feb 2013, 12:03:55 UTC

I got one too:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=508740380

On the workunits webpage it says:
errors Too many error results

Two of us tried to crunch it and it failed, this is NOT a user problem!!
This has happened on several, BUT NOT NEARLY ALL, of my units and I am using Boinc 7.0.45.
ID: 75031 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 352
Credit: 382,349
RAC: 0
Message 75032 - Posted: 3 Feb 2013, 22:31:12 UTC - in response to Message 75031.  

I got one too:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=508740380

This is one of those "1201 cpu seconds" WUs and both computers got exactly the credit they asked for (and not 20, so wrong thread I guess).
.
ID: 75032 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
googloo
Avatar

Send message
Joined: 15 Sep 06
Posts: 133
Credit: 21,682,818
RAC: 5,115
Message 75033 - Posted: 4 Feb 2013, 1:06:39 UTC

Getting really tired of long-running tasks that award only 20 credits.
ID: 75033 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ananas

Send message
Joined: 1 Jan 06
Posts: 232
Credit: 752,471
RAC: 0
Message 75034 - Posted: 4 Feb 2013, 8:38:36 UTC - in response to Message 75033.  
Last modified: 4 Feb 2013, 8:43:59 UTC

...
ID: 75034 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ananas

Send message
Joined: 1 Jan 06
Posts: 232
Credit: 752,471
RAC: 0
Message 75035 - Posted: 4 Feb 2013, 8:39:04 UTC - in response to Message 75034.  
Last modified: 4 Feb 2013, 8:58:27 UTC

same here :-(

Not only bench*ignore... are affected, other series have the problem too :

rb_02_02_36194_68641__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_73531_121_0

Common part is always Stream information inconsistent. in stderr and only one decoy in the result, plus this warning :

WARNING! cannot get file size for default.out.gz: could not open file. so it probably has actually generated nothing at all.

Unfortunately all those facts occur after the time has already been wasted so they cannot be used to abort the task before the calculation starts.

There is one thing that might help though :

OK:
Watchdog active.
Starting work on structure: _00001 <= *** difference ***
# cpu_run_time_pref: 28800


damaged:

Watchdog active.
# cpu_run_time_pref: 28800


so Starting work is missing completely quite close to the start already
ID: 75035 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 352
Credit: 382,349
RAC: 0
Message 75040 - Posted: 4 Feb 2013, 22:43:04 UTC - in response to Message 75035.  

so Starting work is missing completely quite close to the start already

It is also missing on many WUs not affected by this issue (nothing uncommon actually), here one example from your tasks: 559976267.

Or from my tasks: 558392756, 558605706, 558900949, 559432297, 559647034 and 559856339.
That's 6 out of 14 tasks currently available in my list, none of them is affected by the 20 credits issue.
.
ID: 75040 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
googloo
Avatar

Send message
Joined: 15 Sep 06
Posts: 133
Credit: 21,682,818
RAC: 5,115
Message 75157 - Posted: 24 Feb 2013, 6:45:31 UTC

Yet another one: task 564377097.
ID: 75157 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,681,767
RAC: 2,630
Message 75167 - Posted: 26 Feb 2013, 18:15:01 UTC - in response to Message 75157.  

Even worth :(

Whiskey Tango Foxtrot???
0.77 credits for 11k seconds (and apparently 73 decoys detected only to reset itself...)

Any one care to explain?

Ralf
ID: 75167 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,681,767
RAC: 2,630
Message 75192 - Posted: 2 Mar 2013, 22:59:00 UTC - in response to Message 75167.  

Even worth :(

Whiskey Tango Foxtrot???
0.77 credits for 11k seconds (and apparently 73 decoys detected only to reset itself...)

Any one care to explain?

Ralf
And another one.
First crunching happily and generating 59/59 decoys, then starts over to report another, single one for a mere 0.69 credits...

It would really be nice if someone has a reasonable explanation for this or even better, would try to fix this. This is really getting old... :(

Ralf
ID: 75192 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 75197 - Posted: 4 Mar 2013, 14:49:32 UTC

This appears to be unrelated to the original problem of this thread. That said, I don't remember an issue like your sterr_out shows being previously reported. Try updating your BOINC core client. Many users are using 7.0.52 (myself included) and it appears stable.

I'm wondering if the watchdog is sensing that it is hung and restarts.
ID: 75197 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Francois Racine

Send message
Joined: 17 Nov 09
Posts: 9
Credit: 3,658,771
RAC: 0
Message 75292 - Posted: 27 Mar 2013, 10:31:04 UTC

I'm back. The problem didn't show up for a few weeks but it's back. I now see 3 tasks than ran for 25000 seconds and obtained 20 credits. One of these is task 571259532. Thank you for looking at this.

Regards,

François
ID: 75292 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : Only 20 credits for 25,000 seconds



©2024 University of Washington
https://www.bakerlab.org