Progress at 100% for many hours

Message boards : Number crunching : Progress at 100% for many hours

To post messages, you must log in.

AuthorMessage
Asgardh

Send message
Joined: 23 Oct 05
Posts: 3
Credit: 22,780
RAC: 0
Message 1899 - Posted: 29 Oct 2005, 6:06:24 UTC

Hello
I'm connected to more than one project (5 project). The system is a laptop Pentium III with 256 MB ram and hd is partitioned with 6 GB for Windows XP system (partition used only for Boinc V5.2.2).
Application Rosetta 4.78
Job "1btn__abrelax_no_cst_02135" reached the 100% progress at 3:00:00 ca CPU Time.
At this moment the same job is 100% Progress and CPU Time 07:17:54.
Any idea of what is going on? must i abort the Work?
Is it in loop?
Asgardh
ID: 1899 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile KSMarksPsych
Avatar

Send message
Joined: 15 Oct 05
Posts: 199
Credit: 22,337
RAC: 0
Message 1901 - Posted: 29 Oct 2005, 8:28:36 UTC - in response to Message 1899.  

Hello
I'm connected to more than one project (5 project). The system is a laptop Pentium III with 256 MB ram and hd is partitioned with 6 GB for Windows XP system (partition used only for Boinc V5.2.2).
Application Rosetta 4.78
Job "1btn__abrelax_no_cst_02135" reached the 100% progress at 3:00:00 ca CPU Time.
At this moment the same job is 100% Progress and CPU Time 07:17:54.
Any idea of what is going on? must i abort the Work?
Is it in loop?




hi... i'm fairly new at this too, but could you post the logs from BOINC on this wu?

rosetta was having a problem with wu's getting stuck at 1% and someother different percentages.

I had one stuck at the 1% mark for 7 hours. I did a total reboot of my system and it went back and crunched the wu happily.

Just a thought (although I haven't heard of wu's sticking at 100%)


Kathryn
Kathryn :o)
The BOINC FAQ Service
The Unofficial BOINC Wiki
The Trac System
More BOINC information than you can shake a stick of RAM at.
ID: 1901 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 1904 - Posted: 29 Oct 2005, 9:32:45 UTC

If the work unit is in "preempted" status it can be at "100%" and not be doing anything. One of my greater annoyances is that I cannot make a setting to say "if at 90% or greater just finish the darn thing ..."

So, though it is showing 100%, it may only be 99.9999999999% done, and when it switches back it will finish that last bit and almost immediately upload ...
ID: 1904 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile scsimodo

Send message
Joined: 17 Sep 05
Posts: 93
Credit: 946,359
RAC: 0
Message 1905 - Posted: 29 Oct 2005, 9:37:59 UTC - in response to Message 1899.  

Hello
I'm connected to more than one project (5 project). The system is a laptop Pentium III with 256 MB ram and hd is partitioned with 6 GB for Windows XP system (partition used only for Boinc V5.2.2).
Application Rosetta 4.78
Job "1btn__abrelax_no_cst_02135" reached the 100% progress at 3:00:00 ca CPU Time.
At this moment the same job is 100% Progress and CPU Time 07:17:54.
Any idea of what is going on? must i abort the Work?
Is it in loop?


Had such a WU a few days ago, even after a couple of restarts it won't finish. The stderr.txt in the slots directory said something about an exception. I think it was the error mentioned in this thread.

scsimodo
ID: 1905 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Asgardh

Send message
Joined: 23 Oct 05
Posts: 3
Credit: 22,780
RAC: 0
Message 1906 - Posted: 29 Oct 2005, 12:52:46 UTC - in response to Message 1905.  

Hello
I'm connected to more than one project (5 project). The system is a laptop Pentium III with 256 MB ram and hd is partitioned with 6 GB for Windows XP system (partition used only for Boinc V5.2.2).
Application Rosetta 4.78
Job "1btn__abrelax_no_cst_02135" reached the 100% progress at 3:00:00 ca CPU Time.
At this moment the same job is 100% Progress and CPU Time 07:17:54.
Any idea of what is going on? must i abort the Work?
Is it in loop?


Had such a WU a few days ago, even after a couple of restarts it won't finish. The stderr.txt in the slots directory said something about an exception. I think it was the error mentioned in this thread.

scsimodo


Hi, no strange messages into the stderr.txt.
this is an extract from that files
# =====================================
# random seed: 1161341
# =====================================
No heartbeat from core client for 31 sec - exiting
# =====================================
# random seed: 248921
# =====================================
# =====================================
# random seed: 550881
# =====================================
# =====================================
# random seed: 918821
# =====================================

the system was rebooted twice
Boinc is running as service and the service was stopped and restarted many times. No changes seen.
The only thing i remember is that internet connection was not available for any hours.... nothing else


Asgardh
ID: 1906 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Don Joslyn

Send message
Joined: 22 Oct 05
Posts: 2
Credit: 187,235
RAC: 0
Message 1907 - Posted: 29 Oct 2005, 12:59:15 UTC

This week I had a Rosetta work unit do the same thing; it was stuck at 100% for a few days, using more and more CPU time but not finishing. I finally decided to suspend all other work units in my queue (SETI, Einstein, Rosetta) to see if it would finish if not preempted. That did the trick; it finally finished.

Don
ID: 1907 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Andrew

Send message
Joined: 19 Sep 05
Posts: 162
Credit: 105,512
RAC: 0
Message 1915 - Posted: 29 Oct 2005, 14:18:55 UTC
Last modified: 29 Oct 2005, 14:19:10 UTC

Asgardh, what is your "Switch between applications every" setting on the General preferences page?

Since you're machine is a P3 it might be too slow to finish the Rosetta WU before the boinc client switches to another project.

If you're setting is the default, 60 mins, I'd suggest increasing it to 120 mins or even 180 mins for your P3.

Note: this setting is an at-most setting, it will affect your other projects, but if the other projects (or even Rosetta) don't need the entire time then the boinc client will switch to another project.


ID: 1915 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 1918 - Posted: 29 Oct 2005, 16:46:57 UTC

Can anyone who experiences this problem email me the stdout.txt file? dekim at u.washington.edu
ID: 1918 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 7 Oct 05
Posts: 234
Credit: 15,020
RAC: 0
Message 1933 - Posted: 30 Oct 2005, 3:19:38 UTC

I had a WU on 100% for about 25 min, running, and I actually had my eyes on it, but suddenly it started to upload.

So I think that some Rosetta WU's requires a lot of power for the last calculations.

I'm thinking that your computer may not be powerfull for some of the WU's from Rosetta?? What do you say, Paul "The WIKI-man"? Just a thought...


[b]"I'm trying to maintain a shred of dignity in this world." - Me[/b]

ID: 1933 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 1961 - Posted: 31 Oct 2005, 0:20:35 UTC

I will be honest, I really don't have a good "feel" for the operational pattern for the Rosetta@Home work. This may be because in the explorations we are doing we are really doing more of a LHC@Home pattern of work than a SETI@Home ...

SETI@Home work units are relatively "stable" and take about the same time to run, each time, every time. The notable exception are the -9 work units which end in seconds to minutes.

LHC@Home work, on the other hand, has 3 "sizes", but since they all can end in instability, the run time does not even fit into nice time "bins" (I know, I looked at a BUNCH of them ... and I had a fairly even distribution ...

Worse, I have not been doing much in the computer room these last couple weeks ... so ... :(

My Fuzzy Hollynoodles give me way too much credit for perspicacity ...
ID: 1961 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Martin Johnson

Send message
Joined: 18 Oct 05
Posts: 19
Credit: 171,164
RAC: 0
Message 1968 - Posted: 31 Oct 2005, 2:17:12 UTC

Perhaps the answer to this problem is the same as I found on 2 other current threads - on switching projects, LEAVE IN MEMORY! If you don't, and you switch, when you come back, it restarts at some previous point, and re-crunches, again and again, never quite getting there.
ID: 1968 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Asgardh

Send message
Joined: 23 Oct 05
Posts: 3
Credit: 22,780
RAC: 0
Message 2651 - Posted: 8 Nov 2005, 19:38:33 UTC

Thank for your comments.
It finish the job when i suspend all the other project's work and left this complete.

No other times this problem occurred

Thanks i've learned something new.

Regards
Asgardh
ID: 2651 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 2672 - Posted: 8 Nov 2005, 22:25:26 UTC - in response to Message 1968.  

Perhaps the answer to this problem is the same as I found on 2 other current threads - on switching projects, LEAVE IN MEMORY! If you don't, and you switch, when you come back, it restarts at some previous point, and re-crunches, again and again, never quite getting there.


Thanks, I had forgotten this! I have now updated my BOINC general preferences. Appreciate it! :)

Regards,
Bob P.
ID: 2672 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Progress at 100% for many hours



©2024 University of Washington
https://www.bakerlab.org