Minirosetta v1.47 bug thread.

Message boards : Number crunching : Minirosetta v1.47 bug thread.

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · Next

AuthorMessage
Matthias Lehmkuhl

Send message
Joined: 20 Nov 05
Posts: 10
Credit: 2,115,357
RAC: 361
Message 58467 - Posted: 4 Jan 2009, 14:34:14 UTC

got also one WU lr5_score12... with error

<message>
- exit code -1073741819 (0xc0000005)
</message>


wuid=198929431
Matthias

ID: 58467 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile sslickerson

Send message
Joined: 14 Oct 05
Posts: 101
Credit: 578,497
RAC: 0
Message 58480 - Posted: 4 Jan 2009, 18:06:45 UTC
Last modified: 4 Jan 2009, 18:08:37 UTC

@ greb_be & robertmiles

Thanks for looking into this. I let rosetta run last night with increased runtimes and I left the application in memory but I see that 1 wu did fail: 218380754 for the same reason as before.

Also of note, there were 20 that failed because of client error while downloading--couldn't get input files, MD5 check failed: 218580846 for instance.

On this computer, I have set Rosetta to no new work and I had to abort the remaining wu's. I really want to attach here but the problems are far too severe at the moment. Perhaps I'll try again in 6 months, but I must say, this is getting a bit old...



ID: 58480 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 58484 - Posted: 4 Jan 2009, 18:19:08 UTC

The runtime should not directly effect the success of a task. But, since it will run more models, it increases the odds of you hitting a long-running model. So, running 5 models on 5 different 1 hour tasks should give you the same result as running 5 models on a single 5 hour task. But if 20% of the models are long-running, you would say that 100% of your 5hr tasks "fail", and only 20% of your 1hr tasks do.

But, with a 1 hour runtime preference, the watchdog will kick in much sooner. If watchdog is set to 3 times normal, it would only allow a task to run for 3 hours. Whereas with the longer runtime above, it would go for up to a total of 15 before ending the task.
Rosetta Moderator: Mod.Sense
ID: 58484 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile sslickerson

Send message
Joined: 14 Oct 05
Posts: 101
Credit: 578,497
RAC: 0
Message 58486 - Posted: 4 Jan 2009, 18:21:13 UTC

Just for the fun of it I checked my desktop (AMD 4200+) for any errors, typically this one is and has been rock solid for years. Lo and behold, there was one error there that occurred in the past few hours with the same error as my vista laptop. So the error is not machine or cpu specific (AMD vs Intel...XP vs Vista) it has happened in each (as far as my setup at least).

AMD 4200:218361409
ID: 58486 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1966
Credit: 38,188,338
RAC: 11,005
Message 58491 - Posted: 4 Jan 2009, 18:36:53 UTC - in response to Message 58462.  
Last modified: 4 Jan 2009, 18:38:04 UTC

It seems like the 4 or 5 times that I have come back to Rosetta with this setup (64bit Vista) everything works well until the runtime is increased to greater than 1 hour. Perhaps I will increase the runtime but switch to "leave app in memory" to see if there is any change...

[...]

BOINC 6.4.5 is now available, which suggests that a few people found problems in BOINC 6.4.0 and more recent. I notice that all three of those workunits were the lr5_score12 type, which a few other people have been reporting having problems with. Note that some other threads indicate that Rosetta@home is likely to have problems supplying all the workunits that are requested for at least a few more hours, though.

@Robertsslickerson

I had loads of problems (can't acquire lockfile) with Vista64 until Boinc 6.4.5 at which point they disappeared completely. I also reduced my runtime to 2 hours for greater success with earlier versions. With 6.4.5 they seem to have gone. An upgrade may help you too.

That said, it hasn't solved any issues with exception errors, which I still get to a small extent (1 out of 93 when I investigated). All your problems seems to be of that type (many more than me) so it may not solve your problems.

For what it's worth, I kept applications in memory, which I understand to be the best advice. Maybe you should try that too. Hope it helps you to some degree.
ID: 58491 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 58493 - Posted: 4 Jan 2009, 18:42:07 UTC

Also check to see if processor usage is set to 100% ...

I saw a note on EaH that with windows and the processor usage not set to 100% this is a common error. In that this killed about 20 models here for me ... I am interested if this is really the case ... I know ROsetta runs well on OS-X in that I have not had any failures there ...

On Win XP I got 10 failures out of about 20 tries ... which is when *I* gave up again on RaH ...

I had set usage to 99% to give me a little more head room and that may have been enough to farble things up ...

Anyone up for the test?

THis is addressed to the "Cant' acquire lock-file" problem only ...
ID: 58493 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile sslickerson

Send message
Joined: 14 Oct 05
Posts: 101
Credit: 578,497
RAC: 0
Message 58499 - Posted: 4 Jan 2009, 19:56:07 UTC - in response to Message 58493.  

Thanks Paul and everyone else. I'll give these suggestions a try sometime this week (besides Rosie is out of work for the time being).

Also check to see if processor usage is set to 100% ...

I saw a note on EaH that with windows and the processor usage not set to 100% this is a common error. In that this killed about 20 models here for me ... I am interested if this is really the case ... I know ROsetta runs well on OS-X in that I have not had any failures there ...

On Win XP I got 10 failures out of about 20 tries ... which is when *I* gave up again on RaH ...

I had set usage to 99% to give me a little more head room and that may have been enough to farble things up ...

Anyone up for the test?

THis is addressed to the "Cant' acquire lock-file" problem only ...





ID: 58499 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LizzieBarry

Send message
Joined: 25 Feb 08
Posts: 76
Credit: 201,862
RAC: 0
Message 58507 - Posted: 4 Jan 2009, 22:51:22 UTC - in response to Message 58499.  

Thanks Paul and everyone else. I'll give these suggestions a try sometime this week (besides Rosie is out of work for the time being).

Not to forget, it's an ideal time to reset the project too.
ID: 58507 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2842
Credit: 2,020,043
RAC: 0
Message 58540 - Posted: 5 Jan 2009, 20:45:17 UTC

ID: 58540 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Ian_D

Send message
Joined: 21 Sep 05
Posts: 55
Credit: 4,216,173
RAC: 0
Message 58562 - Posted: 6 Jan 2009, 12:17:15 UTC
Last modified: 6 Jan 2009, 12:27:00 UTC

Duplicate post


ID: 58562 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Ian_D

Send message
Joined: 21 Sep 05
Posts: 55
Credit: 4,216,173
RAC: 0
Message 58564 - Posted: 6 Jan 2009, 12:21:06 UTC
Last modified: 6 Jan 2009, 12:27:28 UTC

Duplicate post - wow it's slow.....


ID: 58564 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Ian_D

Send message
Joined: 21 Sep 05
Posts: 55
Credit: 4,216,173
RAC: 0
Message 58565 - Posted: 6 Jan 2009, 12:23:46 UTC

<core_client_version>6.2.12</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
# cpu_run_time_pref: 28800

ERROR: phil how did we get here-2?
ERROR:: Exit from: src/core/kinematics/AtomTree.cc line: 1378
called boinc_finish

</stderr_txt>
]]>


You're having a laugh, right ?

https://boinc.bakerlab.org/rosetta/result.php?resultid=218842260


ID: 58565 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,670,291
RAC: 2,328
Message 58572 - Posted: 6 Jan 2009, 16:21:34 UTC

me and the 2nd cruncher both got computer errors on lr5_score12_rlbd_1ubi_IGNORE_THE_REST_DECOY_5559_1100_1

the combined task summary is https://boinc.bakerlab.org/rosetta/workunit.php?wuid=198993154]here

the error is:
<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 14400


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0049162C read attempt to address 0x00000000

Engaging BOINC Windows Runtime Debugger...

it ran CPU time 1089.156 seconds out of 1440 on my machine

and on the other system

Computer ID 593083
Report deadline 13 Jan 2009 17:11:30 UTC
CPU time 526.051
stderr out

<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0049162C read attempt to address 0x00000000

Engaging BOINC Windows Runtime Debugger...

ID: 58572 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim Leatherman

Send message
Joined: 15 Jun 08
Posts: 2
Credit: 987,127
RAC: 0
Message 58573 - Posted: 6 Jan 2009, 17:07:16 UTC

After upgrading to 6.4.5 BOINC doesn't seem to be downloading any tasks now for Rosetta@Home. Same message all the time:

01/06/09 12:00:56|rosetta@home|Fetching scheduler list
01/06/09 12:01:01|rosetta@home|Master file download succeeded
01/06/09 12:01:06|rosetta@home|Sending scheduler request: To fetch work. Requesting 172801 seconds of work, reporting 0 completed tasks
01/06/09 12:01:11|rosetta@home|Scheduler request completed: got 0 new tasks

I have reset the project, but still no downloads -- was working fine prior to 6.4.5.

Any ideas?
ID: 58573 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim Leatherman

Send message
Joined: 15 Jun 08
Posts: 2
Credit: 987,127
RAC: 0
Message 58574 - Posted: 6 Jan 2009, 17:09:58 UTC

After upgrading to 6.4.5 BOINC doesn't seem to be downloading any tasks now for Rosetta@Home. Same message all the time:

01/06/09 12:00:56|rosetta@home|Fetching scheduler list
01/06/09 12:01:01|rosetta@home|Master file download succeeded
01/06/09 12:01:06|rosetta@home|Sending scheduler request: To fetch work. Requesting 172801 seconds of work, reporting 0 completed tasks
01/06/09 12:01:11|rosetta@home|Scheduler request completed: got 0 new tasks

I have reset the project, but still no downloads -- was working fine prior to 6.4.5.

Any ideas?
ID: 58574 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Evan

Send message
Joined: 23 Dec 05
Posts: 268
Credit: 402,585
RAC: 0
Message 58577 - Posted: 6 Jan 2009, 17:35:21 UTC

The problem is not at your end. If you have similar problems in the future always check the server status. Right now there are problems on the other end as you will see by the prominent red boxes.
ID: 58577 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,670,291
RAC: 2,328
Message 58579 - Posted: 6 Jan 2009, 18:39:58 UTC - in response to Message 58577.  

The problem is not at your end. If you have similar problems in the future always check the server status. Right now there are problems on the other end as you will see by the prominent red boxes.


Generate work servers have been offline today (European time)for quite some time. No news from the team as to what is causing this outage. Keep an eye on the server status page to see when they come back online.
ID: 58579 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rifleman

Send message
Joined: 19 Nov 08
Posts: 17
Credit: 139,408
RAC: 0
Message 58591 - Posted: 7 Jan 2009, 7:23:21 UTC

I have 3 finished WUs that don't seem to upload to the server---is that because of the problems today?
ID: 58591 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rifleman

Send message
Joined: 19 Nov 08
Posts: 17
Credit: 139,408
RAC: 0
Message 58592 - Posted: 7 Jan 2009, 7:25:45 UTC

I have 3 finished WUs that don't seem to upload to the server---is that because of the problems today?
ID: 58592 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,670,291
RAC: 2,328
Message 58593 - Posted: 7 Jan 2009, 8:20:08 UTC - in response to Message 58592.  

See the server problems thread. Apparently a connection to the outside world got pulled while they were working on the rack. They expect things to be be up and running today. But it is midnight pacific time at the moment, so don't expect anything to happen for at least 8 hours. If you go into boinc manager and then goto the projects tab, you can set RAH to 'accept no new tasks' and that will stop it from requesting new work. This will cut back on your status messages. Turn it back on later tonight (European time).


I have 3 finished WUs that don't seem to upload to the server---is that because of the problems today?

ID: 58593 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · Next

Message boards : Number crunching : Minirosetta v1.47 bug thread.



©2024 University of Washington
https://www.bakerlab.org