Report Problems with Rosetta Version 5.16 I

Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next

AuthorMessage
Nightbird

Send message
Joined: 17 Sep 05
Posts: 70
Credit: 32,418
RAC: 0
Message 16663 - Posted: 19 May 2006, 21:21:51 UTC - in response to Message 16656.  
Last modified: 19 May 2006, 21:24:00 UTC




ID: 16663 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 16676 - Posted: 20 May 2006, 4:22:58 UTC

Psssst... Jose, are you in here? Hows it going?
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 16676 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Seth Aaronson
Avatar

Send message
Joined: 5 Mar 06
Posts: 18
Credit: 3,976
RAC: 0
Message 16677 - Posted: 20 May 2006, 4:35:38 UTC

No Screen Saver, no errors.
If anyone is interested, here are the running processes from the Hijack this log:

Running processes:
C:WINDOWSSystem32smss.exe
C:WINDOWSsystem32winlogon.exe
C:WINDOWSsystem32services.exe
C:WINDOWSsystem32lsass.exe
C:WINDOWSsystem32svchost.exe
C:WINDOWSSystem32svchost.exe
C:Program FilesAheadInCDInCDsrv.exe
C:WINDOWSExplorer.EXE
C:WINDOWSsystem32spoolsv.exe
C:WINDOWSzHotkey.exe
C:Program FilesAheadInCDInCD.exe
C:Program FileseMachines Bay Readershwiconem.exe
C:Program FilesCommon FilesMicrosoft SharedWorks SharedWkUFind.exe
C:PROGRA~1GrisoftAVGFRE~1avgcc.exe
C:PROGRA~1GrisoftAVGFRE~1avgemc.exe
C:Program FilesHPhpcoretechhpcmpmgr.exe
C:Program FilesiTunesiTunesHelper.exe
C:Program FilesQuickTime7qttask.exe
C:Program FilesQUICKENWQAGENT.EXE
C:WINDOWSsystem32mrtMngr.EXE
C:Program FilesMicrosoft SQL Server80ToolsBinnsqlmangr.exe
C:Program FilespalmOneHOTSYNC.EXE
C:Program FilesHPhpcoretechcomphptskmgr.exe
C:PROGRA~1GrisoftAVGFRE~1avgamsvr.exe
C:PROGRA~1GrisoftAVGFRE~1avgupsvc.exe
C:CFusionMX7runtimebinjrunsvc.exe
C:CFusionMX7dbslserver54binswagent.exe
C:CFusionMX7runtimebinjrun.exe
C:CFusionMX7dbslserver54binswstrtr.exe
C:CFusionMX7dbslserver54binswsoc.exe
C:CFusionMX7verityk2_nti40bink2admin.exe
C:Program FilesCisco SystemsVPN Clientcvpnd.exe
C:WINDOWSLogWatNT.exe
C:CFusionMX7verityk2_nti40bink2server.exe
C:CFusionMX7verityk2_nti40bink2index.exe
C:Program FilesMySQLMySQL Server 4.1.1.2abinmysqld-nt.exe
C:WINDOWSsystem32nvsvc32.exe
C:Program FilesMicrosoft SQL Server90Sharedsqlbrowser.exe
C:WINDOWSSystem32svchost.exe
C:WINDOWSsystem32UAService7.exe
C:Program FilesVMwareVMware Playervmware-authd.exe
C:Program FilesCommon FilesVMwareVMware Virtual Image Editingvmount2.exe
C:WINDOWSsystem32vmnat.exe
C:Program FilesMicrosoft SQL ServerMSSQL.1MSSQLBinnmsftesql.exe
C:WINDOWSsystem32vmnetdhcp.exe
C:Program FilesiPodbiniPodService.exe
C:Program FilesBOINCboincmgr.exe
C:Program FilesBOINCboinc.exe
C:Program FilesBOINCprojectssetiathome.berkeley.edusetiathome_5.15_windows_intelx86.exe
C:Program FilesBOINCprojectsboinc.bakerlab.org_rosettarosetta_5.16_windows_intelx86.exe
C:Program FilesMozilla Thunderbirdthunderbird.exe
C:PROGRA~1MOZILL~4FIREFOX.EXE
ID: 16677 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
K1100LTSE
Avatar

Send message
Joined: 28 Feb 06
Posts: 7
Credit: 192,387
RAC: 0
Message 16690 - Posted: 20 May 2006, 11:38:22 UTC

Result ID 20893015
Name t283_HOMOLOG_ABRELAX_hom001__515_20431_0
Workunit 17437191
Created 19 May 2006 20:14:40 UTC
Sent 19 May 2006 22:20:09 UTC
Received 20 May 2006 10:53:20 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -1073741811 (0xc000000d)
Computer ID 193286
Report deadline 2 Jun 2006 22:20:09 UTC
CPU time 2431.078125
stderr out <core_client_version>5.4.9</core_client_version>
<message>
- exit code -1073741811 (0xc000000d)
</message>
<stderr_txt>
# random seed: 3329570
# cpu_run_time_pref: 10800

</stderr_txt>


Validate state Invalid
Claimed credit 3.69662235638505
Granted credit 0
application version 5.16

ID: 16690 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jimi@0wned.org.uk

Send message
Joined: 10 Mar 06
Posts: 29
Credit: 335,252
RAC: 0
Message 16691 - Posted: 20 May 2006, 11:55:27 UTC

Failed. I accidentally knocked the power off the water pump without noticing and the CPU brewed up. No damage.

Result ID 20840881
Name t283_HOMOLOG_ABRELAX_hom001__515_15191_0
Workunit 17390031
ID: 16691 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
pieface

Send message
Joined: 20 Sep 05
Posts: 17
Credit: 797,661
RAC: 0
Message 16692 - Posted: 20 May 2006, 12:46:05 UTC

This is probably the same 0xc0000005 problem as i reported on 5.13 earlier here
but on a different machine, still win xp, but this one is a pentium-m.
the unit died overnite, i.e. no-one was messing with the screensaver or anything and then the security package tied things up with a dialog box because rosetta was trying to access a DNS server. output is in result.
I guess this means that with the 'new' debugger code all of the executing programs have to be identified to security software in case they need to go out looking for symbols for a dump or something?
ID: 16692 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
webmaster777

Send message
Joined: 15 Apr 06
Posts: 1
Credit: 13,500
RAC: 0
Message 16695 - Posted: 20 May 2006, 13:33:54 UTC

2006-05-19 16:51:00 [rosetta@home] Unrecoverable error for result T0283_FACONTACTS_hom003_508_18704_0 ( - exit code 1073807364 (0x40010004))
2006-05-20 14:52:55 [rosetta@home] Unrecoverable error for result t287_HOMOLOG_ABRELAX_hom001__513_16762_0 ( - exit code 1073807364 (0x40010004))

Both were ended automatically
claimed about 15 credit each
ID: 16695 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
truckpuller

Send message
Joined: 5 Nov 05
Posts: 40
Credit: 229,134
RAC: 0
Message 16696 - Posted: 20 May 2006, 13:56:43 UTC
Last modified: 20 May 2006, 13:58:09 UTC

I keep getting the Message( If this happens again you may need to reset the project) i have reset the project about a week ago and now im getting this message back again. Also i have noticed my RAC on the same machine has dropped from 250 down to like 220 or so.
Visit us at Christianboards.org
ID: 16696 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 16697 - Posted: 20 May 2006, 14:14:01 UTC - in response to Message 16696.  
Last modified: 20 May 2006, 14:18:28 UTC

I keep getting the Message( If this happens again you may need to reset the project) i have reset the project about a week ago and now im getting this message back again. Also i have noticed my RAC on the same machine has dropped from 250 down to like 220 or so.

This error is usually accompanied by a mention of a missing file, and it can usually be ignored. See here.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 16697 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
truckpuller

Send message
Joined: 5 Nov 05
Posts: 40
Credit: 229,134
RAC: 0
Message 16698 - Posted: 20 May 2006, 14:37:10 UTC - in response to Message 16697.  

I keep getting the Message( If this happens again you may need to reset the project) i have reset the project about a week ago and now im getting this message back again. Also i have noticed my RAC on the same machine has dropped from 250 down to like 220 or so.

This error is usually accompanied by a mention of a missing file, and it can usually be ignored. See here.


Ok i see where it said exited with zero status but no finished files and i have several of these so does this mean i get no credit for these jobs then.

Thanks for reply
Visit us at Christianboards.org
ID: 16698 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 16700 - Posted: 20 May 2006, 14:43:14 UTC - in response to Message 16698.  


Ok i see where it said exited with zero status but no finished files and i have several of these so does this mean i get no credit for these jobs then.

Thanks for reply


As you se on this page https://boinc.bakerlab.org/rosetta/result.php?resultid=19607341 you do get credit for the work done. :)

Anders n

ID: 16700 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 16704 - Posted: 20 May 2006, 15:01:52 UTC - in response to Message 16698.  
Last modified: 20 May 2006, 15:07:39 UTC

I keep getting the Message( If this happens again you may need to reset the project) i have reset the project about a week ago and now im getting this message back again. Also i have noticed my RAC on the same machine has dropped from 250 down to like 220 or so.

This error is usually accompanied by a mention of a missing file, and it can usually be ignored. See here.


Ok i see where it said exited with zero status but no finished files and i have several of these so does this mean i get no credit for these jobs then.

Thanks for reply

Let me apologize to everyone for what I am about to say.

This is the single most Frequently Asked Question (FAQ) on these forums. So logically the answer might be found in A FAQ. The FAQs take a lot of time to prepare and maintain, and I am beginning to think it is not helping people very much.

If you can't find the answer to your question there, I need to know that so I can add it if it would help a lot of people to see it.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 16704 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 16717 - Posted: 20 May 2006, 17:02:47 UTC - in response to Message 16676.  

Psssst... Jose, are you in here? Hows it going?


When I returned I discovered that the unit was stuck. The task manager did not record the existence of the Rosetta exe file ( it was not there) and yet the Boinc files were there and 99% of CPU went to idle. 2 attempts at reattaching had to be dumped as the exe files once in and started to work disappeared . Result 2 phantom files.

Not a great day for me. Rosetta or life wise. I am so angry O cane close to torching the computer .

I tried reattaching now. It is reportedly working. I am going back to bed. My head hurts. If machine fails again: it is torching time.
ID: 16717 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 16718 - Posted: 20 May 2006, 18:00:06 UTC - in response to Message 16717.  

Psssst... Jose, are you in here? Hows it going?


When I returned I discovered that the unit was stuck. The task manager did not record the existence of the Rosetta exe file ( it was not there) and yet the Boinc files were there and 99% of CPU went to idle. 2 attempts at reattaching had to be dumped as the exe files once in and started to work disappeared . Result 2 phantom files.

Not a great day for me. Rosetta or life wise. I am so angry O cane close to torching the computer .

I tried reattaching now. It is reportedly working. I am going back to bed. My head hurts. If machine fails again: it is torching time.

Well, at least this is different. Keep us posted. I have no idea why the EXE would vanish.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 16718 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 16726 - Posted: 20 May 2006, 22:00:33 UTC - in response to Message 16718.  

Keep us posted.


There is no way I can recover from this fiasco. I am frustrated as hell.

So I think I must ask for a special favor. IN my results page there are three WUS that are basically lost to me. No way they will be processed until they are removed from my account and sent to other computers to be processed. These are the units in question:

https://boinc.bakerlab.org/rosetta/result.php?resultid=20874518

https://boinc.bakerlab.org/rosetta/result.php?resultid=20830429

https://boinc.bakerlab.org/rosetta/result.php?resultid=20525150

Please cancel them from my computer and send them to others to be processed. It seems that is going to be the only thing I can do to advance the project.

I do apologize for all the time and resources I have wasted. Good luck to all.

I will keep reading the boards but, I don't think I should try downloading units. These are CASP units and all my errors and problems are a but a drag.

I am not a happy camper.
ID: 16726 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ian

Send message
Joined: 14 Apr 06
Posts: 29
Credit: 326,863
RAC: 637
Message 16727 - Posted: 20 May 2006, 22:13:46 UTC

Couple of errors from yesterday:

https://boinc.bakerlab.org/rosetta/result.php?resultid=20850737

https://boinc.bakerlab.org/rosetta/result.php?resultid=20846087
Ian Cundell, St Albans, UK
ID: 16727 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 16728 - Posted: 20 May 2006, 22:20:04 UTC - in response to Message 16726.  

...Please cancel them from my computer and send them to others to be processed. It seems that is going to be the only thing I can do to advance the project.

I do apologize for all the time and resources I have wasted. Good luck to all.

I will keep reading the boards but, I don't think I should try downloading units. These are CASP units and all my errors and problems are a but a drag.

I am not a happy camper.

Jose,

First, all Work Units of the same name are the same, only the random number that is used to initiate processing changes. So do not be concerned about them being lost to the project, they aren't. Second you are not wasting anyones time. If you were we would have abandoned you to marinade in BBQ sauce and tequila long ago.

However, We would like it very much if you could keep processing for RALPH. You are providing critical information that the project is using to find and kill the 107 error bug. The programers are particularly interested in why turning on EDP helped.

Anyone who is still having errors should attach to and run RALPH. The error rate is very low, on RALPH, but we think this is because the people having errors on Rosetta are not running RALPH. These are the very systems we need to have over there. So please ANY OF YOU STILL HAVING ERRORS, ATTACH TO RALPH! WE NEED YOU.

Please see Dr. Bakers journal for a message from the program team on this point.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 16728 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Aglarond

Send message
Joined: 29 Jan 06
Posts: 26
Credit: 446,212
RAC: 0
Message 16730 - Posted: 21 May 2006, 0:29:28 UTC

LINUX problem:
I need help with this problem: while running Rosetta on Linux server with PentiumIV HyperThreading processor, Rosetta occasionally hangs in a very strange state: everything is running except Rosetta. Boinc is running. Application on other thread (Simap@home) is running. Just Rosetta isn't. Processor is showing 50% idle. After some time Boinc decides to switch apps and Rosetta is preempted and there are 2 Simaps happily running. After some more time Boinc decides do switch apps and there are 2 Rosettas hanging and processor is 100% idle. And so on.
After 2 days I stopped Boinc and run it again and both Rosettas started to work normally. This didn't happen for the first time. I even tried to attach to RALPH some time ago, but it never occured there. It happens once in a 2-3 weeks.
Here are the results: (both are valid, but has something in stderr)
20587857
20574470
Do you have any idea what can be wrong? Is it a bug in Rosetta, or is there some problem with server, where I run it? (btw, yes I have permission from servers admin to run Boinc there)
ID: 16730 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 16733 - Posted: 21 May 2006, 1:09:10 UTC - in response to Message 16730.  
Last modified: 21 May 2006, 1:16:23 UTC

LINUX problem:
I need help with this problem: while running Rosetta on Linux server with PentiumIV HyperThreading processor, Rosetta occasionally hangs in a very strange state: everything is running except Rosetta. Boinc is running. Application on other thread (Simap@home) is running. Just Rosetta isn't.


I had encountered this particular issue back in Jan/Feb-06 (also under Linux). Overall about 5-6 times.

BOINC log would show that boinc restarted Rosetta, but the Rosetta process would just stay "idle" (ps flags were "SN"=sleep,nice consuming no CPU time) for hours/days, until I manually killed it (I guess nowadays the "watchdog" thread will catch it).

At the time, I thought it was an issue with Rosetta+BOINC interaction, as I think it happened upon resuming a Rosetta WU (with leave-in-mem=yes). At the time, I also suspected some issue with the system's resources, as that PC had only 256MB RAM and I was running 6 BOINC projects and 100+ processes.

It COULD have been a faulty WU, but when I ran that WU with rosetta commandline outside BOINC and it completed fine.

The things in common with your setup are BOINC 5.2.14 (optimised) and 2.4.x kernel (mine was Debian Sarge).

Trying to solve the problem, I reduced the # of BOINC projects to 4 (Rosetta, Ralph, Simap and LHC) and never experienced any problems for the past 3+ months (since Feb-06), crunching 24/7:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
boinc 2120 0.0 0.5 7452 3760 ? S Apr27 0:28 ./boinc_client
boinc 25448 7.8 5.2 62208 38776 ? SN May19 240:32 sixtrack_4.66_i68
boinc 1078 12.5 18.4 191428 136776 ? SN May20 59:42 rosetta_beta_5.16
boinc 1079 0.0 18.4 191428 136776 ? SN May20 0:00 rosetta_beta_5.16
boinc 1080 0.0 18.4 191428 136776 ? SN May20 0:00 rosetta_beta_5.16
boinc 1081 0.0 18.4 191428 136776 ? SN May20 0:00 rosetta_beta_5.16
boinc 5254 59.9 0.9 11700 7260 ? SN 02:15 59:51 simap_5.07_i686-p
boinc 5255 0.0 0.9 11700 7260 ? SN 02:15 0:00 simap_5.07_i686-p
boinc 5256 0.0 0.9 11700 7260 ? SN 02:15 0:00 simap_5.07_i686-p
boinc 5828 99.5 8.8 94972 65656 ? RN 03:16 38:55 rosetta_5.16_i686
boinc 5829 0.0 8.8 94972 65656 ? SN 03:16 0:00 rosetta_5.16_i686
boinc 5830 0.0 8.8 94972 65656 ? SN 03:16 0:00 rosetta_5.16_i686
boinc 5831 0.0 8.8 94972 65656 ? SN 03:16 0:00 rosetta_5.16_i686

PS: I believe there was a SIGSEGV violation signal in my case also. You can search my post history for the details.
Best UFO Resources
Wikipedia R@h
How-To: Join Distributed Computing projects that benefit humanity
ID: 16733 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Hoelder1in
Avatar

Send message
Joined: 30 Sep 05
Posts: 169
Credit: 3,915,947
RAC: 0
Message 16738 - Posted: 21 May 2006, 6:42:51 UTC
Last modified: 21 May 2006, 6:44:32 UTC

This result exited with code "1" giving the error message:

ERROR:: Exit at: dock_structure.cc line:401

This is a somewhat old Linux-box with just 256 MB memory but usually it runs stable - this is its first error in, I guess, months...
Team betterhumans.com - discuss and celebrate the future - hoelder1in.org
ID: 16738 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next

Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I



©2024 University of Washington
https://www.bakerlab.org