Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 40 · 41 · 42 · 43 · 44 · 45 · 46 . . . 55 · Next
Author | Message |
---|---|
Andrew Sanchez Send message Joined: 25 Nov 14 Posts: 2 Credit: 4,008 RAC: 0 |
I have a 3 tasks right now that kinda weird. Well, they look like they ran successfully. I guess the date thing is just normal and the tasks were resends or something. |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
I've noticed a couple of my boxes have gone idle, apparently at random, recently... it appears this is the cause: I have noticed this too, since last week. Once I entered the room and there was no noise of running fans. Checking shows all Rosetta WU's ready and project back off for more than 16 hours. After manually updating, everything went smooth again. Has happend now 3 times on two different rigs. Greetings, TJ. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Well, they look like they ran successfully. I guess the date thing is just normal and the tasks were resends or something. Yes, the website shows all pending tasks with date and time in UTC. Your local BOINC Manager shows them in the local time zone of your machine. Rosetta Moderator: Mod.Sense |
Killersocke@rosetta Send message Joined: 13 Nov 06 Posts: 29 Credit: 2,579,125 RAC: 0 |
Task ID 703335720 Created 29 Nov 2014 21:35:01 UTC Outcome Client error Client state Compute error Exit status -1073740940 (0xffffffffc0000374) Workunit: 636829405 636829405 stderr out <core_client_version>7.4.27</core_client_version> <![CDATA[ <message> (unknown error) - exit code -1073740940 (0xc0000374) </message> <stderr_txt> [2014-11-30 10:31:59:] :: BOINC:: Initializing ... ok. [2014-11-30 10:31:59:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. command: projects/boinc.bakerlab.org_rosetta/minirosetta_3.52_windows_x86_64.exe @141112.2.5L_8H_C4_5_fold_and_dock_flags -silent_gz -mute all -out:file:silent default.out -in:file:boinc_wu_zip fold_and_dock_141112.2.5L_8H_C4_5_data.zip -nstruct 10000 -cpu_run_time 21600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 1325984 Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_3d2618f.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/fold_and_dock_141112.2.5L_8H_C4_5_data.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. </stderr_txt> ]]> Validate state Invalid |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2130 Credit: 41,424,155 RAC: 16,102 |
rb_11_26_50854_96408__t000__3_C1_SAVE_ALL_OUT_IGNORE_THE_REST_227069_1754_0 [quote]ERROR: Fullatom mismatch in checkpointer. ERROR:: Exit from: ......srcprotocolscheckpointCheckPointer.cc line: 379 [/quoted] I shouldn't really complain. I was granted 50% more than claimed credit, but obviously something's gone wrong there... |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1997 Credit: 9,747,451 RAC: 10,562 |
On my AMD FX 6300 (Win7), today: 03/12/2014 07:54:02 | rosetta@home | Requesting new tasks for CPU and AMD/ATI GPU |
Killersocke@rosetta Send message Joined: 13 Nov 06 Posts: 29 Credit: 2,579,125 RAC: 0 |
Task ID: 704019018 Name: 1L-14H-3L-6E-3L-14H-2L-6E-1L_1-2.P.0_0003_fold_SAVE_ALL_OUT_226560_148_0 Workunit: 637464884 Outcome Client error Client state Compute error Exit status -1073741819 (0xffffffffc0000005) Details: Details 637464884 |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2130 Credit: 41,424,155 RAC: 16,102 |
Two errors within tasks - one on behalf of someone else Error 1: 141111.4.3L_6H_C2_4_fold_and_dock_SAVE_ALL_OUT_224959_4892_0 ERROR: total_residue() != 0 Error 2 11.19.14.11.BL_tetramer_tyr_net3_fold_and_dock_SAVE_ALL_OUT_225832_4188_0 ERROR: std::abs( coordsys_rot.det() - 1.0 ) < 1e-6 That 2nd one is mine. It ran until the watchdog cut in after running without checkpointing for around 6 hours. It only seemed to give credit for 6 hours work instead of 8 (usual runtime) or 12 (when watchdog cut in). |
Killersocke@rosetta Send message Joined: 13 Nov 06 Posts: 29 Credit: 2,579,125 RAC: 0 |
today i've got 16 Files in series with issues |
Usuario1_S Send message Joined: 24 Mar 14 Posts: 92 Credit: 3,059,705 RAC: 0 |
12/29/14 00:24:07 | rosetta@home | [error] Signature verification failed for minirosetta_database_3d2618f.zip I get this after a few hours or a few days, then all the current WUs that were crunching stop along with the rest of waiting WUs in queue with a 'Computation Error' message, it's pretty random, I've go back to BOINC 6.13.12 and the erros, everything works fine on my PC. I have checked my PC: I have checked my RAM with Win7 Memory Diagnostic: Extended Test, which include no CPU Cache testing, 2 passes and my 8GB of DDR3-1333 RAM are OK, checked Event Log and no hard drive failures, I have the pagefile.sys on another partition, I have moved the file to another partition temporarily and filled its previous partition with zeros then made a full format and return it to this partition and chkdsk all my partitions twice, I even defragmented C: with PerfectDisk to check the read/write and no errors. I have no overclock, on CPU, only on GPU but I only use CPU for BOINC, so pretty sure is not hardware, otherwise the SMART system at boot time and the WIndows event loug would have a HDD or some other hardware related error. So it has to be your WUs. I'm getting tired of this, I have never had this when I was on World Community Grid, if this keeps happening I'm going back it where at least I'll be helping do useful research. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
So it has to be your WUs. That is possible, but I've not seen reports of such problems from others. Since the downloads are the same for everyone, any task requiring that file would fail the same way for everyone. It sounds like the symptoms that can result from an anti-virus application modifying the file, or a firewall modifying the download. Which AV do you use? Have you whitelisted R@h? Rosetta Moderator: Mod.Sense |
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
Hash failures on a downloaded file? Does your internet service compress the data (used to be typical on cellular internet)? Are you having any internet or router problems that would cause a relatively large file to fail with a connection reset therefore truncating the file? Any other atypical internet connection possibilities, like perhaps a proxy server? Just brainstorming here. |
Ron Send message Joined: 25 Dec 14 Posts: 1 Credit: 876,941 RAC: 0 |
I'm a newbie here. The last few days I've gotten a notice that says, "Message from account manager: Can't resolve host name." Yet the program seems to be running normally. What does this mean, and what should I do? |
Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,649,459 RAC: 0 |
Also may want to check that yourvdate, time, and timezone are set correctly on your PC and even on any routers you may be using as this can cause issues with certain security protocals, ssl, ect. Also just brainstorming! |
Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,649,459 RAC: 0 |
I'm a newbie here. The last few days I've gotten a notice that says, "Message from account manager: Can't resolve host name." Sounds like a DNS issue. You may want to flush your DNS cache. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I'm a newbie here. The last few days I've gotten a notice that says, "Message from account manager: Can't resolve host name." It can also mean that the BOINC Manager attempted to contact the project at a time when you were disconnected from the interest. It automatically does retries until it gets through. You can establish preferences that define the hours of the day that the BOINC Manager should perform its internet access if there are times when access is more expensive or more likely to slow other things you would like to do with your internet connection. You can also setup bandwidth limits and caps so BOINC is less likely to noticeably slow your other usage of the internet. Rosetta Moderator: Mod.Sense |
Oldman Send message Joined: 17 Oct 06 Posts: 4 Credit: 1,706,631 RAC: 0 |
2/10/2015 10:03:51 AM | | [error] Can't create HTTP response output file projects/boinc.bakerlab.org_rosetta/minirosetta_3.52_windows_x86_64.exe I'm having download problems with rosetta only. |
Questor Send message Joined: 3 Mar 06 Posts: 5 Credit: 1,100,880 RAC: 0 |
I have been runninng Rosetta@home since '06; but I think I am done with this project for the forseeable future. I just bought a new PC 3 months ago, and loaded it up with all the programs I usually run, including Boinc. Started out with the occasional BSOD; but I thought "I'ts windows, it happens". but it got worse, and worse and worse. It got to the point where the PC would not run for more then 15 minutes; if I could even get int owindows. I spent a lot of time trying to diagnose the problem, testing hardware, changing drivers, installing, upgrading and trying different firewall/anitvirus programs trying to find out what was killing my PC. Even once wiping everything and starting over from scratch. Errors kept pointing to Hard drive failure, but repeated tests kept showing the drives were perfectly fine. Finally, at a loss, I asked for help and someone pointed out to me while I had tested the drives for physical/mechanical failure, I had not tested the actual data intregity. So I did, and what an eye-opener it was. Hundreds and hundreds of file sorting errors, incorrect index enteries and orphaned file fragments showed up. And every bloody single one of them was part of a Rosetta@Home work unit. Every Rosetta work unit I processed was like a round of grapeshot fired though my hard-drives data. No wonder I was having so many problems. I have never had a moments problem with Seti@home, so I am not certain why Rosetta@home is such a continual source of user aggrivation. So, until I can be reasonably certain that processing future Rosetta@home work units is not the equivelent of intentionally giving ones self a virus to slowly corrupt your own hard drive data, I think I am unlinking the project. I am running Windows 7 Home edition 64 bit, on a Athlon ll X4 630 Socket AM3, Foxconn motherboard with a AMD 760G chipset and 6 gigs of memory. Boinc is set to 'Run Always' and 'Suspend GPU' but is disabled from auto-starting. Version 7.4.36 (x64). Projects are Rosetta@home and Seti@home 50/50 split. Hopefully this would help someone is diagnosing the problem, but I would not be holding my breath for someone from Rosetta@home to take the time to actually look into this either. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I had tested the drives for physical/mechanical failure, I had not tested the actual data intregity. What did you do to test the data integrity? Rosetta Moderator: Mod.Sense |
Questor Send message Joined: 3 Mar 06 Posts: 5 Credit: 1,100,880 RAC: 0 |
I has used Windows 7 built-in test, Western Digital's Data Lifeguard as well as using Seagate's SeaTools to all to check the drives for physical defects/failure. I used multiple tests as most of my BSOD errors repeatedly kept pointing to the drives as the reason for the system crashing. I had also used under CMD the 'SFC /scannow' to test the windows files for damage. It was only when someone suggested using under CMD 'CHKDSK /R' (which examines the drive for both physical errors as well as checking the data contents) did the indexing errors and orphaned file fragments, et al show up. Since the drives data intregity was cleaned up/repaired by CHKDSK the system has been stable. There were as I had mentioned in the previous post quite literally hundred of enteries needing to be repaired/removed, and all were part of Rosetta@home work units. There were multiple enteries for each work unit as they seemed to be unzipped/expanded with multiple similar naming for files, but with different extensions such as .SYM .WTS .CSV .PSSM .PARAM .LIN .OUT .PAP .TAB and many others I was unable to catch as the error data was scrolling constantly across the screen. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org