Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 21 · 22 · 23 · 24 · 25 · 26 · 27 . . . 309 · Next
Author | Message |
---|---|
fcbrants Send message Joined: 25 Mar 13 Posts: 13 Credit: 3,933,177 RAC: 0 |
Thanks, but after looking at the affected tasks, it looks like the result was discarded & no credit granted. That said, it's looking more & more like this was a problem with my Dell PERC H710P RAID card. The machine was sluggish as hell with the disk cache write back enabled & everything Really went south (machine became unbootable) after I tried a backup. Fiddled with it for days, finally pulled the backup battery off the card, which disabled the cache & let it sit overnight. Next morning, reinstalled the card, and back on go. Jacked my "use at most" CPU's back up to 100% & the machine is still snappy. Back to Munching & Crunching ;) Thanks for looking this up for me, if I run into problems again, I will try increasing this timeout. Franko The error message is displayed by the BOINC Client. |
fcbrants Send message Joined: 25 Mar 13 Posts: 13 Credit: 3,933,177 RAC: 0 |
Dang it, I'm still getting the same error. I tried to find the file app_control.cpp, but couldn't find it - is this a file I can edit? Thanks!! Franko The error message is displayed by the BOINC Client. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,338,560 RAC: 2,014 |
Dang it, I'm still getting the same error. [snip] Files with the .cpp extension are usually C++ source files, which can be edited. However, doing so is not useful unless: 1. You have a copy of the file. Most BOINC downloads do not include the source files - you have to know where to find the source files and download the entire package of source files. 2. You know enough C++ to make useful edits. 3. You have all of the compilers installed to compile the entire program for your operating system. 4. You have the instructions to compile all source files needed, and then link them into a new version of the program. 5. You know how to substitute the new version of the program for the old version. |
fcbrants Send message Joined: 25 Mar 13 Posts: 13 Credit: 3,933,177 RAC: 0 |
Got it, thanks!! I spent some more time with this machine running at 100% (32 Rosetta tasks + 1 SETI task on the GPU) & it DID hang occasionally, which would explain this error. As this is also my daily driver, I backed the "Use at most CPU's" option down to 93.75% (30 of 32 threads) & I haven't seen the problem since. Problem resolved. Thanks!! Franko Dang it, I'm still getting the same error. |
anklab Send message Joined: 1 Jun 10 Posts: 1 Credit: 9,599,886 RAC: 127 |
Hi! Recently, I have noticed that WU calculations that go on for a long time are also evaluated, as WU calculations that take place for a short time. For example, mu computers Intel Core2Duo E8500 and Intel Core i5-2500. E8500 get WUs with 4 hours crunching, i5-2500 with 24 hours. it is strange that different tasks with different work results are granted equally. Core i5-2500 // 24 hours // granted 160.33 ====================================================== E8500 // 4 hours // granted 152.93 ====================================================== Much earlier, i5-2500 received for each completed WU approximately 800~850 credits. What can i do? |
LarryMajor Send message Joined: 1 Apr 16 Posts: 22 Credit: 31,533,212 RAC: 0 |
Much earlier, i5-2500 received for each completed WU approximately 800~850 credits. I'd do nothing for a few days. It appears to have been the recent WUs/scoring that caused a big drop. Mine started to look more typical in the past 24 hours. |
[AF>Le_Pommier] Jerome_C2005 Send message Joined: 22 Aug 06 Posts: 44 Credit: 1,258,039 RAC: 0 |
Hi I have tasks erroring after 10 hours of calculation <core_client_version>7.14.2</core_client_version> A few did succeed from the same lot after the same amount of calculation time <core_client_version>7.14.2</core_client_version> |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,338,560 RAC: 2,014 |
I am getting a message of "Abandoned by Project" on too many workunits. With 8 hour workunits this is unacceptable and since I compute in the Gridcoin pool I cannot change my settings. Could this mean that your computer is so slow that two other computers have finished the workunit before your does? Does your computer finish workunits before their deadlines? |
Arnav Sood Send message Joined: 20 Aug 18 Posts: 2 Credit: 11,782,086 RAC: 0 |
Have been unable to upload work units since yesterday (two have timed out). Keeps telling me "project backoff." I'm on an iMac Pro 2017 running macOS 10.14 Mojave and BOINC 7.12 |
fcbrants Send message Joined: 25 Mar 13 Posts: 13 Credit: 3,933,177 RAC: 0 |
I just checked my logs back to 12/10 15:00 CST & it looks like I've been uploading continuously, uninterrupted. Win64 Boinc 7.12.1. Have been unable to upload work units since yesterday (two have timed out). Keeps telling me "project backoff." |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I was away from home (of course), and Rosetta took out my i7-4770. Everything was frozen up. I have never seen that before for Rosetta. Apparently it was this work unit: https://boinc.bakerlab.org/result.php?resultid=1046921926 <core_client_version>7.12.0</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255)</message> <stderr_txt> command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.07_i686-pc-linux-gnu @foldit_2006238_0004_fold_and_dock_flags -silent_gz -mute all -out:file:silent default.out -in:file:boinc_wu_zip fold_and_dock_foldit_2006238_0004_data.zip -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 2498717 Starting watchdog... Watchdog active. Starting watchdog... Watchdog active. ERROR: Unable to open database file for dun10 rotamer library: minirosetta_database/rotamer/shapovalov/StpDwn_0-0-0/cys.bbdep.rotamers.lib ERROR:: Exit from: src/core/pack/dunbrack/RotamerLibrary.cc line: 1085 BACKTRACE: [0xe8ca514] [0xca17443] [0xca178ce] [0xca92145] [0xc90133c] [0xc9ef641] [0xd019a4b] [0xd3e6e18] [0xd3eb9ce] [0xc96b2d1] [0xc963eb2] [0xb7fef3f] [0xac8f844] [0x9404246] [0x9299a6c] [0xc232777] [0xc234a84] [0xc2f46c0] [0xc2f323b] [0x929e531] [0x8054670] [0xedcf791] [0xedcf98d] [0x8266087] BOINC:: Error reading and gzipping output datafile: default.out 14:21:38 (2187): called boinc_finish(1) </stderr_txt> Rosetta is the only project I have running on that machine (limited to six cores, with two cores free); I don't even have a GPU installed. It probably won't happen again, but once is enough. EDIT: I updated Ubuntu 16.04, and upon reboot, picked up this in my BOINC log. I have never seen it before, and have no idea what it means. 6 Rosetta@home 12/14/2018 2:51:39 PM [error] App version has unsupported platform i686-pc-linux-gnu; changing to x86_64-pc-linux-gnu 7 Rosetta@home 12/14/2018 2:51:39 PM [error] State file error: duplicate app version: minirosetta x86_64-pc-linux-gnu 378 8 Rosetta@home 12/14/2018 2:51:39 PM [error] App version has unsupported platform i686-pc-linux-gnu; changing to x86_64-pc-linux-gnu But everything appears to be back to normal, and Rosetta is running OK now. |
Killersocke@rosetta Send message Joined: 13 Nov 06 Posts: 29 Credit: 2,579,125 RAC: 0 |
to my surprise i see 24 Tasks in my Profile uploaded to my PC In real i have 10 in my Boinc Manager Whats going on there? Anwendung Rosetta 4.07 Name foldit_2006238_0005_fold_and_dock_SAVE_ALL_OUT_707998_5433 Status Angehalten durch Benutzer erhalten Anwendung Rosetta 4.07 Name foldit_2006238_0002_fold_and_dock_SAVE_ALL_OUT_707992_5434 Status Angehalten durch Benutzer erhalten Anwendung Rosetta 4.07 Name foldit_2006254_0004_fold_and_dock_SAVE_ALL_OUT_708044_5432 Status Angehalten durch Benutzer erhalten slots/2 Anwendung Rosetta 4.07 Name foldit_2006238_0003_fold_and_dock_SAVE_ALL_OUT_707994_5434 Status Angehalten durch Benutzer erhalten slots/7 Anwendung Rosetta 4.07 Name foldit_2006238_1059_fold_and_dock_SAVE_ALL_OUT_708020_5431 Status Angehalten durch Benutzer erhalten slots/5 Anwendung Rosetta 4.07 Name foldit_2006238_1059_fold_and_dock_SAVE_ALL_OUT_708020_4988 Status Angehalten durch Benutzer erhalten slots/4 Anwendung Rosetta 4.07 Name foldit_2006254_0002_fold_and_dock_SAVE_ALL_OUT_708040_5432 Status Angehalten durch Benutzer erhalten slots/3 Anwendung Rosetta 4.07 Name foldit_2006254_0003_fold_and_dock_SAVE_ALL_OUT_708042_5432 Status Aktiv erhalten slots/6 Anwendung Rosetta 4.07 Name foldit_2006238_0004_fold_and_dock_SAVE_ALL_OUT_707996_5434 Status Aktiv erhalten slots/11 Anwendung Rosetta 4.07 Name foldit_2006238_0005_fold_and_dock_SAVE_ALL_OUT_707998_5434 Status Aktiv erhalten slots/13 |
jjch Send message Joined: 10 Nov 13 Posts: 14 Credit: 441,128,699 RAC: 19,444 |
I think I may be experiencing a similar issue. Recently I noted the work in progress value appeared to be approximately double the normal amount of work units I have running at a time. In order to trouble shoot this I set Rosetta to no new tasks and let them run out. Checking Boincstats I no longer have any work left on any host. According to Rosetta I currently have a total of 1709 tasks in progress. For example host 1770544 it is not running any Rosetta tasks but yet the In progress count is 216. https://boinc.bakerlab.org/rosetta/results.php?hostid=1770544&offset=0&show_names=0&state=1&appid= I did try resetting the project on that host but it didn't make any difference. My impression there is a problem on the Rosetta server side and it isn't updating the task status properly. I think we need the Rosetta programming team look into this further. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
According to Rosetta I currently have a total of 1709 tasks in progress. For example host 1770544 it is not running any Rosetta tasks but yet the In progress count is 216. This is interesting. I have 8 in progress, but Rosetta "In progress" shows 11. https://boinc.bakerlab.org/rosetta/results.php?hostid=3510039&offset=0&show_names=0&state=1&appid= It is the oldest three that are missing. That isn't a big difference, so I thought I would take a look in the BOINC log. I see the following curious entry for the oldest one (but it is the only one I see): 43 Rosetta@home 12/14/2018 3:52:05 PM [error] Can't parse file info in scheduler reply: file name is empty or has '..' 44 Rosetta@home 12/14/2018 3:52:05 PM [error] Can't parse file info in scheduler reply: file name is empty or has '..' 46 Rosetta@home 12/14/2018 3:52:05 PM [error] State file error: missing file r1_r1_ems_3hC_984_0002_000000007_0001_0001_0001_23_41_H_.._EHEE_10482_0001_0001_0001_0001_15_38_H_.._DHR70_DHR15_l2_t3_t2_D20_D25_ct21_nTerm_3x_r8_0001_0001_0001_0001_0002_0001_0001_0001_0001_fragments_data.zip 47 Rosetta@home 12/14/2018 3:52:05 PM [error] State file error: missing input file r1_r1_ems_3hC_984_0002_000000007_0001_0001_0001_23_41_H_.._EHEE_10482_0001_0001_0001_0001_15_38_H_.._DHR70_DHR15_l2_t3_t2_D20_D25_ct21_nTerm_3x_r8_0001_0001_0001_0001_0002_0001_0001_0001_0001_fragments_data.zip 48 Rosetta@home 12/14/2018 3:52:05 PM [error] Can't handle task r1_r1_ems_3hC_984_0002_000000007_0001_0001_0001_23_41_H_.._EHEE_10482_0001_0001_0001_0001_15_38_H_.._DHR70_DHR15_l2_t3_t2_D20_D25_ct21_nTerm_3x_r8_0001_0001_0001_0001_0002_0001_0001_0001_0001_fragment_706193_213 in scheduler repl 49 Rosetta@home 12/14/2018 3:52:05 PM [error] State file error: missing task r1_r1_ems_3hC_984_0002_000000007_0001_0001_0001_23_41_H_.._EHEE_10482_0001_0001_0001_0001_15_38_H_.._DHR70_DHR15_l2_t3_t2_D20_D25_ct21_nTerm_3x_r8_0001_0001_0001_0001_0002_0001_0001_0001_0001_fragment_706193_213 50 Rosetta@home 12/14/2018 3:52:05 PM [error] Can't handle task r1_r1_ems_3hC_984_0002_000000007_0001_0001_0001_23_41_H_.._EHEE_10482_0001_0001_0001_0001_15_38_H_.._DHR70_DHR15_l2_t3_t2_D20_D25_ct21_nTerm_3x_r8_0001_0001_0001_0001_0002_0001_0001_0001_0001_fragment_706193_213_1 in scheduler re Maybe someone can figure it out. |
Killersocke@rosetta Send message Joined: 13 Nov 06 Posts: 29 Credit: 2,579,125 RAC: 0 |
I'm scared I see 27 tasks with Status given up They are all from December 14th |
Killersocke@rosetta Send message Joined: 13 Nov 06 Posts: 29 Credit: 2,579,125 RAC: 0 |
Sorry Guys these are my time, my money and my costs So i will stop Rosetta now |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I don't see a problem with your completion rate. Everything looks pretty good. You may just see a status problem. |
Killersocke@rosetta Send message Joined: 13 Nov 06 Posts: 29 Credit: 2,579,125 RAC: 0 |
I don't see a problem with your completion rate. Everything looks pretty good. Sorry but this not my Problem |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2140 Credit: 41,518,559 RAC: 10,612 |
Sorry Guys I've got a similar problem - just posted somewhere else. Having evaluated what's happened, no time was involved, no download took place and no costs were incurred. Maybe 7 seconds of processing time were affected per download - once every few hours - but I'm not sure it was in place of anything else. The only problem for users seems to be a mismatch between the online list of your tasks and what shows in your offline task list. I suspect you wasted more energy clicking reply, typing 17 words and clicking Post reply. |
jjch Send message Joined: 10 Nov 13 Posts: 14 Credit: 441,128,699 RAC: 19,444 |
From what I can tell these work units were cancelled but the status remained In progress. If you check the Workunit under errors you will see WU cancelled. For example: https://boinc.bakerlab.org/workunit.php?wuid=942284714 I don't think there is anything major to worry about just an annoyance. It's not likely you lost any compute cycles either. The Rosetta programming team should clean this up if possible however I think they will disappear after the deadline expires. For now I have stopped all Rosetta computing until after Dec 23rd to see if this is true. FYI, I am giving WCG cycles in the meantime. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org