Message boards : Number crunching : Minirosetta 2.00
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,518,559 RAC: 10,612 |
Can I be clear on something: The problems are with WUs with the name: lr8_combine_smooth_torsion_it00_rama* New WU's are coming down with the name: lr5_combine_smooth_torsion_it00_redo* Are these new ones ok? I think I aborted one by accident.That was wrong, wasn't it? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Can I be clear on something: I've received no specific word, but it sounds very likely, yes. No biggie. [edit]Yifan posted here confirming the rama batch had a problem in how it was created. Rosetta Moderator: Mod.Sense |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,518,559 RAC: 10,612 |
That's what I eventually realised before going into an abort-frenzy. Ok, I think I'm clear and fully re-stocked now. |
bruce Send message Joined: 15 Sep 07 Posts: 10 Credit: 839,797 RAC: 0 |
I'm also seeing a considerable number of WUs with errors similar to those posted by others recently. Here is an example of the messages on the client: 11/25/2009 1:10:59 PM rosetta@home Starting sel_core_1.0_low200_beta_low200_nostart_hb_t313__IGNORE_THE_REST_16216_654_1 11/25/2009 1:11:00 PM rosetta@home Starting task sel_core_1.0_low200_beta_low200_nostart_hb_t313__IGNORE_THE_REST_16216_654_1 using minirosetta version 200 11/25/2009 1:12:43 PM rosetta@home Computation for task sel_core_1.0_low200_beta_low200_nostart_hb_t313__IGNORE_THE_REST_16216_654_1 finished 11/25/2009 1:12:43 PM rosetta@home Output file sel_core_1.0_low200_beta_low200_nostart_hb_t313__IGNORE_THE_REST_16216_654_1_0 for task sel_core_1.0_low200_beta_low200_nostart_hb_t313__IGNORE_THE_REST_16216_654_1 absent Here are some examples from the results: 299750820 299623182 299623178 299623176 299623173 299623157 299621562 299618069 299540989 299540964 299524862 299522937 299509005 299508999 ... and 11 more WUs downloaded today (Nov 25) err'd with similar results 24 downloaded yesterday (nov 24) err'd with similar results. I'll monitor these boards for updates, 'till then I've suspended further WU downloads. |
darkpella Send message Joined: 27 Sep 05 Posts: 13 Credit: 66,840 RAC: 0 |
I'm also seeing a considerable number of WUs with errors similar to those posted by others recently. Similar here with the following WUs: 299885240 299643164 299547442 298811740 stderr is slightly different though. stderr from my WUs is like: <core_client_version>6.6.38</core_client_version> while the one from some of bruce's WUs is like: <core_client_version>6.10.18</core_client_version> |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
darkpella, all four of the tasks you linked are the known problem described here with "...rama..." in the name. These tasks were later corrected and reissued with "...redo..." in the name. Rosetta Moderator: Mod.Sense |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Two tasks failing on Windows 7 300429025 sel_core_1.5_low200_beta_low200_nostart_hb_t297__IGNORE_THE_REST_15865_1407_1 300429024 resa_sel_core_1.5_low200_beta_low200_nostart_hb_t313__IGNORE_THE_REST_16299_98_1 both with the error ERROR: res1 != res2 ERROR:: Exit from: ....srccorekinematicsFoldTree.cc line: 2342 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,518,559 RAC: 10,612 |
Two tasks failing on Windows 7 My W7 laptop is error-free, but my Vista desktop had a few of the same errors: sel_core_1.5_low200_beta_low200_nostart_hb_t297__IGNORE_THE_REST_15865_1075_0 sel_core_1.5_low200_beta_low200_nostart_hb_t313__IGNORE_THE_REST_15870_2182_1 sel_core_1.5_low200_beta_low200_nostart_hb_t313__IGNORE_THE_REST_15870_9716_1 All other WUs are fine. |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Some more failures on Win 7, all with the res1 != res2 error resa_sel_core_1.5_low200_beta_low200_nostart_hb_t331__IGNORE_THE_REST_16303_113_1 rsel_core_1.5_low200_beta_low200_nostart_hb_t297__IGNORE_THE_REST_15865_7931_1 rsel_core_1.5_low200_beta_low200_nostart_hb_t313__IGNORE_THE_REST_15870_8110_1 and this one sel_core_1.5_low200_beta_low200_nostart_hb_t297__IGNORE_THE_REST_15865_7946_0 which gave the same res1 != res2 error but ran for half an hour and returned an error status of success. Again, it seems it's those tasks with t331 and t297 in their names that are causing problems. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Mod Sense, Here you go. These are the only ones still in my list. Credit was about normal, mostly get less than claimed anyway. Don't think there was any double headers as you call them, some may have restarted. =============================================================== This one did 135 models. - CC_101.22 / GC_83.32 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=275075567 --------------------------------------------------------------- This did 112. - CC_101.69 / GC_81.83 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=274576093 --------------------------------------------------------------- This did 153. - CC_102.90 / GC_86.03 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=273886738 --------------------------------------------------------------- This did 116. - CC_103.40 / GC_85.25 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=273350890 All with mini 2.00, i've had some with older versions to. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
A few recent minirosetta 2.00 workunits that went beyond the usual 100 decoys limit: https://boinc.bakerlab.org/rosetta/result.php?resultid=301662056 https://boinc.bakerlab.org/rosetta/result.php?resultid=301552794 https://boinc.bakerlab.org/rosetta/result.php?resultid=301231750 https://boinc.bakerlab.org/rosetta/result.php?resultid=301041709 https://boinc.bakerlab.org/rosetta/result.php?resultid=300975639 https://boinc.bakerlab.org/rosetta/result.php?resultid=300923679 https://boinc.bakerlab.org/rosetta/result.php?resultid=300923678 https://boinc.bakerlab.org/rosetta/result.php?resultid=300745181 https://boinc.bakerlab.org/rosetta/result.php?resultid=300695511 https://boinc.bakerlab.org/rosetta/result.php?resultid=300688521 https://boinc.bakerlab.org/rosetta/result.php?resultid=300574451 https://boinc.bakerlab.org/rosetta/result.php?resultid=300412255 https://boinc.bakerlab.org/rosetta/result.php?resultid=300278675 https://boinc.bakerlab.org/rosetta/result.php?resultid=300272462 No definite problem; those that got less credit than usual also used less CPU time than usual. |
aguiar@carrier.com.br Send message Joined: 19 Feb 06 Posts: 6 Credit: 367,089 RAC: 0 |
Good morning! I have WU 3gbm_3g0l_0264_revert.php_dock_rmsd.xml__16270_181_1 now elapsed 13:25:10 with 0.789% progress. Should I let it go or delete it? Thanks, Valter Aguiar Brazil. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,518,559 RAC: 10,612 |
I have WU 3gbm_3g0l_0264_revert.php_dock_rmsd.xml__16270_181_1 now elapsed 13:25:10 with 0.789% progress. Should I let it go or delete it? With a 3-hour default runtime the watchdog ought to have closed it down already, but if you click properties on that WU I would expect the CPU time is minimal, so something seems to have stalled with that one. I'd abort it and hope the next person that picks it up has more success with it. |
aguiar@carrier.com.br Send message Joined: 19 Feb 06 Posts: 6 Credit: 367,089 RAC: 0 |
Done, thanks. You were right, only 3 min of CPU time. Valter. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
...and so it becomes a question of whether your machine has something else going on at a higher priority that is causing BOINC not to get any CPU time? Or is there a problem with BOINC or the task? All other things being equal, starting a new task would also be impacted by other activity on the system (assuming the other activity is still running). Is your next task running normally? (i.e. check properties or task manager and see how many actual CPU seconds it has now used). [edit] I don't see this task in your results and off-hand, the naming doesn't look like a Rosetta task. Can you post a link? Rosetta Moderator: Mod.Sense |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,518,559 RAC: 10,612 |
...and so it becomes a question of whether your machine has something else going on at a higher priority that is causing BOINC not to get any CPU time? Or is there a problem with BOINC or the task? It appears to be this one: 3gbm_3g0l_0264_revert.pdb_dock_rmsd.xml__16270_181_1 I've seen this kind of thing very occasionally, even while other WUs appear to be running fine. In this case Valter appears to have been the wingman where the original cruncher failed as well. |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
I've had a couple of tasks with names like 3a9bB* fail on Windows 7. In both cases I had to abort them as no progress was being made, even though they weren't getting any CPU time. My wingman in both cases successfully completed the tasks, one on Max OS X and the other on Win XP. The first one's reported above: the second is 271436170 |
AdeB Send message Joined: 12 Dec 06 Posts: 45 Credit: 4,428,086 RAC: 0 |
Validate errors in workunits with the name: mix_score13_hb_rlbd_1ttz__IGNORE_THE_RESTlr13_DECOY_16324_* - 1. ---------------------------------------------------------- Task: 303144429 Workunit: mix_score13_hb_rlbd_1ttz__IGNORE_THE_RESTlr13_DECOY_16324_936_0 CPU time: 85.64598 stderr out: ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Fullatom mode .. # cpu_run_time_pref: 43200 ====================================================== DONE :: 1 starting structures 1201 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish - 2. ---------------------------------------------------------- Task: 302775198 Workunit: mix_score13_hb_rlbd_1ttz__IGNORE_THE_RESTlr13_DECOY_16324_508_1 CPU time: 75.6415 stderr out: ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Fullatom mode .. # cpu_run_time_pref: 43200 ====================================================== DONE :: 1 starting structures 1201 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish AdeB |
Yifan Song Volunteer moderator Project developer Project scientist Send message Joined: 26 May 09 Posts: 62 Credit: 7,322 RAC: 0 |
Validate errors in workunits with the name: mix_score13_hb_rlbd_1ttz__IGNORE_THE_RESTlr13_DECOY_16324_* Thanks! There was a bug when we combine lr5, 8, 10 and 13 to make a large test. As a result, a few lr13 ones end up with too small input file and running too fast for the validation server. This should be fixed soon. |
Message boards :
Number crunching :
Minirosetta 2.00
©2024 University of Washington
https://www.bakerlab.org