1)
Message boards :
Number crunching :
Stalled WU
(Message 105907)
Posted 11 Apr 2022 by ![]() Post: Which leads one to assume there might be something peculiar with the Rosetta VBox tasks themselves... ? Hi Yup - it certainly seems like that :-( You'da thought that a "tech admin" would be overseeing the results returned, would have recognised that a certain percentage were taking far too long to be reported and would be actively figuring out there was a problem and would fix it. Instead, the situation seems to be that volunteers computers are wasting time, money and electricity, by spinning their wheels, due to Rosetta's poor and inefficient management of the tasks they make available. :-( |
2)
Message boards :
Number crunching :
Stalled WU
(Message 105901)
Posted 10 Apr 2022 by ![]() Post: Look at the difference between CPU time and elapsed time, either there is something serious running alongside Boinc or, far more likely, those tasks are dead. Hi Thanks for the feedback. :-) I've seen this sort of behaviour before with other non-VBox projects and usually the rule of thumb is to "leave them be" and they will (eventually) complete... But I've not had this happen with Rosetta's VBox tasks before - and indeed I have one other host, with the same OS (Win 7 Pro), the same VBox version and the same version of BOINC Manager, and that has been fairly rattling through the tasks...and both hosts have plenty of installed, working RAM - and no other significant non-BOINC tasks are taking place simultaneously. eg: One VBoxHeadless.exe is taking up 71Mb, the other is at 39Mb and VirtualBox.exe is taking up 18.5Mb - which are minute amounts of RAM in the grand scheme of things... So, it might be my old CPU on this one host could be "past it" - maybe the right CPU "core-functions" are not up to the mark ...but it works fine with LHC and QuChem VBox tasks... Which leads one to assume there might be something peculiar with the Rosetta VBox tasks themselves... ? |
3)
Message boards :
Number crunching :
Stalled WU
(Message 105898)
Posted 10 Apr 2022 by ![]() Post: Hi all I don't think I have "stalled" tasks - as the %age work done is still increase - but they are taking AGES to complete... task #1 Application - rosetta python projects 1.03 (vbox64) Name - aagb-NMPHE_pp-NMVAL-GGLY-mACPenC12C_pp_7_2674773_4 State - Running Received - 08/04/2022 00:41:54 Report deadline - 11/04/2022 00:41:56 Estimated computation size - 80,000 GFLOPs CPU time - 00:34:13 CPU time since checkpoint - 00:00:06 Elapsed time - 1d 20:46:22 Estimated time remaining - 01:06:58 Fraction done - 97.568% Virtual memory size - 101.57 MB Working set size - 2.79 GB Directory - slots/3 Process ID - 5000 Progress rate - 2.160% per hour Executable - vboxwrapper_26203_windows_x86_64.exe ========= tasks #2 Application - rosetta python projects 1.03 (vbox64) Name - aagb-mAZE-mPHE-GPN-mB3PHG_pp_9_2612326_4 State - Running Received - 08/04/2022 00:41:11 Report deadline - 11/04/2022 00:41:13 Estimated computation size - 80,000 GFLOPs CPU time - 00:37:56 CPU time since checkpoint - 00:00:06 Elapsed time - 2d 02:10:06 Estimated time remaining - 00:47:31 Fraction done - 98.446% Virtual memory size - 101.04 MB Working set size - 2.79 GB Directory - slots/1 Process ID - 7280 Progress rate - 1.800% per hour Executable - vboxwrapper_26203_windows_x86_64.exe And from Task Manager Is ee that CPU usage fluctuates between 0% and maybe 1% This is very much a waste of computing time, if the tasks are not actually doing much...but I don't want to abort them, if the task is going to complete and the "result" file is of benefit... Maybe some admin can provide more succinct answers as to why this is happening, as others seems to ahev reported similar issues with what appear to be "zombie" tasks.,. |
4)
Message boards :
Number crunching :
Turn off Virtualbox task from host details
(Message 105857)
Posted 7 Apr 2022 by ![]() Post: Hi all If you do not want to crunch VBox tasks on Rosetta then this might help: You can STOP VBox tasks from being crunched...so you can then just crunch the standard Rosetta tasks. Go to your account on Rosetta website: Click on: Computers on this account > "View" and then for each computer (host) listed, click on "Details" and at the bottom of the list it says: "VirtualBox VM jobs" Change it to "Allow" and your account will say: Host updated This host will no longer receive new VirtualBox VM jobs Do this for each computer. That's it !! (Oh and click on "Rosetta" project and then "Update" within BOINC Manager too, so it knows your new settings). |
5)
Message boards :
Cafe Rosetta :
Kings Distributed Systems - Alpha Registration
(Message 87450)
Posted 3 Oct 2017 by ![]() Post: B8Ub8XjZgn9u5QtbrjST0E3V3hcVG1Bq |
6)
Message boards :
Cafe Rosetta :
Kings Distributed Systems - Alpha Registration
(Message 87387)
Posted 26 Sep 2017 by ![]() Post: n4WcRh6jghkjaD87YPFqh2SAnNZdq-8T |
7)
Message boards :
Cafe Rosetta :
Kings Distributed Systems - Alpha Registration
(Message 87358)
Posted 24 Sep 2017 by ![]() Post: mN49nzCRR4tf1HNUIM0BOBG6zh-vf1y1 |
8)
Message boards :
Number crunching :
Miscellaneous Work Unit Errors
(Message 12802)
Posted 29 Mar 2006 by ![]() Post: And I've just had the following errors: Hi, Thanks for the reply. So, in this day and age of service to customers, why doesn't the error message say that? (instead of "exit code -529697949") 2nd: from this page: The minimum spec is: Windows XP CPU: 500MHz or higher HDD space: 200MB Memory: 512MB. Think they need to "tweak" this to state: "PER PROCESS". In the meantime, will go back to crunching for other projects. (edit) All the other projects I crunch for don't have any issues with regards to only having 512Mb of memory...! ![]() Oh well..... regards, Tim (Unless some-one's got a spare stick of 512Mb PC2700 memory lying around they might want to donate?) |
9)
Message boards :
Number crunching :
Miscellaneous Work Unit Errors
(Message 12793)
Posted 29 Mar 2006 by ![]() Post: [quoteReport all Work Unit errors on this thread that are NOT - "1%" Hang" "Max Time Exceeded" or other "stuck" or "hung" workuinits [/quote] |
10)
Message boards :
Number crunching :
Help us solve the 1% bug!
(Message 12461)
Posted 21 Mar 2006 by ![]() Post: timbo, above you wrote you changed your prefs to 4 days, if that was the target cpu run time in your ralph@home preferences then the slow movement of percentage and the increasing time to completion is perfectly normal cuz it will run for 4 days with that setting (boinc doesnt know about that project specific option yet, so it cant include it in that prediction, it has to finish some units first to make the prediction more correct and will be far off again if you change the target cpu time) OK - thanks for that info. Had assumed that the option to change pref's meant that the PROJECT ran for 4 days straight - not the actual work unit itself. And besides, I would have thought that if you allowed the WU to have "direct control" over what BOINC is supposed to be doing, (for these 4 days), then that must impact other WU that you will be crunching for. So, will BOINC get in a "tizz" if you work on 4 day long Rosetta WU's and you have other WU from other projects "waiting and getting close or past their deadlines..... It's nice for the project to give users that amount of control, but I think it's a bit too much....! BTW: Didn't the problem of these 1% WU's occur sometime around the time Rosetta allowed users to change these exact preferences...? I've crunched quite a few Rosetta WU's and never really had a problem until recently. regards, Tim |
11)
Message boards :
Number crunching :
Help us solve the 1% bug!
(Message 12441)
Posted 21 Mar 2006 by ![]() Post: This is getting stranger. OK - so the 1st WU is now at 4 hr 27 mins of CPU time and the Progress is now at 4.56% Completion time was around 8 hr 30 m, but now reads: 12 hrs 24m !!! The 2nd WU is now at 4 hr 47 mins and 4.75% with a completion time of 12 hrs 25m (was about 8 hr 30m) In both cases, the graphics in the "Searching..." box *is* moving: with both 1st WU and 2nd WU, the graphics seem to "settle down" for a bit (with the shapes in both boxes being "similar"). The bottom right numbers change slowly. After a short while, in the "Searching..." box, the graphic then starts moving more rapidly. This corresponds to an faster rate of change of the numbers in the bottom right. Will let them continue and see what happens over the next 24 hours...! regards, Tim (edit) typo |
12)
Message boards :
Number crunching :
Help us solve the 1% bug!
(Message 12420)
Posted 21 Mar 2006 by ![]() Post: This is getting stranger. After about 14 minutes total crunching time, the 1st WU: (HB_BARCODE_30_1bk2__352_137_0 using rosetta_beta version 493) has now changed to 0.178% progress (on the graphics screen) and is now stuck again. After 34 minutes crunching time the 2nd WU (HB_BARCODE_30_5croA_352_136_0 using rosetta_beta version 493) is still at 2.35%. Will let these carry on for an hour or so and report back then. regards, Tim (edit) added WU Names |
13)
Message boards :
Number crunching :
Help us solve the 1% bug!
(Message 12419)
Posted 21 Mar 2006 by ![]() Post: Have now shut-down BOINC and going to "play" a bit with my "project prefs" OK - changed my project prefs from default to max - 50, 50 and 4 days. Also set my BOINC prefs to "pre-empted". Have also set computer to "visible" if it helps. Restarted BOINC. RALPH WU's are the only ones I have working. Immmediately, when BOINC restarted, the very 1st WU reset the crunched time to zero, but still showing 1% progress. Did a manual update of the project. Still the same. The 2nd WU is now on 2.35% (was 2.34%). But hasn't moved at all from there for the last 5 minutes. In "desparation mode", I've tried to suspend/resume various WU's in the hope of either causing a "computation error" or to at least to get a WU to move off from the 1%. So far, nothing has changed.....! In both cases, the CPU time (for RALPH WU's) is continuing to increase - it's just the "Progress" that stays stuck - if it weren't for that, you'd think all was well!! regards, Tim PS: System is: CPU: Pentium 4, inc HT @ 3.06GHz (not overclocked) Memory: 512Mb OS: Windows XP + SP2 HDD: 24Gb free space Graphics: Radeon 9500 Pro BOINC: v5.2.13 (standard, not optimised) All other projects crunch OK. (edit) added BOINC version |
14)
Message boards :
Number crunching :
Help us solve the 1% bug!
(Message 12418)
Posted 21 Mar 2006 by ![]() Post: Have started some RALPH units. Having just wrote the last msg, I thought what the heck !! Need to experiment to help you guys. So, I went back to BOINC and sure enough, only one of the 2 WU's was still at 1% - the other one has jumped up to 2.34%. But it's got stuck again. So, I suspended the 1% and allowed BOINC to switch to the next RALPH WU. Upon starting it immediately went to 1%....and stuck! So, suspended that one and allowed a 4th WU to start. And that went straight to 1% and stuck. Same with 5th and now 6th. Have now shut-down BOINC and going to "play" a bit with my "project prefs". regards, Tim |
15)
Message boards :
Number crunching :
Help us solve the 1% bug!
(Message 12417)
Posted 21 Mar 2006 by ![]() Post: Please--if you have frequent occurrences of the 1% bug--it would help us enormously to solve it if you could sign up for RALPH@home. Rom can then identifiy the exact lines of code where the problem is ocurring and it will be easy to fix from there. the problem is that many machines don't have this problem, and they can't help us to track it down and solve it. OK David, Have started some RALPH units. And what's happening you ask??? The first two (I have a P4/HT) have both got "stuck" at 1%. Checked the graphics - having re-installed BOINC as a single-user - and the time is increasing nicely, as it should, the pictures are real pretty and crunching seems to be taking place, but the 1% is not moving...!. What do I do now? Abort these 2 and see what happens with the next couple of WU's Suspend them and see what happens with the next 2. Give up? regards, Tim |
16)
Message boards :
Number crunching :
Help us solve the 1% bug!
(Message 12254)
Posted 19 Mar 2006 by ![]() Post: But there are more users that are not farmers and that is why I suggest people use the Display function to look at the graphic. But, if like me, you've installed BOINC as a service, the display option is NOT available. I've had to re-install BOINC as a single-user, in order to figure out why Rosetta was messing around and failing to complete WU's. (Luckily, I'm very PC literate, so this wasn't a problem - but for some newbies, who have joined this project and THINK they are doing useful work - for them, this could be a real deal breaker, if the project doesn't sort itself out - although with Rom doing his bit now, I have much greater faith that this will be resolved soon). In the meantime, like others, I've lost faith in any new work that I might download and have now suspended Rosetta and am crunching more for other projects as a result, as I'm not keen on wasting the processing power at my disposal - it's not a lot, but the reason for joining BOINC was to make my PC do work, while the CPU was idle. And having it run Rosetta and not generating useful results is a worse scenario that not having BOINC installed in the first place...! In the meantime, I am going to have to suspend our "Weekend Crunch" next weekend in favour of Rosetta and we'll have to switch our crunching power over to another project, as I cannot accept responsibility for my team to be crunching for a project that cannot provide work units that are consistantly able to be returned. We'll be back supporting you when you have a solution (which I'm sure will happen soon, but maybe not in time for 25th-26th March ! ) regards, Tim |
17)
Message boards :
Number crunching :
Report stuck & aborted WU here please
(Message 12207)
Posted 18 Mar 2006 by ![]() Post: The exciting news is that the Boinc consultant we have hired, Rom, has made an improvement in how Hi David, Well that ties in with an observation I can make, which I've seen once or twice. I have noticed that a Rosetta work units "fail" when my 3GHz P4/HT switches from one project to another - so there seem to issues when the Rosetta process seems to be "suspended" by BOINC as it then switches over to another project - (I'm running BBC CCE as a second simultaneous BOINC project on the same PC - this "switch over problem" tends to occur when one "CPU" switches out of working on a Rosetta WU and then switches over to the CCE WU). Maybe this is a help - but seems Rom is on the right trail. regards Tim (edit) typo |
18)
Message boards :
Number crunching :
Report stuck & aborted WU here please
(Message 12154)
Posted 17 Mar 2006 by ![]() Post: This project used to be bullet-proof - what's changed....? GREAT NEWS - Perhaps it might be an idea to let people know there is a problem and to stop making work available until it's fixed - that'll take the pressure off you guys. What sort of percentage of the work returned to you is being trashed by this bug? Would imagine it's fairly high - although, if it was an epidemic failure, would assume you would have stopped sending out work before now. But surely you must be going into damage limitation mode by now. Can you afford to lose lots of crunchers? regards and good luck. Tim |
19)
Message boards :
Number crunching :
Report stuck & aborted WU here please
(Message 12113)
Posted 16 Mar 2006 by ![]() Post: This WU was stuck at 1% for a day - then started "on it's own" - got to about 70% done and then BOINC switched over to one of the other projects I'm running (BBC CCE) and immediately I got a "computation error" and the percentage went to 100%. 16/03/2006 23:19:42|rosetta@home|Unrecoverable error for result FA_RLXdh_hom025_1dhn__360_62_0 ( - exit code -164 (0xffffff5c)) So, that more wasted CPU cycles. This project used to be bullet-proof - what's changed....? regards, Tim PS - Our team with 430+ overall members (and at least 130 already joined up to Rosetta) were going to be concentrating on Rosetta for a "Crunching Weekend" on 25th-26th March. see here: http://www.ukboincteam.org.uk/uk-boinc-team.html If this project doesn't get sorted REAL QUICK, we'll be forced to switch our attentions to a different project....! |
20)
Message boards :
Number crunching :
Miscellaneous Work Unit Errors
(Message 12052)
Posted 15 Mar 2006 by ![]() Post: Report all Work Unit errors on this thread that are NOT - 15/03/2006 00:31:28|rosetta@home|Unrecoverable error for result FA_RLXcc_hom003_1cc8A_359_158_0 ( - exit code -164 (0xffffff5c)) 15/03/2006 03:04:02|rosetta@home|Unrecoverable error for result FA_RLXbq_hom005_1bq9A_359_158_0 ( - exit code -1073741819 (0xc0000005)) 15/03/2006 11:26:32|rosetta@home|Unrecoverable error for result FA_RLXbk_hom002_1bk2__359_459_0 ( - exit code -164 (0xffffff5c)) Not too happy about getting these errors - but grateful to the project if they can fix it so that all WU's are good and can return useful results. regards, Tim ![]() ![]() |
©2023 University of Washington
https://www.bakerlab.org