Message boards : Number crunching : 3.43 is causing pop-ups
Previous · 1 · 2 · 3 · 4 · 5
Author | Message |
---|---|
KWSN - Roger the Shrubber Send message Joined: 16 Sep 07 Posts: 2 Credit: 9,134,942 RAC: 0 |
Doing it to me, too. |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,904,339 RAC: 11,276 |
Seems what standard way to do it exist: http://boinc.berkeley.edu/trac/wiki/PhysicalFileManagement http://boinc.berkeley.edu/trac/wiki/BoincFiles BOINC allow apps to write and read data directly from project folder. And it recommended in situations like with R@H database (Your application uses a large number of files, and you supply these as a single archive that is unpacked by your application) So you need to find how to unpack it in project folder only once (at each app update) and delete after app version(and corresponding database revision) obsolete - since BOINC client does not manage such files automatically. Last is preferably but optional. http://boinc.berkeley.edu/trac/wiki/DeleteFile - standart way to delete old files on BOINC client by command from server If you do not know(find) "elegant" way to unpack database only once per app version - simple workaround is to unpack DB at startup of WU (like it work now) but to project folder instead of slot folder (where files auto deleted after WU end work) and before unpack check if database folder already exist? If exist - skip unpack. need to rename database folder to include DB revision (like minirosetta_database_rev52077 ) - to avoid mixed up files at app version change (when 2 apps versions and database rev. are stored in the same folder for short time) |
Jonathan Brier Send message Joined: 1 Dec 05 Posts: 12 Credit: 2,732,333 RAC: 0 |
Could you guys possibly send abort commands, from the server, for 3.43 jobs? There are many computers that aren't monitored 24x7 by people. I agree with KSMooney about the server task abort. Please setup a process that ensures quality work units only in the pipeline is essential. Removal of bad work units and applications should occur upon their detection. Letting units just "flush out" is not something a nontechnical audience handles well and not removing these issues causes the perception of volunteer computing to suffer. At GridRepublic and Charity Engine we receive complaints and many support tickets when issues are let to flush through projects. Taking advantage of the server cancel mechanism is important for guaranteeing user experience once a problem is detected. This makes our brands suffer and lowers the potential devices available to Rosetta@home and other projects as it can damage the overall system's perspective. Misbehaving work units is a reason why Charity Engine removed Rosetta@home as a back-fill research project. We need projects that ensure only high quality units run on the volunteer machines and any misbehaving units are removed at first sign of an issue. There were instances where work units were not exiting properly on machines and taking up too much RAM... never did figure out the source of the issue. GridRepublic - bringing BOINC mainstream: http://www.gridrepublic.org GridRepublic Fan Page: http://www.facebook.com/GridRepublic Progress Thru Processors Facebook: http://www.facebook.com/progressthruprocessors |
BadThad Send message Joined: 8 Nov 05 Posts: 30 Credit: 71,834,523 RAC: 0 |
Got up this morning to multiple windows on the desktop. I repaired BOINC and reset the project. Now I can't get work. Houston....we have a problem! |
Wayne Mattox Send message Joined: 12 Apr 09 Posts: 1 Credit: 4,880,139 RAC: 0 |
Since no one has stated it - after deleting 3.43 jobs - 3.45 seem to be running fine Thanks |
Jacob Klein Send message Joined: 3 Jul 07 Posts: 15 Credit: 7,098,747 RAC: 0 |
David, Could you please look into why Ralph is still sending out busted 3.43 tasks? http://ralph.bakerlab.org/forum_thread.php?id=539#5615 I've lost days worth of crunching because of this. Also, please consider implementation and execution of a server-side abort. People are uninstalling BOINC, repairing BOINC, writing bug reports, wasting tons of CPU cycles across all projects, all for a problem with your projects that you guys could have already cleaned up if you really wanted to. This is ridiculous. |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,070,625 RAC: 1,219 |
Why not abort these tasks as has been reported as the fix on the front page and all over these forums?Because that isn't really a "solution". You simply forget that not all machines crunching for Rosetta@Home are easily accessible. While I could do just that, aborting the batch of faulty WUs on one of my own hosts, it created a mess on two remote systems for which I had in the past permission to run it on. Not any longer, as those users now felt interrupted and I had to remove BOINC/R@H from those systems once I got on-site... Sorry, if for whatever reason such faulty WUs make it out "into the wild", there needs to be a way to have a server side abort on those, not requiring user interaction. WCG can do this just fine... Ralf |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,904,339 RAC: 11,276 |
David, I am not David, but i can explain. Ralph not actually send more NEW wus to 3.43 app. This is resent after its fails on another computer. (Up to 500 such WUs left) I also received a portion of WUs for 3.43 version. Checking them I was convinced that all they resubmit WUs that hit computers with windows x64 as a first wingman. |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,904,339 RAC: 11,276 |
And I join the request about the remote (server-side) remove of WUs of 3.43 version. Only in our team more than 50 machines total connected to R@H without access(direct or remote) to the management client. And errors with pop-ups windows continue - as far as how correspondig WUs go up in queue. For some computers (with large cache of tasks in BOINC settings) pop-up problem even still in the future! - as they are now completing the last WUs for the 3.41 version. I do not know if there is a regular way for remote(server-side) cancel WUs already issued to clients... But even if it is not, you can just replace the file that is causing the problems (https://boinc.bakerlab.org/rosetta/download/minirosetta_3.43_windows_x86_64.exe) by correct app file and instruct BOINC client(all connected to R@H or just with win x64 OS) to delete local copy (http://boinc.berkeley.edu/trac/wiki/DeleteFile). It will force BOINC to redownload correct app file. |
Bart Send message Joined: 8 Oct 11 Posts: 2 Credit: 476,125 RAC: 0 |
So is 3.4.5 I have suspended and no new jobs. |
Stephen Miller Send message Joined: 18 Sep 05 Posts: 13 Credit: 16,294,215 RAC: 0 |
A BOINC project message would have been a good method to let everyone (whose running a current BOINC anyways) know the announcement that's on the main page. I updated BOINC about the same time Rosetta went into stupid mode and thought it was a new "feature" of BOINC. Today I see that it's a 3.43 issue. Thanks to Microsoft training, we have learned to live with bugs and issues until they are resolved. I'm embarrassed. Never in the field of human conflict was so much owed by so many to so few - Winston Churchill BOINC version: Never in the field of distributed computing was so much wasted by so many for so few credit. |
Doug_Hood Send message Joined: 15 Dec 05 Posts: 2 Credit: 3,416,526 RAC: 0 |
I am just going to say this to any of the moderators/Rosetta staff who might still be listening: If anything like this 3.43 fiasco happens again, I will drop Rosetta. Here is a big F'N hint: BOINC has a message feature. USE IT. If I don't see a message in BOINC the next time that something is wrong with Rosetta, I will delete it immediately from my project list and move on. Period. |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,904,339 RAC: 11,276 |
Stephen Miller Doug_Hood If you mean the Message tab in Boinc client (Notices) it will be nice. But they(R@H staff) can not use this feature. Because version of BOINC software(server side) currently used in the R@H project does not support this feature. They first have to update (replace) all server software to new version. And the project team does not want to do this (as it is quite difficult and potentialy can cause another problems) |
Gary Charpentier Send message Joined: 2 Oct 07 Posts: 3 Credit: 6,474,275 RAC: 171 |
If you mean the Message tab in Boinc client (Notices) it will be nice. But they(R@H staff) can not use this feature. Because version of BOINC software(server side) currently used in the R@H project does not support this feature. They first have to update (replace) all server software to new version. And the project team does not want to do this (as it is quite difficult and potentialy can cause another problems) What are they undergrads? Too busy with coffee and donuts? Do they like security holes? Do they want people mad at them? Can't afford backup storage? |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,536,330 RAC: 6,139 |
If you mean the Message tab in Boinc client (Notices) it will be nice. But they(R@H staff) can not use this feature. Because version of BOINC software(server side) currently used in the R@H project does not support this feature. They first have to update (replace) all server software to new version. And the project team does not want to do this (as it is quite difficult and potentialy can cause another problems) Are you suggesting that you think the server upgrade is trivial? If so, what are you basing that on? |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
Indeed the server code from BOINC with Rosie is outdated. You can see that on several things. You can not see all your different pc's to the project. You can not sort per application, per sort of tasks. The link in BOINC that should bring you to your "home account" doesn't do that. And indeed the message tab is not working. For active crunchers Rosetta can indeed be a pain, with not the best support. However I'm here for the science it does. I'll have a lot of critics to the project as well, but I'll stick with it. Greetings, TJ. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
You can not see all your different pc's to the project. Hmm... I can see all my PCs when I click here, should work for everyone. . |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
You can not see all your different pc's to the project. Of course we can, but that is not what I meant. With the new server code from BOINC you can see at "all tasks "page all the pc's running for the project. Greetings, TJ. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
You can not see all your different pc's to the project. THAT page you mean... yeah, there I don't see them too. . |
Message boards :
Number crunching :
3.43 is causing pop-ups
©2024 University of Washington
https://www.bakerlab.org