Message boards : Number crunching : Please abort WUs with
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · Next
Author | Message |
---|---|
R/B Send message Joined: 8 Dec 05 Posts: 195 Credit: 28,095 RAC: 0 |
None of you understood what I meant. I said the posting on the HOME THREAD alerting people to the problems with the 205 work units.. Not my own postings! Geez. I know how to edit those. Go to rosetta home page and see how it is written. Should say in ALL CAPS not to abort DEFAULT units other than 205 ones.... (edit) This is the posting I was talking about.....From the HOME PAGE. *************************************************** News December 20, 2005 A bad batch of work units were created that can be identified with work unit names that start with "DEFAULT_xxxxx_205_". If you are running one of these work units, please abort it. We will grant credit to those who have run and aborted these work units. Details about this error and recent changes can be found in our Technical News page. Founder of BOINC GROUP - Objectivists - Philosophically minded rational data crunchers. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 7 |
None of you understood what I meant. I said the posting on the HOME THREAD alerting people to the problems with the 205 work units.. We understood what you said - but then the comment was made Only mods can re-write existing postings. and I spoke up to say that no, as far as I know, nobody can modify existing _postings_ (and thread titles, as THIS THREAD originally referred to "205"s) except the person who wrote it (for an hour), and that only STAFF can change the home page, not a mere mod. The home page does say "start with DEFAULT_xxxxx_205_" and not "start with DEFAULT", but I agree that it can be confusing and be taken as "starting with 205 and continuing with other numbers" instead of "starting with this string of characters including specifically the '205'.", and therefore it should be expanded on. However, by the time they are present to change the wording, they'll also be present to delete the WUs, so there's no point... |
Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,514,472 RAC: 0 |
I don't think there is anyone physically _at_ Rosetta today that knows how to kill them; I think that probably would require getting info from someone at SETI that has had to do it a few times. If there's anyone _there_... U.S. colleges are all on break. Well, since BOINC is open-source, it took under a minute to locate http://setiathome2.ssl.berkeley.edu/cgi-bin/cvsweb.cgi/boinc/html/ops/ From a quick look on things, if project has setup the Administrator-pages it's just to login here, select "Cancel workunits" from among the many options, input first and last wu-id, and let the server do it's job of cancelling wu... As for SETI@Home, well they've just had their routinely backup-outage so they're definitely still working, and BOINC-checkins can happen even if it's a holiday... |
rbpeake Send message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0 |
Certainly true, but I think two factors contributed to no checking in: 1. The project is fairly new for them, and I'm guessing they did not fully realize the TLC (tender loving care) it might require; 2. They worked really intensely up to the holidays, and I am guessing that DB said for everyone to take a real vacation/total break and not even think about the project. At any rate, it is only a relatively short period until they are back again next week. :) Regards, Bob P. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 7 |
At any rate, it is only a relatively short period until they are back again next week. :) They haven't completely abandoned ship for the week; I've seen a couple of postings made... I think it's just a matter of not having dealt with all the different crises SETI has dealt with, and being hesitant to do anything without being sure it's not going to make matters worse. I wasn't aware that the server-side stuff could even _be_ web-controlled; I sure wouldn't want to do it for the first time from home or a hotel room or something, without being able to refer to all the notes, and maybe do a backup first... |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
At any rate, it is only a relatively short period until they are back again next week. :) Too right. We are all grateful for the team for taking time out from their break to turn up on these boards. It is kinda like phoning into the office when you are on vacation - effort beyond the call of duty. But just like phoning in, there is only so much you can do by remote. When they are back in the office I am sure we will see real progress on all the outstanding issues. I am hoping the first thing the server admins do is to steal the files off the server that belong to all the bad jobs it keeps recycling, so the server can't send them out again even if it tries to. River~~ |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 7 |
Anyone with a "suspended" DEFAULT_xxxxx_205 please check the webpage for your results, and look at that one - if the "errors" line at the top says "Cancelled", you can unsuspend it and abort it. That will let it get back to the server and be finished. Thanks! |
[B@H] Ray Send message Joined: 20 Sep 05 Posts: 118 Credit: 100,251 RAC: 0 |
226 also has some bad ones, check this one, runs 5 to 8 hours before errering out. INCREASE_CYCLES_10_1ogw_226_937 Reason: Access Violation (0xc0000005) at address 0x006047A8 write attempt to address 0x08567DA4 Exiting... Cheers Ray Pizza@Home Rays Place Rays place Forums |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 7 |
226 also has some bad ones, check this one, runs 5 to 8 hours before errering out. That one does have _some_ problem... but it's not the same as the DEFAULT_xxxx_205's. They error out because of maximum_cpu_time_exceeded. And I've had a number of those "INCREASE_CYCLES" WUs that completed just fine, although on my PC they ran about 4 hours instead of 2. |
rbpeake Send message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0 |
That one does have _some_ problem... but it's not the same as the DEFAULT_xxxx_205's. They error out because of maximum_cpu_time_exceeded. And I've had a number of those "INCREASE_CYCLES" WUs that completed just fine, although on my PC they ran about 4 hours instead of 2. Here is one I got: NO_BARCODE_FRAGS_1ogw_227_2815 <core_client_version>5.2.15</core_client_version> <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> No heartbeat from core client for 31 sec - exiting ***UNHANDLED EXCEPTION**** Reason: Access Violation (0xc0000005) at address 0x7C911BF4 write attempt to address 0x00000000 Exiting... Plus no credit received for slightly over 5 hours of work. Regards, Bob P. |
[B@H] Ray Send message Joined: 20 Sep 05 Posts: 118 Credit: 100,251 RAC: 0 |
226 also has some bad ones, check this one, runs 5 to 8 hours before errering out. I have had other "INCREASE_CYCLES" units that ran good also, but as you say they took a lot longer, between 3 and 6 hours. I don't mind them running longer as long as they finish up. Ray Pizza@Home Rays Place Rays place Forums |
kb7rzf Send message Joined: 7 Oct 05 Posts: 16 Credit: 35,427 RAC: 0 |
I also just got an error for result MORE_FRAGS_1di2_222_4350_0 stderr out <core_client_version>5.2.13</core_client_version> <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> No heartbeat from core client for 31 sec - exiting ***UNHANDLED EXCEPTION**** Reason: Access Violation (0xc0000005) at address 0x7C911E58 read attempt to address 0xBE02E900 Exiting... </stderr_txt> Heres the link to the WU Here edit Heres the message from my messages tab. 1/2/2006 9:08:32 AM|rosetta@home|Resuming result MORE_FRAGS_1di2_222_4350_0 using rosetta version 481 1/2/2006 9:08:32 AM|SETI@home|Pausing result 03no03aa.21211.32001.292318.1.183_3 (left in memory) 1/2/2006 9:29:12 AM|rosetta@home|Unrecoverable error for result MORE_FRAGS_1di2_222_4350_0 ( - exit code -1073741819 (0xc0000005)) 1/2/2006 9:29:13 AM|SETI@home|Result 03no03aa.21211.32001.292318.1.183_3 exited with zero status but no 'finished' file 1/2/2006 9:29:13 AM|SETI@home|If this happens repeatedly you may need to reset the project. 1/2/2006 9:29:13 AM||request_reschedule_cpus: process exited 1/2/2006 9:29:13 AM|rosetta@home|Computation for result MORE_FRAGS_1di2_222_4350_0 finished |
bartsob5&alicjam Send message Joined: 17 Sep 05 Posts: 6 Credit: 183,280 RAC: 0 |
and i'd like to ask, what about WU from NEW_SOFT_CENTROID_PACKING_2reb_225 series? after 9 hours there is still 1%!!!! full name of the workunit is NEW_SOFT_CENTROID_PACKING_2reb_225_3842 |
kb7rzf Send message Joined: 7 Oct 05 Posts: 16 Credit: 35,427 RAC: 0 |
and i'd like to ask, what about WU from NEW_SOFT_CENTROID_PACKING_2reb_225 series? after 9 hours there is still 1%!!!! full name of the workunit is NEW_SOFT_CENTROID_PACKING_2reb_225_3842 I have a WU called NEW_SOFT_CENTROID_PACKING_2reb_225_4338_0, its ran for 29 minutes, and is at 10% done. But then my computer switched to crunch more SETI WU's, so I dunno if its a bad WU or not yet. Will keep an eye on it and see if anything strange happens. Jeremy |
bartsob5&alicjam Send message Joined: 17 Sep 05 Posts: 6 Credit: 183,280 RAC: 0 |
ok.. it has come back to normal after booting PC. so it was only false alarm... now, after 1 hour it has 30% |
bartsob5&alicjam Send message Joined: 17 Sep 05 Posts: 6 Credit: 183,280 RAC: 0 |
and again.. i had a WU named MORE_FRAGS_1ogw_222_4890 and it had error during computing after almost 2hours of computing... as i see, another user had also problem with this WU, but is it another bad series? |
Padanian Send message Joined: 27 Sep 05 Posts: 14 Credit: 15,190 RAC: 0 |
Have a look at this https://boinc.bakerlab.org/rosetta/workunit.php?wuid=3833695 It seems like a recursive computing error. |
O&O Send message Joined: 11 Dec 05 Posts: 25 Credit: 66,900 RAC: 0 |
I aborted DEFAULT_1n0u_218_344_8... For the reason..in bold. Did I do the right thing? O&O |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 7 |
Did I do the right thing? Your computers are hidden, so we can't look at the WU, so no idea... I _think_ so, but would have to look at the web page to be sure. |
O&O Send message Joined: 11 Dec 05 Posts: 25 Credit: 66,900 RAC: 0 |
|
Message boards :
Number crunching :
Please abort WUs with
©2024 University of Washington
https://www.bakerlab.org