Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 . . . 55 · Next
Author | Message |
---|---|
Andrii Muliar Send message Joined: 10 Nov 05 Posts: 12 Credit: 7,655,243 RAC: 0 |
Is anyone having problems uploading results? Not one of my computers will upload. My Internet access is ok. I noticed that the server status page says that everything is ok. Still nothing will upload. I have this problem today on two my computers with different internet providers. Other projects are working fine except rosetta@home. |
rochester new york Send message Joined: 2 Jul 06 Posts: 2842 Credit: 2,020,043 RAC: 0 |
Is anyone having problems uploading results? Not one of my computers will upload. My Internet access is ok. I noticed that the server status page says that everything is ok. Still nothing will upload. plus the tflop estimate went from over 100 down to 30 |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
While, on occasion, this project has encountered connectivity problems (perhaps a few times a year), it is relatively rare for this project in the distributed processing world. My experience with the communication from the admins of the Rosie project is not good BarryAZ. Even private messages don't get an answer or at times after a week or so. However I find this project very useful and will stick to it, but communication with the crunching community could be a lot better. Greetings, TJ. |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,860,059 RAC: 7,494 |
As Polian has just posted in another thread: Network Outages: As part of the UW's continuing datacenter consolidation, the network topology upon which Rosetta@home is run was changed yesterday. Since that time we've been shaking out the various hiccups that result from changing things in such a busy system. We, the IT crew, apologize for the troubles and will try to get them ironed out as soon as we can. We appreciate your patience and your continued contributions to our research efforts. -KEL From the front page |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Maybe you should try to play a little bit with settings? Go to Preferences > Disk and memory usage. Then check/uncheck "Leave aplications in memory while suspended" and try to decrease time for "Tasks checkpoints to disk every:" (I put here 600 seconds). (...) ...and so it sounds like you are of the understanding that suspending a work unit at any random point-in-time will force a checkpoint to be preserved on disk. It doesn't work that way. Sorta like trying to force a pregnant woman to give birth, best to wait for when the baby is ready. BOINC applications have to write specific, and complex logic into their code to take checkpoints and to be able to properly reestablish themselves from them. Some type of Rosetta work units checkpoint more frequently than others. No settings can force a checkpoint, only prevent them from occurring too frequently (based on your definition provided in the setting for how frequently you want to permit writes to disk). So if an application were trying to checkpoint every 30 seconds, you might set that to 10 minutes or something to not take all of those checkpoints and help your machine run smoother for the way you use it. Rosetta Moderator: Mod.Sense |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
Maybe you should try to play a little bit with settings? Go to Preferences > Disk and memory usage. Then check/uncheck "Leave aplications in memory while suspended" and try to decrease time for "Tasks checkpoints to disk every:" (I put here 600 seconds). (...) I believe that's dependent on which model of computer you are using, and whether the workunits are set up to recover from long delays during timeout checks. For my computers, hibernate/sleep while BOINC is suspended usually allows the workunits to resume properly when I'm ready to resume. I'm not sure I've tried it with Rosetta@Home workunits, though. Turning the computer off, by any means other than sleep/hibernate, removes the possibility of recovering from such a sleep/hibernate. So does anything that removes BOINC and the workunits from memory. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
Is anyone having problems uploading results? Not one of my computers will upload. My Internet access is ok. I noticed that the server status page says that everything is ok. Still nothing will upload. Same here, on both my desktops. At least I am connected to several other BOINC projects as well, so I can get enough workunits from them. Do the people running the Rosetta@Home servers WANT any more workunits run during the next few days? It looks like they don't. |
Cutchet Salvador Send message Joined: 1 Feb 10 Posts: 17 Credit: 10,690,439 RAC: 0 |
J.S. does not believe that it is a fault of his N150, for some time there is WUS that do not generate checkpoints for example: hyb_xx, ebolanator_xx, rb_xx and others.... Therefore whenever I began his N150 again the work will begin from 0, not because he has not kept in memory but because there is no kept, and like that checkpoint we are losing hours of realized work. Today in Barcelona (Catalunya,Catalonia) it is raining very much and there have been several micro-cut of electricity, and the result is that several WUS have begun every time from 0 I have lost till now 23 hours of work. Greetings and patience, Salvador |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
J.S. does not believe that it is a fault of his N150, for some time there is WUS that do not generate checkpoints for example: hyb_xx, ebolanator_xx, rb_xx and others.... You might want to check if your computer has the sleep/hibernate feature. Mine do, and I have added a UPS (uninterruptible power supply) for each, so that they can run from a battery for a short time and then copy the entire contents of the memory to the hard drive. Later, the computer can resume from the saved copy of the memory, instead of a normal reboot. If the workunits were designed properly, they can resume from the point of interruption rather than from the beginning. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
[quote]This won't help you, but I have read somewhere with another project, that hibernating a pc (under windows) will result in strange behavior of BOINC and that thanks error out eventually.(...) Unless something requires a reboot I only hibernate both my computers (those two, that are not always running 24/7), no issues with any of the projects I cruch for. The only thing I learned recently from one timed out WU is that you have to watch if BOINC is not requesting work just in the moment you want to hibernate the computer, Rosetta apparently does not have the "resend lost tasks" feature active. . |
dima Send message Joined: 26 Nov 09 Posts: 2 Credit: 2,689,921 RAC: 0 |
hyb_al_09_bench_3rj8A_SAVE_ALL_OUT_IGNORE_THE_REST_61028_1254 100%, and it doesn't complete. CPU not working. If restart boinc-client service, it work from 0% to 100%, and nothing else. LA - null. I aborted this task. |
dima Send message Joined: 26 Nov 09 Posts: 2 Credit: 2,689,921 RAC: 0 |
hyb_al_09_bench_3rj8A_SAVE_ALL_OUT_IGNORE_THE_REST_61028_1254 the problem persists hyb_al_02_bench_2yeqB_SAVE_ALL_OUT_IGNORE_THE_REST_60648_3065_0 |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
hyb_ai_bench_4adyB_SAVE_ALL_OUT_IGNORE_THE_REST_58035_47 My mac (BOINC 6.12.33) ended with Outcome: Success; Client state: Done; Exit status: 0(0x0) but the following in the stderr out: BOINC:: CPU time: 36269.7s, 14400s + 21600s[2012-10-29 7:18:59:] :: BOINC WARNING! cannot get file size for default.out.gz: could not open file. Output exists: default.out.gz Size: -1 InternalDecoyCount: 0 (GZ) ----- 0 ----- Stream information inconsistent. Writing W_0000001 The watchdog ended it and I received the default one model/20 credits. On my wingman's windows machine the workunit ended with a client error within a few seconds of starting though it should be noted that all 40 of his most recent tasks have failed so his failure might not be related to the workunit. Best, Snags |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,468,240 RAC: 18,058 |
I have tons of errors with Wus hyb_.._bench_ series of Wus Same as described above or in my message a 1.5 month ago: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6055&nowrap=true#73741 I abort all WUs from this series in queue. |
Andrea [E.R.] Send message Joined: 4 Jul 11 Posts: 3 Credit: 180,074 RAC: 0 |
Hi!!! A member of the Boinc.Italy Team reported this error: boinc.bakerlab.org/rosetta/workunit.php?wuid=484405926 (too old, but with the same problem of the following) boinc.bakerlab.org/rosetta/workunit.php?wuid=489955188 The WU was re-sent near the deadline of another cruncher, but wasn't validated after the completion by my team companion. It's a bug??? Thanks!!! |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,860,059 RAC: 7,494 |
Hi!!! I think there's a script that's run daily to pick up these tasks that were handed out close to the deadline and assign credit to them. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
hyb_ai_bench_4adyB_SAVE_ALL_OUT_IGNORE_THE_REST_58035_47 i've been getting that crap on and off in my tasks as well. they come out to just 20 credits when the claim is for 197 or so credits. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
i'm starting to get a little fed up with this projects stupid errors and other odds and ends of incomplete data files and gzip errors and giving me only 20 credits for something that should be 150+ credits just because they don't weant to upgrade their code to fit the new boinc manager program. i also don't understand the lack of communication from this team. they must be hibernating under their desks somewhere or don't know how to write. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I would take the existence of this thread to be contrary to your assertions. Please don't attempt to characterize people you've never met. Rosetta Moderator: Mod.Sense |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,214,786 RAC: 1,319 |
I would take the existence of this thread to be contrary to your assertions. Please don't attempt to characterize people you've never met. True but you must agree that it has been a LONG time that these errors have been happening on a regular basis for some people! It is getting frustrating and even when the Project Admins say they will look into it they say "in a couple of weeks when I have more time". We put our time and energy and MONEY into running our pc's FOR Rosetta and get little to NO help in return when we have problems! You and I had a conversation a while back about how Rosetta is happy the ways things are, things haven't changed and yet we users are STILL hoping for one. Some of us BELIEVE in the idea of Rosetta, some are here for the credits and some are here for other reasons, but whatever the reason when the software 'just works' everywhere else yet works SOOOOO badly here, it is VERY FRUSTRATING for some of us!!! |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org