Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 . . . 55 · Next
Author | Message |
---|---|
Diogo Azevedo Send message Joined: 30 Mar 09 Posts: 7 Credit: 11,321 RAC: 0 |
All we can get is a static black screen showing scientific sub-frames involved ('showing', 'accepted', 'low energy', etc.) but...with no active figures at all. Hello again. And thanks again too. :-) It seems that "ab_11_29_optpps_T6241_optpps_03_09_35686_237513_0" is working fine now.... Hummmm...lets see if it stays that way... :-) D_Azevedo |
Diogo Azevedo Send message Joined: 30 Mar 09 Posts: 7 Credit: 11,321 RAC: 0 |
...and 'voilá'... ab_11_29_optpps_T6241_optpps_03_09_35686_237513_0 just crashed the OS window again.... :-( :-( D_Azevedo |
Zoness Send message Joined: 24 Dec 08 Posts: 1 Credit: 2,049,191 RAC: 62 |
1/22/2012 3:03:06 AM | rosetta@home | Requesting new tasks for CPU Any idea of when this will be fixed? No firewall interference or IP blocking on my end and according to the server status page there is plenty of work. Thoughts? |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,465,104 RAC: 18,145 |
Yes, in the past few days, I also saw periodical problems with the upload / download of files. But they usually are resolved on their own within ~1 day without my intervention... |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,465,104 RAC: 18,145 |
Why in the WUs series like prednorlx_ECH13_hisremstart... data files ECH13_hisremstart_lrP .... 3mers and ECH13_hisremstart_lrP_ ..... 9mers are not archived as as usual? They are big ~ 8 + 24 = 32 MB per 1 WU Because of these WUs in the last week Internet traffic has greatly increased(up to 500-1000 Мб per day on CPUs with large number of threads). What brings great inconvenience for the participants do not have access to unlimited internet connection. Some have to stop the calculation (or switch to other projects) Standart zip copression reduces the volume of these files somewhere in the 3 times (from ~32 Mb to ~11 Mb). Good copression (like rar) in the 5 times (from ~32 Mb to ~6 Mb). |
Flosopher Send message Joined: 3 Jun 06 Posts: 1 Credit: 20,125 RAC: 0 |
Hey Mad max, these are my jobs, i apologize for the inconvenience. I've just recently started running some of my calculations on R@H and it looks like I forgot this in my setup scripts. I'll make sure to fix it in future calculations of this type. Again, my apologies, and thank you so much for donating your cpu cycles to help out with our research. regards, florian Why in the WUs series like prednorlx_ECH13_hisremstart... |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Mad Max, thank you for pointing out the impact those large files are having. Since you have many threads running, it tends to mean you have more WUs on-board at a time. The way BOINC works, if ANY of your current tasks needs a file, then the file stays around and is not deleted. This avoids having to download them again, and helps situations where people have lower bandwidth or hours available. So I just wanted to offer a suggestion that may improve things for you in the short-term, and even longer-term, because a compressed 10MB file is still a lot to download. If you increase the amount of work that your machine keeps on-hand, then you will tend to improve the odds that some ONE of your tasks needs the large file, and so BOINC will keep it around. So I wanted to offer the suggestion that you increase the number of days of additional work that you have set up in your BOINC Network preferences. You still have to download new files. And it is not uncommon for a very similar file to be used by later work units. But the above is a simple way to improve your odds of avoiding downloading the same file more than once. ...another way to achieve the result would be to setup a caching proxy server, but that is more involved. But it would keep the file locally even longer. So if BOINC finds itself with no such work units and removes the file, then later gets more of those tasks, it would go through the proxy and find the file there instead of doing a new download. Rosetta Moderator: Mod.Sense |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,465,104 RAC: 18,145 |
Actually for me it does not pose any problems. (Because my personal computer has only 2 cores and I have unlimited Internet connection). With the problems encountered several people from my team(who use limited IC or GSM/3G IC), and then we started to investigate the reason of the increased traffic and found these uncompressed files. (I'm some kind of the coordinator of the Rosetta@Home project in my team). P.S. I know all the traffic-saving methods that you described. And they are already described in our (team) FAQ. :) |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Scott posted: I haven't gotten any work units to crunch in a long time (weeks). I am now running Social Docking on Facebook (http://socialdocking.appspot.com/) until I get work units again. I have repeatedly tried "reset project" and nothing happens. I thought what happened was Christmas break but that should now be long over. If someone has an answer to this, please email me at: xxxxxxxxxxxxx I removed his original post because it contained an EMail address. Robots scan these boards constantly hoping to find new EMails for SPAM, so please do not post EMail addresses. Scott, as you can see from the project teraflops numbers and status indications on the homepage, the project is very much alive and active. Since the problem seems unique to your machine, please open a new thread and describe more about what BOINC is telling you about the situation. I'm sure we can get it figured out. But there must be some additional clues somewhere that point to the problem. Rosetta Moderator: Mod.Sense |
Scott Jensen Send message Joined: 29 Oct 11 Posts: 9 Credit: 1,264,175 RAC: 0 |
Hi Mod.Sense, It seems to have fixed itself. It is now running as normal. Wasn't for weeks, but now is. If there is some log that I can tap and send you that tracks this sort of thing, let me know. Scott |
Daniel Kohn Send message Joined: 30 Dec 05 Posts: 18 Credit: 2,899,939 RAC: 0 |
I keep on getting project backoffs from my uploads for a couple days now, and I can't download any new workunits for the same reason. Is it working for anyone else? I am having the exact same problem. I have no idea why or how to fix it. Too bad because I have 40 completed work units awaiting upload. |
Rocco Moretti Send message Joined: 18 May 10 Posts: 66 Credit: 585,745 RAC: 0 |
It looks like a number of people are running into connection issues with the Rosetta@home servers. Our sysadmins are aware of it and keeping an eye on things, but as the issues are intermittent and only affecting a subset of people, it will likely be challenging to track down. |
Panne Send message Joined: 28 Jan 12 Posts: 1 Credit: 20,731 RAC: 0 |
Is this coming through, although it does not look lite it? My slow server, running the 6.12.34 version is working without a glitch. My fast server, running the 7.0.8 beta version gets "Client error" on every job, and no "granted credits". However, it still has credits. How is that possible? Note the "application version ---" at the bottom :s I'm a BOINC/Rosetta@home newbie, btw. Please help me out here... Slow host jobs: 1514568 Fast host jobs: 1514679 Task ID 480398531 Name Hsp27_fnd_noe_rdc_IGNORE_THE_REST_37091_15783_1 Workunit 436086579 Created 30 Jan 2012 13:08:51 UTC Sent 30 Jan 2012 13:09:21 UTC Received 1 Feb 2012 22:05:56 UTC Server state Over Outcome Client error Client state New Exit status 0 (0x0) Computer ID 1514679 Report deadline 9 Feb 2012 13:09:21 UTC CPU time 10535.74 stderr out <core_client_version>7.0.8</core_client_version> <![CDATA[ <stderr_txt> [2012- 1-30 23:42:22:] :: BOINC:: Initializing ... ok. [2012- 1-30 23:42:22:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev46858.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/Hsp27_fnd_noe_rdc.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. [2012- 1-31 9:39: 8:] :: BOINC:: Initializing ... ok. [2012- 1-31 9:39: 8:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev46858.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/Hsp27_fnd_noe_rdc.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Continuing computation from checkpoint: chk_S_00002_FragmentSampler__stage1 ... success! Continuing computation from checkpoint: chk_S_00002_FragmentSampler__stage2 ... success! Continuing computation from checkpoint: chk_S_00002_FragmentSampler__stage3 ... success! Continuing computation from checkpoint: chk_S_00002_FragmentSampler__stage4_kk_1 ... success! Continuing computation from checkpoint: chk_S_00002_FragmentSampler__stage4_kk_2 ... success! Continuing computation from checkpoint: chk_S_00002_FragmentSampler__stage4_kk_3 ... success! [2012- 1-31 17:39:22:] :: BOINC:: Initializing ... ok. [2012- 1-31 17:39:22:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev46858.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/Hsp27_fnd_noe_rdc.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. ====================================================== DONE :: 8 starting structures 10535.6 cpu seconds This process generated 8 decoys from 8 attempts ====================================================== BOINC :: WS_max 4.57671e-246 BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> ]]> Validate state Invalid Claimed credit 76.8495048945849 Granted credit 0 application version --- |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,465,104 RAC: 18,145 |
Looks like 7.0.8 version of BOINC is not compatible with Rosetta(or just very buggy?). Here is a machine with a same BOINC version: https://boinc.bakerlab.org/rosetta/results.php?hostid=1513429 Too, are very many error - Compute error In logs usual error message is "Incorrect function. (0x1) - exit code 1 (0x1)" |
xnaldax Send message Joined: 7 Feb 12 Posts: 1 Credit: 11,216 RAC: 0 |
I have problem with this computer - https://boinc.bakerlab.org/rosetta/results.php?hostid=1516973 - Validate state: Invalid. Others project as Milkyway etc. works fine. Where is the problem? (Bionic version: 6.12.34 for Windows 64-bit). |
Rocco Moretti Send message Joined: 18 May 10 Posts: 66 Credit: 585,745 RAC: 0 |
I have problem with this computer - https://boinc.bakerlab.org/rosetta/results.php?hostid=1516973 - Validate state: Invalid. Others project as Milkyway etc. works fine. Where is the problem? (Bionic version: 6.12.34 for Windows 64-bit). I'd suggest completely uninstalling *all* of your boinc clients, making sure you've removed everything, and then reinstalling. On the same computer, it looks like you're getting results tagged with both client version 6.12.34 and with version 6.10.58. I'm guessing in the dark here, but it might be that you have two boinc client versions installed on the same machine (or had a botched upgrade) and the conflict between the two is causing issues with your Rosetta@home runs. By the way, the "Validate state: Invalid" is a bit of a red herring. The more relevant line is "Outcome: Client error". If it was really a validation issue, it would be "Outcome: Validate error". (Validate state is listed as invalid because things never got to the point where it could switch to being valid.) |
Daniel Kohn Send message Joined: 30 Dec 05 Posts: 18 Credit: 2,899,939 RAC: 0 |
I have problem with this computer - https://boinc.bakerlab.org/rosetta/results.php?hostid=1516973 - Validate state: Invalid. Others project as Milkyway etc. works fine. Where is the problem? (Bionic version: 6.12.34 for Windows 64-bit). I detached from Rosetta and then re-attached. I was able to upgrade to the new Rosetta version and download more work. Almost done crunching 7 tasks. If the upload works when they are complete, I'll be a happy camper. ***update*** My computer finished crunching and is now stuck again trying to upload completed work. It gets to 16Kb uploaded and then stops. Is there something else I can try? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Daniel, there are some problems right now with some of the upload servers. Nothing on your end. BOINC Manager will take care of doing retries and getting the uploads completed for you. Rosetta Moderator: Mod.Sense |
dondrusco Send message Joined: 2 Jan 07 Posts: 3 Credit: 4,772,623 RAC: 0 |
Hello, I found some tasks with compute error status. All tasks start with CASP9_ prefix. Computer: https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1486022 It is Core i5-2500. I don't exactly know, what's wrong, because another PC finished task sucessfully. No OC applied. dD |
LigH Send message Joined: 7 Sep 09 Posts: 25 Credit: 9,241,214 RAC: 0 |
Hello, Here too; not all of the CASP9 series, but several. AMD Phenom II X4 945. Fun and success! Jobs: holzon + 12angebote Hobbies: doom9/Gleitz + PlaneShift |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org