Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 . . . 55 · Next

AuthorMessage
Diogo Azevedo

Send message
Joined: 30 Mar 09
Posts: 7
Credit: 11,321
RAC: 0
Message 72198 - Posted: 21 Jan 2012, 13:23:52 UTC - in response to Message 72195.  

All we can get is a static black screen showing scientific sub-frames involved ('showing', 'accepted', 'low energy', etc.) but...with no active figures at all.


Interesting ... Do you know what work units you are working on when you see visible panels but no content? The intermittentcy that JohnB indicates points to a possible per-work-unit issue (you get the blank screens for some work units, but not for others). As we need to insert hooks into protocols to get the screensaver to update, some of the workunits may be using a new protocol which might not have been properly updated with those hooks, and so the screensaver runs but doesn't show any data.


Hello again. And thanks again too. :-)
It seems that "ab_11_29_optpps_T6241_optpps_03_09_35686_237513_0" is working fine now.... Hummmm...lets see if it stays that way... :-)
D_Azevedo
ID: 72198 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Diogo Azevedo

Send message
Joined: 30 Mar 09
Posts: 7
Credit: 11,321
RAC: 0
Message 72199 - Posted: 21 Jan 2012, 15:43:41 UTC

...and 'voilá'... ab_11_29_optpps_T6241_optpps_03_09_35686_237513_0 just crashed the OS window again.... :-( :-(

D_Azevedo
ID: 72199 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Zoness

Send message
Joined: 24 Dec 08
Posts: 1
Credit: 1,980,334
RAC: 804
Message 72203 - Posted: 22 Jan 2012, 9:16:25 UTC

1/22/2012 3:03:06 AM | rosetta@home | Requesting new tasks for CPU
1/22/2012 3:08:13 AM | rosetta@home | Scheduler request failed: Timeout was reached
1/22/2012 3:08:16 AM | | Project communication failed: attempting access to reference site
1/22/2012 3:08:17 AM | | Internet access OK - project servers may be temporarily down.


Any idea of when this will be fixed? No firewall interference or IP blocking on my end and according to the server status page there is plenty of work. Thoughts?
ID: 72203 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 207
Credit: 23,411,864
RAC: 12,136
Message 72205 - Posted: 23 Jan 2012, 12:46:29 UTC

Yes, in the past few days, I also saw periodical problems with the upload / download of files. But they usually are resolved on their own within ~1 day without my intervention...
ID: 72205 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 207
Credit: 23,411,864
RAC: 12,136
Message 72217 - Posted: 25 Jan 2012, 9:55:12 UTC
Last modified: 25 Jan 2012, 9:56:47 UTC

Why in the WUs series like prednorlx_ECH13_hisremstart...
data files ECH13_hisremstart_lrP .... 3mers
and ECH13_hisremstart_lrP_ ..... 9mers
are not archived as as usual?
They are big ~ 8 + 24 = 32 MB per 1 WU
Because of these WUs in the last week Internet traffic has greatly increased(up to 500-1000 Мб per day on CPUs with large number of threads).
What brings great inconvenience for the participants do not have access to unlimited internet connection. Some have to stop the calculation (or switch to other projects)

Standart zip copression reduces the volume of these files somewhere in the 3 times (from ~32 Mb to ~11 Mb).
Good copression (like rar) in the 5 times (from ~32 Mb to ~6 Mb).
ID: 72217 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Flosopher

Send message
Joined: 3 Jun 06
Posts: 1
Credit: 20,125
RAC: 0
Message 72222 - Posted: 25 Jan 2012, 21:21:21 UTC - in response to Message 72217.  

Hey Mad max,

these are my jobs, i apologize for the inconvenience. I've just recently started running some of my calculations on R@H and it looks like I forgot this in my setup scripts. I'll make sure to fix it in future calculations of this type. Again, my apologies, and thank you so much for donating your cpu cycles to help out with our research.

regards,
florian

Why in the WUs series like prednorlx_ECH13_hisremstart...
data files ECH13_hisremstart_lrP .... 3mers
and ECH13_hisremstart_lrP_ ..... 9mers
are not archived as as usual?
They are big ~ 8 + 24 = 32 MB per 1 WU
Because of these WUs in the last week Internet traffic has greatly increased(up to 500-1000 Мб per day on CPUs with large number of threads).
What brings great inconvenience for the participants do not have access to unlimited internet connection. Some have to stop the calculation (or switch to other projects)

Standart zip copression reduces the volume of these files somewhere in the 3 times (from ~32 Mb to ~11 Mb).
Good copression (like rar) in the 5 times (from ~32 Mb to ~6 Mb).


ID: 72222 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 72235 - Posted: 29 Jan 2012, 14:31:18 UTC

Mad Max, thank you for pointing out the impact those large files are having. Since you have many threads running, it tends to mean you have more WUs on-board at a time. The way BOINC works, if ANY of your current tasks needs a file, then the file stays around and is not deleted. This avoids having to download them again, and helps situations where people have lower bandwidth or hours available.

So I just wanted to offer a suggestion that may improve things for you in the short-term, and even longer-term, because a compressed 10MB file is still a lot to download. If you increase the amount of work that your machine keeps on-hand, then you will tend to improve the odds that some ONE of your tasks needs the large file, and so BOINC will keep it around. So I wanted to offer the suggestion that you increase the number of days of additional work that you have set up in your BOINC Network preferences.

You still have to download new files. And it is not uncommon for a very similar file to be used by later work units. But the above is a simple way to improve your odds of avoiding downloading the same file more than once.

...another way to achieve the result would be to setup a caching proxy server, but that is more involved. But it would keep the file locally even longer. So if BOINC finds itself with no such work units and removes the file, then later gets more of those tasks, it would go through the proxy and find the file there instead of doing a new download.
Rosetta Moderator: Mod.Sense
ID: 72235 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 207
Credit: 23,411,864
RAC: 12,136
Message 72247 - Posted: 31 Jan 2012, 1:15:58 UTC

Actually for me it does not pose any problems. (Because my personal computer has only 2 cores and I have unlimited Internet connection). With the problems encountered several people from my team(who use limited IC or GSM/3G IC), and then we started to investigate the reason of the increased traffic and found these uncompressed files. (I'm some kind of the coordinator of the Rosetta@Home project in my team).

P.S.
I know all the traffic-saving methods that you described. And they are already described in our (team) FAQ. :)
ID: 72247 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 72253 - Posted: 1 Feb 2012, 13:10:51 UTC

Scott posted:

I haven't gotten any work units to crunch in a long time (weeks). I am now running Social Docking on Facebook (http://socialdocking.appspot.com/) until I get work units again. I have repeatedly tried "reset project" and nothing happens. I thought what happened was Christmas break but that should now be long over. If someone has an answer to this, please email me at: xxxxxxxxxxxxx

If this project is now over, I would appreciate an email telling me it is so I can move to help another distributed computing project. Thanks.


I removed his original post because it contained an EMail address. Robots scan these boards constantly hoping to find new EMails for SPAM, so please do not post EMail addresses.

Scott, as you can see from the project teraflops numbers and status indications on the homepage, the project is very much alive and active. Since the problem seems unique to your machine, please open a new thread and describe more about what BOINC is telling you about the situation. I'm sure we can get it figured out. But there must be some additional clues somewhere that point to the problem.
Rosetta Moderator: Mod.Sense
ID: 72253 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Scott Jensen

Send message
Joined: 29 Oct 11
Posts: 9
Credit: 1,264,175
RAC: 0
Message 72254 - Posted: 1 Feb 2012, 16:58:55 UTC - in response to Message 72253.  

Hi Mod.Sense,

It seems to have fixed itself. It is now running as normal. Wasn't for weeks, but now is. If there is some log that I can tap and send you that tracks this sort of thing, let me know.

Scott
ID: 72254 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Daniel Kohn

Send message
Joined: 30 Dec 05
Posts: 18
Credit: 2,899,939
RAC: 0
Message 72255 - Posted: 1 Feb 2012, 18:21:49 UTC - in response to Message 72170.  

I keep on getting project backoffs from my uploads for a couple days now, and I can't download any new workunits for the same reason. Is it working for anyone else?



I am having the exact same problem. I have no idea why or how to fix it. Too bad because I have 40 completed work units awaiting upload.
ID: 72255 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Rocco Moretti

Send message
Joined: 18 May 10
Posts: 66
Credit: 585,745
RAC: 0
Message 72257 - Posted: 1 Feb 2012, 19:04:29 UTC

It looks like a number of people are running into connection issues with the Rosetta@home servers. Our sysadmins are aware of it and keeping an eye on things, but as the issues are intermittent and only affecting a subset of people, it will likely be challenging to track down.
ID: 72257 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Panne

Send message
Joined: 28 Jan 12
Posts: 1
Credit: 20,731
RAC: 0
Message 72258 - Posted: 1 Feb 2012, 22:39:27 UTC

Is this coming through, although it does not look lite it?

My slow server, running the 6.12.34 version is working without a glitch.
My fast server, running the 7.0.8 beta version gets "Client error" on every job,
and no "granted credits". However, it still has credits. How is that possible?

Note the "application version ---" at the bottom :s

I'm a BOINC/Rosetta@home newbie, btw.

Please help me out here...

Slow host jobs: 1514568
Fast host jobs: 1514679

Task ID 480398531
Name Hsp27_fnd_noe_rdc_IGNORE_THE_REST_37091_15783_1
Workunit 436086579
Created 30 Jan 2012 13:08:51 UTC
Sent 30 Jan 2012 13:09:21 UTC
Received 1 Feb 2012 22:05:56 UTC
Server state Over
Outcome Client error
Client state New
Exit status 0 (0x0)
Computer ID 1514679
Report deadline 9 Feb 2012 13:09:21 UTC
CPU time 10535.74
stderr out
<core_client_version>7.0.8</core_client_version>
<![CDATA[
<stderr_txt>
[2012- 1-30 23:42:22:] :: BOINC:: Initializing ... ok.
[2012- 1-30 23:42:22:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev46858.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/Hsp27_fnd_noe_rdc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
[2012- 1-31 9:39: 8:] :: BOINC:: Initializing ... ok.
[2012- 1-31 9:39: 8:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev46858.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/Hsp27_fnd_noe_rdc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00002_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00002_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00002_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00002_FragmentSampler__stage4_kk_1 ... success!
Continuing computation from checkpoint: chk_S_00002_FragmentSampler__stage4_kk_2 ... success!
Continuing computation from checkpoint: chk_S_00002_FragmentSampler__stage4_kk_3 ... success!
[2012- 1-31 17:39:22:] :: BOINC:: Initializing ... ok.
[2012- 1-31 17:39:22:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev46858.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/Hsp27_fnd_noe_rdc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
======================================================
DONE :: 8 starting structures 10535.6 cpu seconds
This process generated 8 decoys from 8 attempts
======================================================
BOINC :: WS_max 4.57671e-246

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>
Validate state Invalid
Claimed credit 76.8495048945849
Granted credit 0
application version ---
ID: 72258 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 207
Credit: 23,411,864
RAC: 12,136
Message 72267 - Posted: 3 Feb 2012, 22:27:16 UTC
Last modified: 3 Feb 2012, 22:39:03 UTC

Looks like 7.0.8 version of BOINC is not compatible with Rosetta(or just very buggy?). Here is a machine with a same BOINC version: https://boinc.bakerlab.org/rosetta/results.php?hostid=1513429
Too, are very many error - Compute error
In logs usual error message is "Incorrect function. (0x1) - exit code 1 (0x1)"
ID: 72267 · Rating: 0 · rate: Rate + / Rate - Report as offensive
xnaldax

Send message
Joined: 7 Feb 12
Posts: 1
Credit: 11,216
RAC: 0
Message 72274 - Posted: 8 Feb 2012, 10:05:39 UTC

I have problem with this computer - https://boinc.bakerlab.org/rosetta/results.php?hostid=1516973 - Validate state: Invalid. Others project as Milkyway etc. works fine. Where is the problem? (Bionic version: 6.12.34 for Windows 64-bit).
ID: 72274 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Rocco Moretti

Send message
Joined: 18 May 10
Posts: 66
Credit: 585,745
RAC: 0
Message 72275 - Posted: 8 Feb 2012, 18:59:04 UTC - in response to Message 72274.  

I have problem with this computer - https://boinc.bakerlab.org/rosetta/results.php?hostid=1516973 - Validate state: Invalid. Others project as Milkyway etc. works fine. Where is the problem? (Bionic version: 6.12.34 for Windows 64-bit).


I'd suggest completely uninstalling *all* of your boinc clients, making sure you've removed everything, and then reinstalling. On the same computer, it looks like you're getting results tagged with both client version 6.12.34 and with version 6.10.58. I'm guessing in the dark here, but it might be that you have two boinc client versions installed on the same machine (or had a botched upgrade) and the conflict between the two is causing issues with your Rosetta@home runs.

By the way, the "Validate state: Invalid" is a bit of a red herring. The more relevant line is "Outcome: Client error". If it was really a validation issue, it would be "Outcome: Validate error". (Validate state is listed as invalid because things never got to the point where it could switch to being valid.)
ID: 72275 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Daniel Kohn

Send message
Joined: 30 Dec 05
Posts: 18
Credit: 2,899,939
RAC: 0
Message 72306 - Posted: 14 Feb 2012, 16:40:10 UTC - in response to Message 72275.  
Last modified: 14 Feb 2012, 17:27:56 UTC

I have problem with this computer - https://boinc.bakerlab.org/rosetta/results.php?hostid=1516973 - Validate state: Invalid. Others project as Milkyway etc. works fine. Where is the problem? (Bionic version: 6.12.34 for Windows 64-bit).


I'd suggest completely uninstalling *all* of your boinc clients, making sure you've removed everything, and then reinstalling. On the same computer, it looks like you're getting results tagged with both client version 6.12.34 and with version 6.10.58. I'm guessing in the dark here, but it might be that you have two boinc client versions installed on the same machine (or had a botched upgrade) and the conflict between the two is causing issues with your Rosetta@home runs.

By the way, the "Validate state: Invalid" is a bit of a red herring. The more relevant line is "Outcome: Client error". If it was really a validation issue, it would be "Outcome: Validate error". (Validate state is listed as invalid because things never got to the point where it could switch to being valid.)

I detached from Rosetta and then re-attached. I was able to upgrade to the new Rosetta version and download more work. Almost done crunching 7 tasks. If the upload works when they are complete, I'll be a happy camper.

***update*** My computer finished crunching and is now stuck again trying to upload completed work. It gets to 16Kb uploaded and then stops. Is there something else I can try?
ID: 72306 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 72308 - Posted: 14 Feb 2012, 22:00:15 UTC

Daniel, there are some problems right now with some of the upload servers. Nothing on your end. BOINC Manager will take care of doing retries and getting the uploads completed for you.
Rosetta Moderator: Mod.Sense
ID: 72308 · Rating: 0 · rate: Rate + / Rate - Report as offensive
dondrusco

Send message
Joined: 2 Jan 07
Posts: 3
Credit: 4,772,623
RAC: 0
Message 72381 - Posted: 21 Feb 2012, 7:02:34 UTC
Last modified: 21 Feb 2012, 7:13:36 UTC

Hello,

I found some tasks with compute error status. All tasks start with CASP9_ prefix.

Computer:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1486022

It is Core i5-2500. I don't exactly know, what's wrong, because another PC finished task sucessfully. No OC applied.

dD
ID: 72381 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile LigH
Avatar

Send message
Joined: 7 Sep 09
Posts: 23
Credit: 8,877,275
RAC: 3,712
Message 72386 - Posted: 22 Feb 2012, 6:27:30 UTC - in response to Message 72381.  

Hello,

I found some tasks with compute error status. All tasks start with CASP9_ prefix.


Here too; not all of the CASP9 series, but several. AMD Phenom II X4 945.



Fun and success!

Jobs: holzon + 12angebote
Hobbies: doom9/Gleitz + PlaneShift
ID: 72386 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 . . . 55 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org