Minirosetta 3.14

Message boards : Number crunching : Minirosetta 3.14

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 7 · Next

AuthorMessage
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 70554 - Posted: 15 Jun 2011, 23:51:42 UTC

This update includes a number of new and updated protocols. Please report bugs and issues here.

Details about the new protocols will be posted in separate threads as we start submitting jobs.
ID: 70554 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kenny Frew

Send message
Joined: 16 May 08
Posts: 2
Credit: 98,306
RAC: 0
Message 70556 - Posted: 16 Jun 2011, 1:32:47 UTC - in response to Message 70554.  
Last modified: 16 Jun 2011, 1:33:35 UTC

This update includes a number of new and updated protocols. Please report bugs and issues here.

Details about the new protocols will be posted in separate threads as we start submitting jobs.



Completed w.u. will not upload.
ID: 70556 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
cnick6

Send message
Joined: 30 May 06
Posts: 29
Credit: 12,597,623
RAC: 0
Message 70557 - Posted: 16 Jun 2011, 1:36:01 UTC

Yup, me too. Servers are down.
ID: 70557 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kenny Frew

Send message
Joined: 16 May 08
Posts: 2
Credit: 98,306
RAC: 0
Message 70558 - Posted: 16 Jun 2011, 1:47:48 UTC

Ok - It uploaded manually and it reported. Next is downloading slowly. Thanks
ID: 70558 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 70559 - Posted: 16 Jun 2011, 1:59:54 UTC

The servers are going to be stressed for a bit as people try to download the new application. It should settle with time.
ID: 70559 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 70561 - Posted: 16 Jun 2011, 5:35:33 UTC

Nice to see some progress being done :)

Sorta off-topic, are you guys planning to upgrade y'lls (my southern has stuck to me for life) BOINC version?
ID: 70561 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Samson

Send message
Joined: 23 May 11
Posts: 8
Credit: 257,870
RAC: 0
Message 70562 - Posted: 16 Jun 2011, 7:44:51 UTC

Inquiring minds, me, would like to know what advantages Pi brings us
over 2.17 ?

Is there a list somewhere ?

Also, who's the genius that dubbed this 3.14 ?

:)
ID: 70562 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 70563 - Posted: 16 Jun 2011, 15:17:04 UTC

Chilean, we are not planing to upgrade the BOINC version anytime soon but we will be upgrading the hardware in a few weeks or sooner.

Samson, it is version 3.14 due to many iterations of testing on our testing project, Ralph@home. This update includes a number of new protocols and also more methods (we call them movers) are available in our scripting protocol. We'll explain them in separate threads as they start getting used.

ID: 70563 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 70564 - Posted: 16 Jun 2011, 23:02:16 UTC

Hi.

I just rejoined and i have no graphics for the new app, with a task running the button is greyed out.

Other projects that have graphics are showing O.K. and when i ran here in the past the graphics worked fine, on all rigs.

Rig is Ubuntu 10.04lts x64 & Boinc is 6.10.58 x64.

ID: 70564 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 70565 - Posted: 16 Jun 2011, 23:12:53 UTC

P.P.L.,

Unfortunately we omitted the graphics app for the linux platform on this update because of time constraints. It has been so long since our last graphics app update, that our build machine set up for building the graphics no longer worked with the current version of minirosetta. We'll try to bring it back when we have more time to look into it.
ID: 70565 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael Gould

Send message
Joined: 3 Feb 10
Posts: 39
Credit: 14,612,887
RAC: 6,283
Message 70579 - Posted: 18 Jun 2011, 23:40:45 UTC

Transition was seamless here, looking forward to hearing about new capabilities. Nice job, all.
ID: 70579 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,609,497
RAC: 1,614
Message 70580 - Posted: 18 Jun 2011, 23:55:00 UTC

Since the 3.14 version, I get far more than usual randomly "hanging" WUs and validation errors with no granted credit... :-(

Ralf
ID: 70580 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 70581 - Posted: 19 Jun 2011, 4:04:08 UTC

can you post the names of the workunits that are randomly hanging? the validation errors should eventually get credit.
ID: 70581 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,609,497
RAC: 1,614
Message 70583 - Posted: 19 Jun 2011, 7:18:30 UTC - in response to Message 70581.  

can you post the names of the workunits that are randomly hanging? the validation errors should eventually get credit.
Have to go through the logs of different machines tomorrow morning, seems to be more than one batch.
The worst is that the WU completely locks the BOINC manager out, the jobs don't "release" and switch to another project after the default of 2h as they usually do, blocking the whole system on single core CPUs when I don't notice it... :-(

Here's the worst one of today, hung at about 14%, killed it after +17h runtime, as "normal" jobs run 3-7h on that machine...

casd_sgr145_boinc_3busA_23.nonlocal.pctid_0.11.tmscore_0.70096._nonlocal_tex_IGNORE_THE_REST_27533_890_0

Ralf
ID: 70583 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,609,497
RAC: 1,614
Message 70586 - Posted: 19 Jun 2011, 16:27:09 UTC - in response to Message 70583.  

can you post the names of the workunits that are randomly hanging? the validation errors should eventually get credit.
Have to go through the logs of different machines tomorrow morning, seems to be more than one batch.
The worst is that the WU completely locks the BOINC manager out, the jobs don't "release" and switch to another project after the default of 2h as they usually do, blocking the whole system on single core CPUs when I don't notice it... :-(

Here's the worst one of today, hung at about 14%, killed it after +17h runtime, as "normal" jobs run 3-7h on that machine...

casd_sgr145_boinc_3busA_23.nonlocal.pctid_0.11.tmscore_0.70096._nonlocal_tex_IGNORE_THE_REST_27533_890_0
And this one I just killed this morning, stuck at 46% after 14h39m over night...
ilv_hr41_all_boinc_3h8kA_73.nonlocal.pctid_0.18.tmscore_0.49037._nonlocal_tex_IGNORE_THE_REST_27535_3084
(BOINC the only running apps for days now, on a 2GB RAM machine, no indication of RAM issues)

And this one looks hung too, 15.44% after 5h41m
casd_sr10_boinc_3e0mC_3.nonlocal.pctid_0.52.tmscore_0.68362._nonlocal_tex_IGNORE_THE_REST_27537_2047
(also XPSP3, with 3GB RAM and a couple of apps open (email, web browser) over night, as usual)

All those jobs showed a "time to completion" of about 3h when downloaded, which usually is within 10% high/low of the actual runtime...

Ralf
ID: 70586 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,609,497
RAC: 1,614
Message 70587 - Posted: 19 Jun 2011, 17:41:35 UTC - in response to Message 70586.  

And this one looks hung too, 15.44% after 5h41m
casd_sr10_boinc_3e0mC_3.nonlocal.pctid_0.52.tmscore_0.68362._nonlocal_tex_IGNORE_THE_REST_27537_2047
An hour later, the WU hasn't progressed 1/1000 of a %, only "time to completion" increased now to 14h, far away from the 3:05h runtime estimate when downloaded.
The process minirosetta_3.14_windows_intelx86.exe sits in the task manager using 295108K of RAM and 0% CPU, with +1GB of physical RAM available and a WCG WU running on the second core, with an average CPU usage about 40% while editing this on a Firefox browser tab...

I haven't looked through all the task that hung in the last 3 days or so, but what I referred to so far are 3 within a few hours (over night), while it used to be maybe one within a week/10 days before...

Ralf
ID: 70587 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 70591 - Posted: 20 Jun 2011, 0:23:30 UTC

We'll look into this further of course. In the mean time, I'd recommend suspending the project and aborting the work units that are stuck. Since it seems consistent on your machine, it would help us while debugging to join our Ralph@home project. We'll likely post an update on Ralph soon. Sorry for all the trouble.
ID: 70591 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,806,125
RAC: 3,336
Message 70594 - Posted: 20 Jun 2011, 1:23:05 UTC

Have you thought of creating a test application specifically to gather more information on the computer environment it is running on, then sending one such workunit to each machine known to have a problem with workunits freezing? No objection if it then goes on to attempt to run a normal workunit afterwards, possibly with more debugging output than usual enabled.

You'd probably also want to send such workunits to a variety of other computers, to gather outputs for comparison to those from the problem computers.

For example, if it is able to capture the line from the BOINC manager log file describing the CPU capabilities, and perhaps those describing GPU capabilities, just those lines should offer a good starting point in deciding what to look for, if it happens to be something related to matching up properly to the CPU type.
ID: 70594 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,609,497
RAC: 1,614
Message 70596 - Posted: 20 Jun 2011, 1:46:11 UTC - in response to Message 70591.  

We'll look into this further of course. In the mean time, I'd recommend suspending the project and aborting the work units that are stuck. Since it seems consistent on your machine, it would help us while debugging to join our Ralph@home project. We'll likely post an update on Ralph soon. Sorry for all the trouble.
Had two more, "freezing" at 1.6% after 1:36h/1:40h, aborted both. Supended Rosetta@Home on this machine, then joined Ralph@Home on this machine, but got so far only

>6/19/2011 6:37:53 PM ralph@home Message from server: No work sent

What's next?

Ralf

ID: 70596 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,609,497
RAC: 1,614
Message 70597 - Posted: 20 Jun 2011, 1:55:43 UTC - in response to Message 70594.  

Have you thought of creating a test application specifically to gather more information on the computer environment it is running on, then sending one such workunit to each machine known to have a problem with workunits freezing? No objection if it then goes on to attempt to run a normal workunit afterwards, possibly with more debugging output than usual enabled.

You'd probably also want to send such workunits to a variety of other computers, to gather outputs for comparison to those from the problem computers.

For example, if it is able to capture the line from the BOINC manager log file describing the CPU capabilities, and perhaps those describing GPU capabilities, just those lines should offer a good starting point in deciding what to look for, if it happens to be something related to matching up properly to the CPU type.
I don't see anything about the specs of the machines that would give a direct indication. It happened the most recently on 3 different ones:
- P4 2.8Mhz (single core), 2GB RAM, just sitting idle most of the time as it is my Windows 2008 test server
- Vaio notebook, also sitting idle most of the time as the build-in keyboard is reluctant to work and I need to use an external one, Pentium M760 2GHz, 2GB RAM
- my main work computer, Core 2 Duo 6300@1.866GHz, 4GB RAM (3GB avail under XPSP3)

The last one has the most "freezes", but always plenty of CPU and RAM to spare (average 1GB physical RAM free).

Ralf
ID: 70597 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 . . . 7 · Next

Message boards : Number crunching : Minirosetta 3.14



©2024 University of Washington
https://www.bakerlab.org