Problems with Minirosetta Version 1.67

Message boards : Number crunching : Problems with Minirosetta Version 1.67

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 11,805,838
RAC: 0
Message 61139 - Posted: 12 May 2009, 19:21:25 UTC

I spoke too soon when thinking that updating BOINC to the latest version might have fixed the termination problem: another task 250685314 has just failed in the same way.

Crashed executable name: minirosetta_1.67_i686-apple-darwin
built using BOINC library version 6.5.0
Machine type Intel 80486 (32-bit executable)
System version: Macintosh OS 10.4.11 build 8S2167
Tue May 12 09:37:40 2009

Thread 0 Crashed:
0 ...etta_1.67_i686-apple-darwin 0x00efc8bd __ZN7utility7signals9SignalHubIvN4core12conformation7signals16DestructionEventEE11send_signalES5_ + 1701
1 ...etta_1.67_i686-apple-darwin 0x0002a4a7 __ZN4core12conformation12ConformationD1Ev + 7373
2 ...etta_1.67_i686-apple-darwin 0x000910d0 __ZN4core4pose4PoseD1Ev + 4652
3 ...etta_1.67_i686-apple-darwin 0x00518bdc __ZN9protocols3jd214JobDistributor2goEN7utility7pointer10owning_ptrINS_5moves5MoverEEE + 1730
4 ...etta_1.67_i686-apple-darwin 0x00b59c20 __ZN9protocols3jd219BOINCJobDistributor2goEN7utility7pointer10owning_ptrINS_5moves5MoverEEE + 42
5 ...etta_1.67_i686-apple-darwin 0x0013b068 __ZN9protocols8abinitio24Loopbuild_Threading_mainEv + 720
6 ...etta_1.67_i686-apple-darwin 0x00005db8 _main + 7640
7 ...etta_1.67_i686-apple-darwin 0x0000292e __start + 216
8 ...etta_1.67_i686-apple-darwin 0x00002855 start + 41


ID: 61139 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TomaszPawel

Send message
Joined: 28 Apr 07
Posts: 54
Credit: 2,791,145
RAC: 0
Message 61142 - Posted: 12 May 2009, 20:40:05 UTC - in response to Message 61139.  

https://boinc.bakerlab.org/rosetta/result.php?resultid=250780868

iRp40_S_3d85_ip40_1zzo.pdb.gz_00000010_fa_dock.xml_score12_pert38_DOCK_11891_476_1


"The validator error has been found: Our data format was changed and the validator was not updated. We're doing that now. "

More details please?
WWW of Polish National Team - Join! Crunch! Win!
ID: 61142 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5659
Credit: 5,694,012
RAC: 1,911
Message 61143 - Posted: 12 May 2009, 20:57:18 UTC

download issues with parts of a task

5/12/2009 8:10:43 PM rosetta@home Requesting new tasks
5/12/2009 8:10:48 PM rosetta@home Scheduler request completed: got 1 new tasks
5/12/2009 8:10:50 PM rosetta@home Started download of threading_lb_reference.loopbuild.t360_.mtyka.flags.boinc
5/12/2009 8:10:50 PM rosetta@home Started download of threading_lb_reference.loopbuild.t360_.mtyka.boinc_files.zip
5/12/2009 8:11:00 PM Project communication failed: attempting access to reference site
5/12/2009 8:11:00 PM rosetta@home Temporarily failed download of threading_lb_reference.loopbuild.t360_.mtyka.boinc_files.zip: HTTP error
5/12/2009 8:11:02 PM Internet access OK - project servers may be temporarily down.
5/12/2009 8:11:02 PM rosetta@home [error] File threading_lb_reference.loopbuild.t360_.mtyka.boinc_files.zip has wrong size: expected 6045722, got 0
5/12/2009 8:11:02 PM rosetta@home Started download of threading_lb_reference.loopbuild.t360_.mtyka.boinc_files.zip
5/12/2009 8:12:11 PM rosetta@home Finished download of threading_lb_reference.loopbuild.t360_.mtyka.boinc_files.zip
5/12/2009 8:15:53 PM Project communication failed: attempting access to reference site
5/12/2009 8:15:53 PM rosetta@home Temporarily failed download of threading_lb_reference.loopbuild.t360_.mtyka.flags.boinc: HTTP error
5/12/2009 8:15:54 PM Internet access OK - project servers may be temporarily down.
5/12/2009 8:15:54 PM rosetta@home [error] File threading_lb_reference.loopbuild.t360_.mtyka.flags.boinc has wrong size: expected 706, got 0
5/12/2009 8:15:54 PM rosetta@home Started download of threading_lb_reference.loopbuild.t360_.mtyka.flags.boinc
5/12/2009 8:15:55 PM rosetta@home Finished download of threading_lb_reference.loopbuild.t360_.mtyka.flags.boinc
ID: 61143 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LizzieBarry

Send message
Joined: 25 Feb 08
Posts: 76
Credit: 201,862
RAC: 0
Message 61146 - Posted: 12 May 2009, 22:32:22 UTC - in response to Message 61142.  

"The validator error has been found: Our data format was changed and the validator was not updated. We're doing that now."

More details please?

The Validator is running on the server status page, but some delays in awarding over the last couple of hours. Must be working on it right now.
ID: 61146 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,808,144
RAC: 627
Message 61147 - Posted: 12 May 2009, 22:44:15 UTC

Another absent output file: lb_alnmatrix_threading_hb_t313__IGNORE_THE_REST_11904_223_0

exit code 193(0xcl) -63

Snags
ID: 61147 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Alien
Avatar

Send message
Joined: 10 Nov 05
Posts: 5
Credit: 117,597
RAC: 0
Message 61155 - Posted: 13 May 2009, 9:05:59 UTC
Last modified: 13 May 2009, 10:05:46 UTC

Hi all,
I have severral of these Tasks that all start with "iRp40_S_3d85_ip40_ xxxx" that give Client Errors with the Outcome "Validate error"
I have even aborted several of them to avoid the error...........

One Example:
250695394


The second problem Is that I have 6 finished (up to now) and uploaded Results with Pending Credits to be seen here:
Pending credit

Alan
ID: 61155 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Tyka

Send message
Joined: 20 Oct 05
Posts: 96
Credit: 2,190
RAC: 0
Message 61163 - Posted: 13 May 2009, 16:44:18 UTC - in response to Message 61142.  

https://boinc.bakerlab.org/rosetta/result.php?resultid=250780868

iRp40_S_3d85_ip40_1zzo.pdb.gz_00000010_fa_dock.xml_score12_pert38_DOCK_11891_476_1


"The validator error has been found: Our data format was changed and the validator was not updated. We're doing that now. "

More details please?


Lol - sure if you want the gory details, here they are :)

Here's the content of one of our result files:
SEQUENCE: IWELKKDVYVVELDWYPDAPGEMVVLTCDTPEEDGITWTLDQSSEVLGSGKTLTIQVKEFGDAGQYTCHKGGEVLSHSLLLLHKKEDGIWSTDILKDQKNKTFLRCEAKNYSGRFTCWWL
TTISTDLTFSVKSSRGSSDPQGVTCGAATLSAERVRNKEYEYSVECQEDSACPAAEESLPIEVMVDAVHKLKYEQYTSSFFIRDIIKPDPPKNLQLKPLQVEVSWEYPDTWSTPHSYFSLTFCVQVQGKD
RVFTDKTSATVICRKNASISVRAQDRYYSSSWSEWASVPCAHPLENAWTFWFDNPQGKSRQRDWGSTIHPIHTFSTVEDFWGLYNNIHNPSKLNVGADFHCFKNKIEPKWEDPISANGGKWTISCGRGKS
DTFWLHTLLAMIGEQFDFGDEICGAVVSVRQKQERVAIWTKNAANEAAQISIGKQWKEFLDYKDSIGFIVHEDAKRSDKGPKNRYTV
SCORE: score fa_atr fa_rep fa_sol hack_elec hbond_sr_bb hbond_lr_bb hbond_bb_sc hbond_sc dslf_ss
_dst dslf_cs_ang dslf_ss_dih dslf_ca_dih fa_dun ref rms description
REMARK BINARY_SILENTFILE
SCORE: -575.033 -705.385 16.172 295.133 -5.204 -24.046 -60.222 -11.093 -8.141 -11
.137 -8.813 -2.536 -2.322 19.600 -67.040 6.010 S_S_3d85_ip40_2idv.pdb.gz_00000009.pdb.gz_00000001
FOLD_TREE EDGE 1 117 -1 EDGE 117 290 -1 EDGE 117 405 1 EDGE 405 291 -1 EDGE 405 467 -1 S_S_3d85_ip40_2idv.pdb.gz_00000009.pd
b.gz_00000001
RT 0.635059 0.770564 -0.0541478 -0.64237 0.565739 0.51701 0.429023 -0.293549 0.854265 -20.5199 35.9508 31.0298 S_S_3d85_ip40_2id
v.pdb.gz_00000009.pdb.gz_00000001
ANNOTATED_SEQUENCE: I[ILE_p:NtermProteinFull]WELKKDVYVVELDWYPDAPGEMVVLTCDTPEEDGITWTLDQSSEVLGSGKTLTIQVKEFGDAGQYTCHKGGEVLSHSLLLLHKKE
DGIWSTDILKDQKNKTFLRCEAKNYSGRFTCWWLTTISTDLTFSVKSSRGSSDPQGVTCGAATLSAERVRNKEYEYSVECQEDSACPAAEESLPIEVMVDAVHKLKYEQYTSSFFIRDIIKPDPPKNLQL
KPLQVEVSWEYPDTWSTPHSYFSLTFCVQVQGKDRVFTDKTSATVICRKNASISVRAQDRYYSSSWSEWASVPC[CYS_p:CtermProteinFull]A[ALA_p:NtermProteinFull]HPLENAW
TFWFDNPQGKSRQRDWGSTIH[HIS_D]PIHTFSTVEDFWGLYNNIHNPSKLNVGADFHCFKNKIEPKWEDPISANGGKWTISCGRGKSDTFWLHTLLAMIGEQFDFGDEICGAVVSVRQKQERVAIWTK
NAANEAAQISIGKQWKEFLDYKDSIGFIVH[HIS_D]EDAKRSDKGPKNRYTV[VAL_p:CtermProteinFull] S_S_3d85_ip40_2idv.pdb.gz_00000009.pdb.gz_00000001
CHAIN_ENDINGS 290 S_S_3d85_ip40_2idv.pdb.gz_00000009.pdb.gz_00000001
LWOksBxKHvH0xLCoQ5QbqBJbn1HUO0GoQR2+oB13P8HErc8nQ++JqBZkt+HEv0rnQO0irBZQg8HkSMQoQ99TpBVNeAI0KHXoQf/UtBZ5wCIUMIMoQEYVqBBr8CIUgVhoQD
IduB1lIwHUqmEoQk8gsB5SJnHUh4CoQ+/6sBJ6YvHEho0nQUo3oBdqiwHEfMKoQ+TtsBpFZ3HkQgVoQlYQoBNcdDIEEYSoQjs9nBBBW6H057YoQueEuBFYFFIU14SoQKxA
vBFYFBI0GvHoQGZLsB9SXFIkR2GoQy2poBtdeEIk++loQNeZrBtz3/HEAAmoQDCsrB5lOGIUKcfoQ S_S_3d85_ip40_2idv.pdb.gz_00000009.pdb.gz_00000001
LBWZmBBrc+HkWk/nQ++5kBx0tCIEW5wnQqGviBFYlFIUep9nQ8nKiBtyBEIUy2HoQ46xjB9na/HEZ7cnQVj3hBd8S2HkoFhnQKxgiBhU4rH0gAlnQ2O/dB9S32HEsyhnQt
dOgB557lHEAAonQOJGcBZQgsHUGEmnQRLSaBdSM/H0IbfnQwfqWBhrHqHE+TnnQQ14UBVO08HU6mgnQhrHTBRgVyH04lknQcSclBFXP7H0NJGoQXsPmBVTtFI02GqnQIwq
iBlsdCI0GvSnQXPalBtz37HEsyTnQfUokBNy2pH00NlnQXPKgBlBBeHkrHrnQZ7cbBdnvDIEFucnQyLdVBhU4hHkmZqnQFDCSBlupBIUjXenQpw1OBJFuwHEShlnQ S_S_
3d85_ip40_2idv.pdb.gz_00000009.pdb.gz_00000001
LZ7shB9RBKIk8S1nQFDifBx0NNIECsAoQgqxZBRK8LIkSM8nQT3EYBd9oMI025pnQiBRgBRhLTIESh9nQBr8cB5k4WI03PFoQWOUeBB/pcI0bSDoQamZhBB/JeIU5QFoQT
iBbBlsdfIk789nQnvPiBhVOLIUxgmnQlB8fBh9YMIUzNJoQzhWiBZ78TIkqxAoQam5fB125TIUxgsnQcSsYBlYQWIULyDoQuz3dB1h2VIkQgNoQ S_S_3d85_ip40_2idv
.pdb.gz_00000009.pdb.gz_00000001
LgVuWBxJRKIkSMGoQ67HRB1h2II0yhEoQZ78NBZ78NIE7RDoQrcILB9TtOI0Jx2nQKc9OBpvfFIkR2NoQc9IJBNepDI0fqMoQx1jIBVMIAIULyCoQPKcHBd9oAIUhrWoQM
dTYB989JIkZmNoQx+uQBVEiGI0eY6nQmupRBtHFCIE7RNoQ2jiPB5OfHIUxgVoQhArGBR2OHIkEDMoQjXaEBxfq9HkBBCoQ+TtJBpFZCIUHa3nQEtILBJaR5HU3kDoQzMT
DBxfq+H0P1VoQKxAKBh/U6HEsyXoQ46xHBxJRDIU6mdoQ S_S_3d85_ip40_2idv.pdb.gz_00000009.pdb.gz_00000001
LtIbOB55bRIk8SLoQXktLBJxgWI0eULoQR2OPBR2OaIUIwRoQCBWTBFDiYIEAAWoQOJGGBpZGWIEqGQoQ789FBhUYUIEsyboQ9oQABpvfTI0cofoQepb6A13vYIUKcgoQ6
mEvAlX6XIU6mjoQgqxQBxLdQIkWkRoQ0rULBBN/XIEaJDoQ03PEBFDCaI0eUPoQ3k4DBx0NTI0bSLoQR2OIBhArQI04lcoQCB2HBpvfXI04lgoQFue8AdR2QIk++ZoQBWZ
ABNJmRIUNenoQ3kY+ANIQbIUjXmoQIFu6AZktaIE/pYoQueUrA5OfbI0jCkoQamZrA14lVIUJGeoQCsyuANJGWIktzqoQ S_S_3d85_ip40_2idv.pdb.gz_00000009.p
db.gz_00000001

THere's a line there starting with CHAIN ENDINGS - that was added b one of our developers. We didn't realize and so the validator was not updated, hence rejecting these results that contained that new line.

It should be fixed now.


http://beautifulproteins.blogspot.com/
http://www.miketyka.com/
ID: 61163 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 61165 - Posted: 13 May 2009, 17:23:16 UTC

this one shows max memory used over 530MB. I believe that is more then is expected. But I have no way to know if this task was marked to go only to high memory hosts. Task name is abinitio_norelax_homfrag__plus_native_9rnt_0001A__SAVE_ALL_OUT_11817_1957
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 61165 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Alien
Avatar

Send message
Joined: 10 Nov 05
Posts: 5
Credit: 117,597
RAC: 0
Message 61166 - Posted: 13 May 2009, 18:04:00 UTC - in response to Message 61155.  
Last modified: 13 May 2009, 18:05:33 UTC

Hi all,
I have severral of these Tasks that all start with "iRp40_S_3d85_ip40_ xxxx" that give Client Errors with the Outcome "Validate error"
I have even aborted several of them to avoid the error...........

One Example:
250695394


The second problem Is that I have 6 finished (up to now) and uploaded Results with Pending Credits to be seen here:
Pending credit

Alan


Replying to my own message....

I now have 8 Tasks , and counting , waiting for their credit to be granted.
Without recieving the credit and adding the other Invalid tasks I had this is a waste of power.........
If I recieve a few more Tasks without the credit being granted I may turn Rosetta@Home off and look for some other project to take part in....

Pending credit: 341.13
ID: 61166 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,834,938
RAC: 1,233
Message 61168 - Posted: 13 May 2009, 19:52:15 UTC

So far, most of the iRp40_S workunits I've had ran rather fast - 99 decoys in less than an hour - but then had validate errors. Could you run the new version of your validator software on these workunits, already completed, which the previous version did not accept? In at least one case, a wingman needs this in addition to me or instead of me.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=228683054

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=228690029

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=228717383

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=228737287

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=228721480

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=228701305

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=228826391

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=228809090
ID: 61168 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5659
Credit: 5,694,012
RAC: 1,911
Message 61170 - Posted: 13 May 2009, 21:05:50 UTC - in response to Message 61168.  

So far, most of the iRp40_S workunits I've had ran rather fast - 99 decoys in less than an hour - but then had validate errors. Could you run the new version of your validator software on these workunits, already completed, which the previous version did not accept? In at least one case, a wingman needs this in addition to me or instead of me.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=228683054

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=228690029

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=228717383

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=228737287

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=228721480

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=228701305

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=228826391

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=228809090



add me to the list with 1 so far:

iRp40_S_3d85_ip40_1lu4.pdb.gz_00000001_fa_dock.xml_score12_pert38_DOCK_11891_469_0
full run to the 99 decoy limit and then validate error.
ID: 61170 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 61171 - Posted: 13 May 2009, 21:36:20 UTC

Obviously a widespread problem with iRp40_S workunits:

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=228722758
ID: 61171 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,808,144
RAC: 627
Message 61173 - Posted: 13 May 2009, 22:23:54 UTC - in response to Message 61168.  

So far, most of the iRp40_S workunits I've had ran rather fast - 99 decoys in less than an hour - but then had validate errors. Could you run the new version of your validator software on these workunits, already completed, which the previous version did not accept? In at least one case, a wingman needs this in addition to me or instead of me.



If you click on the task ID(far left column on both the workunit details page and the tasks for computer page) you should see that at least some of these have in fact been given credit. It appears that if both crunchers reported before the fix they have both now been granted credit. If the second reports after the fix it goes through the validator in the ordinary order and the first cruncher is left without credit. I know this is a bit annoying but as the validator appears to be struggling to keep up with its work at the moment perhaps it's for the best. I believe the two workunits are identical (they were begun from the same seed) and so only one copy is very useful. If they run those first copies through the validator again or run a script to grant credit it would be more to please us than to benefit the research. Not necessarily a bad thing (pleasing us) but not at the cost of causing more troubles. And folks are complaining about their pending credit.


Snags

ID: 61173 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Alien
Avatar

Send message
Joined: 10 Nov 05
Posts: 5
Credit: 117,597
RAC: 0
Message 61176 - Posted: 13 May 2009, 23:16:24 UTC - in response to Message 61173.  
Last modified: 13 May 2009, 23:29:47 UTC

And folks are complaining about their pending credit.



Excuse my complaining about my pending credit.

I'm not only doing this for science reasons only.
It's all about competing within a team and globaly and the fun involved with it...........
I like to see the results for the tasks I let my computer run for , and it's running 24/7 . More will follow.....if it works without all of these problems...

Without recieving credits there's just no fun left and it's a waste of my money , my energy and my time.............

So , I believe it's perfectly OK to complain about pending credits......


Alan
ID: 61176 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,834,938
RAC: 1,233
Message 61178 - Posted: 14 May 2009, 0:05:18 UTC - in response to Message 61173.  

So far, most of the iRp40_S workunits I've had ran rather fast - 99 decoys in less than an hour - but then had validate errors. Could you run the new version of your validator software on these workunits, already completed, which the previous version did not accept? In at least one case, a wingman needs this in addition to me or instead of me.



If you click on the task ID(far left column on both the workunit details page and the tasks for computer page) you should see that at least some of these have in fact been given credit. It appears that if both crunchers reported before the fix they have both now been granted credit. If the second reports after the fix it goes through the validator in the ordinary order and the first cruncher is left without credit. I know this is a bit annoying but as the validator appears to be struggling to keep up with its work at the moment perhaps it's for the best. I believe the two workunits are identical (they were begun from the same seed) and so only one copy is very useful. If they run those first copies through the validator again or run a script to grant credit it would be more to please us than to benefit the research. Not necessarily a bad thing (pleasing us) but not at the cost of causing more troubles. And folks are complaining about their pending credit.


Snags


For most of those workunits, they haven't been finished yet by any one else the last I looked. One of them hadn't even been assigned to anyone else.
ID: 61178 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,808,144
RAC: 627
Message 61182 - Posted: 14 May 2009, 3:02:18 UTC - in response to Message 61176.  

And folks are complaining about their pending credit.



Excuse my complaining about my pending credit.

I'm not only doing this for science reasons only.
It's all about competing within a team and globaly and the fun involved with it...........
I like to see the results for the tasks I let my computer run for , and it's running 24/7 . More will follow.....if it works without all of these problems...

Without recieving credits there's just no fun left and it's a waste of my money , my energy and my time.............

So , I believe it's perfectly OK to complain about pending credits......


Alan


I too believe it's perfectly okay to complain about pending credits and certainly didn't intend to imply otherwise. The lag lag between reporting and getting credit has been vanishingly small on Rosetta so the existence not to mention the increase in pendings is definitely worth noting. Maybe I should have said, "Now folks are noticing their pending credits". Let me try getting at my (wildly speculative) point from another angle:

We now have pendings. This might be because two copies of a bunch of iRp40 tasks have been run through the validator a second time, lengthening the lines, slowing things down for all the other WUs. If this is what they did and if this caused the growth in pendings perhaps they could have mitigated the issue by only running one copy through the validator a second time instead of both. But this would have left the other cruncher without credit for probably valid though unnecessary work. In cases in which the second cruncher reported his WU after the fix was in the first cruncher has not been granted credit. Maybe they are holding off because maybe they've got validator issues as evidenced by the pendings. If they have already received or have an expectation of receiving the data in the WU through the ordinary procedures the science needs do not require them to do anything with that first task. And maybe doing something about those first tasks just to please us could cause the project more difficulties like longer lines at the validator which would increase the pendings which would TA DA: displease us. (Just for the record, maybe the pendings have nothing to do with the iRp40 tasks. I'm speculating, wildly, irresponsibly even. In fact I am so regretting making that first post right now. For the other record, insomnia is worse than drugs. Seriously).

A few final points:
I would never want to discourage anyone from reporting any observations, noting any concerns or even complaining on these boards.

I couldn't care less why you crunch: none of my business, the more the merrier, glad you are having fun, etc.

Everything I've said about iRp40 tasks, pendings, the validator, and what the project may or may not be doing about them is merely speculation by someone who has no more information and probably less knowledge than many if not most of the readers here.


Snags

ID: 61182 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,808,144
RAC: 627
Message 61183 - Posted: 14 May 2009, 3:02:24 UTC - in response to Message 61178.  

[/quote]
For most of those workunits, they haven't been finished yet by any one else the last I looked. One of them hadn't even been assigned to anyone else.
[/quote]

Now that the fix for those tasks is in the validator the resends won't require additional intervention by staff. All of my tasks hit the 99 model mark in just about an hour so the turnaround time was probably pretty quick on average. Doing something with those first tasks will require intervention. I didn't see any WUs without a second cruncher and I can only speculate about any delay in resending. Forgive me, robert, but I'm going cease speculating at least until I've had some sleep. My brain is grinding away, mind you, but something, perhaps what little shred of good sense I have left, is telling me stop typing.

Snags
ID: 61183 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 61191 - Posted: 14 May 2009, 18:26:32 UTC

You'll notice that the invalid jobs have been granted credit if you click on the specific workunit. Looks like the validator has caught up now also.

this has been one of those rare occasions where we sent out a batch of workunits that had a change in output format with the latest app update and thus got rejected by our validator. It required us to rebuild our validator to fix the issue and the system had a little bit of catching up to do, possibly slowed down by the invalid batch.
ID: 61191 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Terrasapiens

Send message
Joined: 25 Apr 08
Posts: 15
Credit: 368,919
RAC: 0
Message 61197 - Posted: 15 May 2009, 4:26:51 UTC

I recently built a new PC and figured all the problems I was having with Rosetta Mini on my old machine would go away, however I'm still seeing most of the mini WUs crashing, at least all the crashed ones checked were mini WUs. Yet with the new machine the reason for the crashes is different. Could someone please look at my results and provide some feedback as to what the problem may be? The errors checked all showed a lot of "Can't acquire lockfile - exiting" messages. I'm running the same OS (Win XP Pro, SP3) and antivirus (Kaspersky v6.0) as on the old machine. Could the antivirus or some other application be causing these errors? BTW, as with the old machine the Rosetta Beta WUs run just fine.

If this problem persists, it there a way to block mini WUs and only allow processing of beta WUs? This would allow my machine to do a lot more useful work and not waste time on WUs that will continue to fail.

My tasks: https://boinc.bakerlab.org/rosetta/results.php?userid=254884

Thank you!
ID: 61197 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Betting Slip

Send message
Joined: 26 Sep 05
Posts: 71
Credit: 5,702,246
RAC: 0
Message 61200 - Posted: 15 May 2009, 11:25:04 UTC - in response to Message 61191.  

You'll notice that the invalid jobs have been granted credit if you click on the specific workunit. Looks like the validator has caught up now also.

this has been one of those rare occasions where we sent out a batch of workunits that had a change in output format with the latest app update and thus got rejected by our validator. It required us to rebuild our validator to fix the issue and the system had a little bit of catching up to do, possibly slowed down by the invalid batch.


It's a nightmare dealing with the public isn't it :)
ID: 61200 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Problems with Minirosetta Version 1.67



©2024 University of Washington
https://www.bakerlab.org