Problems with Rosetta version 5.93

Message boards : Number crunching : Problems with Rosetta version 5.93

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
Profile AdeB
Avatar

Send message
Joined: 12 Dec 06
Posts: 45
Credit: 4,187,319
RAC: 433
Message 50992 - Posted: 26 Jan 2008, 15:19:40 UTC - in response to Message 50991.  

Strange. hedera received 88 of his 98 claimed for his watchdog ended task resultid=135513724. I wonder what the difference was?


And i received 92 of 94 claimed for resultid 135481414.
I hope Astro gets more than 20 credits for his job, but it probably won't be 400+.
ID: 50992 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile AdeB
Avatar

Send message
Joined: 12 Dec 06
Posts: 45
Credit: 4,187,319
RAC: 433
Message 50993 - Posted: 26 Jan 2008, 15:20:27 UTC - in response to Message 50991.  

Strange. hedera received 88 of his 98 claimed for his watchdog ended task resultid=135513724. I wonder what the difference was?


And i received 92 of 94 claimed for resultid 135481414.
I hope Astro gets more than 20 credits for his job, but it probably won't be 400+.
ID: 50993 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile AdeB
Avatar

Send message
Joined: 12 Dec 06
Posts: 45
Credit: 4,187,319
RAC: 433
Message 50994 - Posted: 26 Jan 2008, 15:21:11 UTC - in response to Message 50991.  

Strange. hedera received 88 of his 98 claimed for his watchdog ended task resultid=135513724. I wonder what the difference was?


And i received 92 of 94 claimed for resultid 135481414.
I hope Astro gets more than 20 credits for his job, but it probably won't be 400+.
ID: 50994 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile AdeB
Avatar

Send message
Joined: 12 Dec 06
Posts: 45
Credit: 4,187,319
RAC: 433
Message 50995 - Posted: 26 Jan 2008, 15:24:42 UTC

sorry for the triple-post. I had some problems with my connection.
ID: 50995 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 50997 - Posted: 26 Jan 2008, 15:47:14 UTC - in response to Message 50995.  
Last modified: 26 Jan 2008, 16:32:51 UTC

sorry for the triple-post. I had some problems with my connection.

up to 461 credits now. LOL

Say, you do know that you can "edit" your posted messages as long as you do so within 60 min of the original post. You should see an "edit" box on each of your previous posts. You could (only if you wanna) delete everything and just put "deleted" or some other message into all but the intended one. At that point a nice moderator might come along and hide those extra posts. Anyway, just wanted you to know. Hope you enjoy the rest of the weekend

32:49:08 cpu time, 99.494% complete with 00:09:57 remaining.

[edit] made a progress chart. Given the curve, I doubt it'll ever finish.

ID: 50997 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
transient
Avatar

Send message
Joined: 30 Sep 06
Posts: 376
Credit: 10,098,254
RAC: 6,814
Message 50998 - Posted: 26 Jan 2008, 17:15:04 UTC - in response to Message 50992.  

Strange. hedera received 88 of his 98 claimed for his watchdog ended task resultid=135513724. I wonder what the difference was?


And i received 92 of 94 claimed for resultid 135481414.
I hope Astro gets more than 20 credits for his job, but it probably won't be 400+.


In my case maybe not even one decoy was finished. I'm just guessing
ID: 50998 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 51000 - Posted: 26 Jan 2008, 18:49:20 UTC

resultid=135831728

CPU time 127601.71875 (35.44 HOURS)
Claimed credit 501.989998586804
Granted credit 20

Mod Sense. I'm pretty sure there's something wrong here. Anyone else spot the problem???? It's not like this issue wasn't posted about early enough on Friday for someone at the project to comment upon it.
ID: 51000 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JEklund

Send message
Joined: 24 Sep 06
Posts: 7
Credit: 105,447
RAC: 0
Message 51001 - Posted: 26 Jan 2008, 19:37:32 UTC - in response to Message 51000.  

resultid=135831728

CPU time 127601.71875 (35.44 HOURS)
Claimed credit 501.989998586804
Granted credit 20

Mod Sense. I'm pretty sure there's something wrong here. Anyone else spot the problem???? It's not like this issue wasn't posted about early enough on Friday for someone at the project to comment upon it.


Based on the info in the log it seems that it was stuck and the watchdog killed it ( and appreciated your work as 20 credits .. which is not fair for 35 hours work IMHO )

No clue what is wrong with that work unit though ..

-- Lundi --

ID: 51001 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mhhall

Send message
Joined: 28 Mar 06
Posts: 7
Credit: 9,911,107
RAC: 1,111
Message 51007 - Posted: 26 Jan 2008, 22:03:39 UTC - in response to Message 50335.  

Please post problems and/or bugs with rosetta 5.93. Thanks for your
support!

My slower computer (ID #187636 -- older Linspire Linux box) is set to accept
jobs of approx 14 hours. I have a job on machine at this time which say it
is 99.67% completed with 50:16:19 of CPU time. For time being, I've suspended
the job. Name starts "2h4o_BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK
(Work unit 123162090).

Don't know if this is a Rosetta issue or a problem w/ this specific job.
I know that I have another of same name in my queue (135883853).

Just wondering if someone else has seen similar issue/problem.

Hope this helps!!

ID: 51007 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile AdeB
Avatar

Send message
Joined: 12 Dec 06
Posts: 45
Credit: 4,187,319
RAC: 433
Message 51008 - Posted: 26 Jan 2008, 22:58:53 UTC - in response to Message 51000.  

resultid=135831728

CPU time 127601.71875 (35.44 HOURS)
Claimed credit 501.989998586804
Granted credit 20

Mod Sense. I'm pretty sure there's something wrong here. Anyone else spot the problem???? It's not like this issue wasn't posted about early enough on Friday for someone at the project to comment upon it.


Oh no, you did get 20. You should have got at least an extra 100 for all the effort you put into it.
ID: 51008 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1663
Credit: 78,309,808
RAC: 36,293
Message 51009 - Posted: 26 Jan 2008, 23:51:20 UTC

I've got one here:

https://boinc.bakerlab.org/rosetta/result.php?resultid=135314464

Rosetta score is stuck or going too long. Watchdog is ending the run!
CPU time: 58569.2 seconds. Greater than 4X preferred time: 14400 seconds

Claimed credit 211.010587329225
Granted credit 80
ID: 51009 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [AF>France>TDM>Centre]Jeannot Le Tazon

Send message
Joined: 8 Dec 05
Posts: 6
Credit: 145,020
RAC: 0
Message 51013 - Posted: 27 Jan 2008, 7:51:31 UTC

I've aborted this one https://boinc.bakerlab.org/rosetta/result.php?resultid=135287253 after 11h. (prefs set to 12h)
11 h crunching, then cpu benchmark, and then back to 10% complete. :(
it seemed to do nothing interesting after, maybe, 1h and 1 decoy
(Model 1, Step 27091, Accepted RMSD 9124, Accepted energy 6.65805)
Nothing displayed on "Searching", "Accepted", nothing moving after 1 decoy on "RMSD" & "Accepted Energy".
ID: 51013 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 190
Credit: 60,852,826
RAC: 5,483
Message 51019 - Posted: 27 Jan 2008, 20:25:18 UTC

I started getting lots of computation errors today. I did make 1 change to the system but it should not have caused this problem. Most of the time the CPU cranks on the WU for 50+ min. before the error.

Is there a problem with some of the WUs in the 5.93 beta? I just installed the newest BOINC Client (5.10.30) and I guess it could be at fault as well.

Any insight is greatly appreciated.

Paul
Thx!

Paul

ID: 51019 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4872
Credit: 4,184,447
RAC: 1,772
Message 51028 - Posted: 27 Jan 2008, 21:19:40 UTC - in response to Message 51019.  

paul - do the group a favor and tell us which one of your many computers is having fits and which work units as you have alot of different computers and lots of workunits in queue. Its not the BOINC program that has the errors, rather the project work units themselves. You probably notice that you have errors on RAH vs the other projects you are working on. If it was a BOINC program error you would have errors on all your projects.

I started getting lots of computation errors today. I did make 1 change to the system but it should not have caused this problem. Most of the time the CPU cranks on the WU for 50+ min. before the error.

Is there a problem with some of the WUs in the 5.93 beta? I just installed the newest BOINC Client (5.10.30) and I guess it could be at fault as well.

Any insight is greatly appreciated.

Paul


ID: 51028 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
PieBandit
Avatar

Send message
Joined: 17 Apr 07
Posts: 6
Credit: 228,220
RAC: 0
Message 51029 - Posted: 28 Jan 2008, 0:08:43 UTC

several of my WU are also failing with compute errors:

Result ID 136334535
Result ID 136319412
Result ID 136308989
Result ID 136258153
Result ID 135343580
Result ID 135260720
Result ID 134993972

since January 21st, I've had about a 50% success rate
ID: 51029 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 190
Credit: 60,852,826
RAC: 5,483
Message 51030 - Posted: 28 Jan 2008, 0:31:32 UTC - in response to Message 51028.  

paul - do the group a favor and tell us which one of your many computers is having fits and which work units as you have alot of different computers and lots of workunits in queue. Its not the BOINC program that has the errors, rather the project work units themselves. You probably notice that you have errors on RAH vs the other projects you are working on. If it was a BOINC program error you would have errors on all your projects.

I started getting lots of computation errors today. I did make 1 change to the system but it should not have caused this problem. Most of the time the CPU cranks on the WU for 50+ min. before the error.

Is there a problem with some of the WUs in the 5.93 beta? I just installed the newest BOINC Client (5.10.30) and I guess it could be at fault as well.

Any insight is greatly appreciated.

Paul



Greg:

Thanks for the note. I do have lots of WUs checked out and it takes a long time to find the issues.

The computer is 591177 and it has more compute errors than successes. I will keep fighting with the hardware but I think it is OK now. All of my temps are well in spec and I don't have any other issues.

I run 100% R@H so I can not compare these WUs to anything else. I did notice that none of my other systems have the same issues so a BIOS upgrade later, I think we may have some stability.

Thx

Paul

Thx!

Paul

ID: 51030 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 11 Oct 05
Posts: 146
Credit: 3,250,729
RAC: 161
Message 51039 - Posted: 28 Jan 2008, 11:06:18 UTC

The problems I was getting over at Ralph appear to have carried over to Rosetta.

The Wu's starting with "2h4o" were causing problems on Ralph so I was supprised to see them over here on Rosetta.

They have a habit of running well past your preference time (up to 21 hours with preference time of 6 hours),
All seem to get to just over 97% completed with 9 minutes 59 seconds to go and just sit there for hours,
Says 100% completed but still shows "Waiting to Run" in Boinc Manager,
Often giving computation errors after the extra long run time (this was mainly on Ralph),
If it does complete after the extra long run time will only give a very poor amount of credit because usually only 1 decoy has been produced in all this time.

I have just aborted two of these WU's
WU 135437069 ran for over 3 1/2 hours got to 100% but still waiting to run in BM, after aborting results show Zero (0) time taken on job.
WU 135437323 was already over an hour past my preference time of 6 hours and still grinding away with 9 minuts 59 seconds to go at 97% completed, it had been this way for quite some time.
WU 135372094 completed after more than 21 hours, returning just 2.5 cr/h.

If I see any more of these WU type then I will be aborting them.
ID: 51039 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Thomas Leibold

Send message
Joined: 30 Jul 06
Posts: 55
Credit: 19,627,164
RAC: 0
Message 51046 - Posted: 28 Jan 2008, 18:22:36 UTC - in response to Message 51039.  



The Wu's starting with "2h4o" were causing problems on Ralph so I was supprised to see them over here on Rosetta.



I'm seeing the same problems as Conan on a number of my servers. The trouble workunits are 2h4o and 1zpy and all require manual abortion. Restarting Boinc will just reset the amount of time already spend on them and starting them again.

The 2h4o units in particular tend to stay at 100% Completed but state "Running" with no increase in amount of cpu time spend. Looking at the stdout.txt/stderr.txt files shows that there was an attempt by the watchdog to shut down the client (and as far as I know that has never worked properly for Rosetta on Linux).
Team Helix
ID: 51046 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 51052 - Posted: 28 Jan 2008, 21:06:46 UTC

I aborted them all as well, Still waiting on my 480 missing credits too...

I wonder when the staff gets in to work? These have really got to be affecting the total rate of return (i.e work done).
ID: 51052 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile FalconFly
Avatar

Send message
Joined: 11 Jan 08
Posts: 23
Credit: 2,163,056
RAC: 0
Message 51070 - Posted: 29 Jan 2008, 8:16:33 UTC - in response to Message 51052.  

Same here, had to abort the last 2h4o Model.
One of my faster Hosts effectively stopped working, as the hourly rotation of the last 2h4o__BOINC_TWIST_RINGS WorkUnit apparently reset CPU time over and over, while making zero progress.

As a side-effect, the Rosetta Long Term Debt of the affected Clients rocketed upto -90000s (lots of work but almost no progress done)
ID: 51070 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · Next

Message boards : Number crunching : Problems with Rosetta version 5.93



©2020 University of Washington
https://www.bakerlab.org