Posts by Chris Holvenstot

21) Message boards : Number crunching : "On multiprocessors, use at most n processors" (Message 70817)
Posted 30 Jul 2011 by Profile Chris Holvenstot
Post:
I am curious - where are you specifying the number of CPUs to utilize - on the computing preferences web page on the Rosetta site or on the "preferences -> processor usage" frame which is a part of the BOINC manager running locally on your system?

If you are specifying it on the web page try going to the BOINC manager preferences panel and set it there - it is my understanding that the local preferences set on the BOINC manager itself override the global preferences set via the web site.

With my account I tried setting the number if processors to 1 on the web page and 100% on the BOINC manager, and it is using all cores.

This may be your issue.
22) Message boards : Number crunching : "On multiprocessors, use at most n processors" (Message 70807)
Posted 29 Jul 2011 by Profile Chris Holvenstot
Post:
Just speculation but your CPU actually only has 4 cores - acting as eight - via the magic of Hiper-threading.

It would appear that maybe when you allow BOINC to use five cores, he takes it as five real cores - thus you are still running on all four real cores.

Try dropping it down to three cores - if my guess is correct then you should see six tasks running.

Just a shot in the dark

23) Message boards : Number crunching : Mod.Sense (Message 70095)
Posted 23 Apr 2011 by Profile Chris Holvenstot
Post:
Greg / Mike -

After withdrawing from the project I looked at other BOINC projects but nothing caught my eye.

1. I would prefer medical research. I am not sure I see the value to society in finding new prime numbers or cracking an encryption system - but that's a personal preference.

2. I wanted a CPU project - my systems run "headless" over a VNC connection and their graphics adapters are small integrated ATI chips. I already have problems with the heat generated by 12 systems.

3, I wanted a project with an active, dynamic community. Due to a degenerative neuromuscular disease I am becoming more and more of a shut in - and I enjoy the technical banter and team spirit so much more than something like slashdot.

4. I wanted a project whose "news" section was kept up to date - unfortunately many of the projects are even worse than Rosetta in this area.

5. I fairly steady supply or work units.

6. I would have preferred that the project be US based - my only reason for this is that it is so much easier to be confident that the results are "in the public" domain - I may be fooling myself but I don't know the rules and regulations dealing with this when you select a project based overseas.

When I find something I will likely jump in - if not I will go out and purchase some monitors and donate the systems to some charity.

However, I will say that you guys have been a pleasure to crunch with over the past year and will be missed.

24) Message boards : Number crunching : Mod.Sense (Message 70094)
Posted 23 Apr 2011 by Profile Chris Holvenstot
Post:
Greg - Mod.Sense is still hanging around - and from my perspective he is a man of honor and dedication (gender assumed) who puts out a real effort.

I would not want to be in his shoes - been there, done that. it is difficult to try and tap dance and communicate status when you are getting no support from the back-end organization.

At the other end of the totem pole there is Dr Dave - and while his posts have been limited over the past few months I have the impression that he still makes an attempt to keep us up to date on the scientific direction of the project.

Dr Dave is a scientist and not an IT specialist - and as such can not really be expected to be down in the nuts and bolts of the system. I see the problem as being rooted squarely in the ranks of the

I see the problem as being at the feet of the developers and the sysadmins.

I am especially frustrated by the error opening cs_frags.9mers.gz - that problem is so consistent and frequent that it sold have been easy to nail - heck, anyone who has played the software game for any length of time knows that often the biggest challenge is being able to recreate the problem.

Before I withdrew form the project I spent a couple of hours trying to locate the local job queue where jobs are stored while they were in the ready to start status - the plan was to write a script to scan and purge the problem tasks before they started execution.

Unfortunately, nothing jumped out at me.
25) Message boards : Number crunching : Compute error (Message 70093)
Posted 23 Apr 2011 by Profile Chris Holvenstot
Post:
Greg - this is one of the errors one of the errors people have been trying to get the sysadmins to address for several weeks. As you noted, there has been no status given by the project.
26) Message boards : Number crunching : Mod.Sense (Message 70085)
Posted 21 Apr 2011 by Profile Chris Holvenstot
Post:
Could you please PM me when the system admins / devs decide to address all of the bad tasks which have been flowing into the system of late - such as validate errors after 10 minutes, compute errors behind the cs_frags.9mers.gz file (all with matching wingman results)

It has been over two weeks since the user community started reporting these problems and there has been no response from the project yet.

If they don't care, why should I?

Let me know when they have addressed the problem, until then I guess I'll just shut down the systems and save on my electric bill.

Thanks

27) Message boards : Number crunching : could not open file cs_frags.9mers.gz (Message 70082)
Posted 20 Apr 2011 by Profile Chris Holvenstot
Post:
Who dropped the ball? I was hoping that you were going to step up and take the blame <sarcastic grin>

People have been complaining for over two weeks now about this and a few other "wingman included" errors in both the "Compute Error" and the "Minirosetta 2.17" threads and neither has the issue been resolved nor have we received even an acknowledgement that there is even an issue from any of the developers.

Further, this is not just a case of having a few jobs polluting the system and just having to wait until they are worked off the queue - as of today (20 April) these tasks are still being generated.

If you take a look at the front page you will see that the "estimated terraflops for the project is down under 110 - where just a few short months ago it was up around 150 - and this is with the recent addition of the two mega-computer(s) run my the Microsoft Windows Azure group and the Russian "2e" group - each with a RAC of well over 100K

I wonder what the project's "terraflops" would be without these two groups? I don't wonder why so many seem to have left the ranks of active participation.
28) Message boards : Number crunching : Problems with web site (Message 70067)
Posted 18 Apr 2011 by Profile Chris Holvenstot
Post:
Any chance of getting the "results" links on the Home page and the Account page fixed - for the past few weeks all I have been getting is the message "Sorry, the data requested does not exist"

I had previously posted this issue in the thread "User results" but maybe that was not the right place to do so.
29) Message boards : Number crunching : minirosetta 2.17 (Message 70066)
Posted 18 Apr 2011 by Profile Chris Holvenstot
Post:
Text1954 said: Well, I suggest they give us a storage fee in the form of double normal points to house their flawed software on our systems...


There are even a few additional types of tasks getting the validate errors with matching wingman results but there are fewer of them so I'm just going to sit back and see what they do with these before sorting through more the chaff.

It sure would be nice if they would update their server software so that we could pull a task list by Server State / Outcome like some of the other projects have. It would make digging through the results a bunch easier.

30) Message boards : Number crunching : Compute error (Message 70065)
Posted 18 Apr 2011 by Profile Chris Holvenstot
Post:
svincint said ... As the first post in that thread states: that's where bugs should be reported. There's no need to have a separate one here: it just gets confusing.


OK - I posted fresh examples of my previously reported errors to the minirosettsa thread. Is there really hope that it will cut down on the number of weeks it takes to get a response / resolution?
31) Message boards : Number crunching : minirosetta 2.17 (Message 70063)
Posted 18 Apr 2011 by Profile Chris Holvenstot
Post:
I am seeing validate errors (with matching wingman results) on tasks whose name has the form of:

T0590_boinc_nmr_homology_max10_loopbuild_threading_cst_relax_tex

A few samples would be:

414980981
414994609
414957506
414950332
415065606
32) Message boards : Number crunching : minirosetta 2.17 (Message 70062)
Posted 18 Apr 2011 by Profile Chris Holvenstot
Post:
You can add to Mad Max's list of failing tasks (with matching wingman results) whose name is in the form of:

ProteinG_abinitio_SAVE_ALL_OUT_design_relax

415571046
414989706
414921629
415131368
415102869

415091934
414802441
415091930
415171797
415008017

This is not an exhaustive list of this type of error found on my systems – these were all “fresh” tasks with creation dates between 16 April and 18 April.
33) Message boards : Number crunching : minirosetta 2.17 (Message 70061)
Posted 18 Apr 2011 by Profile Chris Holvenstot
Post:
Tex1954 said: Any way to know WHEN this group of tasks was generated on the server? Could it be it's an old batch and we just need to work through it?


Good question - I just looked at a few of them on my previous list and the task creation dates were 16 April and 18 April.

So this is NOT a case of just letting "old" jobs work their way through the system.
34) Message boards : Number crunching : minirosetta 2.17 (Message 70059)
Posted 18 Apr 2011 by Profile Chris Holvenstot
Post:
Few last days I got big pack of "Compute error" on tasks starting from "T0xxx_". This tasks ends with errors few seconds after start. Some examples:


To his list of those ending "in only a few seconds" with the error message "file cs_frags.9mers.gz" you can add:

415450587
415451112
415451774
415525056
415086428

415038519
415038519
415008687
415001340
415384421

415376541
415367010
415339775
415326968
415043715

415337799
415300558
415299126
415289036
415068958

415020984
414880550
415302737
415245580
415216583

415210228
415605253
415563904
415554009
415542857

415540472
415533975
415533394
415519909
415503828

415487923
415485024
415473737
415472523
415466575

415465650
415068765
415064287
415062582
415062402

415053058
415343211
415335164
415333379
415278070

This does NOT represent a COMPLETE listing of what I have seen on my systems - I just listed the FIRST 50 or so that have FAILED with this error so far TODAY. And it is still early.

These errors have been going on for AT LEAST 2 WEEKS and have been the topic of discussion in another thread on this board. These are all "fresh" tasks having been issued by the Rosetta server in the last day or two.
35) Message boards : Number crunching : Compute error (Message 70043)
Posted 16 Apr 2011 by Profile Chris Holvenstot
Post:
How about it mod.sense - any word about the t0xxx type jobs which fail after only a few seconds - I am still getting a trickle of them. Matching wingman results of course. Any chance of getting the problem fixed or are they just going to let the series of jobs burn out and die an natural death?

I am also starting to see a new type of task with errors flow into my systems - with matching wingman results. Tasks have the prefix of "dck_rhoA_rhoA" and fail after zero seconds with the error message:

ERROR: Option matching -docking:no_filters not found in command line top-level context


Sample tasks would include:

414237964
414202168
414127519
414134968
414192201
414167204

additionally, I had previously mentioned in this thread a series of tasks which ran for a while (half hour and up)- end with 100 decoys generated and then fail with a validate error - matching wingman results here too.

The names for these tasks seem to all be prefixed with "ProteinG_abinitio_SAVE_ALL_OUT"

Sample tasks would include:

414750044
414771827
414705765
414691538
413762471
413761980

Thanks in advance for any information you can squeeze out of the admins on these issues - it has been almost two weeks since they were first reported and the crunchers have been provided with no feedback yet.

The admins seem to be asleep at the wheel - are they studying to be air traffic controllers when they grow up?
36) Message boards : Number crunching : User results (Message 70010)
Posted 11 Apr 2011 by Profile Chris Holvenstot
Post:
How about it mod.sense - you have any insight into what is going on here?

Thanks
37) Message boards : Number crunching : Compute error (Message 70009)
Posted 11 Apr 2011 by Profile Chris Holvenstot
Post:
Kirby said ...

Hopefully Dr. Baker will remember his promise to communicate the project's status to us better using Twitter and Facebook.


You know, I would not look to or expect the good doctor to get involved with giving status on the technical glitches we have been seeing - rather I would like to hear from him some sort of short blurp on a regular basis concerning the projects current direction and accomplishments.

In simple "layman's terms"

I think that it is more in the realm of the sysadmins to provide us some sort of status when things start going wrong. Or heck, enslave a few grad students to be responsible for that - after all from my time on various campuses it would appear that the average grad student is somewhere between an indentured servant and chattel.

Maybe what we all need to do is just quietly hit the suspend button for a couple days and get their attention.

But that's just frustration talking.
38) Message boards : Number crunching : Compute error (Message 70004)
Posted 10 Apr 2011 by Profile Chris Holvenstot
Post:
Hank said:

I suppose that when there are problems, the folks whose job it is to solve them spend time on that rather than getting into potentially endless conversations on the forum. ;)


I agree - they don't need to spend their workday on endless conversations - I think that 90% of us would be thrilled to see just a simple entry in the "News" section of the home page.

If you've never read it take a moment and check it out. I understand if you actually read it they will give you 3 credit hours in ANCIENT AMERICAN HISTORY.

39) Message boards : Number crunching : Compute error (Message 70001)
Posted 9 Apr 2011 by Profile Chris Holvenstot
Post:
@fatbozz - you are correct - the "Client Error / Fail Almost Right Away" issue seems to be limited to work units whose name starts out with the prefix T0xxx. However, I am also getting "Validate Errors" (with matching wingman results thank you) on ** some ** of the work units starting out with ilv_ and IF3_.

@mod.sense - we know and appreciate that you are an unpaid volunteer. And while I don't know about the other crunchers here, I for one also understand just how much fun it is to try and tap dance around a problem when you are getting little or no input from those responsible for the platform that is struggling.

A couple of samples of work units with the Validate Errors I mentioned would be:

377197127
377200299
374781916

In response to you comment that the "failure rate is note extreme" - well at least the facts seem to be converging with your statement. I just did a quick "eyeball survey" of my system and see about 5% of the jobs getting the Validate type of error and 10% to 15% getting the Client Error failure.

At the high point I was seeing about 40% of my work units failing with the Client Error issue.

The Final Four is over and done with - gently put, I did not even wee WU mentioned. So tell the staff to get back to work! ;)





40) Message boards : Number crunching : User results (Message 69997)
Posted 9 Apr 2011 by Profile Chris Holvenstot
Post:
I have noticed that for the past few days (at least) when I click on the "results" link on the home page I no longer see the results of the work I've done - instead I get a message saying "Sorry, the data requested does not exist"

Am I "special" or is every one getting this message?



Previous 20 · Next 20



©2024 University of Washington
https://www.bakerlab.org