Odd work unit.

Message boards : Number crunching : Odd work unit.

To post messages, you must log in.

AuthorMessage
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 652
Credit: 11,662,550
RAC: 1,276
Message 78262 - Posted: 4 Jun 2015, 8:03:42 UTC

I have this...

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=668308462

... work unit running on one of my machines. I say running, I don't think it is, when it is in the "Running" state, my Windows XP task monitor CPU usage drops from ~100% to ~75%, (quad core machine). The job has racked up 47:49:12 hours at the time I suspended it.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 78262 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 78266 - Posted: 4 Jun 2015, 21:59:38 UTC

Sometimes BOINC seems to lose track of things and records the task as running, but doesn't actually give it any CPU. The properties of the task will show actual CPU time used, the number on the task list is the "elapsed time", not CPU time.

Typically suspending the task and resume it again is enough to straighten things out and the task will complete normally.
Rosetta Moderator: Mod.Sense
ID: 78266 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 652
Credit: 11,662,550
RAC: 1,276
Message 78269 - Posted: 5 Jun 2015, 4:49:02 UTC

I enabled the job again, and it started running, the elapsed dropped back to 6:41:22. Is David Baker aware of this issue? I have BOINC running on a couple of remote systems that I rarely visit, they could be sitting idle for all I know.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 78269 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 652
Credit: 11,662,550
RAC: 1,276
Message 78457 - Posted: 16 Jul 2015, 19:01:15 UTC
Last modified: 16 Jul 2015, 19:01:55 UTC

I had another of these today, 50+ hours. It is clear that I cannot rely on the project at remote stations. No new tasks set. It is interesting that you say it is a BOINC issue, Rosetta is the only project I am connected to that is showing this. This machine is, for example, connected to 23 projects.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 78457 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 652
Credit: 11,662,550
RAC: 1,276
Message 78591 - Posted: 25 Aug 2015, 20:03:55 UTC
Last modified: 25 Aug 2015, 20:40:40 UTC

And today, another of these. Clearly, there is an issue here, affecting Rosetta, and not any of my other projects. No new tasks set AGAIN, and will remain set until I get some information about this, what is being done about it, and when it will be resolved.

The wu this time is:

foldit_2000977_0007_fold_and_dock_SAVE_ALL_OUT_285373_5459_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=683127995

The first report I made in June was the first time I had seen this, something has happened or changed before, but not long before then.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 78591 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,641,936
RAC: 44
Message 78592 - Posted: 26 Aug 2015, 1:33:52 UTC

I only see one active PC on your account, so it appears that your other PCs are not active on rosetta anymore.

With that said, I have 4 machines of various configurations crunching Rosetta 24/7 for more than a year now and have never encountered the issue you're describing. Thus, my guess is that some other software present on the machine is causing an issue. Similar threads to this have uncovered known issues with certain anti-virus programs that prevent certain components of Rosetta from running properly, incorrect OS timezone settings (someone's computer thought it was the wrong timezone in the year 2024 instead of 2014 and their WUs were all failing), and other strange issues that generally turn out to be machine specific.

that's the downside of hetero-genius computing though, right! XD

It also appears that the task you linked to (https://boinc.bakerlab.org/rosetta/workunit.php?wuid=683127995) completed successfully in the end after just ~6 hours. That's good news, how did that happen?
ID: 78592 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1981
Credit: 38,424,259
RAC: 13,236
Message 78594 - Posted: 26 Aug 2015, 4:17:24 UTC - in response to Message 78591.  

And today, another of these. Clearly, there is an issue here, affecting Rosetta, and not any of my other projects. No new tasks set AGAIN, and will remain set until I get some information about this, what is being done about it, and when it will be resolved.

The wu this time is:

foldit_2000977_0007_fold_and_dock_SAVE_ALL_OUT_285373_5459_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=683127995

The first report I made in June was the first time I had seen this, something has happened or changed before, but not long before then.


The error being thrown out a few times is:

ERROR: std::abs( coordsys_rot.det() - 1.0 ) < 1e-6
ERROR:: Exit from: ......srccoreposesymmetryutil.cc line: 887

Also a few

No heartbeat from core client for 30 sec - exiting


ID: 78594 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 652
Credit: 11,662,550
RAC: 1,276
Message 78595 - Posted: 26 Aug 2015, 5:06:00 UTC

>>> how did that happen?

Read the third message in this thread. It happens because I noticed it, suspended it, then resumed it. I cannot notice it when it happens at sites I do not visit regularly. Indeed, I have allowed these tasks to run for days on this machine because I didn't notice the problem was there.

The problem must occur right at the end of processing the work unit, the Remaining column shows ---, when re-enabled, the elapsed drops to a sensible time, and remaining to a reasonable figure, and it runs to completion.

My machines all have Avast and/or AVG anti virus software, something I will not be changing as they work for me, and cause me no problems. If Rosetta cannot work on machines with these tools on them, then Rosetta has a problem, simple as that. Many people run these.

As I have said, this is a Rosetta specific issue, I have not found this with any other projects.


Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 78595 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,641,936
RAC: 44
Message 78602 - Posted: 29 Aug 2015, 4:24:24 UTC

>> My machines all have Avast and/or AVG anti virus software

I can attest that Avast doesn't appear to cause any issues with Rosetta (3 of my 4 machines have Avast installed).

My ears perked up when I read the word 'and' in your reply. I can't speak to AVG, but I can say from my former life in tech support for a certain PC Manufacturer that having more than one anti malware program (especially having multiple programs set to do real-time monitoring/active protection) is asking for lots of strange types of trouble. The choice is yours, but in tech support we actively dissuaded people from installing multiple antivirus solutions on the same machine unless one of them was only set to do system scans and not be in any kind of 'active protection' role.

**Note that by default, AVG and Avast will both attempt to run as 'active protection'/'realtime scanning' solutions. I suggest choosing only one of them for this role.

Personally, I have Avast as my active antimalware solution, and I have Malware Bytes as a periodical system scanner - ie. I do not have realtime monitoring via Malware Bytes but use it to perform full system scans every few days.

(anecdotally I am biased to say that Malware Bytes is superior to either AVG/Avast in terms of which viruses it can successfully detect and remove)
ID: 78602 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 652
Credit: 11,662,550
RAC: 1,276
Message 78603 - Posted: 29 Aug 2015, 8:43:52 UTC

My machines typically have AVG or Avast, they have whatever was my favourite at the time they were set up. This machine does, indeed, have both on it, I don't need both, but as the machine I use, it was useful to see if any of the others needed updates or actions. I can disable one of them. I have not found any trouble having them both there however, they have been there for years. I'll have a look at Malware Bytes, always interested.

My Rosetta work unit queue is now exhausted, so I am no longer crunching for Rosetta. I had hoped someone might have some information that could correct this behaviour. Rather sad.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 78603 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,641,936
RAC: 44
Message 78605 - Posted: 29 Aug 2015, 15:58:52 UTC

It is entirely conceivable that we may have possibly just identified the root cause of your issues - having both AVG and Avast as realtime protection on that one machine may have been causing the issues with your Rosetta workunits..

If this were a broader issue with Rosetta@Home itself, we'd find that the workunits that failed for you were failing for everyone - and they aren't. Thus it is a logical deduction to conclude that it is something different about how that particular system is setup.

As I mentioned, having multiple AV programs setup for realtime protection can cause issues. Just because it hasn't caused issues with anything else you've noticed in the years you've had this setup, doesn't mean it can't start causing issues with Rosetta.

Remember, although it runs under 'BOINC' along with other BOINC apps that may run without any issues, Rosetta (or any boinc app for that matter) is in fact an entirely independent process/application and antivirus applications look at processes on a case-by-case basis...

For example, say AVG locked the watchdog thread of R@H in order to scan it because it simply didn't understand why this process was consistently pinging the other PIDs on your system and well, that looks like of suspicious, and then Avast - which was okay with the excessive PID pings, starts noticed that the Watchdog thread had become locked by this other process called AVG, and was starting to "look funny" and decided to block any interrupts to that thread, well then suddenly AVG can't unlock the thread it locked and the whole thing gets stuck... entirely hypothetical overly-simplified example but its this sort of scenario that I'd imagine.

If you're going to remove one of the AV solutions, I'd vote to remove AVG and keep Avast.
ID: 78605 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1981
Credit: 38,424,259
RAC: 13,236
Message 78614 - Posted: 30 Aug 2015, 13:42:47 UTC - in response to Message 78602.  

Personally, I have Avast as my active antimalware solution, and I have Malware Bytes as a periodical system scanner - ie. I do not have realtime monitoring via Malware Bytes but use it to perform full system scans every few days.

I do the same with MalWareBytes
ID: 78614 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 652
Credit: 11,662,550
RAC: 1,276
Message 78621 - Posted: 30 Aug 2015, 20:02:18 UTC

I have uninstalled AVG on here, and re-enabled Rosetta. As mentioned, I only had both here as a convenience. The fact that this issue only affected Rosetta, and only in the last few months, unavoidably, leaves a certain level of doubt in my mind.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 78621 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,641,936
RAC: 44
Message 78633 - Posted: 31 Aug 2015, 13:16:22 UTC

Rosetta is having server issues at the moment. But when that's cleared up it will be interesting to see if this has any affect on the issue you were encountering.
ID: 78633 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 652
Credit: 11,662,550
RAC: 1,276
Message 78645 - Posted: 31 Aug 2015, 19:57:40 UTC

Indeed.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 78645 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 652
Credit: 11,662,550
RAC: 1,276
Message 78685 - Posted: 4 Sep 2015, 16:00:25 UTC
Last modified: 4 Sep 2015, 16:01:21 UTC

Sort of off topic, but I came close to removing Avast and putting AVG back up. I got fed up with the frequent spam messages Avast issues suggesting my protection is not complete and to activate it, when actually it is simply a marketing ploy to get people to pay for upgrades.

There is a quiet gaming mode you can switch too which prevents these from appearing, (so they say...). There is an active complaints thread about this behaviour on Avast's web site.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 78685 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Odd work unit.



©2024 University of Washington
https://www.bakerlab.org