Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 35 · 36 · 37 · 38 · 39 · 40 · 41 . . . 299 · Next

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2105
Credit: 40,926,259
RAC: 18,158
Message 93079 - Posted: 2 Apr 2020, 14:40:50 UTC - in response to Message 93077.  

see below, there are no 4.07 tasks left showing, there was 9000 yesterday only 400 today, the mini was taking around an hour but gives an idea. the 4.07 were averaging a 40 min runtime, with a rate of 1 credit for 11.5 secs of runtime on average. 3600/11.5 = 313

The last 4.12 is running at 1 credit for 59.95 seconds of runtime. 4.7* slower

https://boinc.bakerlab.org/rosetta/results.php?hostid=3800945&offset=340&show_names=0&state=4&appid=

I didn't look back that far earlier. What I notice now is that starting today, 2-Apr, the scoring for mini-Rosetta has plunged to 75/hr, down from 300/hr and 4.12 are 300/4hr - 75/hr too

It looks like something has happened to <all> scoring from today - a step change down - but consistent between the two on validation. Very odd.

Oh, you're not going to like this...
I've just checked my own PC to see how my dribble of tasks have performed on a mere FX8370
1 Apr - Mini & 4.12 tasks around 45/hr, 280-340/8hr task. Better than I usually get tbh
2 Apr - Mini only (4.12 not reported yet) 110-120/hr, 890-950/8hr task. Lol

Nothing I can say to that...
ID: 93079 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
entity

Send message
Joined: 8 May 18
Posts: 19
Credit: 5,744,699
RAC: 12,807
Message 93080 - Posted: 2 Apr 2020, 15:11:15 UTC - in response to Message 93072.  
Last modified: 2 Apr 2020, 15:13:23 UTC

Oh you're right.
I just looked at my task list.
Time per WU has jumped from 8 hours to 16 hours!
The cores are running cooler than the last version too, suggests a bottleneck.
Note 2, I just noticed that the most recent few are fast again.
Maybe there was just a run of WU for a harder problem.

This is a known problem in Rosetta that the developers have acknowledged but probably haven't fixed yet. They indicated that it would take a major rewrite of the code. L3 cache tends to become over utilized and the CPU waits for data to make the trip from main memory hence the CPU runs cooler (more waiting). There was a post by a developer in another project that suggested to limit the number of tasks run concurrently. They indicated that each task uses about 4MB of L3 cache. Concerning the run time, I noticed that the run parameters include something like cpu_seconds=57500. That is 16 hours. They are ignoring the Target CPU runtime setting
ID: 93080 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stephen "Heretic"

Send message
Joined: 2 Apr 20
Posts: 21
Credit: 11,028
RAC: 0
Message 93081 - Posted: 2 Apr 2020, 15:27:06 UTC - in response to Message 93040.  

Hello, I have just joined this project but it seems there is no work to do at the moment. Is this a common state of affairs or have I struck a bad moment to join??
Work being done has increased by 500% over the last 2 and a bit weeks, so there's not much work available as demand is far exceeding supply.
More work is meant to be coming, but apparently it takes quite a while to prepare it for release, so it will take a while before work production comes close to matching the present demand.


. . I'm guessing fellow refugees from S@H ... oh well, I'll just have to be patient ...

Stephen

:(
ID: 93081 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 93083 - Posted: 2 Apr 2020, 15:37:56 UTC

I've tried to summarize the new work unit runtimes in a new thread, please post concerns about "performance" of new v4.12, or estimated time to completion over there.
Rosetta Moderator: Mod.Sense
ID: 93083 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BetelgeuseFive

Send message
Joined: 10 Aug 10
Posts: 4
Credit: 1,421,973
RAC: 621
Message 93084 - Posted: 2 Apr 2020, 16:23:02 UTC

I'm having a problem with 4.12 on Linux (CentOS 7). Found out my computer was doing nothing while there were plenty of tasks "Ready to start".
First rebooted the system, but this did not change anything.
Enabled cpu_sched_debug in the event log and messages indicated it was trying to start v4.12 tasks, but nothing actually started.
Suspended the v4.12 tasks and other v4.08 tasks started immediately without any problems.

Any clues ?

Thanks,

Tom
ID: 93084 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 93086 - Posted: 2 Apr 2020, 16:42:05 UTC - in response to Message 93084.  
Last modified: 2 Apr 2020, 16:51:56 UTC

How much memory have you allowed BOINC to use, when active? when idle?
Rosetta Moderator: Mod.Sense
ID: 93086 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BetelgeuseFive

Send message
Joined: 10 Aug 10
Posts: 4
Credit: 1,421,973
RAC: 621
Message 93087 - Posted: 2 Apr 2020, 17:00:28 UTC - in response to Message 93086.  

How much memory have you allowed BOINC to use, when active? when idle?


System has 6 Gb configured (running inside VM).
Just checked settings, it has:

When in use, use at most 50%
When not in use, use at most 90%

Should have been plenty start at least one task.
ID: 93087 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 93090 - Posted: 2 Apr 2020, 17:12:16 UTC - in response to Message 93087.  

System has 6 Gb configured (running inside VM).
Just checked settings, it has:

When in use, use at most 50%
When not in use, use at most 90%

Should have been plenty start at least one task.

That means you have only 3 GB available. If you have "leave applications in memory" enabled, any suspended task will be taking up memory too.
It is a memory problem.
ID: 93090 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BetelgeuseFive

Send message
Joined: 10 Aug 10
Posts: 4
Credit: 1,421,973
RAC: 621
Message 93096 - Posted: 2 Apr 2020, 17:58:38 UTC - in response to Message 93090.  

System has 6 Gb configured (running inside VM).
Just checked settings, it has:

When in use, use at most 50%
When not in use, use at most 90%

Should have been plenty start at least one task.

That means you have only 3 GB available. If you have "leave applications in memory" enabled, any suspended task will be taking up memory too.
It is a memory problem.


There were no other (suspended) tasks active and it didn't want to start even a single new v4.12 task while the system had been running v4.08 tasks for several days without any problems. I changed memory settings so it can always use 90% and enabled the v4.12 tasks again so I will find out if it helps.
Did anything change in v4.12 that will cause tasks to not even start ?

Thanks for your feedback, it is appreciated.

Tom
ID: 93096 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
vowelmarauder

Send message
Joined: 22 Mar 20
Posts: 2
Credit: 2,114,237
RAC: 0
Message 93105 - Posted: 2 Apr 2020, 19:51:31 UTC - in response to Message 92985.  
Last modified: 2 Apr 2020, 20:30:31 UTC

I just noticed that my tasks are taking almost twice as long as the ETA says. The time is either standing still with 1-2 seconds either way or counting *up*... I don't think I've tinkered with any settings and boinc is using all its cores fully. Is this normal? What's going on?

https://i.imgur.com/3uwyfAU.jpg

Sure enough all the new tasks are running like this as well (~16 hours) and I saw others report the same?

they're all "conducting_fiber_XXXX_fold_and_dock_XXX"

As suggested above, is this a different batch and nothing to worry about?

edit: thank you for the explanation 🙏🏻
I will reply only here so others can see your post
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6893&postid=93107#93107
ID: 93105 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JoshuaScholar

Send message
Joined: 26 Mar 20
Posts: 18
Credit: 232,183
RAC: 0
Message 93106 - Posted: 2 Apr 2020, 20:03:30 UTC
Last modified: 2 Apr 2020, 20:06:51 UTC

Bitdefender thinks that rosetta_4.12_windows_intelx86.exe "exhibits ransomeware behavior"
I thinks that it encrypted
boinc_checkpoint_count.txt
boinc_init_count.txt
chk_S_00000023_ClassicAbinito_stage4_kk_1.rng.state.gz
[a bunch similar like it]
I'm guessing that rng means random number generator and that it reinitialized a bunch of random number files, the program detected the maximum entropy and assumed that the files are encrypted.

I can make that program an exception, but I don't know what's ruined because the damn program restored some of the files to their previous state.
ID: 93106 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 93107 - Posted: 2 Apr 2020, 20:14:44 UTC - in response to Message 93105.  

ID: 93107 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Admin
Project administrator

Send message
Joined: 1 Jul 05
Posts: 4805
Credit: 0
RAC: 0
Message 93109 - Posted: 2 Apr 2020, 21:13:49 UTC - in response to Message 93107.  

Sorry, we changed that back to 8 hours
ID: 93109 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1663
Credit: 17,328,604
RAC: 24,398
Message 93163 - Posted: 3 Apr 2020, 6:13:57 UTC - in response to Message 93059.  

Hi especially @Grant (SSSF)

Where I am wrong?
I need 2x more time to finish the tasks and 50% GFLOPS on similar i7-8700K CPU

Compare:
- https://boinc.bakerlab.org/rosetta/host_app_versions.php?hostid=3933928
- https://boinc.bakerlab.org/rosetta/host_app_versions.php?hostid=3914491

Thanks in advance.

Could be the Tasks in question?
On my system all Tasks are running to the Target time (other than the odd one that bails out early), and apart from a glitch with some Tasks a few days back that paid out bugger all Credit (and the few early exits), Credit has generally been inline with Runtime.
Grant
Darwin NT
ID: 93163 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JoshuaScholar

Send message
Joined: 26 Mar 20
Posts: 18
Credit: 232,183
RAC: 0
Message 93167 - Posted: 3 Apr 2020, 6:42:53 UTC - in response to Message 93106.  

What do I do to clean my system since the damn antivirus program "restored" some of rosetta's files to a previous state, assuming that Rosetta 4.12 is a ransomware program?

I tried aborting the WU's currently being calculated but one finished.
ID: 93167 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stephen "Heretic"

Send message
Joined: 2 Apr 20
Posts: 21
Credit: 11,028
RAC: 0
Message 93169 - Posted: 3 Apr 2020, 6:50:56 UTC
Last modified: 3 Apr 2020, 6:57:07 UTC

. . OK, I am totally new to this project. I started cautiously giving it one core of my i5-6400 with the other 3 cores idle as backup and support for E@H on the GPU. One task ran and was looking good, pretty much on target (8 hours) after 6 hours runtime with CPU utilisation remaining under 50% on all 4 cores. To try and improve CPU usage I increased it to 2 cores but it remained at one task running. I then increased commitment to 3 cores and it started a 2nd task, but soon crashed BOINC requiring me to go to task manager to kill all Rosetta functions and E@H before I could get BOINC to launch again. I reduced CPU commitment back to 1 core and left it running, but upon returning to this machine about 8 hours later it had crashed the boinc-client several times and despite trying to kill off still active app components I could not get BOINC to restart, so I had to reboot the machine. I suspended the idle Rosetta tasks but now the one running task has gone to 'waiting to run". This machine has 8GB RAM. If I cannot get Rosetta to play nice with E@H it may have to go.

. . I increased CPU commitment back to 2 cores and the stalled task has resumed, but I am now waiting for the other shoe to drop. Will it crash BOINC yet again?

Stephen

? ?
ID: 93169 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1663
Credit: 17,328,604
RAC: 24,398
Message 93171 - Posted: 3 Apr 2020, 7:03:45 UTC - in response to Message 93169.  
Last modified: 3 Apr 2020, 7:05:11 UTC

. . I increased CPU commitment back to 2 cores and the stalled task has resumed, but I am now waiting for the other shoe to drop. Will it crash BOINC yet again?
Settings that are working for me (keep in mind 6c/12t), 32GB of RAM.

Other
     Store at least	        1    days of work
     Store up to an additional  0.02 days of work

Disk
     Use no more than  12 GB
     Leave at least    2 GB free
     Use no more than  40% of total

Memory
     When computer is in use, use at most          95 %
     When computer is not in use, use at most      95 %
     Leave non-GPU tasks in memory while suspended (not selected)
     Page/swap file: use at most                   75 %


Running more than one project i'd suggest "Store at least x days of work" to be 0.5 or less.
Grant
Darwin NT
ID: 93171 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MarkJ

Send message
Joined: 28 Mar 20
Posts: 72
Credit: 25,238,680
RAC: 0
Message 93197 - Posted: 3 Apr 2020, 9:33:28 UTC - in response to Message 93106.  

Bitdefender thinks that rosetta_4.12_windows_intelx86.exe "exhibits ransomware behavior"

<snip>

I can make that program an exception, but I don't know what's ruined because the damn program restored some of the files to their previous state.

Set Rosetta to No New Tasks in BOINC. Make the BOINC folders and program an exception in Bitdefender. On the projects tab in BOINC reset the project and then set it to Allow New Tasks. What that will do is clean out the project folder and download the apps again. It will get rid of any running task (if you have any) .
BOINC blog
ID: 93197 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RT

Send message
Joined: 14 Mar 20
Posts: 6
Credit: 1,155,031
RAC: 0
Message 93198 - Posted: 3 Apr 2020, 9:50:55 UTC

For some reason since v4.12 was released, one of my machines has failed computation on every Rosetta task, my other machines seem to be fine at the moment but the following host:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3849302 seems to fail every task it gets after 2-3 seconds of computation.
ID: 93198 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JoshuaScholar

Send message
Joined: 26 Mar 20
Posts: 18
Credit: 232,183
RAC: 0
Message 93200 - Posted: 3 Apr 2020, 9:55:28 UTC - in response to Message 93197.  

Sadly exceptions for ransomware are by program, not by folder. It seems my choices are:
1) turn off ransomware protection altogether
or
2) except Rosetta_4.12_windows_intelx86.exe and know that I'm going to go through the same sh_tshow next time you update the client.
ID: 93200 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 35 · 36 · 37 · 38 · 39 · 40 · 41 . . . 299 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org