Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 33 · 34 · 35 · 36 · 37 · 38 · 39 . . . 276 · Next

AuthorMessage
rlpm

Send message
Joined: 23 Mar 20
Posts: 13
Credit: 84
RAC: 0
Message 92938 - Posted: 1 Apr 2020, 15:48:39 UTC - in response to Message 92931.  

Signal 11 is SEGV (segmentation fault). This is typically due to a programming bug. Per stderr, looks like a few double frees as well, perhaps related. Anyone know how to report this to the boffins that write the software? Moderators?
ID: 92938 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
GLadi

Send message
Joined: 21 Jan 07
Posts: 3
Credit: 303,172
RAC: 0
Message 92941 - Posted: 1 Apr 2020, 15:52:23 UTC - in response to Message 92871.  

-3 tasks in progress about 50% progress each
-suspend project, turn off computer
-next day turn it on, run boinc manager, resume project
-1 task magically is 100% and uploading, other 2 running normal

Is this jump from 50% to 100% normal ????

It happened to me few days ago. For some WUs progress changed from let's say 50% to 100% immediately after resuming (even when BOINC Manager was switching between tasks from other projects). I haven't noticed it again.

BTW I cannot see some of my tasks between 24-28 Mar 2020 in my account.
ID: 92941 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
koetjesreep

Send message
Joined: 24 Mar 20
Posts: 5
Credit: 495,994
RAC: 0
Message 92942 - Posted: 1 Apr 2020, 15:54:44 UTC - in response to Message 92938.  
Last modified: 1 Apr 2020, 15:56:17 UTC

Signal 11 is SEGV (segmentation fault). This is typically due to a programming bug. Per stderr, looks like a few double frees as well, perhaps related. Anyone know how to report this to the boffins that write the software? Moderators?

https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6893&postid=92847#92847 says this thread is for problem reports, hence I posted it here :-)
ID: 92942 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rlpm

Send message
Joined: 23 Mar 20
Posts: 13
Credit: 84
RAC: 0
Message 92943 - Posted: 1 Apr 2020, 16:02:42 UTC - in response to Message 92942.  

Yep, good call. I'm also wondering if anyone on this thread has access to the code or know anyone who does and can hunt down this bug.
ID: 92943 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Myers
Avatar

Send message
Joined: 29 Mar 20
Posts: 95
Credit: 289,903
RAC: 0
Message 92945 - Posted: 1 Apr 2020, 16:08:58 UTC - in response to Message 92943.  

Yep, good call. I'm also wondering if anyone on this thread has access to the code or know anyone who does and can hunt down this bug.

This thread seems to have the best explanation of the errors.
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13658
Seems they are not parsing the cpu features correctly and attempting to run instructions that the cpu does not support.
ID: 92945 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
vowelmarauder

Send message
Joined: 22 Mar 20
Posts: 2
Credit: 2,114,237
RAC: 0
Message 92985 - Posted: 1 Apr 2020, 22:20:51 UTC

I just noticed that my tasks are taking almost twice as long as the ETA says. The time is either standing still with 1-2 seconds either way or counting *up*... I don't think I've tinkered with any settings and boinc is using all its cores fully. Is this normal? What's going on?

https://i.imgur.com/3uwyfAU.jpg
ID: 92985 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 92986 - Posted: 1 Apr 2020, 22:30:11 UTC - in response to Message 92985.  
Last modified: 1 Apr 2020, 22:31:29 UTC

We'll have to see one of 'em report back in to see for sure, but it sounds like you may have changed the Preference for the workunit runtime from the 8 hour default up to 12 or 24 hours. The watchdog will keep an eye on them for you if they run too long. I suggest letting them run to completion.
Rosetta Moderator: Mod.Sense
ID: 92986 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
aad

Send message
Joined: 5 Jan 06
Posts: 9
Credit: 193,297,166
RAC: 19,613
Message 92987 - Posted: 1 Apr 2020, 22:33:16 UTC - in response to Message 92917.  
Last modified: 1 Apr 2020, 22:33:40 UTC

So what's up with the credits for the task? I read it here somewhere here I think but can't find it.

Latest tasks only credit for 6-8 points. Older tasks were 100-300....

Is this something being looked into? Any comments on this? Wondering when this will be solved...

TIA

I see the same with my machines.
It's only with the COVID wu's
Maybe it's a virus ;-))

I still running them though.....
ID: 92987 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,834,938
RAC: 1,233
Message 92994 - Posted: 1 Apr 2020, 23:20:43 UTC - in response to Message 92985.  
Last modified: 1 Apr 2020, 23:26:20 UTC

I just noticed that my tasks are taking almost twice as long as the ETA says. The time is either standing still with 1-2 seconds either way or counting *up*... I don't think I've tinkered with any settings and boinc is using all its cores fully. Is this normal? What's going on?

https://i.imgur.com/3uwyfAU.jpg

You are a rather new user here. I've noticed that for each new version of any of the applications, about the first ten tasks on a computer using that version is likely to give a large mismatch between the expected time the task will run, and the time it actually runs. If all of your computers were connected since the last version change of each application, all of the versions in use are either new to your computers or recently have been.

If the actual time is much larger than the initial expected time, it is normal for the expected time to completion to be going up instead of down.
ID: 92994 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1480
Credit: 14,534,479
RAC: 12,518
Message 93022 - Posted: 2 Apr 2020, 5:57:44 UTC - in response to Message 92986.  
Last modified: 2 Apr 2020, 5:59:37 UTC

We'll have to see one of 'em report back in to see for sure, but it sounds like you may have changed the Preference for the workunit runtime from the 8 hour default up to 12 or 24 hours. The watchdog will keep an eye on them for you if they run too long. I suggest letting them run to completion.
I've got the same thing occurring with my present Rosetta Mini v3.78 Tasks.
I checked my preferences, and "Target CPU run time" is still "not selected." The current group of Tasks have been going for 12hr 20min with 3hr 45min estimated time to completion.


Has the project's default "Target CPU run time" been changed with the newly released applications? (although the very few Rosetta v4.12 windows_x86_64 & Rosetta v4.12 windows_intelx86 processed Tasks i managed to pick up ran for the desired 8hrs. This seems to be affecting just the Rosetta Mini 3.78 applications; previous work i did with these applications ran to Target time OK).
Grant
Darwin NT
ID: 93022 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
strongboes

Send message
Joined: 3 Mar 20
Posts: 27
Credit: 5,394,270
RAC: 0
Message 93025 - Posted: 2 Apr 2020, 6:31:15 UTC
Last modified: 2 Apr 2020, 6:34:23 UTC

I've run a few 4.12 tasks overnight (around 80), first impressions are that it runs each decoy approx 4* as long, my preference is set for an hour, and my threadripper is quite quick, but it's taking between 3-4 hours to run one decoy if it starts with rb, under 4.07 I would hit 1 decoy in under an hour 99% of the time. The design task has run the same speed but only given 2 points credit per decoy.

4.12 is not looking productive from my end as an end user. It would take my slower processors in laptops etc nearly 8 hours or more for 1 decoy going by results by far.

I've just been sent a further 60 tasks with rb prefix so ill see how they run. these are a different batch.
ID: 93025 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
pritpalb

Send message
Joined: 21 Mar 20
Posts: 2
Credit: 767,576
RAC: 0
Message 93026 - Posted: 2 Apr 2020, 7:01:38 UTC

https://imgur.com/a/gr9UJlr

I am getting continuous "Scheduler request failed:HTTP gateway timeout" errors for the last week. This is on my Windows10 machine, while funnily enough my home imac is happily crunching and reporting tasks.
I have tried clicking 'update' under the projects tab, 'reset' and i have even 'remove' the project and BOINC and reinstalled but the error persists.
Also I am seeing more 'computation error' under the tasks tab.

I know the scheduler is getting hammered but that doesnt explain why my other machine is working well through all this.

Normal web browsing, reset project and downloading tasks works fine on the Windows 10 machine, but it just errors when reporting or requesting new tasks.

Any ideas?
ID: 93026 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1480
Credit: 14,534,479
RAC: 12,518
Message 93028 - Posted: 2 Apr 2020, 7:03:52 UTC - in response to Message 93025.  
Last modified: 2 Apr 2020, 7:04:21 UTC

I've just been sent a further 60 tasks with rb prefix so ill see how they run. these are a different batch.
No new work here for over 12hrs now.

My Rosetta Mini Tasks are on target for running twice a long as the Target time (16hrs instead of the default 8hrs).
Grant
Darwin NT
ID: 93028 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1480
Credit: 14,534,479
RAC: 12,518
Message 93029 - Posted: 2 Apr 2020, 7:07:04 UTC - in response to Message 93026.  

https://imgur.com/a/gr9UJlr

I am getting continuous "Scheduler request failed:HTTP gateway timeout" errors for the last week. This is on my Windows10 machine, while funnily enough my home imac is happily crunching and reporting tasks.
I have tried clicking 'update' under the projects tab, 'reset' and i have even 'remove' the project and BOINC and reinstalled but the error persists.
Also I am seeing more 'computation error' under the tasks tab.

I know the scheduler is getting hammered but that doesnt explain why my other machine is working well through all this.

Normal web browsing, reset project and downloading tasks works fine on the Windows 10 machine, but it just errors when reporting or requesting new tasks.

Any ideas?
Are you running any 3rd party AV/ Anti-malware software? It wouldn't be the first time such a programme has taken exception to BOINC and the programmes that make use of it.
Grant
Darwin NT
ID: 93029 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
pritpalb

Send message
Joined: 21 Mar 20
Posts: 2
Credit: 767,576
RAC: 0
Message 93031 - Posted: 2 Apr 2020, 7:27:53 UTC - in response to Message 93029.  

Are you running any 3rd party AV/ Anti-malware software? It wouldn't be the first time such a programme has taken exception to BOINC and the programmes that make use of it.

Good thought. It would be unusual to allow some communication on port 80 but block reporting and requesting of new tasks?

I am running webroot but it didnt cause a problem 2 weeks ago when I first started working on Rosetta@home. This timeout only started 1 week ago. I managed about 20 000 credit before the PC stopped reporting.
The AV software is centrally managed by sysadmin so I cant place an exclusion on BOINC.

¯_(ツ)_/¯
ID: 93031 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stephen "Heretic"

Send message
Joined: 2 Apr 20
Posts: 21
Credit: 11,028
RAC: 0
Message 93036 - Posted: 2 Apr 2020, 8:46:45 UTC

Hello, I have just joined this project but it seems there is no work to do at the moment. Is this a common state of affairs or have I struck a bad moment to join?

Stephen

?
ID: 93036 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1981
Credit: 38,376,162
RAC: 11,035
Message 93037 - Posted: 2 Apr 2020, 8:53:45 UTC - in response to Message 93025.  

I've run a few 4.12 tasks overnight (around 80), first impressions are that it runs each decoy approx 4* as long, my preference is set for an hour, and my threadripper is quite quick, but it's taking between 3-4 hours to run one decoy if it starts with rb, under 4.07 I would hit 1 decoy in under an hour 99% of the time. The design task has run the same speed but only given 2 points credit per decoy.

4.12 is not looking productive from my end as an end user. It would take my slower processors in laptops etc nearly 8 hours or more for 1 decoy going by results by far.

I've just been sent a further 60 tasks with rb prefix so ill see how they run. these are a different batch.

You've just run 80 tasks overnight and received 60 more
You have 2500 tasks available to run on your threadripper which has 64-cores and 132Gb RAM running 1hr tasks

What is it about 4.12 that isn't looking productive from your pov?
Because after several days with very few tasks available at all I'd kill for any of the problems you're currently having
Quite astonishing.
ID: 93037 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
strongboes

Send message
Joined: 3 Mar 20
Posts: 27
Credit: 5,394,270
RAC: 0
Message 93039 - Posted: 2 Apr 2020, 9:08:12 UTC - in response to Message 93037.  

I've run a few 4.12 tasks overnight (around 80), first impressions are that it runs each decoy approx 4* as long, my preference is set for an hour, and my threadripper is quite quick, but it's taking between 3-4 hours to run one decoy if it starts with rb, under 4.07 I would hit 1 decoy in under an hour 99% of the time. The design task has run the same speed but only given 2 points credit per decoy.

4.12 is not looking productive from my end as an end user. It would take my slower processors in laptops etc nearly 8 hours or more for 1 decoy going by results by far.

I've just been sent a further 60 tasks with rb prefix so ill see how they run. these are a different batch.

You've just run 80 tasks overnight and received 60 more
You have 2500 tasks available to run on your threadripper which has 64-cores and 132Gb RAM running 1hr tasks

What is it about 4.12 that isn't looking productive from your pov?
Because after several days with very few tasks available at all I'd kill for any of the problems you're currently having
Quite astonishing.



The tasks in progress is incorrect, I reset the project twice this week due to multiple downloads failing so they arent really there as discussed in a different thread.

I'm saying it doesn't look productive because the decoys are taking approximately 4 to 6 times longer to process. If you watch the graphics, it gets to a certain number of steps and then almost stops, taking 30-60 minutes for each additional step.

Half last night before I went to bed stopped at step 24600, then took 30 mins to do step 24601 etc.

So that's what I mean, it is taking 4-6 times longer to process the same work, so it appears.

The latest batch which are rb 04 01 20235 19963 ab t000 robetta cstwt... Are currently on 2 hours 49, 56% on first decoy. Looks like 5hrs to run. 4.07 was running very similar tasks under an hour.
ID: 93039 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1480
Credit: 14,534,479
RAC: 12,518
Message 93040 - Posted: 2 Apr 2020, 9:19:30 UTC - in response to Message 93036.  

Hello, I have just joined this project but it seems there is no work to do at the moment. Is this a common state of affairs or have I struck a bad moment to join??
Work being done has increased by 500% over the last 2 and a bit weeks, so there's not much work available as demand is far exceeding supply.
More work is meant to be coming, but apparently it takes quite a while to prepare it for release, so it will take a while before work production comes close to matching the present demand.
Grant
Darwin NT
ID: 93040 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
strongboes

Send message
Joined: 3 Mar 20
Posts: 27
Credit: 5,394,270
RAC: 0
Message 93041 - Posted: 2 Apr 2020, 9:23:09 UTC
Last modified: 2 Apr 2020, 9:26:23 UTC

I can give a little further info also, my cpu is currently 99% utilised. boinc is running 60 cores, 2 are running gpus for folding, 2 spare for overhead. normally when boinc is running with all cores running the clock speed is approx 3.2ghz, and it will pull as many watts as i let it (doubling the power with an overclock only get me to 3.55ghz) , at the moment it's pulling 15% less power, and the clock speed is up at 4.2ghz for all cores. If each core was being run hard it would be impossible for it to run this speed. This is the speed it normally runs with say 3 or 4 cores loaded.

Imo 4.12 is not making use of the cpu properly, it's taking 4-6 times longer to complete a decoy which ties in with the fact my cpu is running a very high clock speed which indicates the cores are doing very little work.
ID: 93041 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 33 · 34 · 35 · 36 · 37 · 38 · 39 . . . 276 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org