Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 45 · 46 · 47 · 48 · 49 · 50 · 51 . . . 309 · Next

AuthorMessage
ww

Send message
Joined: 17 Mar 20
Posts: 3
Credit: 455,936
RAC: 0
Message 94847 - Posted: 19 Apr 2020, 8:51:04 UTC
Last modified: 19 Apr 2020, 9:04:20 UTC

Maybe a memory leak

rb_04_16_21806_21365_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_05_08_918009_366

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1037241400

The first attempt (Windows 32-bit) failed at 12 hours of CPU time, RSS 354MB

I have the second attempt on Linux 64-bit. If it actually needs this much memory, 32-bit wouldn't have been able to run it at all. RSS was at 3.09 GB. It seems to be able to free some memory though.


(Or some swapped out. Don't post tired kids.)

RSS has been steadily climbing; it started at 1.8 GB. Now at 3.5 hours. Completion is on pace for 11.9 hour run-time. It appears to be check-pointing

.

Application
Rosetta 4.15 
Name
rb_04_16_21392_21290__t000__4_C1_SAVE_ALL_OUT_IGNORE_THE_REST_917949_249
State
Running
Received
Sat 18 Apr 2020 04:59:33 PM EDT
Report deadline
Tue 21 Apr 2020 04:59:33 PM EDT
Estimated computation size
80,000 GFLOPs
CPU time
03:38:02
CPU time since checkpoint
00:02:38
Elapsed time
03:38:38
Estimated time remaining
04:56:41
Fraction done
30.282%
Virtual memory size
3.09 GB
Working set size
2.89 GB
Directory
slots/5
Process ID
24116
Progress rate
8.280% per hour
Executable
rosetta_4.15_x86_64-pc-linux-gnu
ID: 94847 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 92
Credit: 16,330,873
RAC: 48,882
Message 94867 - Posted: 19 Apr 2020, 12:39:18 UTC - in response to Message 94754.  

Hello, I'm a newbie to Rosetta and got things set up and running ok. In the last two days I've noticed my laptop running this app in an odd manner. Instead of running at 100% CPU, it fluctuates between 33% and 100%,


If you have Seti@Home mostly idling I would go to the S@H website and disable the "intel igpu" check box.

Generally running any crunching task on that part of the Intel cpu chip slows the entire system down significantly.

This usually is true now, if/when Intel delivers on the planned upgrades to the iGPU it will then start behaving more like AMD's iGPU but not yet.

Tom M
Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel.....
ID: 94867 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Maslo55

Send message
Joined: 3 Mar 08
Posts: 1
Credit: 1,029,280
RAC: 0
Message 95417 - Posted: 27 Apr 2020, 10:15:53 UTC

I have some random crashes, once every few days, I find my crunching computer rebooted when I return to it. I also run Folding@home which I thought was responsible, but in Windows Event viewer the faulting application seems to be rosetta:

Faulting application name: rosetta_4.15_windows_x86_64.exe, version: 0.0.0.0, time stamp: 0x5e856ed2
Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000
Exception code: 0xc0000005
Fault offset: 0x0000000000000000
Faulting process id: 0x2f68
Faulting application start time: 0x01d61c558f1205cc
Faulting application path: C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettarosetta_4.15_windows_x86_64.exe
Faulting module path: unknown
Report Id: e3f4316a-3112-476b-9f13-f2fcc13a42e3
Faulting package full name:
Faulting package-relative application ID:

I have Ryzen 3600 with slightly overclocked RAM, would probably try default, or increasing voltage. All testing programs show no errors. I get some computation errors, but very infrequently.

Rosetta seems to be a better RAM tester than Memtest for me.
ID: 95417 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 95418 - Posted: 27 Apr 2020, 10:19:30 UTC - in response to Message 95417.  

I have Ryzen 3600 with slightly overclocked RAM, would probably try default, or increasing voltage. All testing programs show no errors. I get some computation errors, but very infrequently.
Or better yet default clocks & voltage to see if that sorts out the problem.
Grant
Darwin NT
ID: 95418 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 95423 - Posted: 27 Apr 2020, 14:39:39 UTC - in response to Message 95417.  

I have some random crashes, once every few days, I find my crunching computer rebooted when I return to it. I also run Folding@home which I thought was responsible, but in Windows Event viewer the faulting application seems to be rosetta:

Faulting application name: rosetta_4.15_windows_x86_64.exe, version: 0.0.0.0, time stamp: 0x5e856ed2
Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000
Exception code: 0xc0000005.

[snip]

I'm running Rosetta, some other BOINC projects, and Folding@home on my computer at the same time. Only BOINC currently get the GPU, since I'm trying to avoid changes in how loud the computer's fan is, and Folding@home doesn't always have GPU WUs available.

This often causes crashes of my browser, but not also of Windows.

It tends to make Rosetta tasks take about twice as much clock time to finish, though.

I'm still trying to find out how many virtual CPU cores Folding@home uses at the same time, and how to control this - it appears that the slowdown is due to more background tasks trying to grab CPU time than there are virtual CPU cores to provide such CPU time.

There needs to be a discussion somewhere of how to make BOINC and Folding@home share a computer; I haven't find one at Folding@home.

Changing the Folding@home power setting to light helped reduce the crashes, but has not done much to the slowdown problem.
ID: 95423 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Millenium

Send message
Joined: 20 Sep 05
Posts: 68
Credit: 184,283
RAC: 0
Message 95430 - Posted: 27 Apr 2020, 16:58:05 UTC

Are you running together CPU tasks on BOINC and Folding? If yes then it's nonsense. You just slow them all as they use more memory (and memory bandwidth, and CPU context changes and whatever) without need. Either use the CPU for Folding or for BOINC. There is no way to fix that, you can't just add threads that require CPU usage and expect them all to not be inefficient.

GPU is a different thing of course.
ID: 95430 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 95439 - Posted: 27 Apr 2020, 19:02:51 UTC - in response to Message 95430.  

Are you running together CPU tasks on BOINC and Folding? If yes then it's nonsense. You just slow them all as they use more memory (and memory bandwidth, and CPU context changes and whatever) without need. Either use the CPU for Folding or for BOINC. There is no way to fix that, you can't just add threads that require CPU usage and expect them all to not be inefficient.

GPU is a different thing of course.

The Folding@home method for finishing the current WU and then stopping doesn't work, and I don't want to let a Folding@home workunit time out instead of finishing.

I've already found a way to limit the number of threads BOINC is using. If I can find a similar method for Folding@home, I should be able to stop their contention for virtual cores, but let both continue to run CPU work.
ID: 95439 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 95443 - Posted: 27 Apr 2020, 20:32:00 UTC - in response to Message 95417.  

Maslo55,

Error Code 0xc0000005 under Window 7 or 10 indicates that a program failed to start.

You should check if the total amount of memory in use is approaching the total amount of memory that your computer has.

If so, the problem is not specific to Rosetta at home, but a problem with trying to run too many memory-demanding programs at once.

You can either add more memory to your computer, or reduce the amount of work your computer is trying to do at once.
ID: 95443 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
strongboes

Send message
Joined: 3 Mar 20
Posts: 27
Credit: 5,394,270
RAC: 0
Message 95445 - Posted: 27 Apr 2020, 20:50:49 UTC - in response to Message 95443.  

Folding@ you can set the number of cores in the cpu slot, change from - 1 to value you want.

I have found there is no optimal for running both simultaneously.
ID: 95445 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brummit

Send message
Joined: 14 Jul 14
Posts: 2
Credit: 30,582
RAC: 0
Message 95454 - Posted: 28 Apr 2020, 1:01:06 UTC

Is there any way you could set up an option to download smaller work units?

My average stats for Rosetta are - 159 completed, and 78 failed. Optimistically that's 1/3 of download work deemed invalid due to running out of time, requiring someone else has to (re) process the data,
and pessimistically, just under half the data fails the deadline. I run the PC 12-15 hours per day average.

A waste of processing time for all.

My PC, though not the latest super duper 1000 core gamer extravaganza, is custom built two years ago, and still pretty good.

Thankyou
'Brummit'.
ID: 95454 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 95455 - Posted: 28 Apr 2020, 1:23:51 UTC - in response to Message 95454.  

Brummit,

Under the advanced interface, Your account, Rosetta@home preferences, you can try reducing the Target CPU run time by about a third of its current value. But note that there's a minimum value you're not allowed to go below.

This should give you workunits that run for shorter times, but need about the same amount of memory.

Does this fit your idea of smaller work units?
ID: 95455 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,518,559
RAC: 10,612
Message 95457 - Posted: 28 Apr 2020, 1:53:24 UTC - in response to Message 95454.  

Is there any way you could set up an option to download smaller work units?

My average stats for Rosetta are - 159 completed, and 78 failed. Optimistically that's 1/3 of download work deemed invalid due to running out of time, requiring someone else has to (re) process the data,
and pessimistically, just under half the data fails the deadline. I run the PC 12-15 hours per day average.

A waste of processing time for all.

My PC, though not the latest super duper 1000 core gamer extravaganza, is custom built two years ago, and still pretty good.

Have you only recently returned to this project? It looks like you've had a few days off after receiving tasks and had to abort some.
If you're online 12-15hrs/day you should be able to complete 8hr tasks ok when they have a 3-day deadline. Try to let them run and complete and it should improve the more tasks you complete and return.
It should settle down after a few days. Give it another try.
ID: 95457 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 95462 - Posted: 28 Apr 2020, 5:24:30 UTC - in response to Message 95454.  
Last modified: 28 Apr 2020, 5:40:08 UTC

Is there any way you could set up an option to download smaller work units?
Just set a smaller cache- reducing the Target CPU Run time (at this point) there's the high risk that you'll just end up with even more Tasks downloading, missing the deadlines & then erroring out than is already happening.
On your account page, Preferences, When and how BOINC uses your computer, Computing preferences
Other	
           Store at least 0.6 days of work
Store up to an additional 0.02 days of work
Works for me.
It takes an extremely long time for the Estimated completion times to get reasonably close to the actual time (Target CPU Run time).


And even so, it will take a while for BOINC to determine how many hours a day your computer is on, and how much time it is able to process work while it is on (the default settings can mean just browsing sites with heavy graphics/scripts will stop BONC from processing work).
Grant
Darwin NT
ID: 95462 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile GoldenHat

Send message
Joined: 14 Apr 20
Posts: 3
Credit: 122,663
RAC: 0
Message 95463 - Posted: 28 Apr 2020, 6:48:23 UTC - in response to Message 94755.  

I'm running Windows 10 64bit. I haven't checked system monitor no but I will. Since this post it seems to have disappeared. I rebooted the PC and it's been fine. I notice sometimes it does it for a short period of time but settles again. I'm not a big techie so I'm not going to spend ages faultfinding or trying to understand how it works. It's running so I'll leave it.

Thanks very much for your desire to assist, I do appreciate you taking the time to reply. I'll keep my eye on the monmitor if it goes funky again.

Richard.
ID: 95463 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael E.@ team Carl Sagan

Send message
Joined: 5 Apr 08
Posts: 16
Credit: 1,947,553
RAC: 210
Message 95513 - Posted: 28 Apr 2020, 23:18:56 UTC

This post asks about fixing the estimated Remaining time on long Rosetta tasks. I tend to be pretty direct so here goes...

I was using long 36 hours Rosetta tasks and cut it down to 24 hours, but still have the same issue. This project-specific preference is set under the web interface: Your account > Rosetta@home preferences > Target CPU run time.

On the computer under Advanced View > Options > Computing Preferences, I set my Store at least to 1 days of work, but I still get jobs that do not complete and have to be aborted.

With 24+ hour tasks, the estimated Remaining time says about 6 hours until about 6-7 hours elapsed time, when the more accurate time gets calculated, such as 17 hours left.

Questions/strong suggestion:
+ Could the estimated Remaining time for such 24+ hour tasks be doubled to prevent the need to abort so many tasks?
+ Could there be a limit on the number of downloaded tasks (maybe just long tasks) at a time to 2?

Could the option for long tasks > 10 or 12 hours be disabled until the estimated Remaining time can be fixed? I do not think it is a good practice for people to abort tasks.

Does it matter to the Rosetta@home research if we use 8-10 hour tasks rather than 24+ hour tasks?

Mike
ID: 95513 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
CIA

Send message
Joined: 3 May 07
Posts: 100
Credit: 21,059,812
RAC: 0
Message 95515 - Posted: 29 Apr 2020, 0:09:43 UTC - in response to Message 95513.  


Does it matter to the Rosetta@home research if we use 8-10 hour tasks rather than 24+ hour tasks?


Honestly if you are running 24 hour tasks why even have a cache? Are you running another project besides Rosetta? As long as you have a good internet connection the downtime from finishing a task and getting a fresh one is next to nothing and you are only hitting the server once per day for a new task (vs 3 times a day for 8 hour tasks). All my machines that are set to 24 hour tasks run 0 cache without issue.
ID: 95515 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brummit

Send message
Joined: 14 Jul 14
Posts: 2
Credit: 30,582
RAC: 0
Message 95517 - Posted: 29 Apr 2020, 1:14:49 UTC - in response to Message 95457.  

Shall do. Step 1 - complete the tasks I have now. Then download more if successful.
Thanks.
ID: 95517 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael E.@ team Carl Sagan

Send message
Joined: 5 Apr 08
Posts: 16
Credit: 1,947,553
RAC: 210
Message 95519 - Posted: 29 Apr 2020, 1:50:02 UTC - in response to Message 95515.  

My original question: Does it matter to the Rosetta@home research if we use 8-10 hour tasks rather than 24+ hour tasks?


Honestly if you are running 24 hour tasks why even have a cache? Are you running another project besides Rosetta? As long as you have a good internet connection the downtime from finishing a task and getting a fresh one is next to nothing and you are only hitting the server once per day for a new task (vs 3 times a day for 8 hour tasks). All my machines that are set to 24 hour tasks run 0 cache without issue.


So you are telling me that the same type of tasks are sent regardless of task length? That is, they get split up so there can be smaller tasks?

I want to understand the needs of the researchers. For example, if longer tasks do different types of calculations than small tasks and few people process them, I can do the long tasks.
ID: 95519 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 95523 - Posted: 29 Apr 2020, 2:55:31 UTC - in response to Message 95519.  
Last modified: 29 Apr 2020, 2:57:21 UTC

[snip]
So you are telling me that the same type of tasks are sent regardless of task length? That is, they get split up so there can be smaller tasks?

I want to understand the needs of the researchers. For example, if longer tasks do different types of calculations than small tasks and few people process them, I can do the long tasks.

The tasks are sent out as batches of calculations, sometimes with one starting point and sometimes with a list of starting points. This part is the same between short and long tasks. There are often 100 steps per batch.

A target time set by the user is sent along with them. This controls how many steps of the batch are calculated.

There has been no clear statement on how two tasks from the same workunit are compared if they haven't done an equal number of steps.

Sometimes, the quality of the last step can be calculated rapidly; if so, this calculation is often used in place of an additional task per workunit to allow comparison.

There has been talk, but not yet action, about a new class of workunits that can use up to 4 gigabytes of memory each, rather than the usual up to 2 gigabytes. This is intended to allow work on larger proteins, which will probably also require larger target times,
ID: 95523 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 95524 - Posted: 29 Apr 2020, 3:07:34 UTC - in response to Message 95519.  

I want to understand the needs of the researchers. For example, if longer tasks do different types of calculations than small tasks and few people process them, I can do the long tasks.


The long tasks do the same calculations as the short tasks. They just do more of them. Check the number of "decoys" in your completed results. What the researchers need is thousands of completed decoys. Long tasks might complete 100 of them, short tasks might complete 20 of them. Combine a machine running long ones with a machine running short ones and you get 120 completed decoys.

...and once you successfully complete and report about a dozen tasks of the same runtime preference, BOINC Manager will have a much better guess on the runtime to expect for future tasks. Once the estimated runtime of an unstarted task approaches your current runtime preference, you will stop getting more work than you can complete before the deadline (assuming your cache size is less than the 3 day deadlines). To help things in the meantime, a smaller cache size helps avoid getting more work than you can complete within the 3 day deadline.
Rosetta Moderator: Mod.Sense
ID: 95524 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 45 · 46 · 47 · 48 · 49 · 50 · 51 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org