Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 45 · 46 · 47 · 48 · 49 · 50 · 51 . . . 309 · Next
Author | Message |
---|---|
ww Send message Joined: 17 Mar 20 Posts: 3 Credit: 455,936 RAC: 0 |
Maybe a memory leak rb_04_16_21806_21365_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_05_08_918009_366 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1037241400 The first attempt (Windows 32-bit) failed at 12 hours of CPU time, RSS 354MB I have the second attempt on Linux 64-bit. If it actually needs this much memory, 32-bit wouldn't have been able to run it at all. RSS was at 3.09 GB. (Or some swapped out. Don't post tired kids.) RSS has been steadily climbing; it started at 1.8 GB. Now at 3.5 hours. Completion is on pace for 11.9 hour run-time. It appears to be check-pointing . Application Rosetta 4.15 Name rb_04_16_21392_21290__t000__4_C1_SAVE_ALL_OUT_IGNORE_THE_REST_917949_249 State Running Received Sat 18 Apr 2020 04:59:33 PM EDT Report deadline Tue 21 Apr 2020 04:59:33 PM EDT Estimated computation size 80,000 GFLOPs CPU time 03:38:02 CPU time since checkpoint 00:02:38 Elapsed time 03:38:38 Estimated time remaining 04:56:41 Fraction done 30.282% Virtual memory size 3.09 GB Working set size 2.89 GB Directory slots/5 Process ID 24116 Progress rate 8.280% per hour Executable rosetta_4.15_x86_64-pc-linux-gnu |
Tom M Send message Joined: 20 Jun 17 Posts: 92 Credit: 16,330,873 RAC: 48,882 |
Hello, I'm a newbie to Rosetta and got things set up and running ok. In the last two days I've noticed my laptop running this app in an odd manner. Instead of running at 100% CPU, it fluctuates between 33% and 100%, If you have Seti@Home mostly idling I would go to the S@H website and disable the "intel igpu" check box. Generally running any crunching task on that part of the Intel cpu chip slows the entire system down significantly. This usually is true now, if/when Intel delivers on the planned upgrades to the iGPU it will then start behaving more like AMD's iGPU but not yet. Tom M Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel..... |
Maslo55 Send message Joined: 3 Mar 08 Posts: 1 Credit: 1,029,280 RAC: 0 |
I have some random crashes, once every few days, I find my crunching computer rebooted when I return to it. I also run Folding@home which I thought was responsible, but in Windows Event viewer the faulting application seems to be rosetta: Faulting application name: rosetta_4.15_windows_x86_64.exe, version: 0.0.0.0, time stamp: 0x5e856ed2 Faulting module name: unknown, version: 0.0.0.0, time stamp: 0x00000000 Exception code: 0xc0000005 Fault offset: 0x0000000000000000 Faulting process id: 0x2f68 Faulting application start time: 0x01d61c558f1205cc Faulting application path: C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettarosetta_4.15_windows_x86_64.exe Faulting module path: unknown Report Id: e3f4316a-3112-476b-9f13-f2fcc13a42e3 Faulting package full name: Faulting package-relative application ID: I have Ryzen 3600 with slightly overclocked RAM, would probably try default, or increasing voltage. All testing programs show no errors. I get some computation errors, but very infrequently. Rosetta seems to be a better RAM tester than Memtest for me. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,382,444 RAC: 19,446 |
I have Ryzen 3600 with slightly overclocked RAM, would probably try default, or increasing voltage. All testing programs show no errors. I get some computation errors, but very infrequently.Or better yet default clocks & voltage to see if that sorts out the problem. Grant Darwin NT |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
I have some random crashes, once every few days, I find my crunching computer rebooted when I return to it. I also run Folding@home which I thought was responsible, but in Windows Event viewer the faulting application seems to be rosetta: [snip] I'm running Rosetta, some other BOINC projects, and Folding@home on my computer at the same time. Only BOINC currently get the GPU, since I'm trying to avoid changes in how loud the computer's fan is, and Folding@home doesn't always have GPU WUs available. This often causes crashes of my browser, but not also of Windows. It tends to make Rosetta tasks take about twice as much clock time to finish, though. I'm still trying to find out how many virtual CPU cores Folding@home uses at the same time, and how to control this - it appears that the slowdown is due to more background tasks trying to grab CPU time than there are virtual CPU cores to provide such CPU time. There needs to be a discussion somewhere of how to make BOINC and Folding@home share a computer; I haven't find one at Folding@home. Changing the Folding@home power setting to light helped reduce the crashes, but has not done much to the slowdown problem. |
Millenium Send message Joined: 20 Sep 05 Posts: 68 Credit: 184,283 RAC: 0 |
Are you running together CPU tasks on BOINC and Folding? If yes then it's nonsense. You just slow them all as they use more memory (and memory bandwidth, and CPU context changes and whatever) without need. Either use the CPU for Folding or for BOINC. There is no way to fix that, you can't just add threads that require CPU usage and expect them all to not be inefficient. GPU is a different thing of course. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
Are you running together CPU tasks on BOINC and Folding? If yes then it's nonsense. You just slow them all as they use more memory (and memory bandwidth, and CPU context changes and whatever) without need. Either use the CPU for Folding or for BOINC. There is no way to fix that, you can't just add threads that require CPU usage and expect them all to not be inefficient. The Folding@home method for finishing the current WU and then stopping doesn't work, and I don't want to let a Folding@home workunit time out instead of finishing. I've already found a way to limit the number of threads BOINC is using. If I can find a similar method for Folding@home, I should be able to stop their contention for virtual cores, but let both continue to run CPU work. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
Maslo55, Error Code 0xc0000005 under Window 7 or 10 indicates that a program failed to start. You should check if the total amount of memory in use is approaching the total amount of memory that your computer has. If so, the problem is not specific to Rosetta at home, but a problem with trying to run too many memory-demanding programs at once. You can either add more memory to your computer, or reduce the amount of work your computer is trying to do at once. |
strongboes Send message Joined: 3 Mar 20 Posts: 27 Credit: 5,394,270 RAC: 0 |
Folding@ you can set the number of cores in the cpu slot, change from - 1 to value you want. I have found there is no optimal for running both simultaneously. |
Brummit Send message Joined: 14 Jul 14 Posts: 2 Credit: 30,582 RAC: 0 |
Is there any way you could set up an option to download smaller work units? My average stats for Rosetta are - 159 completed, and 78 failed. Optimistically that's 1/3 of download work deemed invalid due to running out of time, requiring someone else has to (re) process the data, and pessimistically, just under half the data fails the deadline. I run the PC 12-15 hours per day average. A waste of processing time for all. My PC, though not the latest super duper 1000 core gamer extravaganza, is custom built two years ago, and still pretty good. Thankyou 'Brummit'. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
Brummit, Under the advanced interface, Your account, Rosetta@home preferences, you can try reducing the Target CPU run time by about a third of its current value. But note that there's a minimum value you're not allowed to go below. This should give you workunits that run for shorter times, but need about the same amount of memory. Does this fit your idea of smaller work units? |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,518,559 RAC: 10,612 |
Is there any way you could set up an option to download smaller work units? Have you only recently returned to this project? It looks like you've had a few days off after receiving tasks and had to abort some. If you're online 12-15hrs/day you should be able to complete 8hr tasks ok when they have a 3-day deadline. Try to let them run and complete and it should improve the more tasks you complete and return. It should settle down after a few days. Give it another try. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,382,444 RAC: 19,446 |
Is there any way you could set up an option to download smaller work units?Just set a smaller cache- reducing the Target CPU Run time (at this point) there's the high risk that you'll just end up with even more Tasks downloading, missing the deadlines & then erroring out than is already happening. On your account page, Preferences, When and how BOINC uses your computer, Computing preferences Other Store at least 0.6 days of work Store up to an additional 0.02 days of workWorks for me. It takes an extremely long time for the Estimated completion times to get reasonably close to the actual time (Target CPU Run time). And even so, it will take a while for BOINC to determine how many hours a day your computer is on, and how much time it is able to process work while it is on (the default settings can mean just browsing sites with heavy graphics/scripts will stop BONC from processing work). Grant Darwin NT |
GoldenHat Send message Joined: 14 Apr 20 Posts: 3 Credit: 122,663 RAC: 0 |
I'm running Windows 10 64bit. I haven't checked system monitor no but I will. Since this post it seems to have disappeared. I rebooted the PC and it's been fine. I notice sometimes it does it for a short period of time but settles again. I'm not a big techie so I'm not going to spend ages faultfinding or trying to understand how it works. It's running so I'll leave it. Thanks very much for your desire to assist, I do appreciate you taking the time to reply. I'll keep my eye on the monmitor if it goes funky again. Richard. |
Michael E.@ team Carl Sagan Send message Joined: 5 Apr 08 Posts: 16 Credit: 1,947,553 RAC: 210 |
This post asks about fixing the estimated Remaining time on long Rosetta tasks. I tend to be pretty direct so here goes... I was using long 36 hours Rosetta tasks and cut it down to 24 hours, but still have the same issue. This project-specific preference is set under the web interface: Your account > Rosetta@home preferences > Target CPU run time. On the computer under Advanced View > Options > Computing Preferences, I set my Store at least to 1 days of work, but I still get jobs that do not complete and have to be aborted. With 24+ hour tasks, the estimated Remaining time says about 6 hours until about 6-7 hours elapsed time, when the more accurate time gets calculated, such as 17 hours left. Questions/strong suggestion: + Could the estimated Remaining time for such 24+ hour tasks be doubled to prevent the need to abort so many tasks? + Could there be a limit on the number of downloaded tasks (maybe just long tasks) at a time to 2? Could the option for long tasks > 10 or 12 hours be disabled until the estimated Remaining time can be fixed? I do not think it is a good practice for people to abort tasks. Does it matter to the Rosetta@home research if we use 8-10 hour tasks rather than 24+ hour tasks? Mike |
CIA Send message Joined: 3 May 07 Posts: 100 Credit: 21,059,812 RAC: 0 |
Honestly if you are running 24 hour tasks why even have a cache? Are you running another project besides Rosetta? As long as you have a good internet connection the downtime from finishing a task and getting a fresh one is next to nothing and you are only hitting the server once per day for a new task (vs 3 times a day for 8 hour tasks). All my machines that are set to 24 hour tasks run 0 cache without issue. |
Brummit Send message Joined: 14 Jul 14 Posts: 2 Credit: 30,582 RAC: 0 |
Shall do. Step 1 - complete the tasks I have now. Then download more if successful. Thanks. |
Michael E.@ team Carl Sagan Send message Joined: 5 Apr 08 Posts: 16 Credit: 1,947,553 RAC: 210 |
My original question: Does it matter to the Rosetta@home research if we use 8-10 hour tasks rather than 24+ hour tasks?
So you are telling me that the same type of tasks are sent regardless of task length? That is, they get split up so there can be smaller tasks? I want to understand the needs of the researchers. For example, if longer tasks do different types of calculations than small tasks and few people process them, I can do the long tasks. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
[snip] So you are telling me that the same type of tasks are sent regardless of task length? That is, they get split up so there can be smaller tasks? The tasks are sent out as batches of calculations, sometimes with one starting point and sometimes with a list of starting points. This part is the same between short and long tasks. There are often 100 steps per batch. A target time set by the user is sent along with them. This controls how many steps of the batch are calculated. There has been no clear statement on how two tasks from the same workunit are compared if they haven't done an equal number of steps. Sometimes, the quality of the last step can be calculated rapidly; if so, this calculation is often used in place of an additional task per workunit to allow comparison. There has been talk, but not yet action, about a new class of workunits that can use up to 4 gigabytes of memory each, rather than the usual up to 2 gigabytes. This is intended to allow work on larger proteins, which will probably also require larger target times, |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I want to understand the needs of the researchers. For example, if longer tasks do different types of calculations than small tasks and few people process them, I can do the long tasks. The long tasks do the same calculations as the short tasks. They just do more of them. Check the number of "decoys" in your completed results. What the researchers need is thousands of completed decoys. Long tasks might complete 100 of them, short tasks might complete 20 of them. Combine a machine running long ones with a machine running short ones and you get 120 completed decoys. ...and once you successfully complete and report about a dozen tasks of the same runtime preference, BOINC Manager will have a much better guess on the runtime to expect for future tasks. Once the estimated runtime of an unstarted task approaches your current runtime preference, you will stop getting more work than you can complete before the deadline (assuming your cache size is less than the 3 day deadlines). To help things in the meantime, a smaller cache size helps avoid getting more work than you can complete within the 3 day deadline. Rosetta Moderator: Mod.Sense |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org