Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 44 · 45 · 46 · 47 · 48 · 49 · 50 . . . 311 · Next

AuthorMessage
crystalsys
Avatar

Send message
Joined: 11 Aug 09
Posts: 8
Credit: 1,648,888
RAC: 566
Message 94632 - Posted: 16 Apr 2020, 19:58:29 UTC - in response to Message 94628.  
Last modified: 16 Apr 2020, 19:59:19 UTC

OK, so I decided to take your advice, though I don't have any recent notices of a new version.

In BOINC Manager (currently 7.14.2 x64) I clicked 'check for new version'. It came back and told me there wasn't one. I normally don't have the log window open, but I did because I was monitoring the stalled tasks. In THAT window, I got a message in RED saying there was a new version. Did the check again, NOW it says there is a new version.

Hopefully they also fixed the bogus 'there is no newer version' message.

Thanks!
ID: 94632 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Myers
Avatar

Send message
Joined: 29 Mar 20
Posts: 97
Credit: 332,619
RAC: 8
Message 94635 - Posted: 16 Apr 2020, 20:41:46 UTC - in response to Message 94632.  

That feature in the Manager does not work. You can check for the latest BOINC version on the BOINC download page. The latest is 7.16.5.

https://boinc.berkeley.edu/download_all.php

You can also restrict the applications you run by configuring a cc_config.xml file.

https://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration
ID: 94635 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1735
Credit: 18,532,940
RAC: 14,716
Message 94644 - Posted: 17 Apr 2020, 0:20:04 UTC - in response to Message 94612.  

A batch of WUs is not cancelled by the project when they have good reason to believe your attempt to crunch it will go better. They set up most WUs to do one additional try after a failure for the reason you mention, maybe the second attempt will go better. But, looking across the whole batch is the only way to make a decision about whether to withdraw the batch, and that has nothing to do with your current state on the WU.
That's just it.
As near as i can tell, that batch of WUs wasn't cancelled by the servers (i've actually processed 3 others that were resends, and one initial issue with no problems), that Task was sent out to see if it was dodgy or not. But then the Server cancelled it anyway without giving me a chance to even process it.

                    minimum quorum 1
               initial replication 1
max # of error/total/success tasks 1, 2, 1
errors Too many errors (may have bug) Too many total results WU cancelled
Why would the server cancel that Task? Given the time and effort it takes to produce work, i'd have thought Tasks being cancelled before checking them out would be worth looking in to.
Grant
Darwin NT
ID: 94644 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 677
Message 94650 - Posted: 17 Apr 2020, 3:04:30 UTC - in response to Message 94644.  
Last modified: 17 Apr 2020, 3:10:15 UTC

[snip]


                    minimum quorum 1
               initial replication 1
max # of error/total/success tasks 1, 2, 1
errors Too many errors (may have bug) Too many total results WU cancelled
Why would the server cancel that Task? Given the time and effort it takes to produce work, i'd have thought Tasks being cancelled before checking them out would be worth looking in to.

The previous failed attempt WAS enough checking it out. Tasks already downloaded are normally cancelled only if they haven't started.
ID: 94650 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1735
Credit: 18,532,940
RAC: 14,716
Message 94652 - Posted: 17 Apr 2020, 3:50:54 UTC - in response to Message 94650.  
Last modified: 17 Apr 2020, 3:51:54 UTC

[snip]


                    minimum quorum 1
               initial replication 1
max # of error/total/success tasks 1, 2, 1
errors Too many errors (may have bug) Too many total results WU cancelled
Why would the server cancel that Task? Given the time and effort it takes to produce work, i'd have thought Tasks being cancelled before checking them out would be worth looking in to.
The previous failed attempt WAS enough checking it out. Tasks already downloaded are normally cancelled only if they haven't started.
If that was the case, why resend it? The whole point of resending something, is to check it out. If it doesn't need to be checked out, it doesn't need to be resent.
And as i posted, i've done 3 others of that type that had errored on other systems, without them being cancelled.
Grant
Darwin NT
ID: 94652 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 94676 - Posted: 17 Apr 2020, 13:32:27 UTC - in response to Message 94652.  

Human intervention is required to make the decision about whether there is a specific problem with one machine, or a more general problem with the WU batch. By the time the human had enough information to make that decision, some WUs of the batch were already out to a second host.
Rosetta Moderator: Mod.Sense
ID: 94676 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2146
Credit: 41,570,180
RAC: 6,141
Message 94727 - Posted: 18 Apr 2020, 4:02:48 UTC

Not sure if I should report this as a problem, but...

On an Android phone I'm running 4 tasks and have another (varying) 3 or 4 waiting to follow.
I've been reporting and receiving more tasks regularly. All sounds good.

Trouble is, the Server Status page has been reporting no tasks available to download for at least a day.
And the number of in progress tasks has been reducing steadily until a few hours ago and currently reads nil. Right now I have 7.

I've certainly received and reported tasks since both read nil.

Not complaining, obviously. Just reporting
ID: 94727 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile GoldenHat

Send message
Joined: 14 Apr 20
Posts: 3
Credit: 122,663
RAC: 0
Message 94754 - Posted: 18 Apr 2020, 11:26:17 UTC

Hello, I'm a newbie to Rosetta and got things set up and running ok. In the last two days I've noticed my laptop running this app in an odd manner. Instead of running at 100% CPU, it fluctuates between 33% and 100%, my fan turns on and off each time yet I have the settings set as default - 100% CPU time. I'm concerned because 1) It's slower to process the data, 2) It's wearing out my PC and I'm inclined to delete the app from the computer if this continues. I have a Toshiba Qosmio i7 Quad-core with 8 logical processors. It runs the CPU, GPU 0 and GPU 1 at full capacity, with CPU speed at 3Ghz with a base speed of 2.4Ghz.
Any ideas how I can get it running flat at 100% rather than this fluctuation? When I started it was fine, just in the last two days it's gone funky.

Thanks,
Richard.
ID: 94754 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 404
Credit: 12,294,748
RAC: 2,092
Message 94755 - Posted: 18 Apr 2020, 11:31:57 UTC - in response to Message 94754.  

Hello, I'm a newbie to Rosetta and got things set up and running ok. In the last two days I've noticed my laptop running this app in an odd manner. Instead of running at 100% CPU, it fluctuates between 33% and 100%, my fan turns on and off each time yet I have the settings set as default - 100% CPU time. I'm concerned because 1) It's slower to process the data, 2) It's wearing out my PC and I'm inclined to delete the app from the computer if this continues. I have a Toshiba Qosmio i7 Quad-core with 8 logical processors. It runs the CPU, GPU 0 and GPU 1 at full capacity, with CPU speed at 3Ghz with a base speed of 2.4Ghz.
Any ideas how I can get it running flat at 100% rather than this fluctuation? When I started it was fine, just in the last two days it's gone funky.

Thanks,
Richard.


What os are you running?

Have you tried running the system monitor to see what processes are taking cpu time and maybe which processes are cutting in and out to cause the fluctuations?
ID: 94755 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1735
Credit: 18,532,940
RAC: 14,716
Message 94756 - Posted: 18 Apr 2020, 11:54:35 UTC - in response to Message 94754.  

Any ideas how I can get it running flat at 100% rather than this fluctuation? When I started it was fine, just in the last two days it's gone funky.
In the last couple of days you have picked up more work from Seti, Some was run on the iGPU, the rest is on the Nvdia GPU.
And the problem with how it was before, is that the system was producing nothing but errors here at Rosetta. if you set your Computing preferences to the following, things should settle down. Less errors & less fans starting up & slowing down continually.
Computing
   Usage limits	
                                   Use at most 100% of the CPUs
                                   Use at most 100% of CPU time

   When to suspend	
           Suspend when computer is on battery (selected)
               Suspend when computer is in use (not selected)
 Suspend GPU computing when computer is in use (not selected)
   'In use' means mouse/keyboard input in last 3 minutes
  Suspend when no mouse/keyboard input in last --- minutes
     Suspend when non-BOINC CPU usage is above --- %
                          Compute only between ---

   Other	
                                Store at least 1 days of work
                     Store up to an additional 0.02 days of work
                    Switch between tasks every 60 minutes
     Request tasks to checkpoint at most every 60 seconds

   Disk
                              Use no more than 20 GB
                                Leave at least 2 GB free
                              Use no more than 60 % of total

   Memory
          When computer is in use, use at most 95 %
      When computer is not in use, use at most 95 %
 Leave non-GPU tasks in memory while suspended (not selected)
                   Page/swap file: use at most 75 %

Grant
Darwin NT
ID: 94756 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 94791 - Posted: 18 Apr 2020, 18:01:44 UTC

What is your setting for "Suspend when non-BOINC CPU usage is above --- %"? Perhaps you have other tasks popping in and consuming CPU, which is causing BOINC to snooze. Especially if the value is at the default (25%?) it can be easy for the various other tasks to exceed that (briefly). I would set it to 75% of higher. Don't worry, the BOINC tasks still have low priority.

If you are also running work on your GPU, keep in mind that CPU is still required to service the active work on the GPU. I believe many set things to use at most some % of CPUs that leaves one core free to service the GPU work. I don't run GPU work, perhaps someone could reply with details on how to set up that arrangement.
Rosetta Moderator: Mod.Sense
ID: 94791 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1735
Credit: 18,532,940
RAC: 14,716
Message 94804 - Posted: 19 Apr 2020, 0:02:39 UTC - in response to Message 94791.  

I believe many set things to use at most some % of CPUs that leaves one core free to service the GPU work. I don't run GPU work, perhaps someone could reply with details on how to set up that arrangement.
I find it's best to reserve a core to support the GPU. If a GPU Task is running, it gets the CPU support it needs and it doesn't impact on the processing time of CPU Tasks that are running. If there is no GPU work being done, then the CPU core/thread is free to do CPU work.
The app_config.xml file needs to go in to the Seti project folder.

If installed on the C: drive
C:/ProgramData BOINC/projects setiathome.berkeley.edu/app_coonfig.xml

Make sure to use Notepad or similar to create or edit the file (NOT Word or Wordpad)
<app_config>
 <app>
  <name>setiathome_v8</name>
  <gpu_versions>
  <gpu_usage>1.0</gpu_usage>
  <cpu_usage>1.0</cpu_usage>
  </gpu_versions>
 </app>
 <app>
  <name>astropulse_v7</name>
  <gpu_versions>
  <gpu_usage>1.0</gpu_usage>
  <cpu_usage>1.0</cpu_usage>
  </gpu_versions>
 </app>
</app_config>

Grant
Darwin NT
ID: 94804 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Admin
Project administrator

Send message
Joined: 1 Jul 05
Posts: 5144
Credit: 0
RAC: 0
Message 94807 - Posted: 19 Apr 2020, 0:23:59 UTC - in response to Message 94727.  

We are deprecating the 'rosetta_for_devices' app. The arm platforms have been added to the 'rosetta' application group. We will also be deprecating the minirosetta app and will soon have just the rosetta app. There are still some minirosetta jobs in our queue.
ID: 94807 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nicholas Hathaway

Send message
Joined: 20 Nov 14
Posts: 6
Credit: 791,395
RAC: 0
Message 94808 - Posted: 19 Apr 2020, 1:16:49 UTC

Hi I am repeatedly getting the following in my event loG:

Sat Apr 18 07:39:31 2020 | Rosetta@home | Resetting project
Sat Apr 18 23:48:22 2020 | Rosetta@home | Task hgfpsplit2_148_fold_SAVE_ALL_OUT_916496_89_0 exited with zero status but no 'finished' file
Sat Apr 18 23:48:22 2020 | Rosetta@home | If this happens repeatedly you may need to reset the project.
Sat Apr 18 23:53:46 2020 | Rosetta@home | Task hgfpsplit2_148_fold_SAVE_ALL_OUT_916496_89_0 exited with zero status but no 'finished' file
Sat Apr 18 23:53:46 2020 | Rosetta@home | If this happens repeatedly you may need to reset the project.
Sat Apr 18 23:54:40 2020 | Rosetta@home | Task hgfpsplit2_148_fold_SAVE_ALL_OUT_916496_89_0 exited with zero status but no 'finished' file
Sat Apr 18 23:54:40 2020 | Rosetta@home | If this happens repeatedly you may need to reset the project.
Sun Apr 19 00:46:30 2020 | Rosetta@home | Project requested delay of 7 seconds


What do I need to do?
ID: 94808 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1735
Credit: 18,532,940
RAC: 14,716
Message 94810 - Posted: 19 Apr 2020, 1:38:00 UTC - in response to Message 94808.  
Last modified: 19 Apr 2020, 1:43:43 UTC

Hi I am repeatedly getting the following in my event loG:

Sat Apr 18 07:39:31 2020 | Rosetta@home | Resetting project
Sat Apr 18 23:48:22 2020 | Rosetta@home | Task hgfpsplit2_148_fold_SAVE_ALL_OUT_916496_89_0 exited with zero status but no 'finished' file
Sat Apr 18 23:48:22 2020 | Rosetta@home | If this happens repeatedly you may need to reset the project.
Sat Apr 18 23:53:46 2020 | Rosetta@home | Task hgfpsplit2_148_fold_SAVE_ALL_OUT_916496_89_0 exited with zero status but no 'finished' file
Sat Apr 18 23:53:46 2020 | Rosetta@home | If this happens repeatedly you may need to reset the project.
Sat Apr 18 23:54:40 2020 | Rosetta@home | Task hgfpsplit2_148_fold_SAVE_ALL_OUT_916496_89_0 exited with zero status but no 'finished' file
Sat Apr 18 23:54:40 2020 | Rosetta@home | If this happens repeatedly you may need to reset the project.
Sun Apr 19 00:46:30 2020 | Rosetta@home | Project requested delay of 7 seconds


What do I need to do?
How do your Computing preferences compare to these? Particularly the "When to suspend" settings. If they aren't a problem, then as it says in the Event log, you'll probably need to reset the project, but even then i may not fix the problem.
It appears it's because the system is busy doing something else & BOINC can't communicate with the science application. Having "Use at most 100% of CPU time" less than 100% can cause it on some systems.
As long as Tasks don't error out as a result, it's not a problem as such, but it does show some contention for resources on the system.


You've also had a lot of Tasks miss the deadline, so a much smaller cache would be a good idea.
Computing
   Usage limits	
                                   Use at most 100% of the CPUs
                                   Use at most 100% of CPU time

   When to suspend	
           Suspend when computer is on battery (not selected)
               Suspend when computer is in use (not selected)
 Suspend GPU computing when computer is in use (not selected)
   'In use' means mouse/keyboard input in last 3 minutes
  Suspend when no mouse/keyboard input in last --- minutes
     Suspend when non-BOINC CPU usage is above --- %
                          Compute only between ---

   Other	
                                Store at least 1 days of work
                     Store up to an additional 0.02 days of work
                    Switch between tasks every 60 minutes
     Request tasks to checkpoint at most every 60 seconds

   Disk
                              Use no more than 20 GB
                                Leave at least 2 GB free
                              Use no more than 60 % of total

   Memory
          When computer is in use, use at most 95 %
      When computer is not in use, use at most 95 %
 Leave non-GPU tasks in memory while suspended (not selected)
                   Page/swap file: use at most 75 %

Grant
Darwin NT
ID: 94810 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nicholas Hathaway

Send message
Joined: 20 Nov 14
Posts: 6
Credit: 791,395
RAC: 0
Message 94812 - Posted: 19 Apr 2020, 1:46:09 UTC - in response to Message 94810.  

Computing preferences
These settings apply to all computers using this account except
computers where you have set preferences locally using the BOINC Manager
Android devices
Computing
Usage limits
Use at most 100 % of the CPUs
Use at most 100 % of CPU time
When to suspend
Suspend when computer is on battery
Suspend when computer is in use
Suspend GPU computing when computer is in use
'In use' means mouse/keyboard input in last 3 minutes
Suspend when no mouse/keyboard input in last --- minutes
Suspend when non-BOINC CPU usage is above --- %
Compute only between ---
Other
Store at least 0.5 days of work
Store up to an additional 1 days of work
Switch between tasks every 60 minutes
Request tasks to checkpoint at most every 60 seconds
Disk
Use no more than --- GB
Leave at least --- GB free
Use no more than 90 % of total
Memory
When computer is in use, use at most 90 %
When computer is not in use, use at most 90 %
Leave non-GPU tasks in memory while suspended
Page/swap file: use at most 75 %
Network
Usage limits
Limit download rate to --- KB/second
Limit upload rate to --- KB/second
Limit usage to --- MB every --- days
When to suspend
Transfer files only between ---
Other
Skip data verification for image files
Confirm before connecting to Internet
Disconnect when done

Edit preferences
ID: 94812 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2146
Credit: 41,570,180
RAC: 6,141
Message 94815 - Posted: 19 Apr 2020, 2:03:04 UTC - in response to Message 94727.  

Not sure if I should report this as a problem, but...

On an Android phone I'm running 4 tasks and have another (varying) 3 or 4 waiting to follow.
I've been reporting and receiving more tasks regularly. All sounds good.

Trouble is, the Server Status page has been reporting no tasks available to download for at least a day.
And the number of in progress tasks has been reducing steadily until a few hours ago and currently reads nil. Right now I have 7.

I've certainly received and reported tasks since both read nil.

Not complaining, obviously. Just reporting

Still reporting tasks and getting more 24hrs after 0 to send and 0 in progress. Am I one of the 7 who reported in the last 24hrs?

We are deprecating the 'rosetta_for_devices' app. The arm platforms have been added to the 'rosetta' application group. We will also be deprecating the minirosetta app and will soon have just the rosetta app. There are still some minirosetta jobs in our queue.

Oh, maybe this explains it. Thought it was weird.
ID: 94815 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MarkJ

Send message
Joined: 28 Mar 20
Posts: 72
Credit: 25,238,680
RAC: 0
Message 94837 - Posted: 19 Apr 2020, 7:35:19 UTC - in response to Message 94807.  

We are deprecating the 'rosetta_for_devices' app. The arm platforms have been added to the 'rosetta' application group. We will also be deprecating the minirosetta app and will soon have just the rosetta app. There are still some minirosetta jobs in our queue.

Well that should reduce the development effort if you only have the one app (but multiple platforms). I still seem to be getting MiniRosetta and haven't cleared my cache (which is only 0.3 days) yet.
BOINC blog
ID: 94837 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1735
Credit: 18,532,940
RAC: 14,716
Message 94838 - Posted: 19 Apr 2020, 7:44:48 UTC - in response to Message 94837.  

We are deprecating the 'rosetta_for_devices' app. The arm platforms have been added to the 'rosetta' application group. We will also be deprecating the minirosetta app and will soon have just the rosetta app. There are still some minirosetta jobs in our queue.
Well that should reduce the development effort if you only have the one app (but multiple platforms). I still seem to be getting MiniRosetta and haven't cleared my cache (which is only 0.3 days) yet.
I figure there will be a few resends after the last of the initial Tasks have been sent out. But with the short deadlines & low replication, it should all be cleared up well within a week.
Grant
Darwin NT
ID: 94838 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
zfp

Send message
Joined: 22 Mar 20
Posts: 1
Credit: 114,637
RAC: 0
Message 94840 - Posted: 19 Apr 2020, 7:57:36 UTC

Hello,

After a kernel update I restarted my system. It resulted in
    two of the task running at the time of the reboot to lose all progress after the reboot,
    then all running tasks at the time of the reboot to run for extremely long
    and all of them to exit with: Exit status 139 (0x0000008B) Unknown error code



The output shows this:

WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1

https://boinc.bakerlab.org/rosetta/result.php?resultid=1152628078
https://boinc.bakerlab.org/rosetta/result.php?resultid=1152627950
https://boinc.bakerlab.org/rosetta/result.php?resultid=1152627464
https://boinc.bakerlab.org/rosetta/result.php?resultid=1152627790

ID: 94840 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 44 · 45 · 46 · 47 · 48 · 49 · 50 . . . 311 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2025 University of Washington
https://www.bakerlab.org