Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 290 · 291 · 292 · 293 · 294 · 295 · 296 . . . 297 · Next

AuthorMessage
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 1
Message 109692 - Posted: 29 Aug 2024, 22:08:12 UTC

Its back, but with 12000 active users against 19000 tasks, they were gone in a heart beat.
That's 1.6 tasks per system on average
ID: 109692 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bill Swisher

Send message
Joined: 10 Jun 13
Posts: 32
Credit: 31,619,624
RAC: 31,118
Message 109693 - Posted: 30 Aug 2024, 4:30:59 UTC - in response to Message 109692.  

Its back, but with 12000 active users against 19000 tasks, they were gone in a heart beat.
That's 1.6 tasks per system on average


Only if you're running an Intel GPU according to the Event Log. I've no Intel computers (to speak of). So I have no Rosetta work, Denis is still on summer break, leaving only WCG. Yet again.
ID: 109693 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1652
Credit: 17,225,462
RAC: 19,959
Message 109694 - Posted: 30 Aug 2024, 6:34:32 UTC - in response to Message 109693.  

Only if you're running an Intel GPU according to the Event Log.
???
There is no GPU application here at Rosetta, CPU only.
Grant
Darwin NT
ID: 109694 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1652
Credit: 17,225,462
RAC: 19,959
Message 109695 - Posted: 30 Aug 2024, 6:36:13 UTC - in response to Message 109692.  
Last modified: 30 Aug 2024, 6:39:42 UTC

Its back, but with 12000 active users
According to the Server Status page, that's 1,200.
And the last batch of work that was sent out- not even 5,000 Tasks.

So it's still a matter of 20 minutes or less before they are all gone.
Grant
Darwin NT
ID: 109695 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bill Swisher

Send message
Joined: 10 Jun 13
Posts: 32
Credit: 31,619,624
RAC: 31,118
Message 109697 - Posted: 30 Aug 2024, 14:25:39 UTC - in response to Message 109694.  

Yer right! My eyes went wonky for a bit and I misread the task name. It was an entry for WCG.
ID: 109697 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1985
Credit: 9,362,147
RAC: 7,841
Message 109704 - Posted: 4 Sep 2024, 14:56:39 UTC - in response to Message 109684.  

boinc-process is down again, so there's a Validation backlog once again that continues to grow.


Also today, like an old friend....
ID: 109704 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1985
Credit: 9,362,147
RAC: 7,841
Message 109717 - Posted: 9 Sep 2024, 19:53:56 UTC

I said that i hoped a lot of work when summer finish.
' Cause, at September, Universities, research centers, etc will re-open

And....no work :-(
ID: 109717 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrchips

Send message
Joined: 11 Nov 09
Posts: 10
Credit: 14,262,774
RAC: 19,613
Message 109723 - Posted: 11 Sep 2024, 11:30:10 UTC

system is down again!!!
ID: 109723 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2098
Credit: 40,822,968
RAC: 11,520
Message 109733 - Posted: 17 Sep 2024, 0:54:47 UTC

I'm only dipping in and out recently, but all servers are running atm and my system grabbed tasks 30mins ago.
Server status page updated 7 minutes ago and more Rosetta Beta tasks seem to be available.
No idea how many until the front page updates.
Fingers crossed.
ID: 109733 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2098
Credit: 40,822,968
RAC: 11,520
Message 109734 - Posted: 17 Sep 2024, 4:12:23 UTC - in response to Message 109733.  

Looks like half a million
ID: 109734 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1652
Credit: 17,225,462
RAC: 19,959
Message 109735 - Posted: 17 Sep 2024, 5:05:18 UTC
Last modified: 17 Sep 2024, 5:06:14 UTC

Anyone getting errors with these Tasks, within a minute or so, with this in the Stderr output

<core_client_version>8.0.2</core_client_version>
<![CDATA[
<message>
Incorrect function.
 (0x1) - exit code 1 (0x1)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.06_windows_x86_64.exe @srmpnn12_10_hallucinated_127_36_dldesign_0_cycle0.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3384584
Extracting in slot directory: minirosetta_database.zip
Using database: minirosetta_database
Cannot find database: minirosetta_database

</stderr_txt>
]]>


Try resetting the Project.
Once again, there is an issue with where things are, and where your existing installation actually has them (or not).
One of my systems started processing with no problems, the other producing just errors until resetting the project sorted it out.
Grant
Darwin NT
ID: 109735 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2098
Credit: 40,822,968
RAC: 11,520
Message 109736 - Posted: 17 Sep 2024, 16:26:37 UTC - in response to Message 109735.  

Anyone getting errors with these Tasks, within a minute or so, with this in the Stderr output

<core_client_version>8.0.2</core_client_version>
<![CDATA[
<message>
Incorrect function.
 (0x1) - exit code 1 (0x1)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_beta_6.06_windows_x86_64.exe @srmpnn12_10_hallucinated_127_36_dldesign_0_cycle0.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3384584
Extracting in slot directory: minirosetta_database.zip
Using database: minirosetta_database
Cannot find database: minirosetta_database

</stderr_txt>
]]>


Try resetting the Project.
Once again, there is an issue with where things are, and where your existing installation actually has them (or not).
One of my systems started processing with no problems, the other producing just errors until resetting the project sorted it out.

No. And I think the fact that one of your systems works fine and the other doesn't backs that up.
Why it should be happening with one and not the other, I have no idea.
ID: 109736 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1652
Credit: 17,225,462
RAC: 19,959
Message 109739 - Posted: 17 Sep 2024, 22:33:36 UTC - in response to Message 109736.  

Why it should be happening with one and not the other, I have no idea.
Neither do i, but it has been a recurring problem over at Ralph (when it's working, which it isn't again) and when it has work.
Several times it's been necessary to reset the project to stop errors occurring because the updated application doesn't have all the files it needs, or it's looking for them in the wrong place.

Both systems have the same hardware (CPU, motherboard) similar GPU (RTX 2060 & RTX 2060 super), same video driver, same AV software, same OS & updates, same version of BOINC, same projects, some configuration settings.
They are, the same. Yet weirdness continues to occur.
Grant
Darwin NT
ID: 109739 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 257
Credit: 483,503
RAC: 590
Message 109740 - Posted: 17 Sep 2024, 22:38:00 UTC - in response to Message 109739.  

Compare project directories then.
Copy both to usb hdd and then compare with winmerge.
ID: 109740 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1652
Credit: 17,225,462
RAC: 19,959
Message 109741 - Posted: 18 Sep 2024, 0:23:30 UTC - in response to Message 109740.  

Compare project directories then.
Copy both to usb hdd and then compare with winmerge.
Too late now, but something to think about if it occurs again on one system and not the other.
Grant
Darwin NT
ID: 109741 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1985
Credit: 9,362,147
RAC: 7,841
Message 109746 - Posted: 18 Sep 2024, 12:32:51 UTC - in response to Message 109684.  

Oh, what a surprise.
boinc-process is down again, so there's a Validation backlog once again that continues to grow.


Still.
And already 70k wus pending for validation...
ID: 109746 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kasdashdfjsah

Send message
Joined: 15 Jan 24
Posts: 4
Credit: 0
RAC: 0
Message 109747 - Posted: 18 Sep 2024, 16:47:36 UTC - in response to Message 109746.  

Yeah, but resetting the project worked for me at least, despite the server status page saying that no tasks are available, and clicking the update button over and over again didn't work, so this is very likely the only fix right now, and only works for some people.
ID: 109747 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 111
Credit: 5,022,510
RAC: 2,008
Message 109748 - Posted: 18 Sep 2024, 18:23:15 UTC - in response to Message 80630.  



I'm not sure why you aren't getting work units. The system seems ok now and clients should be getting jobs. My desktops are crunching and were able to get jobs recently. Can you try to detach and reattach and see if that helps?

Still getting new WUs (at least during last night it seems), and on Monday at least, some got validated, but at last since yesterday, all crunched WUs just end up "Validation pending", and a ton of servers on the server status page are shown not running... :(

Ralf
ID: 109748 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2098
Credit: 40,822,968
RAC: 11,520
Message 109749 - Posted: 18 Sep 2024, 18:37:34 UTC - in response to Message 109739.  

Why it should be happening with one and not the other, I have no idea.
Neither do i, but it has been a recurring problem over at Ralph (when it's working, which it isn't again) and when it has work.
Several times it's been necessary to reset the project to stop errors occurring because the updated application doesn't have all the files it needs, or it's looking for them in the wrong place.

Both systems have the same hardware (CPU, motherboard) similar GPU (RTX 2060 & RTX 2060 super), same video driver, same AV software, same OS & updates, same version of BOINC, same projects, some configuration settings.
They are, the same. Yet weirdness continues to occur.

Tbf I was looking at one of my other PCs a short time ago, which is offsite to where I am atm, and all its tasks crashed within about 300 seconds, so it does seem to be a bit of pot luck
ID: 109749 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2098
Credit: 40,822,968
RAC: 11,520
Message 109750 - Posted: 18 Sep 2024, 18:40:40 UTC - in response to Message 109746.  

Oh, what a surprise.
boinc-process is down again, so there's a Validation backlog once again that continues to grow.


Still.
And already 70k wus pending for validation...

Just back home and looking to load up with tasks before they run out and I'm too late.
Then discovered what you have about boinc-process going down again. 139k awaiting validation now

Just one easy day is all I ask. Will seemingly never happen...
ID: 109750 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 290 · 291 · 292 · 293 · 294 · 295 · 296 . . . 297 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org