Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 175 · 176 · 177 · 178 · 179 · 180 · 181 . . . 281 · Next

AuthorMessage
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 10,031,642
RAC: 10,721
Message 104798 - Posted: 11 Feb 2022, 19:44:40 UTC - in response to Message 104796.  

Why can't message text be included in email notification?
I'll guess either server load or stupidity.
ID: 104798 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rachael Lines
Avatar

Send message
Joined: 11 Feb 22
Posts: 2
Credit: 2,865
RAC: 0
Message 104803 - Posted: 12 Feb 2022, 10:18:48 UTC

Hey everyone

I started crunching yesterday - but four of mine are showing computation error. is this something normal? Or do I need to suspend each one before shutting down my computer at all? - Has turning off my PC what has caused this? Thanks in advance.
ID: 104803 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 238
Credit: 403,345
RAC: 3,749
Message 104804 - Posted: 12 Feb 2022, 10:25:30 UTC - in response to Message 104803.  

You need to enable SVM in bios to compute virtualbox apps
And reset switch that automaticaly disables virtualbox apps when hos6t fails to compute them at
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=6177189
press allow
ID: 104804 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1533
Credit: 15,580,457
RAC: 23,807
Message 104805 - Posted: 12 Feb 2022, 10:41:46 UTC - in response to Message 104803.  
Last modified: 12 Feb 2022, 10:42:49 UTC

Hey everyone

I started crunching yesterday - but four of mine are showing computation error. is this something normal? Or do I need to suspend each one before shutting down my computer at all? - Has turning off my PC what has caused this? Thanks in advance.
It has tried to process Python Tasks, and they require VirtualBox in order to run.
Your system has VirtualBox, but is having problems running it.

Waiting for VM "boinc_722c34e89dac8a69" to power on...
VBoxManage.exe: error: Not in a hypervisor partition (HVP=0) (VERR_NEM_NOT_AVAILABLE).
VBoxManage.exe: error: AMD-V is disabled in the BIOS (or by the host OS) (VERR_SVM_DISABLED)
VBoxManage.exe: error: Details: code E_FAIL (0x80004005), component ConsoleWrap, interface IConsole

2022-02-11 18:14:31 (7376): VM failed to start.
2022-02-11 18:14:31 (7376): Could not start 
2022-02-11 18:14:31 (7376): ERROR: VM failed to start
2022-02-11 18:14:31 (7376): Powering off VM.
2022-02-11 18:14:31 (7376): Deregistering VM. (boinc_722c34e89dac8a69, slot#4)
2022-02-11 18:14:31 (7376): Removing network bandwidth throttle group from VM.
2022-02-11 18:14:31 (7376): Removing VM from VirtualBox.

I'd suggest checking your BOIS to make sure virtualisation is enabled, and then make sure that Hyper-V isn't enabled (under Windows Features).

This may be of use. (Similar hardware & OS and what they had to do to get VirtualBox to work).

If you can get it working you will be limited in the number of Python Tasks you can process due to the amount of RAM you have- from memory at least 3GB of RAM is required per Task to start processing (even though they actually use much less).
And you will also probably need to increase the default amount of disk space BOINC can use to process more than a few Python Tasks- just under 8GB of disk space is require per Task being processed.


You shouldn't have any issues processing Rosetta 4.20 Tasks (unless we get some that do require more RAM than the present ones), but until the last couple of days they have been pretty much non-existent for the last few months.
I would also suggest running the BOINC Manager benchmarks- they are used to determine the amount of Credit you get for doing work, and your system is showing just the default values.
Grant
Darwin NT
ID: 104805 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rachael Lines
Avatar

Send message
Joined: 11 Feb 22
Posts: 2
Credit: 2,865
RAC: 0
Message 104806 - Posted: 12 Feb 2022, 10:50:20 UTC

They were Rosetta Python ones that were showing the error, I have some of the Rosetta 4.20 working now.
I will have a better look at it this afternon,

Thanks for the info
ID: 104806 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1900
Credit: 8,619,538
RAC: 11,909
Message 104821 - Posted: 15 Feb 2022, 6:30:30 UTC
Last modified: 15 Feb 2022, 6:31:17 UTC

All PcrV10AA_PcrV_HYF_ fail after few seconds:

<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @PcrV10AA_PcrV_HYF_10298_000081_extract_A.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3753509
Using database: database_357d5d93529_n_methylminirosetta_database


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00007FF64AC79D28

ID: 104821 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1533
Credit: 15,580,457
RAC: 23,807
Message 104823 - Posted: 15 Feb 2022, 8:58:10 UTC - in response to Message 104821.  

All PcrV10AA_PcrV_HYF_ fail after few seconds:
I've got plenty of _PcrV_ Tasks that have been processed and Validated, but it must be around 50% of them crashed and burned within seconds of starting.
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00007FF7DA118316 read attempt to address 0xFFFFFFFF

Grant
Darwin NT
ID: 104823 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1900
Credit: 8,619,538
RAC: 11,909
Message 104826 - Posted: 15 Feb 2022, 17:01:06 UTC - in response to Message 104823.  

I've got plenty of _PcrV_ Tasks that have been processed and Validated, but it must be around 50% of them crashed and burned within seconds of starting.


+1. Now some of these wus are running...
ID: 104826 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,749,688
RAC: 2,261
Message 104828 - Posted: 15 Feb 2022, 18:33:16 UTC
Last modified: 15 Feb 2022, 18:33:46 UTC

Everything from that protein died and my wingmen had the same errors.
Very good of them to dump untested tasks on the server.
Thought they dumped them on RALPH first and if he liked them then they came to Rosie.
ID: 104828 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1900
Credit: 8,619,538
RAC: 11,909
Message 104829 - Posted: 15 Feb 2022, 20:47:40 UTC - in response to Message 104828.  

Thought they dumped them on RALPH first and if he liked them then they came to Rosie.


Completely agree with you.
Ralph is VERY underused
ID: 104829 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2006
Credit: 39,458,705
RAC: 24,140
Message 104830 - Posted: 15 Feb 2022, 23:23:54 UTC

Finished the latest batch of Rosetta 4.20 tasks, so flicked back to WCG tasks automatically...

Yeah, I didn't read WCG's recent announcement properly.
I thought it was going to be down from 14th to 28th February.
Not stop sending tasks that will complete in that period and then the whole project be down until April 22nd
Already completed everything ffs...

I'm going to have to install Virtual Box and give that another try, aren't I
God help me
ID: 104830 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2006
Credit: 39,458,705
RAC: 24,140
Message 104831 - Posted: 16 Feb 2022, 2:47:33 UTC - in response to Message 104830.  

Finished the latest batch of Rosetta 4.20 tasks, so flicked back to WCG tasks automatically...

Yeah, I didn't read WCG's recent announcement properly.
I thought it was going to be down from 14th to 28th February.
Not stop sending tasks that will complete in that period and then the whole project be down until April 22nd
Already completed everything ffs...

I'm going to have to install Virtual Box and give that another try, aren't I
God help me

Was just about to say I completed my first tasks (which I did) when something crashed and all my remaining tasks errored out
16/02/2022 2:41:27 | Rosetta@home | [error] MD5 check failed for AIMNet_vm_v2.vdi
16/02/2022 2:41:27 | Rosetta@home | [error] expected 61fef19456bb58ec941845ef08d8c5ef, got c846bd7ee0a3dedc8eedbe8fbc36eda8

Still, better than my previous attempts.
I had up to 9 tasks running at a time within 32Gb RAM on my 8C/16T machine
ID: 104831 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1900
Credit: 8,619,538
RAC: 11,909
Message 104832 - Posted: 16 Feb 2022, 6:14:22 UTC - in response to Message 104830.  
Last modified: 16 Feb 2022, 6:16:33 UTC

I'm going to have to install Virtual Box and give that another try, aren't I
God help me


Tn-Grid??
Sidock?
ID: 104832 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
computezrmle

Send message
Joined: 9 Dec 11
Posts: 63
Credit: 9,680,103
RAC: 0
Message 104833 - Posted: 16 Feb 2022, 7:34:10 UTC - in response to Message 104831.  

16/02/2022 2:41:27 | Rosetta@home | [error] MD5 check failed for AIMNet_vm_v2.vdi
16/02/2022 2:41:27 | Rosetta@home | [error] expected 61fef19456bb58ec941845ef08d8c5ef, got c846bd7ee0a3dedc8eedbe8fbc36eda8

Looks like the vdi image got damaged and needs to be refreshed.

Best would be to
- Shut down BOINC
- Delete AIMNet_vm_v2.vdi from the projects directory
- Restart BOINC

This will initiate a fresh download of the compressed image (~2 GB) which will then be expanded to 6.9 GB.
Whenever a fresh task starts AIMNet_vm_v2.vdi will be forced through the checksum calculator (MD5 check) and the result will be compared to the checksum sent by the project.
Only in case of a success the image will be copied to a slots directory and renamed vm_image.vdi which is used for the task.
ID: 104833 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2006
Credit: 39,458,705
RAC: 24,140
Message 104840 - Posted: 16 Feb 2022, 14:30:32 UTC - in response to Message 104832.  

I'm going to have to install Virtual Box and give that another try, aren't I
God help me

Tn-Grid??
Sidock?

I couldn't even access the home page of Sidock. TN-Grid may be something, but I'm not sure what.
I'll persist here for a while longer
ID: 104840 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2006
Credit: 39,458,705
RAC: 24,140
Message 104841 - Posted: 16 Feb 2022, 14:39:40 UTC - in response to Message 104833.  
Last modified: 16 Feb 2022, 14:45:49 UTC

16/02/2022 2:41:27 | Rosetta@home | [error] MD5 check failed for AIMNet_vm_v2.vdi
16/02/2022 2:41:27 | Rosetta@home | [error] expected 61fef19456bb58ec941845ef08d8c5ef, got c846bd7ee0a3dedc8eedbe8fbc36eda8

Looks like the vdi image got damaged and needs to be refreshed.

Best would be to
- Shut down BOINC
- Delete AIMNet_vm_v2.vdi from the projects directory
- Restart BOINC

This will initiate a fresh download of the compressed image (~2 GB) which will then be expanded to 6.9 GB.
Whenever a fresh task starts AIMNet_vm_v2.vdi will be forced through the checksum calculator (MD5 check) and the result will be compared to the checksum sent by the project.
Only in case of a success the image will be copied to a slots directory and renamed vm_image.vdi which is used for the task.

A new version of AIMNet_vm_v2.vdi comes down after updating, with all the attributes you mention but without a shutdown and restart.
I've had a few Rosetta 4.20 and WCG tasks dribble through too.
I overclock my PC and sometimes these checksum errors have been associated with overclocking, so I'm wary of that factor.
At the same time, VBox tasks seem slightly less demanding than Rosetta tasks and I'm running a lot cooler with VBox, so maybe not.
And I'm sure that only being able to run 8 or 9 tasks at a time rather than 16 plays into that too.

I've completed all the Rosetta and WCG tasks I got, but now I only have 2 VBox tasks and none further will download. Is it normal for VBox tasks to only be available intermittently?

I'm getting what I'm getting and it's not the complete failure it was when I first tried. I'll give it a few more days

Edit: I've had to click "Allow" on my PC's profile. I guess all the crashed tasks tripped it to restrict downloads
Yup, that's done it
ID: 104841 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1900
Credit: 8,619,538
RAC: 11,909
Message 104842 - Posted: 16 Feb 2022, 15:38:02 UTC - in response to Message 104840.  

I couldn't even access the home page of Sidock.

Sidock is in maintenance today, but will be turn online soon


TN-Grid may be something, but I'm not sure what.

It's a storical boinc project about gene network....
http://gene.disi.unitn.it/test/
ID: 104842 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Killersocke@rosetta

Send message
Joined: 13 Nov 06
Posts: 29
Credit: 2,579,125
RAC: 0
Message 104843 - Posted: 16 Feb 2022, 18:25:37 UTC - in response to Message 104842.  

World Community Grid: WCG Data Transfer Underway, Stress Test of New Infrastructure Scheduled For Feb 28th
We have started to transfer data for all active WCG projects to the Krembil Research Institute. We are gearing up to start testing the whole system on February 28, 2022.
09.02.2022 20:41:43 · weiterlesen...

--------------------------------------------------------------------------------
SiDock@home: Technical maintenance on February 15th
Hello! Additional server maintenance planned on February 15th, for several hours.
14.02.2022 22:57:41 · weiterlesen...
ID: 104843 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BoredEEdude

Send message
Joined: 11 Apr 12
Posts: 11
Credit: 38,853,966
RAC: 1,932
Message 104844 - Posted: 16 Feb 2022, 18:28:44 UTC

I have been running Rosetta on multiple computers for years, and it has been a mostly hands-off background task requiring minimal supervision.

For the past few months, Rosetta work units have been unavailable for days on end. No errors are shown, just "got 0 new tasks".

2/16/2022 11:45:16 AM | Rosetta@home | update requested by user
2/16/2022 11:45:20 AM | Rosetta@home | Sending scheduler request: Requested by user.
2/16/2022 11:45:20 AM | Rosetta@home | Requesting new tasks for CPU
2/16/2022 11:45:22 AM | Rosetta@home | Scheduler request completed: got 0 new tasks
2/16/2022 11:45:22 AM | Rosetta@home | No tasks sent
2/16/2022 11:45:22 AM | Rosetta@home | Project requested delay of 31 seconds


When this happens, the online website server status shows approximately 5000 tasks are ready to send, with some large number (~100k) of tasks in progress, and little server side processing occurring.

Computing status
Work
Tasks ready to send 4992
Tasks in progress 115529
Workunits waiting for validation 0
Workunits waiting for assimilation 1
Workunits waiting for file deletion 1
Tasks waiting for file deletion 1
Transitioner backlog (hours) 0.00


It seems that whenever the available number of tasks gets down to around 5000, all work units are considered sent, and the server backend is now just waiting for completed work to be returned. I don't recall ever seeing the available tasks go down to zero.

When I do eventually get some tasks, everything runs as expected locally until all tasks are finished. Then I go idle for days waiting for more tasks to become available.

If seems to me that the project is just not generating as much work for all of it's users these days. I don't know if that is because the number of work units are down, or there are many more users available to process the same number of generally available units, or if the type of work has changed and I am unaware of what my system is lacking so it can be sent some of these "new" type of tasks now being made available.

Is there a checklist somewhere that I can use to verify my system is setup correctly? Because my BOINC Manager currently thinks everything is running just fine.

I used to run Rosetta work exclusively. But to keep my computers occupied (non-idle) I have since added other projects so I can pickup other tasks when no Rosette tasks are available. The downside is that when Rosetta tasks are available, these other projects dilute the amount of resources I can devote to Rosetta in the hands-off processing approach I prefer, as all projects now have to share the available CPU time.

If many Rosetta users are running out of work, but there are still 10s or 100s of thousands of tasks still in progress, can Rosetta start limiting the number of tasks sent to individual users (even if they are willing to backlog a large numbers of tasks locally)?

I have seen other projects where tasks were only generated in large bursts, and the users knew to backlog days or weeks worth of tasks since the server would quickly run out of new tasks to send out. The result was that if you didn't stockpile tasks during the initial big release, you would virtually never see any tasks unless BOINC happened to check in during a new big release of tasks days or weeks in the future.

Limiting the size of individual user backlogs would spread the available work out across all the available users. That would help retain more users, since everyone would feel like they are contributing to the project. At this point, I feel like I'm getting sidelined with no work, while others are sitting on a lot of work units they cannot run immediately. And the rate of results back to Rosetta will be delayed unnecessarily as they wait for the return of backlogged tasks for a few users instead of sending them to idle machines instead.

My Rosetta@home Statistics graph clearly shows 3 bursts of activity over a total of 8 days within the past 30 days. That leaves me sitting idle for 22 days (or about 75% of that time). My main PC (which the graph come from) is capable of running 16 concurrent tasks in 32 GB of RAM at ~3.5 GHZ CPU speed, so while I can normally complete many concurrent tasks in about 8 hours, 75% of the month Rosetta gets ZERO results from me for lack of tasks to run.

https://drive.google.com/file/d/1X5aBWy0xj2wgV7DpF9tqjrRg8i8E-XEY/view
ID: 104844 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,749,688
RAC: 2,261
Message 104845 - Posted: 16 Feb 2022, 19:22:32 UTC - in response to Message 104842.  

I couldn't even access the home page of Sidock.

Sidock is in maintenance today, but will be turn online soon


TN-Grid may be something, but I'm not sure what.

It's a storical boinc project about gene network....
http://gene.disi.unitn.it/test/



QuChem has been offline for 3 days now...must have blown something up to be offline this long.
No webserver, no project server....dead.
ID: 104845 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 175 · 176 · 177 · 178 · 179 · 180 · 181 . . . 281 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org