Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 186 · 187 · 188 · 189 · 190 · 191 · 192 . . . 280 · Next

AuthorMessage
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 626
Message 105291 - Posted: 28 Feb 2022, 18:56:00 UTC - in response to Message 105272.  

It's not about points to me. Its about advancing the science with the hope they can use it to fight whatever they are looking at. You don't get anything for points, just a status symbol among BOINC users.

Certainly. But it is a little more subtle on Folding, since both GPUs and CPUs do the same sort of thing, molecular dynamics (MD). So it is often thought that since the GPUs are getting more points, they are more valuable. But that is not quite how it works, since they are doing different types of work on different types of molecules. It helps to see the task descriptions, which the researcher provides for every work unit at Folding. That is quite different than here, to make an obvious point.



The task descriptions are very nice.
You understand better what you are crunching.
RAH could learn from that.
In the old days a grad student on occasion would write about what they were putting on the system.
That died a long time ago when everyone disappeared.
ID: 105291 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 626
Message 105292 - Posted: 28 Feb 2022, 19:00:24 UTC - in response to Message 105277.  

But is the queue shown on the page the real queue?
Yes.


I would love to see the breakdown by task type vs some number.
Computing, Server status, Tasks by application. Unsent numbers.

But as far as the main page Total queued jobs is concerned, you just have to remember roughly what that number was before any Rosetta 4.20 are sent out to have an idea of just how many Rosetta 4.20 Tasks there are queued up to go out (when there are some to actually go out).



If they have a million then are they mostly python or a mix with 4.2 and then why isn't 4.2 kept filled up?
Python is down to 4,xxx but there are a million in queue. So it really doesn't make any sense.
ID: 105292 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1508
Credit: 14,999,138
RAC: 22,099
Message 105300 - Posted: 1 Mar 2022, 7:41:07 UTC - in response to Message 105292.  

If they have a million then are they mostly python or a mix with 4.2
Like i said, you have to keep track of how many Pythons there are (Total queued jobs) when there are no Rosetta 4.20 Tasks available (Unsent number in Tasks by application).
If there are no Unset Rosetta 4.20 Tasks, then all of the Total queued jobs are Python. If the Total queued jobs number suddenly increases, and there are now Unset Rosetta 4.20 work, then that increased number is the number of Rosetta 4.20 jobs in the Total Queued jobs.

But if there aren't any Rosetta 4.20 Tasks in the Unset numbers, then that increase in Total queued jobs numbers means they are Python Tasks and you need to remember that new number to figure out how many Rosetta 4.20 Tasks are in the Total queued jobs number the next time it increases and it's caused by a batch of Rosetta 4.20 Tasks and not more Python Tasks.



and then why isn't 4.2 kept filled up?
You need to ask the project why they aren't sending out any Rosetta 4.20 work, and why so little on the extremely rare occasions when they do.



Python is down to 4,xxx but there are a million in queue. So it really doesn't make any sense.
Think of the Unsent number as what is queued up in the feeder ready to go to. The Total queued jobs is that number, plus all the other Tasks ready to go, that won't fit in the feeder.
That's why you can have 20 million in the Total queued jobs, and only a few thousand in the Unset.
Grant
Darwin NT
ID: 105300 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 376
Credit: 10,866,935
RAC: 9,158
Message 105301 - Posted: 1 Mar 2022, 10:58:55 UTC - in response to Message 105300.  




Python is down to 4,xxx but there are a million in queue. So it really doesn't make any sense.
Think of the Unsent number as what is queued up in the feeder ready to go to. The Total queued jobs is that number, plus all the other Tasks ready to go, that won't fit in the feeder.
That's why you can have 20 million in the Total queued jobs, and only a few thousand in the Unset.


Or think of it the other way, the work generators dump the tasks into the total queue and there are a couple of daemons running around checking whether the unspent queue each is responsible for is full (5,000 for Python, 27,000 for 4.20) and topping it up as necessary. There could be many reasons why they choose to separate the storage used by the work generators from that used by the task delivery function.
ID: 105301 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1226
Credit: 13,908,692
RAC: 3,097
Message 105303 - Posted: 1 Mar 2022, 15:44:46 UTC - in response to Message 105300.  

[snip]

and then why isn't 4.2 kept filled up?
You need to ask the project why they aren't sending out any Rosetta 4.20 work, and why so little on the extremely rare occasions when they do.

[snip]

But how do we ask the project anything and get an answer?
ID: 105303 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 626
Message 105304 - Posted: 1 Mar 2022, 20:17:38 UTC - in response to Message 105303.  

[snip]

and then why isn't 4.2 kept filled up?
You need to ask the project why they aren't sending out any Rosetta 4.20 work, and why so little on the extremely rare occasions when they do.

[snip]

But how do we ask the project anything and get an answer?


You don't because they don't answer.
Only one person here has a "in" with the project and it seems they ignore that person as well sometimes.
The project is what it is...we have to come up with the answers to any problems.
We are beginning to feel like leftovers after they got their neural network up and running.
That is when everyone disappeared or shortly after that.
ID: 105304 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 626
Message 105305 - Posted: 1 Mar 2022, 20:21:01 UTC

Found out I can't run 15 pythons on a full throttle system and expect any sort of reasonable response time.
I have never used run based on preferences until now. Take back some resources when I am using the computer.
ID: 105305 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1876
Credit: 8,335,951
RAC: 10,406
Message 105310 - Posted: 2 Mar 2022, 11:20:54 UTC - in response to Message 105304.  

You don't because they don't answer....We are beginning to feel like leftovers after they got their neural network up and running.
That is when everyone disappeared or shortly after that.


They disappeared before the new Vm app.
It's YEARS that the communications of the project are gone
ID: 105310 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 626
Message 105311 - Posted: 2 Mar 2022, 20:03:39 UTC - in response to Message 105310.  

You don't because they don't answer....We are beginning to feel like leftovers after they got their neural network up and running.
That is when everyone disappeared or shortly after that.


They disappeared before the new Vm app.
It's YEARS that the communications of the project are gone



I'm talking neural network start up days.
ID: 105311 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1876
Credit: 8,335,951
RAC: 10,406
Message 105315 - Posted: 3 Mar 2022, 11:43:43 UTC - in response to Message 105311.  

I'm talking neural network start up days.


I know. I said that they don't answer not only about "neural network" app (and science) but they don't answer to anything since....i don't remember
ID: 105315 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ben W

Send message
Joined: 2 Mar 22
Posts: 2
Credit: 217
RAC: 0
Message 105316 - Posted: 3 Mar 2022, 17:34:54 UTC

no work units sent to my machines. joined a few days ago. why?
ID: 105316 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 234
Credit: 336,960
RAC: 997
Message 105317 - Posted: 3 Mar 2022, 17:42:50 UTC - in response to Message 105316.  

There are only workunits for virtualbox app
https://boinc.bakerlab.org/rosetta/server_status.php
ID: 105317 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tullio

Send message
Joined: 10 May 20
Posts: 63
Credit: 630,125
RAC: 0
Message 105318 - Posted: 3 Mar 2022, 17:44:15 UTC - in response to Message 105316.  

Did you install VirtualBox? rosetta python needs it.
Tullio
ID: 105318 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ben W

Send message
Joined: 2 Mar 22
Posts: 2
Credit: 217
RAC: 0
Message 105319 - Posted: 3 Mar 2022, 20:44:03 UTC - in response to Message 105318.  

thanks guys :)
ID: 105319 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 626
Message 105320 - Posted: 3 Mar 2022, 22:16:52 UTC - in response to Message 105315.  

I'm talking neural network start up days.


I know. I said that they don't answer not only about "neural network" app (and science) but they don't answer to anything since....i don't remember


2019 or 2020 depending on who or what is what I see.
Dr. B it looks like stopped in 19 and Admin. stopped in 20.
ID: 105320 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lundare

Send message
Joined: 12 Apr 21
Posts: 2
Credit: 822,019
RAC: 0
Message 105330 - Posted: 5 Mar 2022, 17:28:38 UTC

Hi there!
I am experiencing some issues on all tasks from Rosetta@home from the last like 10 months or so. They ran fine before that.

Issue 1
The tasks from Rosetta do run fine and quite fast for the first 85-90% and then completely slows down for the remainder on the task.
It slows down to the extent that my computers will never complete them and stops progressing at 98-99% completed, but the CPUs is still pegged at 100% load.
I have waited for some to complete for several days. But they will never get complete. This behaviour occupies the CPU cores for a small eternity and prevent other tasks from running.

Issue 2
Many tasks from Rosetta will be stuck and wait for memory for seemingly for ever. This despite I have 48Gb in both my computers and the memory is hardly used at all according to taskmanager. This behaviour seem to prevent all other tasks for other project to get access to memory or CPU.
So in the end I may only have 2 tasks running on a 12core/24thread system. That I don't like.

This however happens from time to time with other projects as well, but then all of my memory actually is used. That I can understand that these tasks will wait for memory to be freed up when other tasks gets completed. And when the memory-hogging tasks finnishes, the tasks that waited for memory will run.

I have tried reset, reboot, reinstall, remove and add again the project without luck.
Now I have stopped running Rosetta completely because I can hardly complete any task.

Thank in advance for help.

Have a nice day!
//Mattias

Specs for both computers:
Mac Pro 5,1
Dual Xeon X5680
48Gb RAM
SSD storage
AMD RX570 in both computers plus an extra RX560 in one
Latest Boinc and Virtual box and Mac OS Catalina with all patches.
ID: 105330 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 105331 - Posted: 5 Mar 2022, 21:33:19 UTC - in response to Message 105330.  

Hi there!
I am experiencing some issues on all tasks from Rosetta@home from the last like 10 months or so. They ran fine before that.

If you are running the pythons with VirtualBox, it looks like you are experiencing the "0 CPU" and "Vm job unmanageable" problems, both well known.

Search around on this thread, or related ones.
ID: 105331 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 626
Message 105332 - Posted: 5 Mar 2022, 23:33:42 UTC - in response to Message 105330.  

Let the work run out.
Then run a registry and disk cleaner on your system and make sure you can restart "fresh"
do a "repair" of BOINC just to make sure you have a "clean" working copy.
Do you shut your systems down at night or run 24/7?
Have you checked your VM manager for any dead tasks and removed them?

These are things I do with my windows system just to make sure I am on a "clean" machine and then I try again and if that does not work, then I go hunting for answers.

You might want to install BOINC tasks by Emfer so you can see your cpu usage per task and other good info, like memory usage physical and virtual. So if you see a CPU task running at say .10% and the run time is over 12 hours and your stalled at 97-99% and if your BOINC Tasks amount done (in 2 decimal places) runs at something like .05 for every two cycle updates, then you know its time to kill the tasks and not wait days to kill them.

If after the cleaning you still get stalls, then yeah, you have to look through here or try a google search that might link to a message. We discussed this last year sometime, that would be up to 10 pages in the past.
ID: 105332 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 234
Credit: 336,960
RAC: 997
Message 105333 - Posted: 5 Mar 2022, 23:39:42 UTC - in response to Message 105332.  

https://efmer.com/boinctasks/download-boinctasks/
ID: 105333 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lundare

Send message
Joined: 12 Apr 21
Posts: 2
Credit: 822,019
RAC: 0
Message 105339 - Posted: 6 Mar 2022, 9:45:26 UTC - in response to Message 105332.  

A clean install or not does not seem to do any different, since one of the computers has a just freshly installed Mac OS and software and the other install is about a year or so old.
Result is the same.

The computers is restarted once every week.

I will have to look in to BOINC tasks by Emfer.
ID: 105339 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 186 · 187 · 188 · 189 · 190 · 191 · 192 . . . 280 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org