horns named project files causing pgtables out of memory errors in boinc all others run fine

Message boards : Number crunching : horns named project files causing pgtables out of memory errors in boinc all others run fine

To post messages, you must log in.

AuthorMessage
Profile at90systems

Send message
Joined: 19 Apr 20
Posts: 7
Credit: 700,368
RAC: 0
Message 99732 - Posted: 21 Nov 2020, 13:18:41 UTC

Have 2 Rpi running Ubuntu. All other project files run fine (named other things) All files I get that are named horns_.....xxxxxx etc. start but eventually cause multiple errors such as pgtables out of memory etc and I am having to abort them to clear the issues. Rebooting the Pi does not solve the issues, happening on both units. It is just the horns named files. Any suggestions or similar problems anyone has noticed with a solution, I hate to keep just aborting a particular strand? TIA
ID: 99732 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 99733 - Posted: 21 Nov 2020, 13:29:52 UTC - in response to Message 99732.  

Many of the horns tasks require a large amount of memory and are likely to fail on smaller machines in the way you have seen. You are probably best off aborting any that you receive; there are plenty of other task types that require less RAM and will run fine on a Pi. There’s some related discussion in another thread, starting here.
ID: 99733 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1860
Credit: 8,160,158
RAC: 8,440
Message 99734 - Posted: 21 Nov 2020, 14:13:19 UTC - in response to Message 99733.  

You are probably best off aborting any that you receive

As i wrote it's better to create the possibility to choose apps in the user's profile.
ID: 99734 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 99735 - Posted: 21 Nov 2020, 15:17:51 UTC - in response to Message 99734.  

better to create the possibility to choose apps in the user's profile.
As a way of allowing users to select which strands of research they contribute to, certainly. It would require substantially clearer categorisation of tasks than we have at present, though.

But avoiding performance issues like this one is a separate matter. It would be better for the server to have some knowledge of the characteristics of each task type, so it could automatically refrain from sending large tasks to small hosts that stand little chance of completing them.
ID: 99735 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile at90systems

Send message
Joined: 19 Apr 20
Posts: 7
Credit: 700,368
RAC: 0
Message 99739 - Posted: 21 Nov 2020, 21:32:35 UTC

I kind of figured that was the problem to start with thanks for the pointer to the other thread. Surprised to see that since both of my units are 4GB Pi4s. Not to sound crazy but doesn't the need for large amounts of memory to process files like that kind of defeat the purpose of the distributed computing model? I know its tough to find platforms that support the PI equipment (trust me there are only a handful) just strange that there is only this one form (the horns) that requires so much. Seems like an easy fix though, split the files programming down to smaller subsets or limit them to a particular set of boinc users, the boinc registration should be able to supply some type of information as I know if you try to register the Pi with a project that is not supported it tells you that from the get go. It just gets repetitive having to go to the units I have (which are headless for the most part) and keep aborting one single file every day or other day when other subclasses work fine was all I was concerned about. Wondering if using an 8Gb Pi would solve the issue and keeping the 4Gb on machine learning project would fix the issue.
ID: 99739 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 99740 - Posted: 21 Nov 2020, 21:55:07 UTC - in response to Message 99739.  

I know its tough to find platforms that support the PI equipment (trust me there are only a handful) just strange that there is only this one form (the horns) that requires so much. Seems like an easy fix though, split the files programming down to smaller subsets or limit them to a particular set of boinc users, the boinc registration should be able to supply some type of information as I know if you try to register the Pi with a project that is not supported it tells you that from the get go.

No, the models usually start out large. They then get smaller with development. I don't know whether it is feasible to separate them at the large stage.
The real question is why do they allow a Pi at all? It is bound to fail at some point.
ID: 99740 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
jeff_b

Send message
Joined: 8 Apr 20
Posts: 3
Credit: 11,702,313
RAC: 4,211
Message 99744 - Posted: 22 Nov 2020, 19:07:54 UTC - in response to Message 99739.  

So far my 8gb pi is working ok with horns project files (so far), but my 4gb pi cluster doesn't like them so have to delete them.
ID: 99744 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1860
Credit: 8,160,158
RAC: 8,440
Message 99745 - Posted: 22 Nov 2020, 20:27:55 UTC - in response to Message 99740.  

The real question is why do they allow a Pi at all? It is bound to fail at some point.

Not only Raspberry Pi, but also smarphone and pc with less than 4bg per core...
ID: 99745 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
wolfman1360

Send message
Joined: 18 Feb 17
Posts: 72
Credit: 18,450,036
RAC: 0
Message 99776 - Posted: 26 Nov 2020, 9:53:44 UTC - in response to Message 99745.  

The real question is why do they allow a Pi at all? It is bound to fail at some point.

Not only Raspberry Pi, but also smarphone and pc with less than 4bg per core...

Because most WUs take 1 gb per core if not a bit less.
I wish there was a way to separate out these larger requirements for WUs on this project. I have quite a variety of machines and it's frustrating to have super huge WUs pop up unexpectedly. RPI or not, smartphone or not, everything should be able to contribute. Especially now.
ID: 99776 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile at90systems

Send message
Joined: 19 Apr 20
Posts: 7
Credit: 700,368
RAC: 0
Message 99777 - Posted: 26 Nov 2020, 12:22:51 UTC - in response to Message 99744.  

Good to know what going to ask that question, thank for the response.
ID: 99777 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile at90systems

Send message
Joined: 19 Apr 20
Posts: 7
Credit: 700,368
RAC: 0
Message 99778 - Posted: 26 Nov 2020, 12:27:18 UTC

Well, shout out to whomever fixed the problem, I hope it wasn't simply by just stopping the horns work units. PI units have been working flawlessly now for quite a few days without errors. In regards to why allow Pi units to run, they are more robust than what they were now with the 4Gb and 8Gb versions they can contribute so much to the program. Sure they don't do as much as a computer cluster, but they do contribute. The entire idea behind distributed computing is to allow anything to contribute and take the work load off of a much larger super computer, so in this day and age I say yes more boinc projects need to support them, not only that the space considerations and power consumption used by the units and expense (when compared to purchases not using old hardware though) is great.
ID: 99778 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1860
Credit: 8,160,158
RAC: 8,440
Message 99803 - Posted: 28 Nov 2020, 11:41:25 UTC - in response to Message 99776.  

I wish there was a way to separate out these larger requirements for WUs on this project. I have quite a variety of machines and it's frustrating to have super huge WUs pop up unexpectedly. RPI or not, smartphone or not, everything should be able to contribute. Especially now.

As i wrote in another thread, there is App_Plan in the boinc server scheduler....
ID: 99803 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,285
RAC: 12,464
Message 99807 - Posted: 28 Nov 2020, 14:04:03 UTC - in response to Message 99803.  

I wish there was a way to separate out these larger requirements for WUs on this project. I have quite a variety of machines and it's frustrating to have super huge WUs pop up unexpectedly. RPI or not, smartphone or not, everything should be able to contribute. Especially now.


As i wrote in another thread, there is App_Plan in the boinc server scheduler....


There's no way they will test that here at Rosetta, it would happen at the Beta Project first, Ralph@home, and right now they have zero tasks to crunch.
ID: 99807 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1860
Credit: 8,160,158
RAC: 8,440
Message 99808 - Posted: 28 Nov 2020, 15:01:43 UTC - in response to Message 99807.  
Last modified: 28 Nov 2020, 15:02:11 UTC

There's no way they will test that here at Rosetta, it would happen at the Beta Project first, Ralph@home, and right now they have zero tasks to crunch.


Ralph@home is often, very often, underused
ID: 99808 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : horns named project files causing pgtables out of memory errors in boinc all others run fine



©2024 University of Washington
https://www.bakerlab.org