Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 23 · 24 · 25 · 26 · 27 · 28 · 29 . . . 305 · Next

AuthorMessage
Killersocke@rosetta

Send message
Joined: 13 Nov 06
Posts: 29
Credit: 2,579,125
RAC: 0
Message 90186 - Posted: 11 Jan 2019, 1:45:21 UTC

Stopped by Server?
Whats that. I had never seen this message before

9 Tasks, all beginning with RK000004-A

Task WU
1050725434 946459659 1664252 9 Jan 2019, 6:21:44 UTC 9 Jan 2019, 14:53:57 UTC Abgebrochen durch Server 30,416.41 28,451.39 --- Rosetta Mini v3.78

1050725435 946459661 1664252 9 Jan 2019, 6:21:44 UTC 9 Jan 2019, 14:53:57 UTC Abgebrochen durch Server 30,412.33 28,233.52 --- Rosetta Mini v3.78

1050726076 946459658 1664252 9 Jan 2019, 6:21:44 UTC 9 Jan 2019, 14:53:57 UTC Abgebrochen durch Server 30,417.52 28,197.50 --- Rosetta Mini v3.78

1050726077 946459660 1664252 9 Jan 2019, 6:21:44 UTC 9 Jan 2019, 14:53:57 UTC Abgebrochen durch Server 30,407.97 28,503.64 --- Rosetta Mini v3.78

1050726078 946459662 1664252 9 Jan 2019, 6:21:44 UTC 9 Jan 2019, 14:53:57 UTC Abgebrochen durch Server 30,416.41 28,207.77 --- Rosetta Mini v3.78

1050725358 946459507 1664252 9 Jan 2019, 6:21:44 UTC 9 Jan 2019, 14:53:57 UTC Abgebrochen durch Server 30,416.41 28,246.91 --- Rosetta Mini v3.78

1050725875 946459256 1664252 9 Jan 2019, 6:21:44 UTC 9 Jan 2019, 14:53:57 UTC Abgebrochen durch Server 30,416.41 28,516.28 --- Rosetta Mini v3.78

1050491949 946251647 1664252 7 Jan 2019, 19:20:45 UTC 8 Jan 2019, 1:53:34 UTC Fehler beim Berechnen 20,863.31 19,961.66 --- Rosetta v4.07

1050491998 946251712 1664252 7 Jan 2019, 19:20:45 UTC 8 Jan 2019, 1:49:31 UTC Fehler beim Berechnen 23,107.00 20,452.11 --- Rosetta v4.07
windows_intelx86
ID: 90186 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 90190 - Posted: 11 Jan 2019, 17:14:30 UTC - in response to Message 90186.  
Last modified: 12 Jan 2019, 6:02:16 UTC

It basically means that there were problems found with that batch of work, and the server sends a kill request, to help avoid your system finding the problem as well.
Rosetta Moderator: Mod.Sense
ID: 90190 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile jay

Send message
Joined: 12 Jan 08
Posts: 20
Credit: 195,801
RAC: 0
Message 90191 - Posted: 11 Jan 2019, 20:34:06 UTC - in response to Message 90190.  

Thanks for your update!! I was wondering why a WU was taking more than 19 hours..
Jay
ID: 90191 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 90194 - Posted: 13 Jan 2019, 14:27:40 UTC - in response to Message 90185.  

Instead of being an idiot, be informative and post a link or a quote.

Information is scattered all over different sections of the message boards or buried in a past post. How many posts have been made here since December 25? Who says I have the time to go look through all the threads on the boards to find the answer?

Gees...I work 11 hour days and am gone from home 13 hours a day and during work I don't have time to hunt for a post.

I also did look at a couple of different logical places for information and did not find the answer.

So stop assuming you know how my life is and how much time I have
available to look for information.
ID: 90194 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 90195 - Posted: 13 Jan 2019, 14:30:30 UTC - in response to Message 90185.  

Instead of being an idiot, be informative and post a link or a quote.

Information is scattered all over different sections of the message boards or buried in a past post. How many posts have been made here since December 25? Who says I have the time to go look through all the threads on the boards to find the answer?

Gees...I work 11 hour days and am gone from home 13 hours a day and during work I don't have time to hunt for a post.

I also did look at a couple of different logical places for information and did not find the answer.

So stop assuming you know how my life is and how much time I have
available to look for information.

Also if you bothered to look at my other posts (wherever they might be) about this issue you would know it predates the Dec. 25 issue by a week.
ID: 90195 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 90196 - Posted: 13 Jan 2019, 14:30:32 UTC - in response to Message 90185.  

Instead of being an idiot, be informative and post a link or a quote.

Information is scattered all over different sections of the message boards or buried in a past post. How many posts have been made here since December 25? Who says I have the time to go look through all the threads on the boards to find the answer?

Gees...I work 11 hour days and am gone from home 13 hours a day and during work I don't have time to hunt for a post.

I also did look at a couple of different logical places for information and did not find the answer.

So stop assuming you know how my life is and how much time I have
available to look for information.

Also if you bothered to look at my other posts (wherever they might be) about this issue you would know it predates the Dec. 25 issue by a week.
ID: 90196 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 90197 - Posted: 13 Jan 2019, 14:30:35 UTC - in response to Message 90185.  

Instead of being an idiot, be informative and post a link or a quote.

Information is scattered all over different sections of the message boards or buried in a past post. How many posts have been made here since December 25? Who says I have the time to go look through all the threads on the boards to find the answer?

Gees...I work 11 hour days and am gone from home 13 hours a day and during work I don't have time to hunt for a post.

I also did look at a couple of different logical places for information and did not find the answer.

So stop assuming you know how my life is and how much time I have
available to look for information.

Also if you bothered to look at my other posts (wherever they might be) about this issue you would know it predates the Dec. 25 issue by a week.
ID: 90197 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 90198 - Posted: 13 Jan 2019, 16:54:07 UTC - in response to Message 90197.  
Last modified: 13 Jan 2019, 17:38:30 UTC

Instead of being an idiot, be informative and post a link or a quote.

I gave you the answer, straight into your hand. And you still weren't able to figure it out.
(I agree that information is hard to find sometimes - that is why you should look at the topics first. If you don't have time, I don't think you can expect someone else to do it for you.)
ID: 90198 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 90203 - Posted: 14 Jan 2019, 23:16:38 UTC - in response to Message 90198.  
Last modified: 14 Jan 2019, 23:27:53 UTC

What you said was it has "something" to do with "decemeber 25"
Well what exactly is something? and what does decemeber 25 have to with anything when this problem goes back to the 18th or even earlier?

If you have time..go find a specific post and put the link here.
I do that for people when I have time and know the specific answer from another post.
Generalization does nothing for me. That's what you offered. "Something"

Here is specific detailed info that I see on my single project graph.
All other projects are 100% equal share with Rosetta. (default setting) Rosetta is losing credits and should be trying to make them up but does not. Credit stays normal until on the 17th and dips for a day and the corrects itself until the 21st and after that all downhill. So how does this go with your "something on the 25th"? 17 and 21 predate the 25th.

[/img]
ID: 90203 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 90208 - Posted: 15 Jan 2019, 18:28:00 UTC

Amazing!!

1/15/2019 7:24:26 PM | Rosetta@home | Sending scheduler request: To fetch work.
1/15/2019 7:24:26 PM | Rosetta@home | Requesting new tasks for CPU
1/15/2019 7:24:29 PM | Rosetta@home | Scheduler request completed: got 29 new tasks
ID: 90208 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 395
Credit: 12,235,242
RAC: 10,719
Message 90509 - Posted: 14 Mar 2019, 16:01:06 UTC

I am seeing all too many errors from work units at the end of their processing cycle (after 12hours processing) and would like some advice as to whether there are any changes I can make to stop them.

Examples can be seen in WUs 1062692421 and 1062687362 but basically they show exit status 139 (unknown error) with signal 11 and a message saying that default.out.gz already exists with size -1.

Any suggestions would be gratefully received.
ID: 90509 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 90511 - Posted: 14 Mar 2019, 21:08:47 UTC - in response to Message 90509.  

You may need more memory. You have 8 GB on your Ryzen, but the Rosetta work units sometimes take up to 1 GB each.
ID: 90511 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 395
Credit: 12,235,242
RAC: 10,719
Message 90512 - Posted: 15 Mar 2019, 0:06:20 UTC - in response to Message 90511.  

You may need more memory. You have 8 GB on your Ryzen, but the Rosetta work units sometimes take up to 1 GB each.


Ouch, I know my free memory sometimes goes down to 2 or 3% but I hadn’t thought of it going negative.

Thanks for the suggestion, I’ll look at getting another 8gb and maybe some more for the FX rig as well, that only has 4gb for the 4 cores.

Hmm, that raises a thought. They’re both running half and half between Rosetta and WCG which, I think, has a lower memory requirement?
ID: 90512 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 90513 - Posted: 15 Mar 2019, 1:43:25 UTC - in response to Message 90512.  

Hmm, that raises a thought. They’re both running half and half between Rosetta and WCG which, I think, has a lower memory requirement?

Yes, all the WCG ones that I know of have a pretty small memory requirement. The biggest is MIP, which is around 300 MB.
But you probably aren't always running an equal proportion of Rosetta and WCG. The BOINC scheduler does strange things, and may give you all Rosetta once in a while.
ID: 90513 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,317,813
RAC: 3,316
Message 90514 - Posted: 15 Mar 2019, 3:44:12 UTC - in response to Message 90509.  

I am seeing all too many errors from work units at the end of their processing cycle (after 12hours processing) and would like some advice as to whether there are any changes I can make to stop them.

Examples can be seen in WUs 1062692421 and 1062687362 but basically they show exit status 139 (unknown error) with signal 11 and a message saying that default.out.gz already exists with size -1.

Any suggestions would be gratefully received.

You might check if decreasing the time that workunits can run on your computers to ten hours has any effect on this.
ID: 90514 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 395
Credit: 12,235,242
RAC: 10,719
Message 90515 - Posted: 15 Mar 2019, 8:40:31 UTC - in response to Message 90514.  

I am seeing all too many errors from work units at the end of their processing cycle (after 12hours processing) and would like some advice as to whether there are any changes I can make to stop them.

Examples can be seen in WUs 1062692421 and 1062687362 but basically they show exit status 139 (unknown error) with signal 11 and a message saying that default.out.gz already exists with size -1.

Any suggestions would be gratefully received.

You might check if decreasing the time that workunits can run on your computers to ten hours has any effect on this.


The time was set to the default of 8 hours so I don’t know why these were taking 12 hours anyway but I reset it to 6 hours yesterday to try to reduce the loss when a wu errored out in this way.
ID: 90515 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 395
Credit: 12,235,242
RAC: 10,719
Message 90516 - Posted: 15 Mar 2019, 8:50:18 UTC - in response to Message 90513.  

Hmm, that raises a thought. They’re both running half and half between Rosetta and WCG which, I think, has a lower memory requirement?

Yes, all the WCG ones that I know of have a pretty small memory requirement. The biggest is MIP, which is around 300 MB.
But you probably aren't always running an equal proportion of Rosetta and WCG. The BOINC scheduler does strange things, and may give you all Rosetta once in a while.


I monitor them fairly closely and I’m fairly sure the the ryzen was running 6 and 6 at the time. The FX had just come out of a period where it was running all Rosetta for a few days to catch up after running all WCG for a while but the ryzen completed 46 WCG WUs that day which is about normal and was equal every time I looked.
ID: 90516 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 395
Credit: 12,235,242
RAC: 10,719
Message 90517 - Posted: 15 Mar 2019, 10:37:08 UTC

OK, extra memory ordered for both machines so we’ll see if that sorts it.
ID: 90517 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 395
Credit: 12,235,242
RAC: 10,719
Message 90519 - Posted: 15 Mar 2019, 13:32:02 UTC

Whilst I’m here, a silly question if I may.

Is there any way of changing the View Tasks page from sorting by date sent to sorting by date returned?
ID: 90519 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 90520 - Posted: 15 Mar 2019, 14:19:38 UTC - in response to Message 90519.  

Whilst I’m here, a silly question if I may.

Is there any way of changing the View Tasks page from sorting by date sent to sorting by date returned?


I take it you are asking about the website? ...and not the BOINC Manager tasks page. I am not aware of a way to define how to present the web page.
Rosetta Moderator: Mod.Sense
ID: 90520 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 23 · 24 · 25 · 26 · 27 · 28 · 29 . . . 305 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org