Stuck on Uploading

Message boards : Number crunching : Stuck on Uploading

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Luigi R.

Send message
Joined: 7 Feb 14
Posts: 39
Credit: 2,045,527
RAC: 0
Message 80272 - Posted: 24 Jun 2016, 18:45:53 UTC - in response to Message 80269.  

Change of hosts.txt is a stupid idea! There should be another way.

No, it's not stupid. It worked fine for me. I uploaded and got validated 14 4-hour tasks, that's 56 hours of computing.

On Linux is /etc/hosts.

Replace of "srv1.bakerlab.org" in client_state.xml is useless. Whenever i restart the boinc manager is it again there. I can't find from where boinc restore this.
i have it also replace in client_state_prev.xml.

Yep, that is what I was worried about.

Thanks your lazy guys have i now two WU's lost! 24 hours of work for nothing.

See above.
ID: 80272 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luigi R.

Send message
Joined: 7 Feb 14
Posts: 39
Credit: 2,045,527
RAC: 0
Message 80273 - Posted: 24 Jun 2016, 18:54:48 UTC - in response to Message 80271.  
Last modified: 24 Jun 2016, 18:58:52 UTC

..., we're all on the same team here man :)

No! That is wrong, they need us, we don't.
They get money for the job. I spend money, for hardware and for power. I spend my spare time for maintenance of my systems to keep them always crunching.
Later, in an hopefully no so far future, if the first results come to the market, pay i again to get the medicine or whatever.

I share this point, but - you know - BOINC is plenty of medicine projects. If Rosetta@home is not always reliable or doesn't meet your demands, you can choose another project to place side by side or to replace it.

For example I don't run Rosetta@home tasks very much cause of inefficiency, but it is off-topic. Anyway 3.73 app sounds more cpu-intensive to me.
ID: 80273 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sinspin

Send message
Joined: 30 Jan 06
Posts: 29
Credit: 6,574,585
RAC: 0
Message 80274 - Posted: 24 Jun 2016, 18:57:45 UTC

It is not our problem that srv1 is not responding. And it is not on us to implement a solution for that. Even more, if the solution is so damn stupid. 99% of the users will forget that they have made this change an will later have another problems only caused by this solution!

I think it is much better to change the server redirection at your side. Then can you spend all time you need to find out what is wrong. Especially since the weekend is at the doorstep.
ID: 80274 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luigi R.

Send message
Joined: 7 Feb 14
Posts: 39
Credit: 2,045,527
RAC: 0
Message 80275 - Posted: 24 Jun 2016, 19:06:21 UTC - in response to Message 80274.  
Last modified: 24 Jun 2016, 19:08:37 UTC

99% of the users will forget that they have made this change an will later have another problems only caused by this solution!

Maybe not. These servers will be both online, so there could be no difference in the future. For forgetful users it could be fine to edit hosts file to upload only and to comment that line right after upload process.

I'm agree we should not do any special configuration. We should run BOINC only.

Not our problem, but we want our work to get validated too.
ID: 80275 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
vb

Send message
Joined: 31 Dec 14
Posts: 1
Credit: 1,587,957
RAC: 0
Message 80276 - Posted: 24 Jun 2016, 20:18:58 UTC

Temporary fix in place (modified hosts file). Thanks, whoever proposed it!
ID: 80276 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMDave

Send message
Joined: 16 Dec 05
Posts: 35
Credit: 12,576,896
RAC: 0
Message 80277 - Posted: 24 Jun 2016, 20:54:12 UTC - in response to Message 80261.  

My suspicions confirmed;

  • srv1.bakerlab.org at 128.95.160.142 is not working / not accepting uploads
  • srv4.bakerlab.org at 128.95.160.145 is accepting uploads



All tasks that are trying to be uploaded to the first server are failing, while any tasks attempt to be sent to srv4.bakerlab.org successfully upload.



No go for me.

Here is the line I added:
128.95.160.145    srv1.bakerlab.org, in accordance with these instructions.

Here is the Event log:
6/24/2016 4:48:10 PM | rosetta@home | update requested by user
6/24/2016 4:48:15 PM | rosetta@home | Sending scheduler request: Requested by user.
6/24/2016 4:48:15 PM | rosetta@home | Not requesting tasks: too many uploads in progress
6/24/2016 4:48:17 PM | rosetta@home | Scheduler request completed
ID: 80277 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,641,936
RAC: 44
Message 80278 - Posted: 24 Jun 2016, 21:05:52 UTC - in response to Message 80277.  

My suspicions confirmed;

  • srv1.bakerlab.org at 128.95.160.142 is not working / not accepting uploads
  • srv4.bakerlab.org at 128.95.160.145 is accepting uploads



All tasks that are trying to be uploaded to the first server are failing, while any tasks attempt to be sent to srv4.bakerlab.org successfully upload.



No go for me.

Here is the line I added:
128.95.160.145    srv1.bakerlab.org, in accordance with these instructions.

Here is the Event log:
6/24/2016 4:48:10 PM | rosetta@home | update requested by user
6/24/2016 4:48:15 PM | rosetta@home | Sending scheduler request: Requested by user.
6/24/2016 4:48:15 PM | rosetta@home | Not requesting tasks: too many uploads in progress
6/24/2016 4:48:17 PM | rosetta@home | Scheduler request completed


This fix is not to help with Requesting new tasks, but rather to help get your uploads going again. Judging from the snippet of your log file that you shared, you were attempting to fetch new work..

Go to your transfers tab and click one of your Uploading tasks that is stuck on 'Retry in ....' and click Retry Now - it should work.

Also, if you want to check that your hosts modification worked, you can attempt to visit this link http://srv1.bakerlab.org/ and it should bring you to the Rosetta@Home homepage :) Alternatively, you can ping srv1.bakerlab.org

Cheers
**38 cores crunching for R@H on behalf of cancercomputer.org - a non-profit supporting High Performance Computing in Cancer Research
ID: 80278 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMDave

Send message
Joined: 16 Dec 05
Posts: 35
Credit: 12,576,896
RAC: 0
Message 80280 - Posted: 24 Jun 2016, 21:50:57 UTC - in response to Message 80278.  

This fix is not to help with Requesting new tasks, but rather to help get your uploads going again. Judging from the snippet of your log file that you shared, you were attempting to fetch new work..

Go to your transfers tab and click one of your Uploading tasks that is stuck on 'Retry in ....' and click Retry Now - it should work.

Also, if you want to check that your hosts modification worked, you can attempt to visit this link http://srv1.bakerlab.org/ and it should bring you to the Rosetta@Home homepage :) Alternatively, you can ping srv1.bakerlab.org

Cheers


I went to the Tasks tab and clicked Update.  I thought that was an all-inclusive request.  Retry Now worked, all "uploading" WUs are gone.  Just need to remember to undo mod to hosts file if problem is fixed.

Thank you Timo
ID: 80280 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Oscar Järkvik

Send message
Joined: 16 Jun 16
Posts: 1
Credit: 1,651,223
RAC: 0
Message 80283 - Posted: 24 Jun 2016, 22:58:00 UTC

I am running BOINC on Linux machines headless. After updating /etc/hosts with 128.95.160.145 srv1.bakerlab.org they still weren't able to upload recent work. I control BOINC through boinctui but couldn't find a way to force retrying of upload. The solution I found was to restart BOINC after the hosts file edit, then the uploads retried, and successfully uploaded. On a recent Debian system or derivative (Ubuntu and more), BOINC is restarted from the terminal by running sudo systemctl restart boinc.
ID: 80283 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hjdghjdghjghjjggh

Send message
Joined: 10 May 16
Posts: 7
Credit: 9,749
RAC: 0
Message 80284 - Posted: 24 Jun 2016, 23:37:20 UTC

Without making any changes to any hosts files and simply waiting (While still crunching at least), all my completed tasks finally uploaded and I got credit for them. Looks like they fixed it.
ID: 80284 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luigi R.

Send message
Joined: 7 Feb 14
Posts: 39
Credit: 2,045,527
RAC: 0
Message 80290 - Posted: 25 Jun 2016, 6:53:24 UTC - in response to Message 80284.  
Last modified: 25 Jun 2016, 7:08:40 UTC

I am running BOINC on Linux machines headless. After updating /etc/hosts with 128.95.160.145 srv1.bakerlab.org they still weren't able to upload recent work. I control BOINC through boinctui but couldn't find a way to force retrying of upload. The solution I found was to restart BOINC after the hosts file edit, then the uploads retried, and successfully uploaded. On a recent Debian system or derivative (Ubuntu and more), BOINC is restarted from the terminal by running sudo systemctl restart boinc.

Yes, on Xubuntu you didn't need restarting BOINC to get hosts modifications effective and I'm not running BOINC as a service.

You could have tried
$boinccmd --get_file_transfers

Then
$boinccmd --file_transfer https://boinc.bakerlab.org/rosetta $filename retry

where $boinccmd is your boinccmd command including path and $filename is the result you want try to reupload.

Or simply
$boinccmd --set_network_mode never
$boinccmd --set_network_mode auto

and retry should be automatic.

See boinccmd.


Without making any changes to any hosts files and simply waiting (While still crunching at least), all my completed tasks finally uploaded and I got credit for them. Looks like they fixed it.

That solution was recommended for people having near-deadline tasks. Some robetta (rb_*) tasks were expiring on 24/06.
ID: 80290 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sinspin

Send message
Joined: 30 Jan 06
Posts: 29
Credit: 6,574,585
RAC: 0
Message 80293 - Posted: 25 Jun 2016, 16:23:15 UTC

It seems that the Problem is fixed. Thank you guys very much!
ID: 80293 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
entigy

Send message
Joined: 2 Nov 05
Posts: 5
Credit: 990,830
RAC: 0
Message 80308 - Posted: 28 Jun 2016, 6:49:43 UTC

This. Again.

28/06/2016 07:45:22 | rosetta@home | Computation for task gr062216_EEHEE_rd3_1211_fragments_fold_SAVE_ALL_OUT_384881_10_0 finished
28/06/2016 07:45:24 | rosetta@home | Started upload of gr062216_EEHEE_rd3_1211_fragments_fold_SAVE_ALL_OUT_384881_10_0_0
28/06/2016 07:46:07 | rosetta@home | Temporarily failed upload of gr062216_EEHEE_rd3_1211_fragments_fold_SAVE_ALL_OUT_384881_10_0_0: transient HTTP error
28/06/2016 07:46:07 | rosetta@home | Backing off 00:02:21 on upload of gr062216_EEHEE_rd3_1211_fragments_fold_SAVE_ALL_OUT_384881_10_0_0
28/06/2016 07:47:26 | rosetta@home | Started upload of gr062216_EEHEE_rd3_1211_fragments_fold_SAVE_ALL_OUT_384881_10_0_0
28/06/2016 07:48:13 | rosetta@home | Temporarily failed upload of gr062216_EEHEE_rd3_1211_fragments_fold_SAVE_ALL_OUT_384881_10_0_0: transient HTTP error
28/06/2016 07:48:13 | rosetta@home | Backing off 00:04:05 on upload of gr062216_EEHEE_rd3_1211_fragments_fold_SAVE_ALL_OUT_384881_10_0_0

ID: 80308 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
entigy

Send message
Joined: 2 Nov 05
Posts: 5
Credit: 990,830
RAC: 0
Message 80481 - Posted: 4 Aug 2016, 7:17:46 UTC

And again.

04/08/2016 08:14:54 | rosetta@home | Started upload of FFD__300b59433e13d029e07395f92f38637f_abinitioDocking_16_08_09_53_15_globalDocking_7_SAVE_ALL_OUT_406840_18_0_0
04/08/2016 08:15:16 | rosetta@home | Temporarily failed upload of FFD__300b59433e13d029e07395f92f38637f_abinitioDocking_16_08_09_53_15_globalDocking_7_SAVE_ALL_OUT_406840_18_0_0: transient HTTP error
04/08/2016 08:15:16 | rosetta@home | Backing off 00:41:38 on upload of FFD__300b59433e13d029e07395f92f38637f_abinitioDocking_16_08_09_53_15_globalDocking_7_SAVE_ALL_OUT_406840_18_0_0

ID: 80481 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 80484 - Posted: 4 Aug 2016, 19:29:13 UTC - in response to Message 80481.  

And again.

04/08/2016 08:14:54 | rosetta@home | Started upload of FFD__300b59433e13d029e07395f92f38637f_abinitioDocking_16_08_09_53_15_globalDocking_7_SAVE_ALL_OUT_406840_18_0_0
04/08/2016 08:15:16 | rosetta@home | Temporarily failed upload of FFD__300b59433e13d029e07395f92f38637f_abinitioDocking_16_08_09_53_15_globalDocking_7_SAVE_ALL_OUT_406840_18_0_0: transient HTTP error
04/08/2016 08:15:16 | rosetta@home | Backing off 00:41:38 on upload of FFD__300b59433e13d029e07395f92f38637f_abinitioDocking_16_08_09_53_15_globalDocking_7_SAVE_ALL_OUT_406840_18_0_0



thanks for alerting us on this. I'll see what's going on.
ID: 80484 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 80485 - Posted: 4 Aug 2016, 23:05:16 UTC

Why revive this thread? Anyway, the server status is showing everything is disabled except for one process. Not sure why the "Disabled" boxes are green, since it obviously isn't an all-systems-go status, but...

Seems like rosetta@home is a good project if you don't care much. Obviously I've given up caring, but I still wonder if the shadow extends to the results. I'd be a bit troubled if someone asked me to review a research paper under the condition of not caring much about the calculated results.
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 80485 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 80490 - Posted: 5 Aug 2016, 17:09:06 UTC - in response to Message 80485.  
Last modified: 5 Aug 2016, 18:50:46 UTC

Why revive this thread? Anyway, the server status is showing everything is disabled except for one process. Not sure why the "Disabled" boxes are green, since it obviously isn't an all-systems-go status, but...

Seems like rosetta@home is a good project if you don't care much. Obviously I've given up caring, but I still wonder if the shadow extends to the results. I'd be a bit troubled if someone asked me to review a research paper under the condition of not caring much about the calculated results.


Coincidentally one of our servers had a very high load so we looked into that and decided to reboot the server which is why the project was disabled momentarily during the reboot. It seems ok now but I have to check further. This coincided with a researcher submitting over 20,000 individual unique jobs all at once which was the likely culprit for the load due to the enormous amount of files associated with so many unique jobs.
ID: 80490 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 80491 - Posted: 5 Aug 2016, 17:15:35 UTC - in response to Message 80490.  
Last modified: 5 Aug 2016, 18:51:06 UTC

Why revive this thread? Anyway, the server status is showing everything is disabled except for one process. Not sure why the "Disabled" boxes are green, since it obviously isn't an all-systems-go status, but...

Seems like rosetta@home is a good project if you don't care much. Obviously I've given up caring, but I still wonder if the shadow extends to the results. I'd be a bit troubled if someone asked me to review a research paper under the condition of not caring much about the calculated results.


Coincidentally one of our servers had a very high load so we looked into that and decided to reboot the server which is why the project was disabled momentarily during the reboot. It seems ok now but I have to check further. This coincided with a researcher submitting over 20,000 individual unique jobs all at once which was the likely culprit for the load due to the enormous amount of files associated with so many unique jobs.


I'll ask the researcher to describe her work here on the forum. As more of our research may involve this kind of huge number of unique jobs for protein design, we will have to look into our hardware options for the upgrade we are planning to support this.
ID: 80491 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1981
Credit: 38,413,287
RAC: 12,777
Message 80496 - Posted: 8 Aug 2016, 2:41:41 UTC - in response to Message 80490.  

Why revive this thread? Anyway, the server status is showing everything is disabled except for one process. Not sure why the "Disabled" boxes are green, since it obviously isn't an all-systems-go status, but...

Seems like rosetta@home is a good project if you don't care much. Obviously I've given up caring, but I still wonder if the shadow extends to the results. I'd be a bit troubled if someone asked me to review a research paper under the condition of not caring much about the calculated results.

Coincidentally one of our servers had a very high load so we looked into that and decided to reboot the server which is why the project was disabled momentarily during the reboot. It seems ok now but I have to check further. This coincided with a researcher submitting over 20,000 individual unique jobs all at once which was the likely culprit for the load due to the enormous amount of files associated with so many unique jobs.

This issue has returned for just two of my tasks. Later tasks are uploading fine, but two persist in being unable to upload.
ID: 80496 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
googloo
Avatar

Send message
Joined: 15 Sep 06
Posts: 133
Credit: 21,676,514
RAC: 4,938
Message 80497 - Posted: 8 Aug 2016, 3:45:46 UTC

Three tasks stuck.
ID: 80497 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Stuck on Uploading



©2024 University of Washington
https://www.bakerlab.org