Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 75 · 76 · 77 · 78 · 79 · 80 · 81 . . . 280 · Next

AuthorMessage
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1226
Credit: 13,911,303
RAC: 2,681
Message 99952 - Posted: 7 Dec 2020, 13:15:50 UTC - in response to Message 99949.  

A task running for over 12 hours so far, even though I've selected a run length of 8 hours:

3stub_cyc_target_1cwa_01152_14_extract_B_SAVE_ALL_OUT_1044879_311

The estimated time remaining is INCREASING, not decreasing.

It is doing checkpoints WITHOUT ending the task. 26 seconds since the last one.

Is something wrong with this task? Should I abort it?

Have you tried exiting boinc and opening it again or restarting your computer/laptop? If after restart it starts back at for example 10 hours letters run and see if it will finish with in the 12 hours. If it doesn't & keeps running past 13 hours feel free to abort

I let it go overnight, It finally finished after 15.5 hours.
ID: 99952 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Joe

Send message
Joined: 24 Nov 17
Posts: 1
Credit: 3,737,245
RAC: 1,065
Message 100051 - Posted: 16 Dec 2020, 4:40:18 UTC

I've been having this issue with my FreeBSD machine with BOINC installed on it always failing to compute jobs https://kitsunehosting.net/nextcloud/index.php/s/rysi6tY6TE33oZr/preview
Now that I'm looking at it I'm pretty sure it never completed a job.
Is there anything I should look into? Maybe some logs or something, find out what's failing and if I can fix it?

Thanks so much for reading.
ID: 100051 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100055 - Posted: 16 Dec 2020, 10:51:45 UTC - in response to Message 100051.  
Last modified: 16 Dec 2020, 11:09:36 UTC

Your machine does have some credit, so it’s obviously succeeded in running something at some point – just not recently.

The Exec format error failures I assume are because the system is unable to run Rosetta’s Linux application. (Note it was trying to run the 32-⁠bit application, which may not be appropriate for a 64-⁠bit system. It’s also possible that older application versions were able to run, but recent updates have broken something. I don’t know enough about BSD’s Linux capability to be able to diagnose further. Rosetta@home does not provide a native BSD application. One user did report success running the 64-bit Rosetta Linux application on FreeBSD recently.)

The others failed to download some of their input files. Could something be blocking downloads?
ID: 100055 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100081 - Posted: 20 Dec 2020, 22:18:07 UTC
Last modified: 20 Dec 2020, 22:21:55 UTC

Several of a new batch of horns5 tasks failing with access violations shortly after startup
ID: 100081 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 100082 - Posted: 20 Dec 2020, 23:30:58 UTC - in response to Message 100081.  
Last modified: 20 Dec 2020, 23:35:56 UTC

Several of a new batch of horns5 tasks failing with access violations shortly after startup

Maybe limited to Windows? I am running seven now (1 to 6 hours) on Ubuntu 18.04.5 (Ryzen 3900X) without a problem.

PS - The sizes are quite reasonable, being less than 500 MB. That indicates they are not a new project, but a continuation of horns4 . It is interesting to speculate what that might be...
ID: 100082 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1226
Credit: 13,911,303
RAC: 2,681
Message 100083 - Posted: 20 Dec 2020, 23:47:33 UTC - in response to Message 100081.  
Last modified: 21 Dec 2020, 0:03:04 UTC

Several of a new batch of horns5 tasks failing with access violations shortly after startup

I looked at the stderr log for several of your failed tasks. About two thirds of them failed while trying to access location 0, and I can't read the dump well enough to tell what instruction was trying to access that location. I'll have to leave the problem to someone who can read dumps better than I can.

I did notice that you are using Windows 7, rather than the newer Windows 10.

The only recent horns5 task I spotted for my Windows 10 computer completed and validated.

Also, I noticed that all of your computers run BOINC 7.16.5; my computer runs 7.16.11.

If no one else helps, you could try updating BOINC on one of your computers showing the problem, and Windows on another, to see if either of these older versions causes the problem.
ID: 100083 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100084 - Posted: 21 Dec 2020, 0:12:43 UTC

Also noticed the graphics app does not work (either disappears immediately or hangs) with those horns5 tasks that do manage to run
ID: 100084 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 350
Credit: 1,063,870
RAC: 2,945
Message 100088 - Posted: 21 Dec 2020, 12:09:27 UTC
Last modified: 21 Dec 2020, 12:20:47 UTC

I've only got one horns5 but it's running fine under Ubuntu 18.04 at Google Colab.
ID: 100088 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100093 - Posted: 21 Dec 2020, 18:18:42 UTC

The strangest part is: some work units that have failed on my machines have succeeded elsewhere (example), and some that have failed elsewhere have succeeded on mine (example).
ID: 100093 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 376
Credit: 10,891,045
RAC: 10,357
Message 100094 - Posted: 21 Dec 2020, 18:34:50 UTC

No problem this end, two running at the moment and no failures.
ID: 100094 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1226
Credit: 13,911,303
RAC: 2,681
Message 100095 - Posted: 21 Dec 2020, 19:54:08 UTC

I just had one horns5 fail on my computer after about 20 minutes while another is past that point and about half finished.

The error message for the one that failed looks likely to mean a problem in an input file.
ID: 100095 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1226
Credit: 13,911,303
RAC: 2,681
Message 100096 - Posted: 21 Dec 2020, 20:06:44 UTC - in response to Message 100093.  

The strangest part is: some work units that have failed on my machines have succeeded elsewhere (example), and some that have failed elsewhere have succeeded on mine (example).

This could mean that the application is picking up a random number from somewhere and using it as part of its input.

If this in not deliberate, it could be the application program using the contents of some memory location without first setting it to a known value.
ID: 100096 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1226
Credit: 13,911,303
RAC: 2,681
Message 100097 - Posted: 21 Dec 2020, 20:18:16 UTC - in response to Message 100095.  
Last modified: 21 Dec 2020, 20:18:38 UTC

I just had one horns5 fail on my computer after about 20 minutes while another is past that point and about half finished.

The error message for the one that failed looks likely to mean a problem in an input file.

I now have three horns5 tasks running at once on my computer, all of them well past 20 minutes.
ID: 100097 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100098 - Posted: 21 Dec 2020, 20:50:15 UTC - in response to Message 100095.  

I just had one horns5 fail
That’s the same error Grant reported this morning. Interesting that those ones detected a problem and exited, while the others have just fallen over in a heap. Of course if, as you suggest, they’re using uninitialised data somewhere, anything could happen…
ID: 100098 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 38,911,415
RAC: 23,730
Message 100101 - Posted: 22 Dec 2020, 3:33:11 UTC - in response to Message 100094.  

No problem this end, two running at the moment and no failures.

Same
ID: 100101 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1508
Credit: 15,028,167
RAC: 22,256
Message 100102 - Posted: 22 Dec 2020, 6:18:56 UTC

At least this time around the with the horns5 Tasks i've had more Valid ones than errors. Last time it was easily 90% were errors.
Grant
Darwin NT
ID: 100102 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tom

Send message
Joined: 29 Nov 08
Posts: 10
Credit: 6,044,733
RAC: 0
Message 100118 - Posted: 23 Dec 2020, 23:29:15 UTC - in response to Message 99767.  

for 5 months i have been producing exactly one error-free task a day. it's hard to do more than that when there's a limit, wouldn't you think?

as for why i'm supposedly "producing errors", i'm running the same software that the project provided, on the same computer that has run it for years. and just coincidentally, these "errors" only started with the switchover to secure http.

not interested anymore.
ID: 100118 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100119 - Posted: 24 Dec 2020, 4:24:50 UTC - in response to Message 100118.  

As we have tried to explain to you before: the limit is there to protect the project from hosts that fail to perform useful work, and if the problem were related to SSL you wouldn’t be able to download any tasks in the first place. It genuinely is coincidence that your trouble started around the same time as the switch.

You’re not alone in finding that application version 4.20 doesn’t work on older versions of Mac OS, though the only resolution seems to be “try a different project”.
ID: 100119 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1878
Credit: 8,342,355
RAC: 10,131
Message 100123 - Posted: 24 Dec 2020, 14:24:56 UTC - in response to Message 100119.  

You’re not alone in finding that application version 4.20 doesn’t work on older versions of Mac OS, though the only resolution seems to be “try a different project”.

Or waiting a bugfixed version for Mac....
ID: 100123 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1508
Credit: 15,028,167
RAC: 22,256
Message 100127 - Posted: 25 Dec 2020, 10:27:18 UTC
Last modified: 25 Dec 2020, 10:30:24 UTC

Plenty of WUs ready to go (11 million queued jobs), but all i get is No Tasks sent when requesting new work to replace returned work (Ready to send is zero).
In progress has fallen from 550k down to 400k.

Someone needs to give the servers a kick.
Grant
Darwin NT
ID: 100127 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 75 · 76 · 77 · 78 · 79 · 80 · 81 . . . 280 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org