Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 65 · 66 · 67 · 68 · 69 · 70 · 71 . . . 313 · Next

AuthorMessage
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1235
Credit: 14,341,506
RAC: 433
Message 97967 - Posted: 8 Jul 2020, 20:58:08 UTC - in response to Message 97961.  

Recently, I started seeing a lot of jobs completing with a status of "aborted by project". They were completed prior to the deadline, but it doesn't appear that I get any credit for them either.
Any ideas/thoughts on this?

Usually done only if your computer has downloaded them but not started on them yet, but can be done even if started or completed but not returned.

You may need to try harder to return completed tasks.
ID: 97967 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1235
Credit: 14,341,506
RAC: 433
Message 97968 - Posted: 8 Jul 2020, 21:01:44 UTC - in response to Message 97964.  

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1092259599


Both tasks errored out after just a few seconds. Slightly different error codes but the same "upload failure":

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>TFSCAFFOLD0001_6_SAVE_ALL_OUT_IGNORE_THE_REST_0ub6wd0j_953357_1_1_r1180454695_0</file_name>
<error_code>-240(stat() failed)</error_code>
</file_xfer_error>
</message>
]]>

Very fast failures usually mean that not all of the expected output files were produced, and therefore those files were not available to upload,
ID: 97968 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 354
Credit: 1,281,132
RAC: 489
Message 97969 - Posted: 8 Jul 2020, 22:35:52 UTC - in response to Message 97968.  

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1092259599


Both tasks errored out after just a few seconds. Slightly different error codes but the same "upload failure":

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>TFSCAFFOLD0001_6_SAVE_ALL_OUT_IGNORE_THE_REST_0ub6wd0j_953357_1_1_r1180454695_0</file_name>
<error_code>-240(stat() failed)</error_code>
</file_xfer_error>
</message>
]]>

Very fast failures usually mean that not all of the expected output files were produced, and therefore those files were not available to upload,


Well, 3rd TFSCAFFOLD task the errors out. Good thing they fail quickly, whatever is causing it.
ID: 97969 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1740
Credit: 18,534,891
RAC: 4,618
Message 97975 - Posted: 9 Jul 2020, 4:44:51 UTC - in response to Message 97961.  

Recently, I started seeing a lot of jobs completing with a status of "aborted by project". They were completed prior to the deadline, but it doesn't appear that I get any credit for them either.
Any ideas/thoughts on this?
The only similar errors i could find were "Cancelled by server", and none of them were cancelled before your system started to process them.
No work done, no Credit.
Grant
Darwin NT
ID: 97975 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,121,817
RAC: 1,355
Message 97986 - Posted: 9 Jul 2020, 20:01:36 UTC - in response to Message 97968.  
Last modified: 9 Jul 2020, 20:03:46 UTC

Very fast failures usually mean that not all of the expected output files were produced, and therefore those files were not available to upload,


How can it have got to the uploading stage if it's only just started?

Well, 3rd TFSCAFFOLD task the errors out. Good thing they fail quickly, whatever is causing it.


Hopefully the server gives up and only tries sending them to several people before putting them in a "fix this" box for the programmers. I have also noticed my Boinc client backing off and not trying to get Rosetta tasks if it's just had a few failures. Universe and LHC tasks coming in more often just now.
ID: 97986 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1235
Credit: 14,341,506
RAC: 433
Message 97989 - Posted: 9 Jul 2020, 22:19:51 UTC - in response to Message 97986.  
Last modified: 9 Jul 2020, 22:22:24 UTC

Very fast failures usually mean that not all of the expected output files were produced, and therefore those files were not available to upload,


How can it have got to the uploading stage if it's only just started?

I'd expect that only if an error occurred it some point where the error output wasn't going to either of the output log files the users are able to see, which seems to be what happened to most of the TFSCAFFOLD tasks my computer tried to run.

Well, 3rd TFSCAFFOLD task the errors out. Good thing they fail quickly, whatever is causing it.


Hopefully the server gives up and only tries sending them to several people before putting them in a "fix this" box for the programmers. I have also noticed my Boinc client backing off and not trying to get Rosetta tasks if it's just had a few failures. Universe and LHC tasks coming in more often just now.

I've found a thread for a moderator's attention, and asked the moderator to check this thread.

Those of this type that I've looked at were set to have the server give up on the workunit after two failed tasks.
ID: 97989 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 354
Credit: 1,281,132
RAC: 489
Message 97991 - Posted: 9 Jul 2020, 22:36:29 UTC - in response to Message 97989.  

Very fast failures usually mean that not all of the expected output files were produced, and therefore those files were not available to upload,


How can it have got to the uploading stage if it's only just started?

I'd expect that only if an error occurred it some point where the error output wasn't going to either of the output log files the users are able to see, which seems to be what happened to most of the TFSCAFFOLD tasks my computer tried to run.

Well, 3rd TFSCAFFOLD task the errors out. Good thing they fail quickly, whatever is causing it.


Hopefully the server gives up and only tries sending them to several people before putting them in a "fix this" box for the programmers. I have also noticed my Boinc client backing off and not trying to get Rosetta tasks if it's just had a few failures. Universe and LHC tasks coming in more often just now.

I've found a thread for a moderator's attention, and asked the moderator to check this thread.

Those of this type that I've looked at were set to have the server give up on the workunit after two failed tasks.



I saw, thanks. Had 1 or 2 more fail but am now running 1 just fine, as others have reported.
ID: 97991 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stevie G

Send message
Joined: 15 Dec 18
Posts: 110
Credit: 879,133
RAC: 1,030
Message 98037 - Posted: 13 Jul 2020, 3:34:16 UTC - in response to Message 97955.  
Last modified: 13 Jul 2020, 3:36:16 UTC


For some reason, the computer shut down and was unresponsive for 48 hours. No action from the power button, hard drive, etc. Nada, nichts, zip.


After a week of working interspersed with total shut-downs I finally solved the problem.

It was a faulty power supply. I installed a new heftier (600W) power supply and a more powerful fan.

The machine has been crunching non-stop for three days. YAAAYY!

The exhaust air is much cooler. According to the CoreTemp utility, the CPU is running between 42 and 54 degrees C. It's also quieter and apparently happier.

Now if Rosetta would send me some WUs, that would complete my week.

Thanks for your support and patience.

Cheers,
Steven Gaber
Oldsmar, FL
ID: 98037 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2163
Credit: 41,606,783
RAC: 4,182
Message 98063 - Posted: 14 Jul 2020, 2:37:18 UTC - in response to Message 98037.  


For some reason, the computer shut down and was unresponsive for 48 hours. No action from the power button, hard drive, etc. Nada, nichts, zip.


After a week of working interspersed with total shut-downs I finally solved the problem.

It was a faulty power supply. I installed a new heftier (600W) power supply and a more powerful fan.

The machine has been crunching non-stop for three days. YAAAYY!

The exhaust air is much cooler. According to the CoreTemp utility, the CPU is running between 42 and 54 degrees C. It's also quieter and apparently happier.

Now if Rosetta would send me some WUs, that would complete my week.

Thanks for your support and patience.

Cheers,
Steven Gaber
Oldsmar, FL

Good news - it could've easily been something much more expensive
ID: 98063 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stevie G

Send message
Joined: 15 Dec 18
Posts: 110
Credit: 879,133
RAC: 1,030
Message 98066 - Posted: 14 Jul 2020, 6:20:41 UTC - in response to Message 98063.  


Good news - it could've easily been something much more expensive


Yes, but I think at that point, say a defective motherboard or CPU, I would have just gotten another computer. It's like trying to keep an old car running for another year.

This one was barebones box that I filled with the parts for around $550.

Now that it's working again, I think I will put another 8 GB of RAM in it.

There are some really inexpensive refurbished Dell and HP computers out there, starting at $200.

Anybody ever try one of those?

Steven Gaber
Oldsmar, FL
ID: 98066 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 98072 - Posted: 14 Jul 2020, 12:46:58 UTC

rgmjp tasks running way longer than 8 hours: 1220528042 · 1220528132 · 1220528339

I’ve got another couple still running after nearly 16 hours, and a few more in the pipeline…
ID: 98072 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 22 Apr 20
Posts: 17
Credit: 270,864
RAC: 0
Message 98080 - Posted: 14 Jul 2020, 19:37:35 UTC - in response to Message 98072.  

This one was just a smidgeon over 23hrs. Not a problem for me as my hosts run 24/7 and I have "Switch between" set beyond 2 days (to allow an occasional long LHC virtual task to run to completion without interruption) but I don't know how it would have fared on a machine that only runs 8hrs a day or if it was switched out too many times.
ID: 98080 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1235
Credit: 14,341,506
RAC: 433
Message 98087 - Posted: 14 Jul 2020, 21:08:06 UTC

The rgmjp tasks appear to complete only one decoy. The first decoy is usually only a quick check to make sure that your computer is running properly, so does this mean that the usual first decoy is skipped for these, or does it mean that more decoys are done but without adding them to the decoy count?
ID: 98087 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 98093 - Posted: 14 Jul 2020, 23:27:53 UTC - in response to Message 98067.  

Stevie G wrote:
I think I will put another 8 GB of RAM in it.
Depends what else you’re using the machine for, but it might not be worth it. Rosetta won’t benefit; it rarely needs more than 1 GB per task.


There are some really inexpensive refurbished Dell and HP computers out there, starting at $200.

Anybody ever try one of those?
You can certainly get a lot for your money buying used workstations and servers. Bear in mind that cheap to buy can mean expensive to run – the older the machine, the slower and less energy-efficient. Unless you want to use the computers for heating, the very cheapest machines are likely to be a false economy. I’m sure people here will be happy to give specific buying advice; probably best to start a new thread.
ID: 98093 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2163
Credit: 41,606,783
RAC: 4,182
Message 98095 - Posted: 15 Jul 2020, 0:19:11 UTC - in response to Message 98067.  

Good news - it could've easily been something much more expensive

Yes, but I think at that point, say a defective motherboard or CPU, I would have just gotten another computer. It's like trying to keep an old car running for another year.

This one was barebones box that I filled with the parts for around $550. Maybe even spring for an SSD.

Now that it's working again, I think I will put another 8 GB of RAM in it.

There are some really inexpensive refurbished Dell and HP computers out there, starting at $200.

Anybody ever try one of those?

Over time, I've upgraded components one at a time, though not necessarily in the right order, I have to admit, often through necessity.
I started with a power supply too and made sure it was more than enough at the time (750W modular EVGA 80 Gold+) before messing round unsuccessfully with cooling fans until I realised I needed a modern case that could handle 140mm fans both for a CPU cooler and case fans because they can shift more heat/air away at slower (ie quieter) speeds. Cases that can handle 2x140mm coolers often handle up to 3x120mm coolers too, so that's the case I've got now.
After that, upgrades are motherboardCPU combos, sometimes including RAM.
Doing it piecemeal like this makes it easier for me to finance over time and has provided a lot more flexibility over time.
I don't have an SSD on my main PC yet, but I've installed a 2.5" one on our work PC and one of those M.2 drives on another, both of which are very much worth the cost
ID: 98095 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1896
Credit: 9,262,433
RAC: 3,901
Message 98096 - Posted: 15 Jul 2020, 1:53:40 UTC - in response to Message 98067.  
Last modified: 15 Jul 2020, 1:58:32 UTC



There are some really inexpensive refurbished Dell and HP computers out there, starting at $200.

Anybody ever try one of those?

Steven Gaber
Oldsmar, FL


I have 5 of them running!! All of mine are dual quad core Xeon cpu's meaning 16 cpu cores thru Hyper-threading for each pc. As others have said unless they come with a big enough psu they can't really use the gpu too as replacement psu's are VERY expensive and proprietary. Mine all run at 2.5ish ghz some more some less but they flat bang out cpu workunits over the course of 24 hours. Mine are not here at Rosetta they are at TN-Grid right now and the units there take about 4 to 6 hours each depending on the machine. I bought them on Ebay without harddrives or gpu's but they did come with @20gb of memory in each one. I put Win7 on each one and then auto upgraded to Win10 but some of the Win10 upgrades do not like the older pc's vry much, some work just great but some updates hang. I have all of them running Linux Mint at the moment but still have the Win10 harddrives sitting on a shelf. They boot up VERY slowly on a standard sata drive so I put 240gb SSD drives in each one and they boot and run just fine. I have no paid software on any of them as they are strictly Boinc machines and all they do is crunch 24/7, that means periodic updates for a/v software and Windows updates, which is partly why they run Linux right now. When I bought mine they werer about $150 to $250 each plus shipping thru Ebay and they are BIG BOXES!! Look up HP Z600 cases!!! I just looked on Ebay and they are in the $350US and up range now.
ID: 98096 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1235
Credit: 14,341,506
RAC: 433
Message 98104 - Posted: 15 Jul 2020, 12:34:37 UTC

The moderator appears to have been absent in the moderator contact thread for almost 3 months.

The TFSCAFFOLD0001 tasks are still often failing, but now have an extra line near what's probably the point of failure:

BOINC :: WS_max 0

No obvious meaning to me, but hopefully it's more meaningful to the person who set up these tasks.
ID: 98104 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 98106 - Posted: 15 Jul 2020, 15:32:15 UTC - in response to Message 98104.  

‘WS’ might be working set
ID: 98106 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 98117 - Posted: 15 Jul 2020, 21:01:00 UTC - in response to Message 98087.  

The rgmjp tasks appear to complete only one decoy. The first decoy is usually only a quick check to make sure that your computer is running properly, so does this mean that the usual first decoy is skipped for these, or does it mean that more decoys are done but without adding them to the decoy count?


You are mistaken about the first decoy. The first decoy is a legit, full model of the protein, not a simple test of the environment.
Rosetta Moderator: Mod.Sense
ID: 98117 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,121,817
RAC: 1,355
Message 98126 - Posted: 16 Jul 2020, 17:13:21 UTC - in response to Message 98096.  
Last modified: 16 Jul 2020, 17:15:19 UTC

I have 5 of them running!! All of mine are dual quad core Xeon cpu's meaning 16 cpu cores thru Hyper-threading for each pc. As others have said unless they come with a big enough psu they can't really use the gpu too as replacement psu's are VERY expensive and proprietary.


Actually, I have two dual Xeon machines, the PSUs were only £20 each. Genuine Dell supplies, 2nd hand on Ebay. I could have used a normal ATX supply, but I couldn't find the weird pinouts for the non-standardly wired ATX connectors. I didn't plug a normal supply in when I noticed all the yellow wires were at one end, instead of randomly scattered like a normal ATX plug. It makes more sense to have all of each voltage together for the tracks on the motherboard, but I guess ATX plugs have been added to over the years.

You can certainly get a lot for your money buying used workstations and servers. Bear in mind that cheap to buy can mean expensive to run – the older the machine, the slower and less energy-efficient. Unless you want to use the computers for heating, the very cheapest machines are likely to be a false economy.


The newer ones certainly use less power, but they cost a lot more. I guess the best thing is to add up the electricity cost and the parts cost and see what's cheapest per FLOP over the next few years.
ID: 98126 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 65 · 66 · 67 · 68 · 69 · 70 · 71 . . . 313 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2025 University of Washington
https://www.bakerlab.org