Posts by planetclown

1) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 101860)
Posted 20 May 2021 by Profile planetclown
Post:
I'm seeing some compute errors around "Unable to open constraints file"

<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
Incorrect function.
 (0x1) - exit code 1 (0x1)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol pre_helix_boinc_v1.xml @helix_design.flags -in:file:silent pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_5zl5yc4k.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_5zl5yc4k.zip @pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_5zl5yc4k.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1240084
Using database: database_357d5d93529_n_methylminirosetta_database

ERROR: [ERROR] Unable to open constraints file: f506a88e740dc1433a9792f2e819aa3f_0001.MSAcst
ERROR:: Exit from: ......srccorescoringconstraintsConstraintIO.cc line: 457
BOINC:: Error reading and gzipping output datafile: default.out
05:40:41 (2928): called boinc_finish(1)

</stderr_txt>
]]>

Few examples:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1235909939
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1235751056
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1235940406
2) Message boards : Number crunching : Minirosetta 3.73-3.78 (Message 87976)
Posted 30 Dec 2017 by Profile planetclown
Post:
Ryzen is crappy? Are you a troll?
Yes. No. You don't seem to own a Ryzen. I do.

Let me give some brief information about my current computers. One Ryzen 7 1700, right now showing 516 valid tasks and 60 errors. And one FX-8320E, 67 valid and 1 error. I can assure you the Ryzen behaves exactly as planetclown describes. Either the application crashes outright with a segmentation fault, or the C library kills it because it detected an invalid pointer, this way preventing a possible segfault. If you think about it there must also be cases where an invalid pointer goes unnoticed but doesn't cause a segfault. The result could be anything. I wouldn't rely on a Ryzen for something important, let's hope this project's validator is good. If you dig through the project's host list you'll find more Ryzens showing these symptoms, the most obvious running Linux, but also some Windows hosts with a high number of access violations that could be related.

Also as planetclown describes, the errors don't seem to happen with the new Rosetta application and not at other projects, so you could be tempted to dismiss this as an application error in Rosetta Mini. But there's at least one other example of spontaneous segfaults on Ryzens. Search for "kill_ryzen" or "marginality error" and you'll find many reports on Ryzens segfaulting in a particular use case: massive parallel compiler runs on Linux. An extreme scenario, but not unrealistic, and there's no excuse for simply crashing. People there claim you're safe if you don't do that kind of thing, but without arguments, and Rosetta proves them wrong.

So there's at least two completely unrelated cases of several Ryzens segfaulting out of the blue and no valid reason to assume thats's all. In other words, those things can unpredictably crash for unknown reasons and if they don't crash you still can't trust the results. Crap.

Just want to reply with an updated status to my SEGFAULT issues with Ryzen 7 1700. I was able to reproduce the segmentation faults using the “kill ryzen” test. I also got a replacement Ryzen through AMD’s RMD process. It took about a week from when I mailed it back to when I received the replacement.

My original CPU had a manufacture date in the 21st week of the year, the replacement in the 39th week (where it’s believed chips produced in the 25th or prior weeks may have issues). I have now completed 97 Rosetta Mini v3.78 tasks on linux without a single error. It appears RMDing the Ryzen was the solution. Thank you floyd for providing information on the issues that people have been having with the Ryzen chips!

Results from my desktop with the latest Ryzen:
https://boinc.bakerlab.org/results.php?hostid=3297625&offset=0&show_names=0&state=0&appid=4[/url]
3) Message boards : Number crunching : Minirosetta 3.73-3.78 (Message 87776)
Posted 30 Nov 2017 by Profile planetclown
Post:
Hello, I'm occasionally seeing two different errors on the following apps:

    Rosetta Mini v3.78 x86_64-pc-linux-gnu
    Rosetta Mini v3.78 i686-pc-linux-gnu


I've seen it on Lubuntu and Linux Mint (both Ubuntu 16.04/Xenial) along with BOINC 7.6.31. Link to computer.

The first error is glibc detected with free(): invalid pointer

BOINC:: Worker startup. 
Starting watchdog...
Watchdog active.
*** glibc detected *** ../../projects/boinc.bakerlab.org_rosetta/minirosetta_3.78_x86_64-pc-linux-gnu: free(): invalid pointer: 0x13867fb8 ***
======= Backtrace: =========
[0xdf36941]
[0xdf3a45b]
[0xede768c]
[0xdeffb51]
[0x81630ad]
[0xd45eb92]
[0xd45ebcb]
[0xd465336]
[0xd46ca67]
[0xd46feef]
[0xd474232]
[0xd400a01]
[0xd40c69a]
[0xc9ac83d]
[0xc9ad47f]
[0xca8b53f]
[0xb08de97]
[0xb265920]
[0xb2a83b6]
[0xb29f4d2]
[0x8aaae73]
[0x8aae71d]
[0x8ab361b]
[0x8a925f9]
[0x8a65a47]
[0xb371855]
[0xb3743be]
[0xb434b13]
[0xb43119d]
[0x8a6fa23]
[0x8056303]
[0xdf0cfd8]
[0x8048131]
======= Memory map: ========
08048000-0ede4000 r-xp 00000000 08:05 1183736                            /var/lib/boinc-client/projects/boinc.bakerlab.org_rosetta/minirosetta_3.78_x86_64-pc-linux-gnu
0ede4000-0edec000 rw-p 06d9c000 08:05 1183736                            /var/lib/boinc-client/projects/boinc.bakerlab.org_rosetta/minirosetta_3.78_x86_64-pc-linux-gnu
0edec000-0f115000 rw-p 00000000 00:00 0 
10d45000-17e18000 rw-p 00000000 00:00 0                                  [heap]
ebd2d000-f2cd4000 rw-p 00000000 00:00 0 
f305c000-f3d64000 rw-p 00000000 00:00 0 
f4200000-f4221000 rw-p 00000000 00:00 0 
f4221000-f4300000 ---p 00000000 00:00 0 
f517e000-f517f000 ---p 00000000 00:00 0 
f517f000-f5e8f000 rw-p 00000000 00:00 0 
f5e8f000-f7667000 rw-s 00000000 08:05 1581177                            /var/lib/boinc-client/slots/11/boinc_minirosetta_11
f7667000-f7668000 ---p 00000000 00:00 0 
f7668000-f766b000 rw-p 00000000 00:00 0 
f766b000-f766d000 rw-s 00000000 08:05 1581173                            /var/lib/boinc-client/slots/11/boinc_mmap_file
f766d000-f776a000 rw-p 00000000 00:00 0 
f776a000-f776c000 r--p 00000000 00:00 0                                  [vvar]
f776c000-f776e000 r-xp 00000000 00:00 0                                  [vdso]
ffc6c000-ffc8e000 rw-p 00000000 00:00 0                                  [stack]

</stderr_txt>
]]>


The second error is SIGSEGV: segmentation violation
BOINC:: Worker startup. 
Starting watchdog...
Watchdog active.
SIGSEGV: segmentation violation
Stack trace (4 frames):
[0xde75dcf]
[0xf77ceca0]
[0xdf36358]
[0xeffb51ff]

Exiting...

</stderr_txt>
]]>


I haven't seen any errors while running Rosetta v4.06 app or other BOINC projects. Any help would be appreciated. Thank you!
4) Questions and Answers : Web site : RSS - NEWS feed problems (Message 76502)
Posted 3 Mar 2014 by Profile planetclown
Post:
Thank you for fixing this minor issue!
5) Questions and Answers : Web site : RSS - NEWS feed problems (Message 76495)
Posted 27 Feb 2014 by Profile planetclown
Post:
The RSS feed for project news is currently invalid.

The ampersand in the text "-KEL & DOVA" appears to be the culprit, which is reserved as an escape character in XML.






©2024 University of Washington
https://www.bakerlab.org