Message boards : Number crunching : Rosetta 4.0+
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 19 · Next
Author | Message |
---|---|
Admin Project administrator Send message Joined: 1 Jul 05 Posts: 5144 Credit: 0 RAC: 0 |
I've talked to the researcher who submitted these jobs and I've also updated the validator to hopefully address this issue. Let us know if this continues. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2002 Credit: 9,790,281 RAC: 3,640 |
After 6h... 1062813667 <stderr_txt> |
Juha Send message Joined: 28 Mar 16 Posts: 13 Credit: 705,034 RAC: 0 |
@David E Kim I'll look into this. Could you also take a look at Linux 4.08 x86_64 version? I don't think I'm exaggerating much if I say it's crashing two thirds of the tasks on my machine. I have an older Linux machine and the previous 4.07 x86_64 version ran just fine. Failed tasks were rare with it. What's curious is that if a task that failed with 4.08 gets a second try on Windows machine it always succeeds. That suggests a bug in the app instead of tasks. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I have an older Linux machine and the previous 4.07 x86_64 version ran just fine. Failed tasks were rare with it. What's curious is that if a task that failed with 4.08 gets a second try on Windows machine it always succeeds. That suggests a bug in the app instead of tasks. Try a later Linux kernel. I had problems with the earlier ones on my Ryzens also, but now they work fine. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
rb_03_17_1833_1986__t000__0_C2_SAVE_ALL_OUT_IGNORE_THE_REST_821983_78_0 Peak working set size: 1,044.39 MB Peak swap size: 1,403.04 MB Error while computing, stderr output: <core_client_version>7.12.1</core_client_version> <![CDATA[ <message> (unknown error) - exit code -1073741819 (0xc0000005)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.07_windows_intelx86.exe -run:protocol jd2_scripting @flags_rb_03_17_1833_1986__t000__0_C2_robetta -silent_gz -mute all -out:file:silent default.out -in:file:boinc_wu_zip input_rb_03_17_1833_1986__t000__0_C2_robetta.zip -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 1235712 Starting watchdog... Watchdog active. Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00000000 Engaging BOINC Windows Runtime Debugger... ******************** BOINC Windows Runtime Debugger Version 7.9.0 Dump Timestamp : 03/18/19 08:48:30 Install Directory : Data Directory : C:ProgramDataBOINC Project Symstore : https://boinc.bakerlab.org/rosetta/symstore LoadLibraryA( C:ProgramDataBOINCdbghelp.dll ): GetLastError = 126 Loaded Library : dbghelp.dll LoadLibraryA( C:ProgramDataBOINCsymsrv.dll ): GetLastError = 126 LoadLibraryA( symsrv.dll ): GetLastError = 126 LoadLibraryA( C:ProgramDataBOINCsrcsrv.dll ): GetLastError = 126 LoadLibraryA( srcsrv.dll ): GetLastError = 126 LoadLibraryA( C:ProgramDataBOINCversion.dll ): GetLastError = 126 Loaded Library : version.dll SymInitialize(): GetLastError = 8 *** Dump of the Process Statistics: *** - I/O Operations Counters - Read: 73154, Write: 0, Other 12643 - I/O Transfers Counters - Read: 0, Write: 198584, Other 0 - Paged Pool Usage - QuotaPagedPoolUsage: 247488, QuotaPeakPagedPoolUsage: 247616 QuotaNonPagedPoolUsage: 33104, QuotaPeakNonPagedPoolUsage: 33104 - Virtual Memory Usage - VirtualSize: 2120523776, PeakVirtualSize: 2125643776 - Pagefile Usage - PagefileUsage: 1222176768, PeakPagefileUsage: 1477279744 - Working Set Size - WorkingSetSize: 582373376, PeakWorkingSetSize: 1095122944, PageFaultCount: 14848221 *** Dump of thread ID 5800 (state: Waiting): *** - Information - Status: Wait Reason: UserRequest, , Kernel Time: 726718720.000000, User Time: 132472659968.000000, Wait Time: 21575628.000000 - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00000000 *** Dump of thread ID 9824 (state: Waiting): *** - Information - Status: Wait Reason: ExecutionDelay, , Kernel Time: 2031250.000000, User Time: 625000.000000, Wait Time: 21575628.000000 *** Dump of thread ID 428 (state: Waiting): *** - Information - Status: Wait Reason: ExecutionDelay, , Kernel Time: 312500.000000, User Time: 0.000000, Wait Time: 21575524.000000 *** Debug Message Dump **** *** Foreground Window Data *** Window Name : Window Class : Window Process ID: 0 Window Thread ID : 0 Exiting... </stderr_txt> ]]> . |
Admin Project administrator Send message Joined: 1 Jul 05 Posts: 5144 Credit: 0 RAC: 0 |
This was quite a large protein. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2002 Credit: 9,790,281 RAC: 3,640 |
Some wus with this error (ex 1064625564) -529697949 (0xE06D7363) Unknown error code |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2002 Credit: 9,790,281 RAC: 3,640 |
Other wus after 80/90 minutes (0x1) - exit code 1 (0x1)</message> |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2002 Credit: 9,790,281 RAC: 3,640 |
Again, memory errors on some wus (unknown error) - exit code -529697949 (0xe06d7363)</message> Please, any admins/developers want to debug this app? |
Admin Project administrator Send message Joined: 1 Jul 05 Posts: 5144 Credit: 0 RAC: 0 |
These memory errors are due to large proteins that are being submitted to our structure prediction server, Robetta. I increased the rsc_memory_bound for these jobs depending on the sequence length but it looks like I should increase the bound further. Sorry for any inconvenience. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2002 Credit: 9,790,281 RAC: 3,640 |
Sorry for any inconvenience. No problem, thank for the answer. P.S. Do you plan to release a new version of app, with updated protocols and functions? |
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0 |
this task failed with https://boinc.bakerlab.org/rosetta/result.php?resultid=1066550803 std::cerr: Exception was thrown: File: src/core/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306 chi angle must be between -180 and 180: nan Rosetta v4.07 on linux 64 bits i'm not sure if it is related to r@h or the model itself |
Admin Project administrator Send message Joined: 1 Jul 05 Posts: 5144 Credit: 0 RAC: 0 |
There are no immediate plans for a new version release. But researchers are working on new methods that will eventually get put into production on R@h. I'm not sure about the timeline though. |
Terrible T Send message Joined: 29 Dec 16 Posts: 4 Credit: 1,333,030 RAC: 0 |
Apparently these big proteins need still more memory? Had now several of this serie failing after 20000secs of computing. FYI: WU 960589108 <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code -529697949 (0xe06d7363)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.07_windows_intelx86.exe @rb_04_03_2468_2618_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -psipred_ss2 t000_.spider3_ss2 -kill_hairpins t000_.nobuformat.spider3_ss2 -abinitio::use_filters true -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_04_03_2468_2618_ab_t000__robetta.zip -frag3 rb_04_03_2468_2618_ab_t000__robetta.200.3mers.index.gz -fragA rb_04_03_2468_2618_ab_t000__robetta.200.17mers.index.gz -fragB rb_04_03_2468_2618_ab_t000__robetta.200.9mers.index.gz -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 2189964 Starting watchdog... Watchdog active. Unhandled Exception Detected... - Unhandled Exception Record - Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x74C345A2 Engaging BOINC Windows Runtime Debugger... |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2002 Credit: 9,790,281 RAC: 3,640 |
1066755948 after 6hrs of calculation WARNING! cannot get file size for default.out.gz: could not open file. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2002 Credit: 9,790,281 RAC: 3,640 |
But researchers are working on new methods that will eventually get put into production on R@h. I'm not sure about the timeline though. Come on guys, we are ready for a lot of new science... :-)) |
Trotador Send message Joined: 30 May 09 Posts: 108 Credit: 291,214,977 RAC: 0 |
These errors are back again in many units. The default processing time (8 hours) is extended and units fail after 12 hours processing time which is frustrating. Are the crunching results used or are they wasted? All unit starting rb_04_06_2593_2728_ab_t000 seem to be affected in Linux. <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> process got signal 11</message> <stderr_txt> command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu @rb_04_06_2593_2728_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -psipred_ss2 t000_.spider3_ss2 -kill_hairpins t000_.nobuformat.spider3_ss2 -jumps:pairing_file t000_.fasta.bbcontacts.jumps -abinitio::use_filters false -skip_convergence_check -jumps:overlap_chainbreak -seq_sep_stages 1 1 1 -ramp_chainbreaks -sep_switch_accelerate 0.8 -jumps:random_sheets 7 2 1 1 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_04_06_2593_2728_ab_t000__robetta.zip -frag3 rb_04_06_2593_2728_ab_t000__robetta.200.3mers.index.gz -fragA rb_04_06_2593_2728_ab_t000__robetta.200.6mers.index.gz -fragB rb_04_06_2593_2728_ab_t000__robetta.200.4mers.index.gz -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 1914528 Starting watchdog... Watchdog active. BOINC:: CPU time: 43619.4s, 14400s + 28800s[2019- 4- 8 15:26:19:] :: BOINC WARNING! cannot get file size for default.out.gz: could not open file. Output exists: default.out.gz Size: -1 InternalDecoyCount: 0 (GZ) ----- 0 ----- Stream information inconsistent. Writing W_0000001 ====================================================== DONE :: 1 starting structures 43619.4 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== 15:26:19 (15379): called boinc_finish(0) </stderr_txt> ]]> |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
rb_03_27_2191_2381__t000__4_C1_SAVE_ALL_OUT_IGNORE_THE_REST_827088_870 Both results ended with "incorrect function". <core_client_version>7.12.1</core_client_version> <![CDATA[ <message> Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.07_windows_intelx86.exe -run:protocol jd2_scripting @flags_rb_03_27_2191_2381__t000__4_C1_robetta -silent_gz -mute all -out:file:silent default.out -in:file:boinc_wu_zip input_rb_03_27_2191_2381__t000__4_C1_robetta.zip -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 2995742 Starting watchdog... Watchdog active. </stderr_txt> ]]> . |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
Application version: Rosetta v4.07 windows_intelx86 Device: 1759960, Task: 1066832102, and WU 960999090. Name: rb_04_04_2540_2669__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_827813_1998_0 Status: Error while computing Exit status: -529697949 (0xE06D7363) Unknown error code <core_client_version>7.14.2</core_client_version> |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2002 Credit: 9,790,281 RAC: 3,640 |
Again, a lot of "C++ out of memory" error 1068437282 1068437281 1068437279 1068437273 1068437317 etc Please, fix it |
Message boards :
Number crunching :
Rosetta 4.0+
©2024 University of Washington
https://www.bakerlab.org