Message boards : Number crunching : Minirosetta 3.52
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,611,656 RAC: 8,920 |
'll look into the server upgrade. It will be a long process since there is a lot of R@h specific code. Priorities for now are first to release our android app and then to add a replica DB and upgrade the server code. The later may require significant down time so we need to plan this with the on going research projects in the lab. We also have to look into hardware upgrades. That's great!! P.s. Please, try optimize the code for android (memory footprint, for example) |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,611,656 RAC: 8,920 |
[25675] Add feature for specifying plan classes in an XML file This is an old list (updated to june 2012). After my request, DA has updated it. These are other changes to server code. 15-18.Aug.2014 Add support for per-app credit 8 Aug 2014 Convey user CPID to client (for BoincTasks?) 29 Jul 2014 version.xml can specify API version (for compressed apps) 25 Jul 2014 partial support in scheduler for generic coproessors (e.g. ASICs) 16 Jul 2014 scheduler support for client "brand"; store in DB 14 Jul 2014 add <maintenance_delay> config option 8 Jul 2014 matchmaker (score-based) scheduling is now the default 3 Jul 2014 fix bugs in changing code signing key 3 Jul 2014 scheduler: fix bugs if project has both NCI and regular apps 10 Jun 2014 add "delete_spammers.php" for removing various types of spam accounts 6 Jun 2014 app versions (as well as apps) can be marked as "beta" 4 Jun 2014 support CPU OpenCL apps in plan class spec 27 May 2014 fully implement targeted jobs 18 May 2014 include badges in XML stats export 8 May 2014 send notices w/ video or images only to 7.3+ clients 6 May 2014 file_deleter: delete .gz versions also 6 May 2014 add web page showing top CPU models and their stats 4 May 2014 apps can be marked as "exact fraction done" (base completion time est only on FD) 30 Apr 2014 generalize interface to PHPMailer 20 Apr 2014 support remote input files in create_work 18 Apr 2014 let projects disable forums and/or teams 10 Apr 2014 support efficient bulk job creation in create_work 2 Apr 2014 store job peak mem/disk usage in DB 26 Mar 2014 support gzipped input files 21 Mar 2014 use mysqli PHP functions if available 18 Mar 2014 add validator that checks for string in stderr 8 Mar 2014 enforce GPU job limits separately for each GPU type 6 Mar 2014 store gpu_active_frac, and use it in runtime estimation 5-20 Dec 2013 add generic support for badges 23 May 2013 parse client "product name" (e.g. phone model) and store in DB 9 May 2013 use HTTPS for forms containing password 25 Apr 2013 add support for multi-size apps 9 Apr 2013 add new score-based scheduling 27 Aug 2012 add support for limited locality scheduling 17 Aug 2012 add support for volunteer data archival 11 Jul 2012 pagination in forums 25 Jun 2012 scheduler: support Intel GPUs |
Orgil Send message Joined: 11 Dec 05 Posts: 82 Credit: 169,751 RAC: 0 |
Finished wu's not validating for 1 full day I checked the server status everything looking green, what happenned?! |
Miklos M Send message Joined: 8 Dec 13 Posts: 29 Credit: 5,277,251 RAC: 0 |
Are we getting longer wu's effective 8/31/14? They seem to be estimated time to get done 40 hours or so. My preferences are not changed and still set for max 1 day to get a wu done per cpu. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
The jobs should still run based on the target cpu run time preference. The estimate is likely off because the workunit estimated FLOPS value has been doubled. The client should make better estimates as more jobs get processed but if the problem persists or if the job is actually running significantly longer than your target run time, let us know. Thanks. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,611,656 RAC: 8,920 |
I'll look into the server upgrade. It will be a long process since there is a lot of R@h specific code. Rosetta@Home and Ralph@Home run on same version of server code? If not, you can try to update Ralph and see what happens before update Rosetta... |
Orgil Send message Joined: 11 Dec 05 Posts: 82 Credit: 169,751 RAC: 0 |
My wu's are waiting for 48hrs to validate or still in upload state. Houston we have a problem?! |
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0 |
i'd guess server side resource constraints could be a part of the reasons for some of these bottlenecks. i'd guess there are 'other solutions' e.g. an even more 'distributed' computing paradigm / design or partnering with 'mirror' servers say with a willing partner / institution may help alleviate some of the issues. but i'd think that those software changes possibly affecting the design of boinc itself and could take considerable effort to diagnose, develop and integrate with rosetta Hence, i'd guess for the immediate term having a somewhat longer default run time is hence a *practical* consideration to alleviate some of the issues. nevertheless, i'm attempting to make do with a somewhat longer self-defined run time (4 hrs) as a compromise for that. i do agree that running long jobs do not coincide with say an average 'normal' usage pattern of a desktop or even notebook computer as for various reasons users would want to shut down their PC/notebook. a simple example could be that a computer could be running with a rather loud fan, and that'd be simply annoying at night in a bedroom and the (naive?) user could simply decide to abort the jobs and shutdown. i used to run a PC that had a fan which almost runs like a jet engine (*noisy*) mainly due to an old graphic card lol, |
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0 |
i'd guess other possible 'designs/paradigms' such as a qos (quality of service) design can also be used to alleviate some of the high server load issues. an example is that when the server is busy it can 'announce' qos controls and issue tokens with a number and a waiting period. this is to issue 'queue numbers' to the participating hosts and to request the hosts to back off and wait for the per-detimined period before retrying. however in the same way these could involve various changes to boinc (both client and server) and integration with rosetta and could require rather large effort to develop them. qos has similar limitations as a lengthened run time however a big difference is that the participant host computer is *idle* while waiting for re-contact with the server. this could alleviate cases where for instance the jobs runs with a noisy pc fan as the fan would likely wind down and run at lower speeds hence less noise |
Norman Send message Joined: 3 Oct 06 Posts: 3 Credit: 1,968,998 RAC: 1,029 |
I have discovered a serious memory leak in Rosetta Mini 3.52 on a Mac OS X version 10.9.4 system on a Macbook Pro with 8 GB of physical memory. I watched as my system slowed to a crawl and then hung over several hours while my system was otherwise idle. On another occasion I watched with the Memory panel of the Activity Monitor as my system slowed down, virtual memory grew to 49 GB and the swap file grew to 13 GB. Each of the three Rosette processes that were running were using about 1 GB each, but were not growing. I interpret this as filling the available disk space with the swap file. Mac OS X apparently does not cope well with a full disk because there were also many weird error messages in the Console log. When I suspended the Rosetta project in BOINC Manager, my system returned to normal and has been running smoothly all day. I will not resume running Rosetta until you tell me this bug is fixed. |
Orgil Send message Joined: 11 Dec 05 Posts: 82 Credit: 169,751 RAC: 0 |
I have few completed wu results on upload state for 4 days. And why no one from the project is answering my questions!! These results are not my property not project staffs property it is scientific property. It is shocking that project server status is showing fals green light status but a cruncher cannot upload the results. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Not sure why your client isn't uploading results. Is anyone else having this issue? Is there any useful info in your client log? Norman, that's a pretty serious bug/bad workunit. Any specifics? WU id? |
Orgil Send message Joined: 11 Dec 05 Posts: 82 Credit: 169,751 RAC: 0 |
The status says: Upload pending, project backoff .. (counting time) WU id's: 1 application Rosetta Mini created 18 Aug 2014 14:04:30 UTC name tj_8_7_ordered_X_25_h20_BAB_20_BAB_wD_fragments_abinitio_SAVE_ALL_OUT_185149_3629 minimum quorum 1 initial replication 1 max # of error/total/success tasks 1, 2, 1 2 application Rosetta Mini created 6 Aug 2014 9:32:18 UTC name 1L-18H-2L-8E-4L-8E-1L_1-2.A.0.rsmn_0060_2_fold_SAVE_ALL_OUT_183385_151 minimum quorum 1 initial replication 1 max # of error/total/success tasks 1, 2, 1 3 application Rosetta Mini created 16 Aug 2014 8:56:24 UTC name flu.c05g_3_input_0244_0001_ss1_1_ss2_2_ss3_2_ss4_2_ss5_2_0001_0001_0001.B_fragments_fold_188798_181 minimum quorum 1 initial replication 1 max # of error/total/success tasks 1, 4 application Rosetta Mini created 16 Aug 2014 6:23:12 UTC name db_triangle104B_fold_SAVE_ALL_OUT_189886_7480 minimum quorum 1 initial replication 1 max # of error/total/success tasks 1, 2, 1 |
Norman Send message Joined: 3 Oct 06 Posts: 3 Credit: 1,968,998 RAC: 1,029 |
For Mac OS X memory leak, three task names: 5htube05_relax_SAVE_ALL_OUT_189789_1457_0 batch2_pdb16_relax_SAVE_ALL_OUT_189866_5873_0 1L-7E-2L-11H-3L-7E-2L-11H-1L_1-2.P.0_0002_fold_SAVE_ALL_OUT_190736_101_0 |
Miklos M Send message Joined: 8 Dec 13 Posts: 29 Credit: 5,277,251 RAC: 0 |
Errors in the new long tasks. |
Chunfu Xu Send message Joined: 2 Oct 13 Posts: 2 Credit: 8,816 RAC: 0 |
For Mac OS X memory leak, three task names: The 5htube* work unit was submitted by me. I am sorry that it caused a problem to your computer. I have identified the problem and will avoid it in the future. Sorry about that. |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
The "extremely long file name that goes over the Windows character limit" issue is back: 684142940, 684308973 WARNING! attempt to create gzipped file ../../projects/boinc.bakerlab.org_rosetta As Windows has a path limit of 256 characters and the above path is 228 characters (excluding the file extension and higher levels of the path) you are bound to generate errors on a regular basis. This issue has come up before but I guess that some of the scientists missed the memo. Can you put in place a character limit for scientists submitting work? I guess there will be a small inconvenience for the scientists in not being as descriptive as they want to be, but at least you don't scare the crunchers away with swathes of compute errors. |
Norman Send message Joined: 3 Oct 06 Posts: 3 Credit: 1,968,998 RAC: 1,029 |
"For Mac OS X memory leak, three task names: 5htube05_relax_SAVE_ALL_OUT_189789_1457_0 batch2_pdb16_relax_SAVE_ALL_OUT_189866_5873_0 1L-7E-2L-11H-3L-7E-2L-11H-1L_1-2.P.0_0002_fold_SAVE_ALL_OUT_190736_101_0" "The 5htube* work unit was submitted by me. I am sorry that it caused a problem to your computer. I have identified the problem and will avoid it in the future. Sorry about that." "Avoid it" in the future is not enough. You have described changing the input data for the work unit, but since I am a retired software engineer, I know that the root cause of this problem probably is a software bug. If such a problem can go wrong in the future, then it will. This software bug caused me to loose a week of work tracking it down. I will not use Minirosetta again until someone tells me that this bug is fixed. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
The "extremely long file name that goes over the Windows character limit" issue is back Thanks for catching this. Yes, there is a character limit imposed but this job somehow slipped through. I'll have to reduce the max characters allowed so this doesn't happen again. edit- I see now how it slipped through and have fixed our submission code. thanks! |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
We'll definitely track this bug down and make sure it's fixed in the next app update. |
Message boards :
Number crunching :
Minirosetta 3.52
©2024 University of Washington
https://www.bakerlab.org