Posts by BeemerBiker

1) Message boards : Number crunching : all wu\'s error on 1 system, but OK on another (Message 74914)
Posted 16 Jan 2013 by Profile BeemerBiker
Post:
I have these reversed as I was not paying attention to which web page was on the screen. Rosetta does not show GPU info so I pulled up another project that shows the GPU version but got the two systems reversed. I have a (small) boinc farm and it is easy to get systems mixed up when they are not in front of me.

The K8NDRE, which fails all tasks, is running 306.97 and the one that seeming is working just fine, s2877, has a later version, 310.70, so a rollback would not solve the problem here. If anything I need to advance 306.97 to latest drivers.

It would be nice if the \"show_host_detail\" web page here showed the GPU and version even if the project does not use a gpu. Likewise, it would be nice to select \"valid\", \"invalid\", or \"error\" task results,etc to get a quick count of problems.



2) Message boards : Number crunching : Need help debugging your software to find the problem (Message 74913)
Posted 16 Jan 2013 by Profile BeemerBiker
Post:
The possibility exists that some side effect of nvidia\'s driver is causing problems with Rosetta.

However, I suspect there may be a problem with the the gunzip mechanism. I assume that Rosetta\'s program has code that unzips or shells out gunzip (I see both .zip and .gz files) The project \"Constellation\" had all errors for \"trackjack\" and it turned out to be a problem with their wrapper as described here. Malwarebytes support asked me to provided the \"command line\" that caused their antivirus to report a problem but I was unable to duplicate the problem manually. They needed the files and a test procedure before they could certify it as a false positive.

This system that is failing was a quick build that I didnt bother activating (win7x64) as it was only to be used for a short time. Unaccountably, windows defender is disabled and I cannot enable it. I know I installed MSE and did an update so it should have been running. The symptom I found when Constellation failed was that windows 7 hung with multiple images of Malewarebytes dialog box. I had to take a photograph of the screen as the screen print did not work. Possibly, something in windows defender got hung up when gunzip was shelled out (or however it was implemented)

I am no longer processing any Constellation WU\'s as they didnt fix the problem or provide me with a workaround other than excluding any trackjack or tasks that needed to be unzipped.

I think the other system is working because it was activated and had winzip 11 installed. (yes, 11 is old, but it is paid for)

I may look into this later to see if that nvidia driver somehow caused the problem.
3) Message boards : Number crunching : all wu\'s error on 1 system, but OK on another (Message 74912)
Posted 16 Jan 2013 by Profile BeemerBiker
Post:
The Rosetta work units are all CPU tasks. I am using the GPU\'s for the PrimeGrid challenge. I don\'t see how processing PrimeGrid GPU tasks can cause all the Rosetta CPU tasks to fail. However, after 30 years or programming, I know that one can only be 99.999999... certain that software will behave as designed. ie One cannot rule out side effects so there is a (slim) chance you might be correct.

K8NDRE is using 306.97 and except for the tasks I aborted, it seems to have validated tasks.

S2877 is using 310.70 and all are failing ...hmm...

After the PrimeGrid challenge completes, I will roll back the driver. I am in 13th place in the challenge and it would be unlucky to roll it back now.
4) Message boards : Number crunching : Need help debugging your software to find the problem (Message 74900)
Posted 15 Jan 2013 by Profile BeemerBiker
Post:
I have windows defender disabled on the system that is generating all the client errors. I am not sure why this AMD system has all errors and another similar AMD system does not.

I found that the project \"Constellation\" was unable to un-gzip its data files because malwarebytes does not allow unzipping of programs it cannot scan. This project also has .gz and .zip files. I do not have malwarebytes installed on this new & clean win7x64 system.

Is there a command I can execute from command prompt in the rosetta director so I can capture errors or observe warning? I have done this before for seti but they documented how to to this.

Even though I am retired I do not have a lot of time to waste doing this so if I cannot easily debug the problem I will just switch to another project.

Thanks for looking!
5) Message boards : Number crunching : 100% error rate on work units since Christmas (Message 74899)
Posted 15 Jan 2013 by Profile BeemerBiker
Post:
I have a similar problem but all my tasks are CPU: Opteron-290. There ia a gpu but I am using it for primegrid. Looking at \"messages\" in bointasks I see there are no output files.

I notice that there are both \".gz\" and \".zip\" and I assume that rosetta has its one unzip program and does not rely on win7 to work with gz and zip. I use 7z but there is no association for 7z as I didnt want an association. However, my other win7 system runs rosetta fine for cpu tasks and both are amd opteron systems.
6) Message boards : Number crunching : all wu\'s error on 1 system, but OK on another (Message 74898)
Posted 15 Jan 2013 by Profile BeemerBiker
Post:
This system had nothing but errors. Opteron290 (fastest) with 2gb of memory.

This one is slower opteron275 with 4gb memory. Almost all WU\'s are good.

Looking at stderr_txt I dont see anything exceptional except all errors on the k8ndre-1

I dont see what is causing the problem, maybe someone else can. I could try add more memory.

Both run same version of windows 7 but the failing mombo is asus with gtx650ti the other tyan with pair of gts250. All of the gts540ti are completing their primegrid tasks with valid results and the gts250 generate valid results too, so what gives?
7) Message boards : Number crunching : Anyone else having trouble uploading results? (Message 62882)
Posted 11 Aug 2009 by Profile BeemerBiker
Post:
same here. I thought it was just me and I had a single stuck upload and aborted it, and have just noticed another system with several stuck uploads.

There is another thread about ralph not working. AFAIK ralph is the swimming pig at a nearby water attraction. If ralph is the upload server it would be nice if it was named \"upload server\", but that is probably too much to ask of a linux sysop.
8) Message boards : Number crunching : Problems with Rosetta version 5.98 (Message 59810)
Posted 26 Feb 2009 by Profile BeemerBiker
Post:
2p64__BOINC_SYMM_FOLD_AND_DOCK_RELAX-2p64_-native_frag2__7622_17767_0
got %5 done before faulting. I went and aborted it as it was hung after the app faulted.

Faulting application rosetta_beta_5.98_windows_intelx86.exe, version 0.0.0.0, faulting module rosetta_beta_5.98_windows_intelx86.exe, version 0.0.0.0, fault address 0x0084fcf3.

====

I monitor my farm using boincview and when it shows a yellow background I know there is a problem. When I logged in remotely, I was greeted with a fault report and was asked if I wanted to report it to microsoft.

There have been 3 separate rosetta app faults since Feb 21. My event log goes back to 10/23/08 and there are no other app faults. Milkyway and Poem are the other 2 boinc apps besides rosetta.

Windows Xp, sp3
BM 6.2.18
Dual MP 2800
9) Message boards : Number crunching : Problems with Rosetta version 5.98 (Message 59702)
Posted 21 Feb 2009 by Profile BeemerBiker
Post:
bunch of faults, maybe 5, windows xp, vista 64, etc. Started about 3 days ago. These faults are the type that show up over the desktop and ask if microsoft should be informed.

from xp event log
Faulting application rosetta_beta_5.98_windows_intelx86.exe, version 0.0.0.0, faulting module rosetta_beta_5.98_windows_intelx86.exe, version 0.0.0.0, fault address 0x0084fcf3.


10) Message boards : Number crunching : who do some tasks show two results? (Message 58955)
Posted 21 Jan 2009 by Profile BeemerBiker
Post:
Possibly - and this is really a guess, it might be that this bug is caused by not running the CPU at 100% utilization. I have been reading over at einstein that they identified a problem in checkpointing. When the CPU is descheduled by the < 100% rule, when it is resumed, it could not find the checkpoint (there was none) and exited. Possibly something like this happened on rosette and the task simply started from scratch.

I am pretty sure that I had been setting my systems to %95 utilization. I have since changed back to 100% after reading the warning at einstein

I posted this problem. You can also look for the thread \"solved\" there also which relates to the %100 requirement.
11) Message boards : Number crunching : who do some tasks show two results? (Message 58750)
Posted 12 Jan 2009 by Profile BeemerBiker
Post:
For example, this has two \"DONE\" sections, the first one is 18200 cpu seconds the second has 30749 cpu seconds. Only the 30749 showes up. Also, why is the claimed credit so high compared to the granted. The ratio of 73.9 to 7.9 is almost an order or magnitude.

thanks for looking
12) Message boards : Number crunching : Minirosetta v1.32 bug thread (Message 55261)
Posted 24 Aug 2008 by Profile BeemerBiker
Post:
Seeing segfaults periodically but all tasks seem to finish OK. How is that?

All results returned seem valid except some messages about not having a filter
http://boinc.bakerlab.org/rosetta/results.php?hostid=874157

I am running boinc 6.2.15 amd64 on ubuntu 8.0.4.1 and looking in the logs I see a bunch of seg faults

jstateson@jyslinux3:/var/log$ grep -i segfault kern.log
kern.log:Aug 22 09:33:52 jyslinux3 kernel: [53889.135133] minirosetta_1.3[7041]: segfault at ff3fbff8 rip 89bd380 rsp ff3fbed8 error 6
kern.log:Aug 22 17:41:43 jyslinux3 kernel: [83133.993543] minirosetta_1.3[7372]: segfault at ff3fbff8 rip 89bd380 rsp ff3fbed8 error 6
kern.log:Aug 23 18:24:55 jyslinux3 kernel: [75871.459737] minirosetta_1.3[20077]: segfault at ff5fbff8 rip 89bd380 rsp ff5fbed8 error 6
kern.log:Aug 23 18:24:55 jyslinux3 kernel: [75871.559667] minirosetta_1.3[19621]: segfault at ff5fbff8 rip 89bd380 rsp ff5fbed8 error 6

I switch to boinc 6.2.15 from 5.15.45 after result time 21 Aug 2008 22:30:58
and those 4 segfaults occured afterwards. Since all results were returned and all were valid I am unsure what effect the segfaults had.


13) Message boards : Number crunching : any credit for confirmed computational errors? (Message 55087)
Posted 15 Aug 2008 by Profile BeemerBiker
Post:
I noticed a computational error and when I checked the work id there was another participant that got the same error.

http://tinyurl.com/5s9zqx

Neither system was granted credit. However, when I selected the task id both participants where shown as being \"granted credit\" (scroll to the bottom of the detail page).

Question: If the task details for each task shows \"granted credit\" then why is not the credit actually granted? Neither participant shows granted credit for that task in their host id page.
14) Message boards : Number crunching : vista graphics wants permission to show all the time (Message 54716)
Posted 29 Jul 2008 by Profile BeemerBiker
Post:
I am running boinc as a service on home premium with sp1. Clicking on graphics causes a microsoft popup \"interactive services dialog\" to appear and I have to grant permission to show the graphics. The graphics works fine, but the desktop is not available and I have to close that \"interactive service dialog\" to get back to the desktop and that closes the graphics display. Every few minutes the popup appears and reminds me. This is annoying so I disabled that ISD service under admintools-services but then the graphics is never shown. That popup says that the Rosetta program has a compatibility issue involving security and it would appear there is no way to bypass this problem.

Any ideas? I wnat to run boinc as a service but when loging in (and running the boincmgr) want to see the graphics.

..thanks..






©2019 University of Washington
http://www.bakerlab.org