rosetta python projects (vbox64)

Message boards : Number crunching : rosetta python projects (vbox64)

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,725,166
RAC: 1,161
Message 102754 - Posted: 20 Sep 2021, 10:57:45 UTC - in response to Message 102753.  

ATLAS has been up since 2014 and they have been doing VM for how long?
You can't compare RAH and these guys. The difference is to great.
RAH is a infant when it comes to VM.


Ok, it's no problem if they are "an infant" in VM fields.
But, after 6 days, they could give us a sign of life.
Today, still download error



My Italian friend, have you not read what has been posted here just a little bit ago?
We will be lucky if SID gets through to the team.
If so, it is now just Monday approaching 4am (- 9 hrs from here) so the earliest someone might see his email is 5-6 hours from now. And then to delegate it to the right person, at least the rest of the day if at all.

If you read my post and SID's post, we both stated that the project has very little monitoring from the lab.
DEK was a tech person who moved on years ago and watched things here in the forum.
We then had MOD. SENSE who was a interface to the team, but he has moved on.
These were probably Graduate Students who completed their research and left the project.
Since MOD. SENSE moved on we don't have anyone that monitors the forums for the project.
The only time something is done about a bug is when the results come back with tons of errors, then someone fixes it without saying anything and the project goes on.

As I have suggested before, the best thing to do is just stop the project for now and wait until the bug is fixed or go research how to write a command to isolate Python and keep doing 4.20 work.
Those are the only options.

So for now, sit back and wait and watch this area for more details. But there is nothing we can do for now. This projects lab is a monday through friday operation. No one reads emails on the weekend.

I understand you frustration, but again, there is nothing we as users/volunteers as they call us can do to solve your problem. We have told you everything we can find on the web and some have told you more advanced things. Since nothing works, your out of luck for a few days or the rest of the week or until the scheduler at the University gives up on your machine and sends you only the 4.20 work instead. There is nothing you can change in your preference either. You get what you get. That's it.

Sorry my friend, but that is just the way it goes here.
ID: 102754 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1893
Credit: 8,523,115
RAC: 10,990
Message 102758 - Posted: 20 Sep 2021, 12:25:14 UTC - in response to Message 102754.  

My Italian friend, have you not read what has been posted here just a little bit ago?

Yes, i read it :-P


So for now, sit back and wait and watch this area for more details. But there is nothing we can do for now. This projects lab is a monday through friday operation. No one reads emails on the weekend.

Meantime i'm crunching Tn-Grid.


I understand you frustration, but again, there is nothing we as users/volunteers as they call us can do to solve your problem. We have told you everything we can find on the web and some have told you more advanced things. Since nothing works, your out of luck for a few days or the rest of the week or until the scheduler at the University gives up on your machine and sends you only the 4.20 work instead. There is nothing you can change in your preference either. You get what you get. That's it.

Sorry my friend, but that is just the way it goes here.

I know, i know
You and SID are great!
(and i'm crunching this project with notebook, that doesn't download VM, but only 4.20)
ID: 102758 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 117,614,181
RAC: 43,587
Message 102759 - Posted: 20 Sep 2021, 17:24:55 UTC
Last modified: 20 Sep 2021, 17:25:46 UTC

Does anyone know/have any idea what is causing the MD5 checksum error?

Some of my machines are running VBox tasks without issue, and others are racking up hundreds of gigabytes of failed downloads. The Bakerlab servers must have pushed out petabytes of failed tasks over the weekend by now!

Is it that a bad MD5 hash was downloaded early on, and is being repeatedly used? Or is there a file that contains a bad version of the MD5 algorithm that needs deleting and re-downloading? If it were just a server-side issue then surely all PCs would either download the tasks or would all fail?
ID: 102759 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 117,614,181
RAC: 43,587
Message 102760 - Posted: 20 Sep 2021, 17:38:34 UTC
Last modified: 20 Sep 2021, 17:42:00 UTC

Ah, so this is in my log on a machine that fails to download the vdi file:

20/09/2021 18:23:27 |  | [http_xfer] [ID#7] HTTP: wrote 16384 bytes
20/09/2021 18:23:27 |  | [http_xfer] [ID#7] HTTP: wrote 16384 bytes
20/09/2021 18:23:27 |  | [http_xfer] [ID#7] HTTP: wrote 15050 bytes
20/09/2021 18:23:28 | Rosetta@home | Finished download of AIMNet_vm_v2.vdi
20/09/2021 18:23:28 |  | [statefile] set dirty: pers_file_xfer_set poll
20/09/2021 18:24:09 |  | [slot] removed file projects/boinc.bakerlab.org_rosetta/AIMNet_vm_v2.vdi.gz
20/09/2021 18:24:09 | Rosetta@home | [error] MD5 check failed for AIMNet_vm_v2.vdi
20/09/2021 18:24:09 | Rosetta@home | [error] expected d41d8cd98f00b204e9800998ecf8427e, got 61fef19456bb58ec941845ef08d8c5ef
20/09/2021 18:24:09 | Rosetta@home | [error] Checksum or signature error for AIMNet_vm_v2.vdi
20/09/2021 18:24:09 |  | [statefile] Writing state file
20/09/2021 18:24:09 |  | [statefile] Done writing state file


And the MD5 hash d41d8cd98f00b204e9800998ecf8427e is the hash for an empty file. So is the file deleted before it is hashed? Maybe by my antivirus (Avast)? Will try that next.

It is suspicous that it says it has removed the file and then that the MD5 fails... Any idea why BOINC would remove that file?
ID: 102760 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Admin
Project administrator

Send message
Joined: 1 Jul 05
Posts: 4805
Credit: 0
RAC: 0
Message 102764 - Posted: 20 Sep 2021, 22:43:03 UTC

The MD5 hash values were indeed incorrect possibly due to a filesystem issue on our end when the jobs were created. I've fixed the MD5 values in our database so these errors should no longer be an issue for newly issued jobs.
ID: 102764 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 39,271,073
RAC: 22,970
Message 102767 - Posted: 21 Sep 2021, 0:59:19 UTC - in response to Message 102754.  

If you read my post and SID's post, we both stated that the project has very little monitoring from the lab.
DEK was a tech person who moved on years ago and watched things here in the forum.
We then had MOD. SENSE who was a interface to the team, but he has moved on.
These were probably Graduate Students who completed their research and left the project.
Since MOD. SENSE moved on we don't have anyone that monitors the forums for the project.
The only time something is done about a bug is when the results come back with tons of errors, then someone fixes it without saying anything and the project goes on.

Just to clarify, DEK is very much still around at the project, just not in the forums.
When we started on all the Covid tasks 18 months ago he posted quite a lot in the forums for a few weeks, but aiui the Project team is very small and being here killed their productivity just when stacks of results were coming back to them, so had to back off completely.
The way I remember it, when an issue came up that needed someone's attention, Mod.Sense wrote a brief summary of the problem and where the solution may lie, and it seemed to me like he must've emailed a link to that message so the Admins didn't have to wade through pages of whining to find what the issue was.
And then Mod.Sense disappeared and that link got lost. And here we now are.

On this VBox problem, I have so little idea what people are talking about I had to send a link to the whole thread rather than a specific summary message, so I haven't been able to be as clear as I'd like.

But I now see a post from Admin just above, so if they've fixed that "MD5 hash value" that should make a big difference.
Feed back on how new downloads are going and whether tasks now run ok or not
ID: 102767 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1893
Credit: 8,523,115
RAC: 10,990
Message 102770 - Posted: 21 Sep 2021, 6:38:49 UTC - in response to Message 102767.  

But I now see a post from Admin just above, so if they've fixed that "MD5 hash value" that should make a big difference.
Feed back on how new downloads are going and whether tasks now run ok or not


It works!!
No more "download error"
ID: 102770 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1893
Credit: 8,523,115
RAC: 10,990
Message 102772 - Posted: 21 Sep 2021, 7:19:19 UTC

My first correct VM wu correct!!
ID: 102772 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,725,166
RAC: 1,161
Message 102776 - Posted: 21 Sep 2021, 7:44:53 UTC - in response to Message 102772.  

My first correct VM wu correct!!


Congrats! Told you we jut needed to find the right person.
ID: 102776 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,725,166
RAC: 1,161
Message 102777 - Posted: 21 Sep 2021, 7:48:03 UTC - in response to Message 102767.  

If you read my post and SID's post, we both stated that the project has very little monitoring from the lab.
DEK was a tech person who moved on years ago and watched things here in the forum.
We then had MOD. SENSE who was a interface to the team, but he has moved on.
These were probably Graduate Students who completed their research and left the project.
Since MOD. SENSE moved on we don't have anyone that monitors the forums for the project.
The only time something is done about a bug is when the results come back with tons of errors, then someone fixes it without saying anything and the project goes on.

Just to clarify, DEK is very much still around at the project, just not in the forums.
When we started on all the Covid tasks 18 months ago he posted quite a lot in the forums for a few weeks, but aiui the Project team is very small and being here killed their productivity just when stacks of results were coming back to them, so had to back off completely.
The way I remember it, when an issue came up that needed someone's attention, Mod.Sense wrote a brief summary of the problem and where the solution may lie, and it seemed to me like he must've emailed a link to that message so the Admins didn't have to wade through pages of whining to find what the issue was.
And then Mod.Sense disappeared and that link got lost. And here we now are.

On this VBox problem, I have so little idea what people are talking about I had to send a link to the whole thread rather than a specific summary message, so I haven't been able to be as clear as I'd like.

But I now see a post from Admin just above, so if they've fixed that "MD5 hash value" that should make a big difference.
Feed back on how new downloads are going and whether tasks now run ok or not


DEK is still here? This problem should have gone to him, but ADMIN seems to be more in tune with the specifics. But they should pay attention to the forums every now and then or at least monitor the results and if a huge amount comes back with errors, then look and see what the problem is.

Did they test Python on RALPH or is it just a rumor that it was dumped here with no testing?
ID: 102777 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 117,614,181
RAC: 43,587
Message 102779 - Posted: 21 Sep 2021, 8:12:50 UTC

Why do you think it should have gone to DEK?

And yes - some vbox tasks were tested on Ralph, but I don't believe any were from this batch.
ID: 102779 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1893
Credit: 8,523,115
RAC: 10,990
Message 102780 - Posted: 21 Sep 2021, 9:36:06 UTC - in response to Message 102777.  

Did they test Python on RALPH or is it just a rumor that it was dumped here with no testing?

Not tested on Ralph.
And, on Ralph, the version of app is 0.21, here is 1.03, so i don't know if it is the same app....
ID: 102780 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,725,166
RAC: 1,161
Message 102782 - Posted: 21 Sep 2021, 10:44:07 UTC - in response to Message 102780.  
Last modified: 21 Sep 2021, 10:44:22 UTC

Did they test Python on RALPH or is it just a rumor that it was dumped here with no testing?

Not tested on Ralph.
And, on Ralph, the version of app is 0.21, here is 1.03, so i don't know if it is the same app....


That's unusual
ID: 102782 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1893
Credit: 8,523,115
RAC: 10,990
Message 102785 - Posted: 21 Sep 2021, 12:31:56 UTC - in response to Message 102782.  
Last modified: 21 Sep 2021, 13:03:25 UTC

And, on Ralph, the version of app is 0.21, here is 1.03, so i don't know if it is the same app....


That's unusual


Here and here you can see.
Also for me it's strange, but maybe the code is the same and the difference is only the numbering.
Or maybe they are different.
Usual lack of communication about project
ID: 102785 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Admin
Project administrator

Send message
Joined: 1 Jul 05
Posts: 4805
Credit: 0
RAC: 0
Message 102790 - Posted: 21 Sep 2021, 17:17:12 UTC - in response to Message 102785.  

The specific VM image was not tested on ralph but there is little change between them. The jobs were tested on Ralph. The ralph test would not have caught the MD5 checksum error unfortunately. Coincidentally UW IT moved some of our hardware while the big batch was submitted and this may have caused the error. - DEK
ID: 102790 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 39,271,073
RAC: 22,970
Message 102791 - Posted: 21 Sep 2021, 17:28:55 UTC - in response to Message 102777.  
Last modified: 21 Sep 2021, 17:36:23 UTC

DEK is still here? This problem should have gone to him, but ADMIN seems to be more in tune with the specifics

Iirc he used to post here as dekim but just as likely he posts as Admin sometimes, because, why not?

I have no idea about testing or Ralph as I'm not using either

Edit: And there you are
Edit2: My main PC has been crashing and blue-screening for ages, so I don't even have any idea if normal tasks are going through ok atm, but I updated my BIOS and other low-level stuff last night and I may finally be stable enough to find out what the hell's been going on
ID: 102791 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,725,166
RAC: 1,161
Message 102792 - Posted: 21 Sep 2021, 18:36:27 UTC - in response to Message 102790.  

The specific VM image was not tested on ralph but there is little change between them. The jobs were tested on Ralph. The ralph test would not have caught the MD5 checksum error unfortunately. Coincidentally UW IT moved some of our hardware while the big batch was submitted and this may have caused the error. - DEK



Hi DEK, nice to see you back here on the forums!
ID: 102792 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,725,166
RAC: 1,161
Message 102793 - Posted: 21 Sep 2021, 18:38:30 UTC - in response to Message 102791.  

DEK is still here? This problem should have gone to him, but ADMIN seems to be more in tune with the specifics

Iirc he used to post here as dekim but just as likely he posts as Admin sometimes, because, why not?

I have no idea about testing or Ralph as I'm not using either

Edit: And there you are
Edit2: My main PC has been crashing and blue-screening for ages, so I don't even have any idea if normal tasks are going through ok atm, but I updated my BIOS and other low-level stuff last night and I may finally be stable enough to find out what the hell's been going on


Have you looked for chipset updates and ran a windows cleaner lately to clear the registry and the drive?
Have you done CHKDSK at all?

Just a few more things to check.
ID: 102793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1893
Credit: 8,523,115
RAC: 10,990
Message 102794 - Posted: 21 Sep 2021, 19:14:33 UTC - in response to Message 102790.  

The specific VM image was not tested on ralph but there is little change between them. The jobs were tested on Ralph. The ralph test would not have caught the MD5 checksum error unfortunately. Coincidentally UW IT moved some of our hardware while the big batch was submitted and this may have caused the error. - DEK


Thanks for the informations!!!!

P.S.
Maybe some infos about WHAT we are crunching with this new app....
ID: 102794 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 39,271,073
RAC: 22,970
Message 102797 - Posted: 22 Sep 2021, 0:58:22 UTC - in response to Message 102793.  

Edit2: My main PC has been crashing and blue-screening for ages, so I don't even have any idea if normal tasks are going through ok atm, but I updated my BIOS and other low-level stuff last night and I may finally be stable enough to find out what the hell's been going on

Have you looked for chipset updates and ran a windows cleaner lately to clear the registry and the drive?
Have you done CHKDSK at all?

Just a few more things to check.

It's the relatively new Ryzen 5800X I got last December.
I updated everything in March with new chipset drivers, which haven't had a new version issued since, but there have been lots of new BIOS versions with stability and performance fixes, so it seems like I'm not alone with issues.
Updated VGA & Audio drivers too while I was there and it's looking good in the first 24hrs. Running a touch faster, a lot cooler and no crashes, which I was experiencing daily if not more often.
I'll give it until the weekend as I'll be away again from tomorrow until Sunday - fingers crossed.
ID: 102797 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : rosetta python projects (vbox64)



©2024 University of Washington
https://www.bakerlab.org