If I want to validate data copy operations using commands like cmp, dircmp,
etc, how do I know I'm actually comparing data on disk, given that the data
might be in a buffer or disk cache?
Printable View
If I want to validate data copy operations using commands like cmp, dircmp,
etc, how do I know I'm actually comparing data on disk, given that the data
might be in a buffer or disk cache?
sinister wrote:[color=blue]
> If I want to validate data copy operations using commands like cmp, dircmp,
> etc, how do I know I'm actually comparing data on disk, given that the data
> might be in a buffer or disk cache?
>
>[/color]
Run sync first - just an idea, I don't know it is necessarily right.
Although that might cause the OS to write the data to the "disk", I'm
not sure how you can be certain the data is not buffered on the disk
hardware, entirely independent of the operating system.
Interesting question.
--
Dave K
[url]http://www.southminster-branch-line.org.uk/[/url]
Please note my email address changes periodically to avoid spam.
It is always of the form: month-year@domain. Hitting reply will work
for a couple of months only. Later set it manually. The month is
always written in 3 letters (e.g. Jan, not January etc)
sinister wrote:
[color=blue]
>If I want to validate data copy operations using commands like cmp, dircmp,
>etc, how do I know I'm actually comparing data on disk, given that the data
>might be in a buffer or disk cache?
>
>
>
>[/color]
Well, you could follow your copy operation with a couple of "sync" commands.
If you are in that much doubt, however, you might want to consider using
an operating system like OpenVMS that actually commits data to disk
before reporting the I/O as complete; it may be slower but there's no
doubt about whether or not your data is on disk!
In article <NIidnVubXOnFQDvenZ2dnUVZ_tmdnZ2d@comcast.com>, Richard B. Gilbert <rgilbert88@comcast.net> wrote:[color=blue]
> sinister wrote:
>[color=green]
>>If I want to validate data copy operations using commands like cmp, dircmp,
>>etc, how do I know I'm actually comparing data on disk, given that the data
>>might be in a buffer or disk cache?
>>[/color]
> Well, you could follow your copy operation with a couple of "sync" commands.
>
> If you are in that much doubt, however, you might want to consider using
> an operating system like OpenVMS that actually commits data to disk
> before reporting the I/O as complete; it may be slower but there's no
> doubt about whether or not your data is on disk![/color]
As high of an opinion of OpenVMS's default setting as I have, I should
point out that it's only half the story, though.
That approach only says it's on disk, but not if it's correctly written out.
Was a recent thread on that particular point, and I eventually conceded
good points raised that ultimately, one has to trust that if it was
written out to disk, just essentially hope it was the exact same data.
(Places where data can get corrupted: in memory prior to the write or
cabling problems.)
The problem is that if you had hardware issues causing silent
corruption, then how do you reliably detect such (aside from admin
monitoring of /var/adm/messages or fmdump), since data comparisons would
involve the same busted hardware components.
Hence, there isn't really much you can do in that particular situation
(from the perspective of the writer), except to cross fingers and just
really hope data was correctly written out and matches the source data. :)
ZFS, possibly in Solaris 10 Update 2, alleviates that to some degree
with its checksummed data, as does careful monitoring of fmdump in
Solaris 10 or /var/adm/messages in earlier OS releases for ECC errors or
signs of cabling issues.
-Dan
Dan Foster wrote:
[color=blue]
>In article <NIidnVubXOnFQDvenZ2dnUVZ_tmdnZ2d@comcast.com>, Richard B. Gilbert <rgilbert88@comcast.net> wrote:
>
>[color=green]
>>sinister wrote:
>>
>>
>>[color=darkred]
>>>If I want to validate data copy operations using commands like cmp, dircmp,
>>>etc, how do I know I'm actually comparing data on disk, given that the data
>>>might be in a buffer or disk cache?
>>>
>>>
>>>[/color]
>>Well, you could follow your copy operation with a couple of "sync" commands.
>>
>>If you are in that much doubt, however, you might want to consider using
>>an operating system like OpenVMS that actually commits data to disk
>>before reporting the I/O as complete; it may be slower but there's no
>>doubt about whether or not your data is on disk!
>>
>>[/color]
>
>As high of an opinion of OpenVMS's default setting as I have, I should
>point out that it's only half the story, though.
>
>That approach only says it's on disk, but not if it's correctly written out.
>
>
>[/color]
But the OP merely wanted to be certain that the data he was comparing
was being read back from disk rather than from a buffer in memory.
In article <43a6c604@212.67.96.135>,
Dave <INVALID-see-signature-for-how-to-determine@southminster-branch-line.org.uk> writes:[color=blue]
> sinister wrote:[color=green]
>> If I want to validate data copy operations using commands like cmp, dircmp,
>> etc, how do I know I'm actually comparing data on disk, given that the data
>> might be in a buffer or disk cache?
>>
>>[/color]
>
> Run sync first - just an idea, I don't know it is necessarily right.
>
> Although that might cause the OS to write the data to the "disk", I'm
> not sure how you can be certain the data is not buffered on the disk
> hardware, entirely independent of the operating system.
>
> Interesting question.[/color]
Here's something I've used sometimes if I wanted to know that data was
being read from disk rather than from cached copies; "memtool"
(playground.sun.com, I think) seems to confirm that it does purge it.
(at least on Solaris 9 on UltraSPARC...)
As to whether it works on any other platforms, it really depends on
whether or not MADV_DONTNEED is available and works similarly.
/*
* freemap.c
*
* If nothing else is using a file that happens to be cached in memory,
* this should cause it to be freed entirely rather than merely left on
* the freelist to be reclaimed.
*
*/
#include <stdio.h>
#include <fcntl.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <unistd.h>
void perror3(const char *s1, const char *s2, const char *s3)
{
static const char colon_space[]={ ':',' '};
if (s1!=NULL && *s1!='\0') {
write(2,s1,strlen(s1));
write(2,colon_space,sizeof colon_space);
}
if (s2!=NULL && *s2!='\0') {
write(2,s2,strlen(s2));
write(2,colon_space,sizeof colon_space);
}
perror(s3);
}
int main(int argc, char **argv)
{
int fd, x;
struct stat s;
caddr_t rval;
for (x=1;x<argc;x++) {
if ((fd=open(argv[x],O_RDONLY)) == -1) {
perror3(argv[0],argv[x],"can't open file");
continue;
}
if (fstat(fd,&s) == -1) {
perror3(argv[0],argv[x],"can't obtain file attributes");
close(fd);
continue;
}
if (!S_ISREG(s.st_mode)) {
fprintf(stderr, "%s: %s: not a regular file\n", argv[0],argv[x]);
close(fd);
continue;
}
if (s.st_size>0) {
if ((rval=mmap(NULL,s.st_size,PROT_READ,MAP_SHARED,fd,(off_t)0))
== MAP_FAILED) {
perror3(argv[0],argv[x],"can't map file into virtual memory");
close(fd);
continue;
}
else {
close(fd);
madvise(rval,s.st_size,MADV_DONTNEED);
munmap(rval,s.st_size);
}
}
}
return 0;
}
--
mailto:rlhamil@smart.net [url]http://www.smart.net/~rlhamil[/url]
Lasik/PRK theme music:
"In the Hall of the Mountain King", from "Peer Gynt"
sinister wrote:[color=blue]
> If I want to validate data copy operations using commands like cmp, dircmp,
> etc, how do I know I'm actually comparing data on disk, given that the data
> might be in a buffer or disk cache?[/color]
It's not documented that it does this, but my own experience leads me to
believe that "lockfs -fa" will invalidate the cache. I'm not sure of this,
but based on performance, it seems that lots of things have to be loaded
after a "lockfs -fa" that would've been in cache otherwise.
By the way, I'm really sure that "sync" will be sufficient. It writes
everything to disk, but that does not ensure that the cached stuff in
RAM is invalidated. And if it's not invalidated, then when you go to
read it again, you will be reading it from cache rather than disk, so
you aren't verifying that the data can really be read from disk.
- Logan