inserting/deleting into/from the middle of large files? - Linux
This is a discussion on inserting/deleting into/from the middle of large files? - Linux ; 'lo there, =)
having a DVB-S receiver running Linux (PPC) I recently found
myself wondering how to delete data from the middle of a
large file (stripping a recording of ads, for example).
Currently, the common way of doing this ...
-
inserting/deleting into/from the middle of large files?
'lo there, =)
having a DVB-S receiver running Linux (PPC) I recently found
myself wondering how to delete data from the middle of a
large file (stripping a recording of ads, for example).
Currently, the common way of doing this seems to be by
copying the file (leaving a part behind) and then deleting
the original. Of course, on a large file (say 12 GB) this
can take an eternity; also you'll run into trouble if the
filesystem is nearly full...
Is it me, or doesn't that make any sense?
Having a block-oriented filesystem, operations like this
should only take an instance.
So basically I'm looking for functions to:
- insert a chunk into a file
- delete a chunk from a file
- move a chunk from one file into another
All of the above would be very useful when dealing with
large data, such as DVB-recordings (Linux being the nr.1 OS
on those receivers, naturally:-).
Since I couldn't find any system calls providing such
functionality, I am now asking You Gurus whether I was just
to stupid to find them, or if there should indeed be a
standard (Posix?) for providing such functionality. (One
call could be to determine if the filesystem supports those
operations fast - it could return a version for instance, 0
meaning that the operations, although provided, will be
slow.)
Thank you for any help! :-)
LC (myLC@gmx.de)
-
Re: inserting/deleting into/from the middle of large files?
> how to delete data from the middle of a large file?
Overwrite the data in its current place, turning the bytes
into a "comment" or "skip this record."
> Currently, the common way of doing this seems to be by
> copying the file (leaving a part behind) and then deleting
> the original.
Yes.
> So basically I'm looking for functions to:
> - insert a chunk into a file
> - delete a chunk from a file
> - move a chunk from one file into another
Most existing file systems for Linux do not have any such functionality.
Common methods of accomplishing with the task are:
1) Do it the obvious and slow way.
2) Generate smaller pieces in the first place; 'cat' them
together for ordinary usage.
3) Run a process which "serves" and "delivers" the data on demand.
Use an index, btree, etc. to skip the "deleted" parts.
--
-
Re: inserting/deleting into/from the middle of large files?
In article <1183636719.088746.206070@w5g2000hsg.googlegroups.c om>,
LC wrote:
> 'lo there, =)
>
> having a DVB-S receiver running Linux (PPC) I recently found
> myself wondering how to delete data from the middle of a
> large file (stripping a recording of ads, for example).
> Currently, the common way of doing this seems to be by
> copying the file (leaving a part behind) and then deleting
> the original. Of course, on a large file (say 12 GB) this
> can take an eternity; also you'll run into trouble if the
> filesystem is nearly full...
> Is it me, or doesn't that make any sense?
> Having a block-oriented filesystem, operations like this
> should only take an instance.
>
> So basically I'm looking for functions to:
> - insert a chunk into a file
> - delete a chunk from a file
> - move a chunk from one file into another
>
> All of the above would be very useful when dealing with
> large data, such as DVB-recordings (Linux being the nr.1 OS
> on those receivers, naturally:-).
I think you could do it by mmap()ping the file and then using memmove()
to shift the part of the file after the chunk being inserted or deleted.
However, with a 12 GB file this would only work in a 64-bit OS.
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
-
Re: inserting/deleting into/from the middle of large files?
LC writes:
> 'lo there, =)
>
> having a DVB-S receiver running Linux (PPC) I recently found
> myself wondering how to delete data from the middle of a
> large file (stripping a recording of ads, for example).
> Currently, the common way of doing this seems to be by
> copying the file (leaving a part behind) and then deleting
> the original. Of course, on a large file (say 12 GB) this
> can take an eternity; also you'll run into trouble if the
> filesystem is nearly full...
> Is it me, or doesn't that make any sense?
> Having a block-oriented filesystem, operations like this
> should only take an instance.
>
> So basically I'm looking for functions to:
> - insert a chunk into a file
> - delete a chunk from a file
> - move a chunk from one file into another
>
> All of the above would be very useful when dealing with
> large data, such as DVB-recordings (Linux being the nr.1 OS
> on those receivers, naturally:-).
>
> Since I couldn't find any system calls providing such
> functionality, I am now asking You Gurus whether I was just
> to stupid to find them, or if there should indeed be a
> standard (Posix?) for providing such functionality. (One
> call could be to determine if the filesystem supports those
> operations fast - it could return a version for instance, 0
> meaning that the operations, although provided, will be
> slow.)
>
>
> Thank you for any help! :-)
You're asking that on unix newsgroups.
The philosophy of unix is to be simple.
For the files, unix offers only a simple "sequence of byte"
abstraction, and let applications implement anything more complex they
want over this simple layer.
Some older OS offered more complex type of files, like sequential
record files, indexed files, variable length record, fixed length,
etc. When the main peripheral was the card reader and card puncher,
it looked like a logical way to organize files, as sequences of
80-byte records... But these kind of filesystem didn't survive, in
part because it made both the system and the applications more
complex.
But this doesn't prevent us to implement these kind of files in unix
if they're needed. For example, libdb (see dbopen(3)) implements
various kinds of indexed record files. With this kind of files, you
could more easily insert or remove blocs in the middle of the file.
And of course, if libdb doesn't offer the exact features you need,
there are several other such libraries, and you can implement your
own.
Of course, now the problem is to make the applications use these
libraries, to structure their files in a meaningfull way.
The main problem in your question is that these DVB-S files have
probably a structure, and if this structure doesn't correspond to the
blocks of the file system, it won't serve you to have the ability to
insert or move chunks from under. You have to know the file format.
In particular, even if the file is structured in some kind of blocks,
there's no reason why the transitions from movie to ad or from ad to
movie fall exactly on block frontiers. And there is no reason why the
file should be still consistent after having removed some blocks in
the middle: the file format may specify some offsets or indices in the
file, and removing some blocs would invalidate these offsets rendering
the file unusable in whole.
--
__Pascal Bourguignon__ http://www.informatimago.com/
NOTE: The most fundamental particles in this product are held
together by a "gluing" force about which little is currently known
and whose adhesive power can therefore not be permanently
guaranteed.
-
Re: inserting/deleting into/from the middle of large files?
>having a DVB-S receiver running Linux (PPC) I recently found
>myself wondering how to delete data from the middle of a
>large file (stripping a recording of ads, for example).
>Currently, the common way of doing this seems to be by
>copying the file (leaving a part behind) and then deleting
>the original. Of course, on a large file (say 12 GB) this
>can take an eternity; also you'll run into trouble if the
>filesystem is nearly full...
>Is it me, or doesn't that make any sense?
>Having a block-oriented filesystem, operations like this
>should only take an instance.
>
>So basically I'm looking for functions to:
>- insert a chunk into a file
>- delete a chunk from a file
>- move a chunk from one file into another
These are easy to do if you're willing to make them slow enough.
insert a chunk of size N into a file:
Copy from the insertion point to the end of the file to a point
N bytes past the insertion point (make sure not to do destructive
overlap). Now write the new data starting
at the insertion point.
delete a chunk of size N into a file:
Copy from the first byte to be kept after the deletion segment to
the end of the file to the start of the deletion point. (make sure not
to do destructive overlap). ftruncate() N bytes off the end of the file.
move a chunk from one file into another:
I think this is an insertion followed by a deletion.
-
Re: inserting/deleting into/from the middle of large files?
On 5 jul, 13:58, LC wrote:
> 'lo there, =)
>
> having a DVB-S receiver running Linux (PPC) I recently found
> myself wondering how to delete data from the middle of a
> large file (stripping a recording of ads, for example).
Wen removing _multiple_ sections from a video recording,
the video editor just makes a list with start and endpoints of these
sections.
When in preview, the list is executed (displayed from the sections
specified).
As there can be _many_ sections this has some huge advantages.
There are other issues so as to point to the exact mpeg2 boundary
frame.
when selecting a splice point, this is transparent to the user.
A typical example that works that way is 'lve' (Linux Video Editor),
it just creates an edit list.
Only when all edits have been done (fast in a GUI) is the actual
final output file created made up of all the pieces you selected.
So your problem is no problem.
-
Re: inserting/deleting into/from the middle of large files?
>>... how to delete data from the middle of a large file ...
>>So basically I'm looking for functions to:
>>- insert a chunk into a file
>>- delete a chunk from a file
>>- move a chunk from one file into another
> I think you could do it by mmap()ping the file and then using memmove()
> to shift the part of the file after the chunk being inserted or deleted.
> However, with a 12 GB file this would only work in a 64-bit OS.
Using mmap + memmove [+ truncate] does save space in the filesystem.
However: there is added CPU and memory time (fetch+store, cache misses,
page faults) to perform the memmove(), the disk transfer burden is no less
than a series of 'cp' or 'dd' commands, and using memmove is much more
fragile in the face of power failure. If you avoid a journaling file
system, then current commodity SATA drives can deliver 30 to 60 MB/s,
so 12 GB in plus 12 GB out is around 8 to 15 minutes. This might be
less than watching the commercials once. ;-)
--
-
Re: inserting/deleting into/from the middle of large files?
Barry Margolin wrote:
> I think you could do it by mmap()ping the file and then
> using memmove() to shift the part of the file after the
> chunk being inserted or deleted. However, with a 12 GB file
> this would only work in a 64-bit OS.
Yes, but I doubt the OS or rather the FS will do anything
other than the usual copy operation. You cannot remove
something from the middle or insert into it this way (not
without copying).
---
Pascal Bourguignon wrote:
> ... For example, libdb (see dbopen(3)) implements various
> kinds of indexed record files. With this kind of files, you
> could more easily insert or remove blocs in the middle of
> the file. And of course, if libdb doesn't offer the exact
> features you need, there are several other such libraries,
> and you can implement your own.
>
> Of course, now the problem is to make the applications use
> these libraries, to structure their files in a meaningfull
> way.
Exactly. Those receivers usually have only limited
processing power. Doing many things at once, what helps a
great deal is simply passing on the MPEG2 data stream coming
from the satellite/cable to the harddisk. Anyhow, one would
have to modify the entire system and the data format. All
programs relying on the current data format would therefore
cease to function...
> The main problem in your question is that these DVB-S files
> have probably a structure, and if this structure doesn't
> correspond to the blocks of the file system, it won't serve
> you to have the ability to insert or move chunks from under.
> You have to know the file format.
>
> In particular, even if the file is structured in some kind
> of blocks, there's no reason why the transitions from movie
> to ad or from ad to movie fall exactly on block frontiers.
> And there is no reason why the file should be still
> consistent after having removed some blocks in the middle:
> the file format may specify some offsets or indices in the
> file, and removing some blocs would invalidate these offsets
> rendering the file unusable in whole.
In the DVB case, it's a stream. Being able to "operate" on a
block-basis would already help.
The filesystems, however, also have means of dealing with
files smaller than the actual block-size. I doubt that they
currently have the means to have blocks with smaller content
IN BETWEEN the chains. However, I cannot see why implemen-
ting it shouldn't be possible. If done, one could use the
functionality transparently - regardless of filetype.
Many applications could benefit from this...
pantel...@yahoo.com wrote:
> Wen removing _multiple_ sections from a video recording, the
> video editor just makes a list with start and endpoints of
> these sections.
> ... Only when all edits have been done (fast
> in a GUI) is the actual final output file created made up of
> all the pieces you selected. So your problem is no problem.
Yes, I'm very much aware of that. The actual problem is
nevertheless the "final stage". In case of my receiver there
is a script performing those operations. It is supposed to
be added to cron and run at nighttime (say 4 o'clock) as the
finalizing part can take several hours of copying on a slow
box. If there were support by the FS the same job could be
done in a few (milli)seconds. That is the problem... ;-)
Knowing what I know now (i.e., the functionality isn't there
yet) - thanks to you folks! - my question is probably better
placed in a group dealing with the actual implementation of
a filesystem such as ext3.
Again, thanks for your help! =)
Regards,
LC (myLC@gmx.de)
-
Re: inserting/deleting into/from the middle of large files?
Barry Margolin wrote:
> I think you could do it by mmap()ping the file and then
> using memmove() to shift the part of the file after the
> chunk being inserted or deleted. However, with a 12 GB file
> this would only work in a 64-bit OS.
Yes, but I doubt the OS or rather the FS will do anything
other than the usual copy operation. You cannot remove
something from the middle or insert into it this way (not
without copying).
---
Pascal Bourguignon wrote:
> ... For example, libdb (see dbopen(3)) implements various
> kinds of indexed record files. With this kind of files, you
> could more easily insert or remove blocs in the middle of
> the file. And of course, if libdb doesn't offer the exact
> features you need, there are several other such libraries,
> and you can implement your own.
>
> Of course, now the problem is to make the applications use
> these libraries, to structure their files in a meaningfull
> way.
Exactly. Those receivers usually have only limited
processing power. Doing many things at once, what helps a
great deal is simply passing on the MPEG2 data stream coming
from the satellite/cable to the harddisk. Anyhow, one would
have to modify the entire system and the data format. All
programs relying on the current data format would therefore
cease to function...
> The main problem in your question is that these DVB-S files
> have probably a structure, and if this structure doesn't
> correspond to the blocks of the file system, it won't serve
> you to have the ability to insert or move chunks from under.
> You have to know the file format.
>
> In particular, even if the file is structured in some kind
> of blocks, there's no reason why the transitions from movie
> to ad or from ad to movie fall exactly on block frontiers.
> And there is no reason why the file should be still
> consistent after having removed some blocks in the middle:
> the file format may specify some offsets or indices in the
> file, and removing some blocs would invalidate these offsets
> rendering the file unusable in whole.
In the DVB case, it's a stream. Being able to "operate" on a
block-basis would already help.
The filesystems, however, also have means of dealing with
files smaller than the actual block-size. I doubt that they
currently have the means to have blocks with smaller content
IN BETWEEN the chains. However, I cannot see why implemen-
ting it shouldn't be possible. If done, one could use the
functionality transparently - regardless of filetype.
Many applications could benefit from this...
pantel...@yahoo.com wrote:
> Wen removing _multiple_ sections from a video recording, the
> video editor just makes a list with start and endpoints of
> these sections.
> ... Only when all edits have been done (fast
> in a GUI) is the actual final output file created made up of
> all the pieces you selected. So your problem is no problem.
Yes, I'm very much aware of that. The actual problem is
nevertheless the "final stage". In case of my receiver there
is a script performing those operations. It is supposed to
be added to cron and run at nighttime (say 4 o'clock) as the
finalizing part can take several hours of copying on a slow
box. If there were support by the FS the same job could be
done in a few (milli)seconds. That is the problem... ;-)
Knowing what I know now (i.e., the functionality isn't there
yet) - thanks to you folks! - my question is probably better
placed in a group dealing with the actual implementation of
a filesystem such as ext3.
Again, thanks for your help! =)
Regards,
LC (myLC@gmx.de)
-
Re: inserting/deleting into/from the middle of large files?
Sorry 'bout the double post - **** Google's new beta interface! :-P
-
Re: inserting/deleting into/from the middle of large files?
On a sunny day (Fri, 06 Jul 2007 11:38:02 -0700) it happened LC
wrote in <1183747082.651425.12440@n60g2000hse.googlegroups.c om>:
>pantel...@yahoo.com wrote:
>
>> Wen removing _multiple_ sections from a video recording, the
>> video editor just makes a list with start and endpoints of
>> these sections.
>> ... Only when all edits have been done (fast
>> in a GUI) is the actual final output file created made up of
>> all the pieces you selected. So your problem is no problem.
>
>Yes, I'm very much aware of that. The actual problem is
>nevertheless the "final stage". In case of my receiver there
>is a script performing those operations. It is supposed to
>be added to cron and run at nighttime (say 4 o'clock) as the
>finalizing part can take several hours of copying on a slow
>box. If there were support by the FS the same job could be
>done in a few (milli)seconds. That is the problem... ;-)
Well I dunno, I have been recording digital sat DVB-S for many
years, starting on a AMD K6, with a SkyStar1 PCI card with hardware
mpeg2 decoder....
DVB-S TV is about 2GB max per hour, sometimes much less..
'The actual problem' is that you first need to understand the transport
stream format, then the contents of it, mp2 sound, AC3 sound, mpeg2 video,
and how to cut those streams (sound is not in the same pace as video).
Even on something acient as a K6 'copying' just is some minutes,
it only depends on the harddisk speed.
From you posting I do not get the impression that you work with HD material
(about 10GB/hour).
There is something else about removing ads too.
I have stopped doing it, because in editing I did see those ads many many times
over, much more then when just fast-forwarding the movie.
There are always issues with sound - video sync too when editing, so better
forget about it, just use fast-forward in playback.
I would leave the filesystems in one piece, they are really good.
These days I go even one step further, if I have a movie I want to keep,
say 2 hours or 4 GB, I just grab a DVD+R, and do this with the recorded .ts transport
stream:
growisofs -speed 16 -Z /dev/dvd=my_recording.ts
So now the DVD is an _image_ of the recording.
No authoring, no filesystem limits, no filesystem!!!, and play back like this:
cat /dev/dvd | mplayer -ao alsa:device=hw=1,0 -fs -cache 8192 -vop pp=0x20000 -
Cannot beat this for speed and reliability and efficiency, as it allows
4 700 000 000 bytes on a DVD, and no filesystem overhead.
Those who want to sing about wrong use of cat please do it in the bathroom.
EL Pante
By using above methods you agree to the small print.
-
Re: inserting/deleting into/from the middle of large files?
On a sunny day (Fri, 06 Jul 2007 11:38:02 -0700) it happened LC
wrote in <1183747082.651425.12440@n60g2000hse.googlegroups.c om>:
---- replay previous text -------
I would leave the filesystems in one piece, they are really good.
These days I go even one step further, if I have a movie I want to keep,
say 2 hours or 4 GB, I just grab a DVD+R, and do this with the recorded .ts transport
stream:
growisofs -speed 16 -Z /dev/dvd=my_recording.ts
So now the DVD is an _image_ of the recording.
No authoring, no file system limits, no filesystem!!!, and play back like this:
cat /dev/dvd | mplayer -ao alsa:device=hw=1,0 -fs -cache 8192 -vop pp=0x20000 -
Cannot beat this for speed and reliability and efficiency, as it allows
4 700 000 000 bytes on a DVD, and no filesystem overhead.
Those who want to sing about wrong use of cat please do it in the bathroom.
---- end previous text --------
As a side note: why use this construct?
Now suppose you have the .ts recording running all night long from 20:00 to 05:00 at night.
Just to get all movies, check them out later, = 9 hours is about 18 GB.
This does not fit a DVD+R, so check quickly where about the good stuff starts with
xine recording.ts
This gives you a time in minutes.
But now how to extract and burn the right stuff to DVD?
Say if we have 1.8GB / hour then we have .9GB / 30 minutes or 90MB / 3 minutes, 30 MB / minute.
So now we can test where the good part starts (end is less important):
dd if=recording.ts bs=30000000 skip=MINUTES | mplayer -ao alsa:device=hw=0,0 -fs -cache 8192 -vop pp=0x20000 -
Just take a guess, and use successive approximation to quickly (say 10 tries) to find
the exact start, then burn the stuff to DVD:
dd if=recording.ts bs=30000000 skip=START_MINUTES | growisofs -speed 16 -Z /dev/dvd=/dev/stdin
It will stop when the DVD is full....
You can use smaller granularity by reducing the 30000000.
So now we can write a simple script.....