Disk IO caching?
Disk IO caching?
How is disk caching done in most OSs?
I have this idea, but it seems so simple that it seems every OS would have some form of it..
anyway..
So disk reading is given top priority... If a process requests to read a file, it happens right then and there as long as another process isn't also reading a file(in which it'd just have to be put on a queue)
If a file requests to write a file, then that gets put on a memory cache. When there is some free time for the disk(as in, no process trying to read from it) the cache is synched to the disk. Also this works so that when reading from a freshly written file, it is very fast as it is cached.
I have this idea, but it seems so simple that it seems every OS would have some form of it..
anyway..
So disk reading is given top priority... If a process requests to read a file, it happens right then and there as long as another process isn't also reading a file(in which it'd just have to be put on a queue)
If a file requests to write a file, then that gets put on a memory cache. When there is some free time for the disk(as in, no process trying to read from it) the cache is synched to the disk. Also this works so that when reading from a freshly written file, it is very fast as it is cached.
-
- Member
- Posts: 2566
- Joined: Sun Jan 14, 2007 9:15 pm
- Libera.chat IRC: miselin
- Location: Sydney, Australia (I come from a land down under!)
- Contact:
Re: Disk IO caching?
You should be able to do multiple reads at once - there's nothing in the filesystem or disk drive stopping youIf a process requests to read a file, it happens right then and there as long as another process isn't also reading a file(in which it'd just have to be put on a queue)
I'm personally a fan of reading into cache (where possible, you'll need to be able to dynamically resize your cache area if you run out of RAM) and then all future reads go from that cache, and then as soon as a write is performed it not only updates the cache but also writes to disk.f a file requests to write a file, then that gets put on a memory cache. When there is some free time for the disk(as in, no process trying to read from it) the cache is synched to the disk. Also this works so that when reading from a freshly written file, it is very fast as it is cached.
This means that a write will *always* go to disk, but a read can come from cache.
There are of course other, faster, ways to go about caching but this is probably the easiest to get your head around and requires minimal effort to implement.
- salil_bhagurkar
- Member
- Posts: 261
- Joined: Mon Feb 19, 2007 10:40 am
- Location: India
Re: Disk IO caching?
Write caches are maintained until the system finds enough writes to a physically localised space on the hard disk so that they can be done in the shortest possible time with minimum possible time consuming disk head seeks. Waiting until the system goes idle, may not always work, as you don't always have enough ram to store all those writes.
Re: Disk IO caching?
well, of course there would be a limit until it starts flushing the cache..salil_bhagurkar wrote:Write caches are maintained until the system finds enough writes to a physically localised space on the hard disk so that they can be done in the shortest possible time with minimum possible time consuming disk head seeks. Waiting until the system goes idle, may not always work, as you don't always have enough ram to store all those writes.
Also, head seek time is negligible with most any harddrive made within the last few years.. and there there is thumbdrives and SSDs... I don't think that kind of approach would work well on the new hardware people have...
-
- Member
- Posts: 93
- Joined: Mon Nov 24, 2008 9:13 am
Re: Disk IO caching?
For SSDs there are other things you probably want to consider. E.g. erase block size. There is a T13 proposal related to that: Soild State Drive Identify Proposal for ATA8-ACS.earlz wrote:well, of course there would be a limit until it starts flushing the cache..salil_bhagurkar wrote:Write caches are maintained until the system finds enough writes to a physically localised space on the hard disk so that they can be done in the shortest possible time with minimum possible time consuming disk head seeks. Waiting until the system goes idle, may not always work, as you don't always have enough ram to store all those writes.
Also, head seek time is negligible with most any harddrive made within the last few years.. and there there is thumbdrives and SSDs... I don't think that kind of approach would work well on the new hardware people have...
--TS
- salil_bhagurkar
- Member
- Posts: 261
- Joined: Mon Feb 19, 2007 10:40 am
- Location: India
Re: Disk IO caching?
IMHO disk seek times are still not that negligible on newer hard drives. I don't know much about high end systems which might use better technology. Random r/w on disks still slow down the speed to a few MBs per second as against sequential r/w which can go up to 50-60 MB per second for a typical home computer.earlz wrote:well, of course there would be a limit until it starts flushing the cache..salil_bhagurkar wrote:Write caches are maintained until the system finds enough writes to a physically localised space on the hard disk so that they can be done in the shortest possible time with minimum possible time consuming disk head seeks. Waiting until the system goes idle, may not always work, as you don't always have enough ram to store all those writes.
Also, head seek time is negligible with most any harddrive made within the last few years.. and there there is thumbdrives and SSDs... I don't think that kind of approach would work well on the new hardware people have...
As far as drives other than hds are concerned, you don't need to worry about data localisation.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Disk IO caching?
Its also the head settling time that cuts you down here. If you read a disk sequentially, you read all the tracks in the same cylinder in succession. Which means that you only need one head seek, once that's done, you can chunk out all the tracks in one go the moment they pass under the head.
If you do random access, you'll end up moving the head around for each (group of) sector, which means you have the seek and settling time for each individual access instead of just one.
While HDs are the easiest to to discuss, data locality is just as important on some non-hd media, like floppy drives, ZIP drives and CDs (where the delays are much higher) as well as tape drives (I so hope you are not going to seek on one of those) to name a few.
Only solid state storage are close to being as fast with random reads compared to sequential reads.
If you do random access, you'll end up moving the head around for each (group of) sector, which means you have the seek and settling time for each individual access instead of just one.
While HDs are the easiest to to discuss, data locality is just as important on some non-hd media, like floppy drives, ZIP drives and CDs (where the delays are much higher) as well as tape drives (I so hope you are not going to seek on one of those) to name a few.
Only solid state storage are close to being as fast with random reads compared to sequential reads.
Re: Disk IO caching?
So this seems to imply that if the next read is close enough (in terms of disk geography) to the new read, it may be better to continue reading and discard some data, as there's no disk up/down/settle, right?Combuster wrote:Its also the head settling time that cuts you down here. If you read a disk sequentially, you read all the tracks in the same cylinder in succession. Which means that you only need one head seek, once that's done, you can chunk out all the tracks in one go the moment they pass under the head. If you do random access, you'll end up moving the head around for each (group of) sector, which means you have the seek and settling time for each individual access instead of just one.
JAL
Re: Disk IO caching?
Hi,
For CD-ROMs the data is in a big spiral (and not in cylinders like on hard drives), so continuing to read can prevent the need to shift the head. However, I'd assume CD-ROM drives automatically do this for you (e.g. if you ask for a sector that's fairly close to where the head is, then it'll keep tracking the spiral instead of shifting the heads). Spirals suck for a different reason - reading from sector N and then asking for sector N-1 always involves shifting the heads.
Also note that if you always do reads/writes according to how close they are to the current head position, then you can have reads/writes at the start or end of the disk that never get done (if the heads are kept busy doing reads/write in the middle of the disk). Even if these reads/writes at the start or end of the disk do get done it's likely that they'll wait for an unfair amount of time before they're done. To prevent this it's better to do things in ascending order (e.g. do all reads/writes for cylinder 0, then cylinder 1, then cylinder 2, ..., then the last cylinder, then wrap back to cylinder 0 again). Fortunately, this works well for CD_ROMs too, and you can just do all reads/writes in order of their starting LBA address for both hard drives and CD-ROMs.
The other problem is that some reads/writes are more important than others. For example, if you need to read a page from swap space and also need to read a page for defragmenting the file system in the background, and the second read is closer to the heads, then which read would you do first?
Cheers,
Brendan
For hard drives, if the new read is on a different cylinder then you have to shift the disk heads to get to it and continuing to read from the wrong cylinder won't help. If the data is on the same cylinder then you can access it without shifting the heads, and continuing to read still won't help.jal wrote:So this seems to imply that if the next read is close enough (in terms of disk geography) to the new read, it may be better to continue reading and discard some data, as there's no disk up/down/settle, right?Combuster wrote:Its also the head settling time that cuts you down here. If you read a disk sequentially, you read all the tracks in the same cylinder in succession. Which means that you only need one head seek, once that's done, you can chunk out all the tracks in one go the moment they pass under the head. If you do random access, you'll end up moving the head around for each (group of) sector, which means you have the seek and settling time for each individual access instead of just one.
For CD-ROMs the data is in a big spiral (and not in cylinders like on hard drives), so continuing to read can prevent the need to shift the head. However, I'd assume CD-ROM drives automatically do this for you (e.g. if you ask for a sector that's fairly close to where the head is, then it'll keep tracking the spiral instead of shifting the heads). Spirals suck for a different reason - reading from sector N and then asking for sector N-1 always involves shifting the heads.
Also note that if you always do reads/writes according to how close they are to the current head position, then you can have reads/writes at the start or end of the disk that never get done (if the heads are kept busy doing reads/write in the middle of the disk). Even if these reads/writes at the start or end of the disk do get done it's likely that they'll wait for an unfair amount of time before they're done. To prevent this it's better to do things in ascending order (e.g. do all reads/writes for cylinder 0, then cylinder 1, then cylinder 2, ..., then the last cylinder, then wrap back to cylinder 0 again). Fortunately, this works well for CD_ROMs too, and you can just do all reads/writes in order of their starting LBA address for both hard drives and CD-ROMs.
The other problem is that some reads/writes are more important than others. For example, if you need to read a page from swap space and also need to read a page for defragmenting the file system in the background, and the second read is closer to the heads, then which read would you do first?
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
- salil_bhagurkar
- Member
- Posts: 261
- Joined: Mon Feb 19, 2007 10:40 am
- Location: India
Re: Disk IO caching?
That is a very interesting concept and you can very effectively see how it works when you work with vista+ oses. They have this I/O priority which generally assigns a background priority for processes like defragmentation, due to which in this case, even if loading the page from the swap space is expensive, it would be given more importance. This makes the system more responsive and more productive where it needs to be.Brendan wrote: The other problem is that some reads/writes are more important than others. For example, if you need to read a page from swap space and also need to read a page for defragmenting the file system in the background, and the second read is closer to the heads, then which read would you do first?
Re: Disk IO caching?
Hi,
The other thing to consider is being able to cancel I/O requests. For example, imagine if the application asks the VFS to read from a file, the VFS asks the file system to fetch some data for the file, then the file system asks the storage device to load some sectors, then the application terminates. In this case you'd want the VFS to tell the file system to cancel the read, and the file system to tell the file system to cancel the read, and for everything to work right regardless of whether the read was canceled or not (because the operation may have been canceled too late).
Cheers,
Brendan
Yes.salil_bhagurkar wrote:That is a very interesting concept and you can very effectively see how it works when you work with vista+ oses. They have this I/O priority which generally assigns a background priority for processes like defragmentation, due to which in this case, even if loading the page from the swap space is expensive, it would be given more importance. This makes the system more responsive and more productive where it needs to be.
The other thing to consider is being able to cancel I/O requests. For example, imagine if the application asks the VFS to read from a file, the VFS asks the file system to fetch some data for the file, then the file system asks the storage device to load some sectors, then the application terminates. In this case you'd want the VFS to tell the file system to cancel the read, and the file system to tell the file system to cancel the read, and for everything to work right regardless of whether the read was canceled or not (because the operation may have been canceled too late).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Disk IO caching?
Well, not exactly. For CD-Rs, there are two formatting "types" for blank media. There are Audio CDs, and Data CDs. 99% of what you will find in stores is Data CDs. You probably need to order Audio CDs directly from a distributor (and they cost a lot more). Data CDs are recorded in actual cylinders, not a helical spiral. Audio CDs are helical spirals. You can burn both audio and data to either type of CD -- the only real difference is that on an Audio CD the head does not need to be moved from track to track if you are reading the disc sequentially. So if you burn music to a data CD, then you are making the CD drive to a teeny bit of extra work, because at the end of each track it is expecting the head to automatically move to the next track -- but it doesn't -- so it must do one extra head movement on every rotation.Brendan wrote: For CD-ROMs the data is in a big spiral (and not in cylinders like on hard drives)
Re: Disk IO caching?
Hi,
"The blank disc has a pre-groove track onto which the data are written. The pre-groove track, which also contains timing information, ensures that the recorder follows the same spiral path as a conventional CD."
AFAIK all optical disks, including CD, CD-R, CD-RW, DVD, Blu-Ray, etc are written in spirals. This is partly for compatibility with the original CD format (e.g. so no extra mechanics or control logic are needed in drives to support all formats), and mainly to avoid undesirable jumps/delays when reading data sequentially.
Cheers,
Brendan
For CD-R, wikipedia says:bewing wrote:Well, not exactly. For CD-Rs, there are two formatting "types" for blank media. There are Audio CDs, and Data CDs. 99% of what you will find in stores is Data CDs. You probably need to order Audio CDs directly from a distributor (and they cost a lot more). Data CDs are recorded in actual cylinders, not a helical spiral.Brendan wrote: For CD-ROMs the data is in a big spiral (and not in cylinders like on hard drives)
"The blank disc has a pre-groove track onto which the data are written. The pre-groove track, which also contains timing information, ensures that the recorder follows the same spiral path as a conventional CD."
AFAIK all optical disks, including CD, CD-R, CD-RW, DVD, Blu-Ray, etc are written in spirals. This is partly for compatibility with the original CD format (e.g. so no extra mechanics or control logic are needed in drives to support all formats), and mainly to avoid undesirable jumps/delays when reading data sequentially.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Disk IO caching?
I've been a (tiny) computer store owner for 12 years, and my info comes directly from my recordable media distributor (Horizon USA). So I'm willing to directly challenge wikipedia on this one. There are both "audio" and "data" blank CDs, they are different, and the difference is helical vs circular tracks.
- JackScott
- Member
- Posts: 1033
- Joined: Thu Dec 21, 2006 3:03 am
- Location: Hobart, Australia
- Mastodon: https://aus.social/@jackscottau
- GitHub: https://github.com/JackScottAU
- Contact:
Re: Disk IO caching?
<$0.02>
While I'm not confident either way, what bewing says does make sense in that I can remember back in the day when your old audio CD players couldn't play CD-Rs. My portable CD player doesn't read CD-RWs. Just anecdotal evidence, but it would make sense.
Although, it could have something to do with the reflective chemicals used?
</$0.02>
While I'm not confident either way, what bewing says does make sense in that I can remember back in the day when your old audio CD players couldn't play CD-Rs. My portable CD player doesn't read CD-RWs. Just anecdotal evidence, but it would make sense.
Although, it could have something to do with the reflective chemicals used?
</$0.02>