Monitoring memory access

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
onlyonemac
Member
Member
Posts: 1146
Joined: Sat Mar 01, 2014 2:59 pm

Monitoring memory access

Post by onlyonemac »

How can I get the kernel to "monitor" all memory accesses? So like, say I have something where a memory block is given out to a process and then the kernel needs to see when the process writes to that memory block so that it can perform some other action. In particular, I want the kernel to put the write somewhere else instead, so that the original memory block is unaltered (this is to implement an "overlay" system that is transparent to processes, while still allowing them to perform direct memory modification operations rather than having to call a kernel routine themselves). The only approach that I can think of is to essentially deny access to all segments/pages, then when a segmentation fault/page fault occurs have the kernel do whatever memory access the process was trying to do, or whatever other operation is required, however that seems like it would have too big a performance hit. I imagine that something like this could be done on a page-by-page basis, with the kernel deciding which pages it would like to "monitor" and then getting page faults for only those ones, but then memory blocks would have to be multiples of 4 kilobytes which is going to use far too much memory as the design of my operating systems involves processes using a lot of small (in the order of a few bytes) memory blocks.

EDIT: I *can* require processes to call a kernel routine instead of writing directly, or at least to "tell" the kernel that they want to write and the kernel can then give them an alternative pointer to use, however this is a bit of a problem for two reasons:
  1. It doesn't protect against buggy code which uses the pointer directly, or uses the wrong pointer
  2. In the latter case, processes may say that they want to write when in fact they don't, leading to memory wastage by the kernel allocating more overlay blocks than are necessary (these overlay blocks are also saved to disk in the end, so having too many of them seems like a very bad idea)
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.

Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
onlyonemac
Member
Member
Posts: 1146
Joined: Sat Mar 01, 2014 2:59 pm

Re: Monitoring memory access

Post by onlyonemac »

Never mind I'll just have one function to get a read-only pointer and another function to get a writeable pointer.
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.

Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
User avatar
SpyderTL
Member
Member
Posts: 1074
Joined: Sun Sep 19, 2010 10:05 pm

Re: Monitoring memory access

Post by SpyderTL »

Just a thought... you might be able to get what you want by enabling paging, and marking all pages not available. Then in your page fault handler, you could "emulate" the data access, and then return to the next instruction.

Probably way more trouble than it's worth, though.

I think Bochs will let you set break points on memory accesses in certain ranges. I found this page online, but I've obviously never tried it.

https://www.hex-rays.com/products/ida/s ... 1648.shtml
Project: OZone
Source: GitHub
Current Task: LIB/OBJ file support
"The more they overthink the plumbing, the easier it is to stop up the drain." - Montgomery Scott
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Monitoring memory access

Post by Brendan »

Hi,
onlyonemac wrote:How can I get the kernel to "monitor" all memory accesses? So like, say I have something where a memory block is given out to a process and then the kernel needs to see when the process writes to that memory block so that it can perform some other action. In particular, I want the kernel to put the write somewhere else instead, so that the original memory block is unaltered (this is to implement an "overlay" system that is transparent to processes, while still allowing them to perform direct memory modification operations rather than having to call a kernel routine themselves).
I spent some time trying to invent something similar.

More specifically; I want "virtualised device processes" that receive "read/write N bytes at offset X in device's memory mapped IO area" messages (and receive "read/write from device's IO port" messages and send "device generated an IRQ" messages); where these "virtualised device processes" can emulate a device that doesn't exist at all, or could be using "pass through" to a real device that does exist. The idea was that system emulators (e.g. like Bochs, Qemu, etc) could just use these "virtualised device processes" and avoid writing device emulation code in every emulator while also providing the ability to assign real devices to virtual machines. This is all relatively easy. Then I decided it'd be awesome if the kernel (without any virtual machine at all) was able to trick normal device drivers into using these "virtualised device processes", as this would make developing device drivers much easier. For example, you could have a minimal "virtualised device processes" that's using "pass through" to the real device; that does logging, restores the device's state if the device driver crashes, prevents the driver from doing insanely dodgy things, etc.

For this reason; I wanted the kernel to detect reads/writes that a device driver makes to a "fake memory mapped IO" area, and create/send those "read/write N bytes at offset X in device's memory mapped IO area" messages to a "virtualised device process" instead.

The scheme I came up with is:
  • All of the pages are marked as "not present"
  • When an access is made, a page fault occurs and the page fault handler knows if it's a read or a write and knows the address that was accessed. The problem is that the page fault handler can't know the size (e.g. if it was trying to read one byte, or 2 bytes, or ...). To work around that page fault handler mapped a dummy page into the area that was accessed and configured the CPU's debugging hardware for 4 "data breakpoints" - one at "address + 1", one at "address + 2", one "address + 5" and one at "address + 9". The page fault handler also enabled single-step debugging. Once that's done page fault handler returns to the code that caused the page fault, which repeats the previous access.
  • For one or more reasons the instruction that does the access causes a debug exception. The debug exception handler examines the breakpoints to determine the size of the access. If it was a 1-byte access none of the breakpoints would trigger, if it was a 2-byte access only the first breakpoint would trigger, and so on. At this point we'd know the address, if it was a read/write and the size. Also, if it was a write we'd also know the value being written, as it'd be stored to that dummy page. At this point reads and writes need different behaviour:
    • For writes; we send the "write N bytes at offset X in device's memory mapped IO area" message and set the page back to "not present" and disable the CPU's debugging stuff, and return from the debug exception like normal.
    • For reads; we send a "read N bytes at offset X in device's memory mapped IO area" message and wait for a reply message (containing the data that should be read), then put the data into the right place of the dummy page and setup the CPU's debugging for single-step only; and return from the first debug exception. When the second debug exception occurs (after the read has occurred) we set the page back to "not present" and disable the CPU's debugging stuff, and return from the debug exception like normal.
The first problem is you can only setup 4 data breakpoints, which means you can only handle 5 different sizes (e.g. 1, 2, 4, 8 and 16 bytes), but the CPU is capable of doing more than 5 sizes (6 bytes with SIDT/SGDT, 32 bytes with either 256-bit AVX or the pushad/popad instruction, 64 bytes with AVX-512). For "100% robust" you can work around this using 2 or more debug exceptions - e.g. the first to determine "1, 2, 4 or 6 bytes or larger", and if it's larger have a second debug exception that determines "8, 16, 32 or 64 bytes".

The second problem I'm sure you've already noticed - it's insanely complicated! :-)

Sadly, I couldn't think of a simpler way. The only other way I could think of is for the page fault handler to examine the instruction that caused the page fault and emulate the instruction itself (e.g. with a huge "switch(opcodebyte1) { ....." mess), which would take a lot of time to write and test.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
onlyonemac
Member
Member
Posts: 1146
Joined: Sat Mar 01, 2014 2:59 pm

Re: Monitoring memory access

Post by onlyonemac »

@Brendan: Your idea sounds incredibly interesting and your implementation is very clever! Although I think that for my situation it'll be better to have a "read pointer" and a "write pointer". This also means that I can return a NULL pointer for things that I don't want processes writing to. All of this, of course, assumes that processes are decent enough to use the correct pointer; malicious code will always be malicious, and buggy code will always be buggy, and defending against them isn't high on my priority list at this stage and I can always implement some sort of memory protection later on if desired.
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.

Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Re: Monitoring memory access

Post by Antti »

Brendan wrote:The only other way I could think of is for the page fault handler to examine the instruction that caused the page fault and emulate the instruction itself (e.g. with a huge "switch(opcodebyte1) { ....." mess), which would take a lot of time to write and test.
I would not totally discard this idea. If you filtered all the common cases with this, you would have a very good optimization. No need to emulate the instruction but just to find out the memory access size.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Monitoring memory access

Post by Brendan »

Hi,
Antti wrote:
Brendan wrote:The only other way I could think of is for the page fault handler to examine the instruction that caused the page fault and emulate the instruction itself (e.g. with a huge "switch(opcodebyte1) { ....." mess), which would take a lot of time to write and test.
I would not totally discard this idea. If you filtered all the common cases with this, you would have a very good optimization. No need to emulate the instruction but just to find out the memory access size.
You're right. It would be possible/beneficial to optimise some common cases - e.g. the "mov" instruction, and all string instructions ("rep movs", "rep cmps", "rep scas", "rep ins", "rep outs") where a little work can avoid a huge number of exceptions. For these cases it'd be faster (and not much harder) to emulate the entire instruction (instead of having a "setup page fault", then allowing the CPU to execute that one instruction, then having a second "tear-down" debug exception).

I just wouldn't want to do this for all instructions (at least not for 80x86 where most instructions can access memory - for a more "RISC like" CPU that has a small number of load/store instructions it'd be much much easier).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
alexfru
Member
Member
Posts: 1112
Joined: Tue Mar 04, 2014 5:27 am

Re: Monitoring memory access

Post by alexfru »

If you decode an instruction to find out the size of its memory operand, then you can execute it directly by tweaking the operand encoding to become, say, byte/word/dword ptr [foo] with foo being the variable, through which you exchange data with the virtual/emulated device. No extra page faults or debug exceptions to handle, no instructions to emulate, just define a subset of the allowed instructions and decode them enough for this and add a little glue code to set up the initial state of the general purpose registers and to retrieve their post-execution state. You may even decode a bit forward to see if you can execute several instructions in this fashion to reduce the overhead.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Monitoring memory access

Post by Combuster »

There are other ways to avoid the debug registers
kernel documentation wrote:(...)
MMIO accesses are recorded via page faults. Just before __ioremap() returns,
the mapped pages are marked as not present. Any access to the pages causes a
fault. The page fault handler calls mmiotrace to handle the fault. Mmiotrace
marks the page present, sets TF flag to achieve single stepping and exits the
fault handler. The instruction that faulted is executed and debug trap is
entered. Here mmiotrace again marks the page as not present. The instruction
is decoded to get the type of operation (read/write), data width and the value
read or written. These are stored to the trace log.
(...)
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
Post Reply