Hardware support for the hard real time

DavidCooper · Post by **DavidCooper** » Fri Apr 27, 2012 1:47 pm

How long are the longest SMM interruptions that you (anyone) have been able to measure? I've often noticed visible delays when using Windows, but I don't know if that's the fault of SMM as they don't happen when running my own OS - it's hard to tell though because Windows is able to run the machine hotter, but if running the machine at a lower temperature eliminates the delays, that could be a way of enabling more rapid responses. Clearly I'm just judging by how responsive the machine is to inputs, so I might not notice an occasional delay of even a tenth of a second, but isn't that good enough for most robotic purposes?

We (humans) are capable of doing all manner of skillful things which depend on precision and precise timing, and we generally do pretty well despite making substantial errors. If you're designing something robotic that's going to race around amongst people at a hundred miles an hour it might be a different story, but unless it's purpose is military, it's more likely to be doing things at a human speed so as not to be dangerous or scary. If it's a system for driving a car, again it only has to do a little better than a human driver, so if we can drive safely despite momentary lapses lasting for several tenths of a second, a machine controlled by a standard PC isn't likely to have a problem in terms of reaction time delays caused by SMM interruptions.

If it is a dangerous machine, it should be running multiple computer control systems with a voting system to decide on actions (to guard against one of the control systems going wrong) and this would reduce the odds of them all being interrupted at the same time. For the system combining the votes, you would want to use some kind of chip that doesn't suffer from unpredictable delays, but the rest of the system could all be x86. Using multi-core processors in each would again help to hide the delays as the SMM interrupts would be less likely to affect all of them at once, even though it's possible that there would be more SMM interrupts on such a system.

Guided missiles and rockets doubtless need guaranteed minimal delays between detecting the need of a correction and applying that correction, but if you're making something dull like a robotic vacuum cleaner, the only reason for rapid response might be in avoiding hoovering up a hamster, but that's only going to matter if it's capable of recognising a hamster in the first place. If your robot vacuum cleaner is going to be hamster friendly, it's probably going to be better off using a fast processor which suffers from occasional short unpredictable delays than a slow processor which is guaranteed to be free of delays but which is also incapable of performing the task required of it - correcting the steering of a missile is a simple process depending on fast reactions, but recognising a hamster is complex and will take so long that unexpected delays are unlikely to matter. Of course, it may be that a rare SMM interrupt could last so long that an accident results, but the same applies to people when they suffer from a long lapse of attention, such as the death of a driver at the wheel - if you can produce something which is statistically safer under imperfect computer control than under even less perfect human control, your robotic device is viable.

rdos · Post by **rdos** » Fri Apr 27, 2012 2:44 pm

At least on Intel processors, it seems like the actions taking place on other cores will affect performance on a core that does some kind of polling loop, like the call gate benchmark. When performance figures differ a magnitude between runs, there must be something strange going on in the background, like some type of power management or some other reason why the cores seems to affect each others. It is notable that I run those tests with the ACPI driver loaded, and the ACPI power-management redirected to something else than SMI. Yet, the processor still performs very unpredictably.

It actually works a lot better on AMD Phenom, and once my own power-management has boosted the processor to maximum sustainable frequency, the performance test gives very predictable results. However, the problem is that most embedded boards use Intel Atom rather than multicore AMD, or the low-end, single core AMD Geode.

Brendan · Post by **Brendan** » Fri Apr 27, 2012 9:38 pm

Hi,

DavidCooper wrote:How long are the longest SMM interruptions that you (anyone) have been able to measure? I've often noticed visible delays when using Windows, but I don't know if that's the fault of SMM as they don't happen when running my own OS - it's hard to tell though because Windows is able to run the machine hotter, but if running the machine at a lower temperature eliminates the delays, that could be a way of enabling more rapid responses. Clearly I'm just judging by how responsive the machine is to inputs, so I might not notice an occasional delay of even a tenth of a second, but isn't that good enough for most robotic purposes?

I've never tried to measure delays caused by SMM (for any permutation of motherboard/chipset, firmware, RAM and CPU), but I'd estimate that for "expected worst case" you'd potentially be looking at several IO port accesses (maybe 1 us each for legacy/slow devices) plus a few thousand cycles. For a 1 GHz CPU that would imply the delay could be up to about 10 us. A human is unlikely to notice a delay less than about 20 ms, so this estimate would imply that you'd need about 2000 "worst case" SMIs to occur at almost the same time before any delay is noticeable. Basically, it's extremely unlikely that any noticeable delay is caused by SMI/SMM.

If I had to guess, I'd assume the delay you've noticed is caused by software. Most likely causes would include antivirus software that decides to "scan the world" periodically, or a relatively large process written by someone that thought garbage collection is a good idea.

rdos wrote:At least on Intel processors, it seems like the actions taking place on other cores will affect performance on a core that does some kind of polling loop, like the call gate benchmark. When performance figures differ a magnitude between runs, there must be something strange going on in the background, like some type of power management or some other reason why the cores seems to affect each others. It is notable that I run those tests with the ACPI driver loaded, and the ACPI power-management redirected to something else than SMI.

Recent Intel and AMD CPUs have "TurboBoost", where core/s will change speed depending on what other cores are doing (typically in an attempt to improve performance for single-threaded software while keeping the chip within its intended thermal envelope). Recent Intel CPUs have hyper-threading too, where work done on one logical CPU effects resources available to another. Recent AMD CPUs share some of the chip's resources between cores (e.g. FPU pipeline), which ends up causing similar effects as hyper-threading for whichever resources are shared (e.g. if both cores are doing lots of floating point operations then they might fight for the same resources, but if one or both are doing integer operations then they might not fight). Finally there's contention in memory systems - CPUs competing for shared caches, competing for bus bandwidth, competing for RAM chip accesses, etc. This isn't limited to just CPUs - e.g. a PCI device doing bus mastering can effect the time a CPU takes to access RAM.

Cheers,

Brendan

rdos · Post by **rdos** » Sat Apr 28, 2012 1:26 am

Brendan wrote:Recent Intel and AMD CPUs have "TurboBoost", where core/s will change speed depending on what other cores are doing (typically in an attempt to improve performance for single-threaded software while keeping the chip within its intended thermal envelope).

Yes, but at least on AMD, there must be a processor driver which accesses certain registers in order to change voltage and operating frequency. The processor doesn't do this on its own. However, on Intel Atom I have no processor driver, and I assume it is therefore running at it's maximum sustainable frequency as this is the usual scenario for processors running on OSes that don't support power-management.

Brendan wrote:Recent Intel CPUs have hyper-threading too, where work done on one logical CPU effects resources available to another.

Yes, the Intel Atom portable I'm using has two physical cores with hyperthreading, and ACPI thus claims it has 4 cores.

However, when I run the benchmark, I run it without other load, so I still fail to see how it could be related to hyperthreading.

To me it seems more like it could be the operating point of the processor being set too high, and then overheating would cause SMI to manipulate processor frequency and voltage. It is a mini-PC with only limited cooling facilities.

Edit: The behavior seems be due to Intel's Thermal Monitor, that will turn off the processor clock when a critical temperature is reached, and then turn it on again when temperature drops. Such a feature would be devastating for a realtime environment, especially if the motherboard has inadequate cooling for the operating frequency selected.

Rudster816 · Post by **Rudster816** » Sat Apr 28, 2012 8:45 am

rdos wrote:I'm in the middle of designing a PCB that has requirements that cannot easily be met with a standard OS running on x86 hardware. Initially, I pondered on using a multicore x86 processor, letting one core run without interrupts and SMI, but given the extreme slowness of the dual core Intel Atom processor, I've changed my mind. The gate call performance benchmarks also show very different results between runs, even if I don't do power-management on it. Those taken together made me change my mind about implementing it that way.

So, now I instead integrated a Microchip PIC controller that will handle the realtime requirements much cheaper and predictably than a x86 processor ever could. This processor could poll ports 100,000s times per second and never miss a deadline. Of course, it won't use interrupts, but will do busy-polling. All instruction timings are predictable, so it is always possible to calculate response times.

BTW, the application is to decode clock and data pulses from a magnetic card pulled by the end user. It should work even if the user pulls it very fast, something that cannot be handled predictably on our current hardware based on x86.

You're using x86 hardware for embedded use, you're bound to run in to all kinds of issues. Not even Tablet\Smart Phone manufactures have (as far as I know) decided to attempt to use Atom's for their devices.

Not to get on you or your employers case, but it kind of sounds like an ARM SoC and embedded Linux would be more than suitable. What's the point of implementing a hard real time system when all failure means is that the customer has to slide their card through the machine again? It's going to happen a lot any ways (card wear out, and so do the readers), no point in driving up the cost by introducing excessively complicated systems.

Owen · Post by **Owen** » Sat Apr 28, 2012 11:52 am

Rudster816 wrote:You're using x86 hardware for embedded use, you're bound to run in to all kinds of issues. Not even Tablet\Smart Phone manufactures have (as far as I know) decided to attempt to use Atom's for their devices.

Not to get on you or your employers case, but it kind of sounds like an ARM SoC and embedded Linux would be more than suitable. What's the point of implementing a hard real time system when all failure means is that the customer has to slide their card through the machine again? It's going to happen a lot any ways (card wear out, and so do the readers), no point in driving up the cost by introducing excessively complicated systems.

Reading a magnetic stripe is, to me, a soft realtime operation: If it fails, all you have is a mildly inconvenienced user who has to retry. Its also, however, one which requires precise timing, something a PC cannot offer.

Incidentally, you can now get an Atom powered phone. x86 based embedded systems are very common; Live TV shows, touring music acts, and nightclubs are these days all full of lighting dekss which are little more than PCs, mostly running some form of Windows Embedded, occasionally Linux, with the controls and outputs little more than simple internal USB devices. Most modern digital sound desks are similar, except the actual mixing is normally left to dedicated DSPs (because latency is extremely critical in this environment, and because PCs don't generally have the needed CPU power). Multi-track recorders are similar (likely some form of custom PCI-Express card connected to the audio interface).

In spite of this, when was the last time you saw a lighting or sound control failure during a live show? I've certainly only ever seen a couple,those were at my place of work, and they were related to bugs in the software running on the console rather than the underlying system and hardware.

If theres a ready built, heavily tested, well understood system which meets your requirements, why not build upon it?

rdos · Post by **rdos** » Sat Apr 28, 2012 3:41 pm

Rudster816 wrote:You're using x86 hardware for embedded use, you're bound to run in to all kinds of issues. Not even Tablet\Smart Phone manufactures have (as far as I know) decided to attempt to use Atom's for their devices.

Well, I've pretty much decided not to use Atom as it is a real lousy processor. We currently use AMD Geode in the installations, and it seems to be a good choice in the future as well.

Rudster816 wrote:Not to get on you or your employers case, but it kind of sounds like an ARM SoC and embedded Linux would be more than suitable.

I find it unlikely that we would switch to ARM, but slighly more likely that we would have switched to Linux within say 10 years time.

Rudster816 wrote:What's the point of implementing a hard real time system when all failure means is that the customer has to slide their card through the machine again? It's going to happen a lot any ways (card wear out, and so do the readers), no point in driving up the cost by introducing excessively complicated systems.

I know the problems of letting a standard processor handle both the card-reader hardware and a multithreaded application, so I won't repeat the mistake to assume those can be combined in any way. You still need some kind of hardware interface to get the physical signals from the card reader, so it's just as easy to use a microcontroller to decode them as well. Besides, I already have a functional microcontroller program that can decode the magnetic stribe from both directions, so there is not additional software development involved.

OSDev.org

Hardware support for the hard real time

Re: Hardware support for the hard real time

Re: Hardware support for the hard real time

Re: Hardware support for the hard real time

Re: Hardware support for the hard real time

Re: Hardware support for the hard real time

Re: Hardware support for the hard real time

Re: Hardware support for the hard real time