Stupid assembly question

BMW · Post by **BMW** » Thu Nov 21, 2013 8:52 pm

Is [ss:esp] the same as ss:[esp] in NASM syntax?

Brendan · Post by **Brendan** » Fri Nov 22, 2013 12:30 am

Hi,

BMW wrote:Is [ss:esp] the same as ss:[esp] in NASM syntax?

Yes. I think you can also do (e.g.) "ss mov eax,[esp]" if you want (but I wouldn't recommend this as it's less obvious what the "ss" is for).

Of course an SS segment override prefix is unnecessary when ESP or EBP is involved because SS is the default in those cases (in the same way that you could use "[esi]" instead of "[ds:esi]"). To avoid needing to remember the rules that determine which segment is the default segment you could just put segment override prefixes everywhere, or set SS=DS=ES and forget them.

Cheers,

Brendan

iansjack · Post by **iansjack** » Fri Nov 22, 2013 1:05 am

I agree with your self-assessment that this is a stupid question.

Stupid not because you should be expected to know the answer but because you could have tested it so easily. It would have taken no time to write the code, assemble it, and check the opcodes produced.

An unwillingness to do a little experimentation and test things out doesn't bode well for OS development. It's (IMO) similar to those who ask "what is wrong with this code" without first doing a little elementary debugging for themselves.

Antti · Post by **Antti** » Fri Nov 22, 2013 1:22 am

If we had to make code aligned in a certain boundary, would it be better to use nops or unnecessary segment prefixes? This is a micro-optimization issue. For example, for loop starts the alignment could be "adjusted" to be always e.g. 16-byte. These unnecessary segment prefixes could be one tool to give some margin.

Whether this alignment has any impact on efficiency is another thing. In short: is a nop or an unnecessary segment prefix more ignored by the CPU. Intuitively, it feels that nops are better.

Kevin · Post by **Kevin** » Fri Nov 22, 2013 4:16 am

My intuition says otherwise, I believe prefixes should involve less overhead, especially in comparison to multi-byte nops, which are real full instructions that just happen to have no effect. But it's still just a guess.

In a quick look at the Intel Optimization Reference Manual I found their suggestions on what instructions to use for nop, and that prefixes tend to make things slower, but I couldn't find a comparison between the methods for padding. Perhaps they just didn't consider prefixes for padding?

BMW · Post by **BMW** » Fri Nov 22, 2013 4:26 am

Hi,

iansjack wrote:Stupid not because you should be expected to know the answer but because you could have tested it so easily. It would have taken no time to write the code, assemble it, and check the opcodes produced.

The only reason I labelled it stupid was to try to avoid responses like yours; it obviously hasn't worked.

I agree - it would take no time to write the code, assemble it, and check the opcodes produced, if I were running my development setup. However, I am not in my development setup (I am in Windows and do not have NASM installed). I'm sure asking a simple question on here and getting a decent answer from someone like Brendan would be quicker than finding my linux hard drive, rebooting and testing.

Also I was lucky enough for Brendan to provide an excellent answer which offers extra information, which would not have been gleaned from experimenting. Any other people who may not have known about the minor details of the NASM syntax in question and have read this thread will now also know, which is an added benefit.

Brendan · Post by **Brendan** » Fri Nov 22, 2013 4:59 am

Hi,

Kevin wrote:My intuition says otherwise, I believe prefixes should involve less overhead, especially in comparison to multi-byte nops, which are real full instructions that just happen to have no effect. But it's still just a guess.

In a quick look at the Intel Optimization Reference Manual I found their suggestions on what instructions to use for nop, and that prefixes tend to make things slower, but I couldn't find a comparison between the methods for padding. Perhaps they just didn't consider prefixes for padding?

Intel's latest optimisation guide suggests an operand size override prefix for a 2-byte NOP (but not for larger NOPs). AMD's optimisation manual recommends redundant operand size override prefixes or larger NOPs.

I'd be tempted to suspect that the fastest way (where possible) may be to increase the size of previous instructions (e.g. by using 8-bit or 32-bit "zero displacements") rather than inserting additional instructions of any kind; especially if the previous instructions aren't within a loop.

Cheers,

Brendan

Love4Boobies · Post by **Love4Boobies** » Fri Nov 22, 2013 5:05 am

While the first rule of assembly code is that it's inherently unportable across architectures, the first rule to assembly micro-optimization is that it's not portable across microarchitectures. Thus, such questions become meaningless in the general case. As far as coding goes, I would generally recommend not tuning for any particular microarchitecture, except perhaps as an aside to a generic version which the build system should link by default.

That being said, here's a summary of the situation so far:

An instruction's encoding, except sometimes for prefixes (see below), has no effect on its execution time. Also true for prefix encodings.
NOP's do take cycles to execute so they should be used when all else fails.
The multibyte versions of NOP are preferred to multiple individual NOP's, on CPU's which support them.
As long as the maximum instruction length of 15 bytes is not exceeded, there is no limit on the number of prefixes.
Although not future-proof, meaningless segment override prefixes are currently ignored. Redundant ones are future-proof, though.
Except for VEX, any prefix may be reused a given instruction.
Address size prefixes slow down the decoding process.
AMD CPU's impose a performance penalty on instructions with more than 3 prefixes.
Executing the instruction both with dummy prefixes, used for alignment, and without, by branching, may confuse the CPU's optmizer.

Be aware that there are also rules for when certain prefixes can be used and how prefixes can be combined, but those are off-topic so I won't go into them here. And, as far as alignment goes, data must also be dealt with in various ways; this is almost always a much more significant micro-optimization to make.

BMW wrote:quicker than finding my linux hard drive, rebooting and testing.

But quicker than downloading around 1 MiB, which is the size of NASM's Windows port, which can be ran out of the box?

iansjack · Post by **iansjack** » Fri Nov 22, 2013 5:19 am

I'm sure asking a simple question on here and getting a decent answer from someone like Brendan would be quicker than finding my linux hard drive, rebooting and testing.

Or even taking the trouble to install the Windows version of NASM. I agree; it's much easier to just let someone else do the work for you. But is it a good grounding for OS development? - I don't think so.

In exactly the same way, as I said, it's easier to ask someone else to debug your code for you than to do the work yourself.

Combuster · Post by **Combuster** » Fri Nov 22, 2013 5:38 am

NOP's do take cycles to execute so they should be used when all else fails.

As far as I recall, the one-byte NOP only gets to a special case in the decoder which in turn decodes it to zero µops, which means it might not even take a cycle.

Antti · Post by **Antti** » Fri Nov 22, 2013 6:28 am

Code: Select all

entry1:
	; Code
	; Fall through
entry2:
	; Code
	; Fall through
entry3:
	; Code
	ret

If those entry points are nicely aligned, I think that optimization is portable. It is a good question what is the best way to achieve the alignment and that is a micro-optimization issue.

Brendan wrote:I'd be tempted to suspect that the fastest way (where possible) may be to increase the size of previous instructions

Perhaps the preferable one. At least more elegant.

bwat · Post by **bwat** » Fri Nov 22, 2013 7:02 am

BMW wrote:I'm sure asking a simple question on here and getting a decent answer from someone like Brendan would be quicker than finding my linux hard drive, rebooting and testing.

The difference between the two is one gives belief whereas the other gives knowledge.

Love4Boobies · Post by **Love4Boobies** » Fri Nov 22, 2013 8:25 am

Combuster wrote:
NOP's do take cycles to execute so they should be used when all else fails.
As far as I recall, the one-byte NOP only gets to a special case in the decoder which in turn decodes it to zero µops, which means it might not even take a cycle.

There is indeed special support for single-byte NOP's (but not multibyte ones), but you're misremembering the details: It's only implemented on Intel CPU's and what it does is to remove the dependency on EAX on the pipeline, since NOP is really "xchg eax, eax." They still have 1 µop and, depending on the CPU, the latency is 0--1 cycles, and the throughput is always more than 0 cycles but never more than 1.

Minoto · Post by **Minoto** » Fri Nov 22, 2013 9:33 am

BMW wrote:Also I was lucky enough for Brendan to provide an excellent answer which offers extra information, which would not have been gleaned from experimenting. Any other people who may not have known about the minor details of the NASM syntax in question and have read this thread will now also know, which is an added benefit.

Except for the fact that Brendan's answer is incorrect. One of the forms you listed works; the other results in a syntax error.

Brendan · Post by **Brendan** » Fri Nov 22, 2013 10:27 am

Hi,

Minoto wrote:
BMW wrote:Also I was lucky enough for Brendan to provide an excellent answer which offers extra information, which would not have been gleaned from experimenting. Any other people who may not have known about the minor details of the NASM syntax in question and have read this thread will now also know, which is an added benefit.
Except for the fact that Brendan's answer is incorrect. One of the forms you listed works; the other results in a syntax error.

I tested - you're right

Cheers,

Brendan

OSDev.org

Stupid assembly question

Stupid assembly question

Re: Stupid assembly question

Re: Stupid assembly question

Re: Stupid assembly question

Re: Stupid assembly question

Re: Stupid assembly question

Re: Stupid assembly question

Re: Stupid assembly question

Re: Stupid assembly question

Re: Stupid assembly question

Re: Stupid assembly question

Re: Stupid assembly question

Re: Stupid assembly question

Re: Stupid assembly question

Re: Stupid assembly question