Why little endian!

nexos · Post by **nexos** » Sat Dec 19, 2020 7:31 am

Hello,
One thing that has always confused me in low level programming is endianess. So, today I decided to research it. So with little endian, from a resource from IBM I read, in little endian, everything is backwards! So, if I have a number 1234, little endian machines store it as 4321, correct? Big endian, however, stores it as 1234, also correct? Then I continued reading, and it states that data is always backwards at the byte level! Why is this?
Thanks,
nexos

bloodline · Post by **bloodline** » Sat Dec 19, 2020 8:41 am

nexos wrote:Hello,
One thing that has always confused me in low level programming is endianess. So, today I decided to research it. So with little endian, from a resource from IBM I read, in little endian, everything is backwards! So, if I have a number 1234, little endian machines store it as 4321, correct? Big endian, however, stores it as 1234, also correct? Then I continued reading, and it states that data is always backwards at the byte level! Why is this?
Thanks,
nexos

It just the bytes which are stored in reverse order of what you would expect when reading and writing binary numbers.

So uint32_t 0xDEADBEEF would be stored as 0xEFBEADDE in memory.

While I inherently prefer Big Endian, Little Endian is quite useful as you can access the same memory location with different type sizes, and still have a valid value.

nullplan · Post by **nullplan** » Sat Dec 19, 2020 8:57 am

nexos wrote:Then I continued reading, and it states that data is always backwards at the byte level! Why is this?

Without context, this is going to be difficult to answer. IBM is a bit weird about bit counting as well: All PowerPC documentation has bit 0 as the most significant bit, so converting the manuals for PPC64 to PPC32 is not as simple as with the Intel manuals.

Originally, Little Endian and Big Endian were terms from Gulliver's travels. Gulliver happened upon an island on which a fierce war was about to start about whether to open your boiled egg from the little end or the big end. Not sure what Swift wanted to tell us with that, maybe that some of our wars actually were that petty? Also, I have it on authority of Baron Münchausen that Swift was a liar, and the only thing in the South Sea is tribe of people dancing Minuet in mid-air.

Anyway, yes, nowadays Little Endian and Big Endian refer to byte order. How do you save data in memory? Fun fact: If memory was only accessible in units of machine words, byte order would not matter. (Proof: Right now, memory is only accessible to the byte level, and you never hear arguments about bit ordering, do you?) But oh well, that is not the world we live in. So if you have a data item that spans multiple bytes, do you store the bytes in ascending or descending order of significance? Then the PDP faction will pipe up and tell you to take a third option. You see, the PDP-11 was a word addressed little-endian machine. So within the word, the significance of the bytes would decrease, but for a double word, the second word would have a higher significance. So the byte order would actually be 2143.

But anyway, apart from historical curiosities, all machines save stuff in little endian or big endian byte order. Both have counter-intuitive consequences. Yes, if you read a memory dump on a little endian machine, and you know a couple of bytes are actually forming a number, you have to invert their order to understand the number. That might seem daft, but it has the advantage that if you save a low value into that memory slot, you can read the value without error without knowing the exact type. This came in handy just a few minutes ago, when I was implementing a new multiboot loader. See, the multiboot 1 spec doesn't tell me very clearly how big the "type" in the memory map is supposed to be. That is actually a big problem with a lot of their listings. But the "type" is a value definitely below 256, so I can just declare it as a byte on x86 and get the correct value out. Won't help me on PowerPC, but then I will need an entirely different loader on PowerPC, so who cares?

Little endian also has the related advantage that if you increase the size of a field, if the new bytes added in are zero, then the value stays the same. That cannot be said for Big Endian systems. Big Endian has the advantage of making memory dumps more easily understandable, but that is not a very valuable trait of a computing system. It is also the network byte order, so badly written code will get a speedup from running on a Big Endian machine. Well-written code will perform the same on either.

foliagecanine · Post by **foliagecanine** » Sat Dec 19, 2020 10:30 am

Basically, little-endian specializes in typecasting sizes.

Take 0xDEADBEEF for example.
Now say you only wanted the first two bytes, 0xBEEF.
With little-endian you wouldn't have to change the address you were looking at at all, you would just have to change how much you look at. (hex, where V is the pointer location):

Code: Select all

V
EFBEADDE
V
EFBE (ignore anything past what we want)

----------------------------------------------------------------
Now consider big-endian. You have your stored value and you want to take the word value of it. You would have to increment the pointer by two bytes (or bit-shift/mask) to get to the value

Code: Select all

V
DEADBEEF
    V
DEADBEEF

Now of course, big-endian has the typecasting ease in the opposite way.
If you want to read the last two bytes, 0xDEAD, you would just have to change how much you are reading wheras in little-endian you'd have to do bit-shifting and masking and stuff (or increment the pointer by two)
However, from what I can tell, it is more common to need the lower bytes than the upper bytes.
Also, there's probably something in the electronics that makes it easier to build for. Idk about faster, but ¯\_(ツ)_/¯

Korona · Post by **Korona** » Sat Dec 19, 2020 2:02 pm

Exactly, foliagecanine is 100% correct. Think about it like this: little-endian is arguably the more natural representation for machines because digits with low significance are stored at low memory addresses.

zaval · Post by **zaval** » Sat Dec 19, 2020 5:09 pm

I like LE and also think, it's the natural way. we spell numbers in BE and it's awkward, europeans borrowed this system from arabs, but made it wrong - arabs write them in LE, - their writing system starts text from right to left, so if the number is say 123, first goes the least significant decimal digit - 3. europeans just copied this to the L-R spelling system, forgetting about reverting numbers, so, when we write 123, we have first the most significant digit. it's become habitual and because of this seems "normal" to us, but in fact, LE is the one, natural. imagine if we wrote words, starting not from the letter for the first sound, but for the last. BE does exactly the same with numbers.

klange · Post by **klange** » Sat Dec 19, 2020 6:41 pm

I think looking at big numbers makes the benefits of little-endian systems much less apparent. Let's look at small numbers instead, say... 1.

If we represent 1 as a 32-bit value in hexadecimal, we get 0x00000001.
As a 16-bit value, it's 0x0001.
And as an 8-bit value it's 0x01.

If we store these in big endian, they all look different:
0x00 0x00 0x00 0x01
0x00 0x01
0x01

And if we store them in little-endian, notice the pattern:
0x01 0x00 0x00 0x00
0x01 0x00
0x01

Consider what happens if we want to cast a 32-bit value to a 16-bit value - we can use the same memory address to get both of those in little-endian, but in big-endian we need to shift our pointer.

moonchild · Post by **moonchild** » Mon Dec 21, 2020 4:05 am

bloodline wrote:It just the bytes which are stored in reverse order of what you would expect when reading and writing binary numbers.

That's not entirely true.

The bits are probably also in little endian (though there is no guarantee of this).

But this is a CPU implementation detail. Since the byte is the smallest addressable unit, so there's no way for us programmers to tell the difference.

moonchild · Post by **moonchild** » Mon Dec 21, 2020 4:08 am

@klange I do agree the little endian representation is superior, but I don't agree with your argument. Patterns can be offset by arbitrary numbers of bits, and a repeating sequence can be recognized regardless of endianness.

Moreover, imagine we're comparing two same-sized integers; say, two pointers. It's probably interesting if those pointers have the same high bits, but it's completely irrelevant if just their low bits are the same. So in that case recognizing the high bits is more important.

eekee · Post by **eekee** » Mon Dec 21, 2020 4:28 am

I know a very smart and passionate person who really hates big-endian! I don't know why, but I imagine nullplan's arguments probably cover it.

There's another argument for little-endian, but it doesn't apply to many OSs: In arbitrary-precision calculations, adding or subtracting numbers needs to start with the low digits for the carry to propagate correctly. (Yes, even though you borrow in the other direction when doing subtraction longhand. 2s complement is awesome like that.) I don't know about multiplication and division.

In human language, big-endian makes rounding more natural. That's valuable, I think.

klange · Post by **klange** » Mon Dec 21, 2020 4:52 am

moonchild wrote:@klange I do agree the little endian representation is superior, but I don't agree with your argument. Patterns can be offset by arbitrary numbers of bits, and a repeating sequence can be recognized regardless of endianness.

Moreover, imagine we're comparing two same-sized integers; say, two pointers. It's probably interesting if those pointers have the same high bits, but it's completely irrelevant if just their low bits are the same. So in that case recognizing the high bits is more important.

It's not about recognition, it's about casting between sizes. With a little-endian storage format, a small value stored in a large integer "starts" at the same location for smaller types.

And individual bytes do not have any concept of endianness; there is no meaningful 'ordering' of bits. They all exist together, equally. How that exists in hardware not only can not be determined by software, it has no meaning and can even vary at different parts of the pipeline. In some forms of memory, individual bits of a single byte may even live on different chips.

Solar · Post by **Solar** » Mon Dec 21, 2020 6:53 am

eekee wrote:In arbitrary-precision calculations [...] I don't know about multiplication and division.

As I happen to be engaged in just that @ PDCLib at the moment: Naive multiplication goes from least to most significant. A more advanced algorithm does a kind of divide-and-conquer with half the digits; if you do that recursively LSB / MSB stops to matter at all.

Division "guesses" a result going by the most significant digits, and then adjusts that first guess when calculating the remainder reveals the guess being off by one.

Whether you store the digits LSB or MSB first doesn't really matter performance-wise, but I have to (reluctantly, as I was raised on MSB) admit that having data[0] being the least significant digit (i.e., LSB) made more sense to me.

nexos · Post by **nexos** » Mon Dec 21, 2020 7:04 am

moonchild wrote:
bloodline wrote:It just the bytes which are stored in reverse order of what you would expect when reading and writing binary numbers.
That's not entirely true.

The bits are probably also in little endian (though there is no guarantee of this).

But this is a CPU implementation detail. Since the byte is the smallest addressable unit, so there's no way for us programmers to tell the difference.

Actually, when doing bitwise operations, bits being ordered in little endian can be quite confusing. (i.e, when masking off the top four bits of a number, you must do "num & 0x0fff")

Korona · Post by **Korona** » Mon Dec 21, 2020 7:13 am

For arithmetic operations on words, endianness does not make a difference: even on a big endian system, you'd do "num & 0xFFF" to mask out the top bits.

astralorchid · Post by **astralorchid** » Mon Dec 21, 2020 3:23 pm

little endian has made writing my assembler's assembled immediate words/bytes into memory very easy. i am glad it is used.

OSDev.org

Why little endian!

Why little endian!

Re: Why little endian!

Re: Why little endian!

Re: Why little endian!

Re: Why little endian!

Re: Why little endian!

Re: Why little endian!

Re: Why little endian!

Re: Why little endian!

Re: Why little endian!

Re: Why little endian!

Re: Why little endian!

Re: Why little endian!

Re: Why little endian!

Re: Why little endian!