Understanding Big and Little Endian Byte Order

kalid · December 11, 2007, 2:58am

Question from Doran: "How can 2 chars allocate to an unsigned short? It just doesn’t make sense to me.
I’ve heard about NUXI few times before, and still I can’t get it. Please can you explain it for me. (even in C)

My reply:

On a 32-bit computer, a short is composed of 16 bits (2 bytes). In order to set the value, you an specify the short using two bytes, which is 4 hex characters (in C):

short a = 0x1234;
short b = 0x5678;

So short a has 0x1234 (4,660 decimal) and b is similar. Now, instead of using the characters “0-A”, let’s just use U N I and X to represent each byte. For example, U could be 0x12, N could be 0x34, I could be 0x56, and X could be 0x78.

short a = 0xUN;
short b = 0xIX;

On any machine, these shorts would be stored consecutively in memory. Address 0 and 1 would be “a”, and address 2 and 3 would be “b”. [Again, each short takes up 2 bytes].

On a big-endian machine, the data would look like this:

Addr 0: U
Addr 1: N
Addr 2: I
Addr 3: X

On a little-endian machine, we store the smallest part of the number first. That is, in a = 0xUN, we store “N” first, which are the low-order bits. So in memory it would look like this:

Addr 0: N
Addr 1: U
Addr 2: X
Addr 3: I

Hence the “NUXI” problem. On a big-endian machine the data looks like UNIX, on a little-endian machine the data looks like NUXI. This isn’t a problem if you stay on the same machine (each machine knows how to convert appropriately), but can be a problem if you are exchanging binary data between machines.

Hope this helps,

-Kalid

shaan · December 17, 2007, 7:17am

Well information is good!!!
I have one query if there is not much advantage of Big-Endian over Little Endian then why Network Byte order is Big-Endian???

kalid · December 17, 2007, 7:19am

While there are advantages to each, I don’t think one is clearly better than the other. I think they just had choose one or the other – Big Endian may have been a more popular format at the time :).

steve · December 20, 2007, 9:53pm

Thank you so much. I’ve been lucky thus far doing high end coding, but having rolled my sleeves up to start mucking about with bit and bytes this has been very helpful indeed.

kudos!

kalid · December 20, 2007, 11:22pm

Thanks Steve, glad you found it useful! It’s fun to dip into bits & bytes every once in a while :).

Anonymous_User · December 28, 2007, 7:20am

That was damn good explanation. Thanks a lot for the post

kalid · December 28, 2007, 8:12pm

Hi Ramkumar, you’re welcome. Glad you liked it.

Anonymous_User · December 30, 2007, 1:12am

this is a great read

kalid · December 30, 2007, 8:52am

Thanks Socal!

avvy · January 8, 2008, 10:21am

How can I change byte order from Big Endian to Little Endian and vice versa without breaking structure. When we are sending any structure.

kalid · January 8, 2008, 9:38pm

Hi avvy, you can use the “host to network” and “network to host” functions to convert data (more info here: http://linux.die.net/man/3/htons). You’d have to convert each field in the structure separately.

shweta · January 17, 2008, 2:41pm

Hi Kalid,

I cannot get a better picture of this topic wherever i search. I was searching about endianness in a hurry as was looking for some stuff which can atleast give the details in short and i feel lucky to find your post. Your post came like an angel as i had some urgency to find about endianness faster.
Very briefly you told whole story about it. Details perfect…Flow of explanation perfect! Kudos!
I’ll appreciate if you drop a small mail whenver you post stuff like this.

Thanks a tonn!

kalid · January 17, 2008, 5:23pm

Hi Shweta, glad you liked the article! If you’d like to receive emails when new posts appear, just enter your email address in the “subscribe” form on the upper-right of the page. Thanks for the comment.

shweta · January 18, 2008, 11:06am

Hey thanks for info…done it
Hoping to see something informative soon.

Anonymous_User · March 15, 2008, 2:54am

You have a ‘locaiton’ in there, if you care to fix it.
I second everyone else and say that this is awesome. I didn’t even know this issue existed and now I understand it well (I think).

I think you should mention explicitly that if you store ‘UNIX’ in little-endian it will end up as NUXI in big-endian. Not strictly necessary, but I think it would Explain it Better. (Even Better.) Or perhaps lainExp it terBet. (enEv terBet.)

Whenever I see the word endian I think first of Ender Wiggin. The enemy is down and so forth.

kalid · March 15, 2008, 3:10am

Thanks Alrenous, glad you’re enjoying it. Appreciate the tip – went through and cleaned up a bunch of typos (it’s a bit embarrassing how many were in there).

Good suggestion on the explanation, I’m always looking for ways to make things clearer (that’s why this isn’t best explained ).

I hadn’t made the Ender Wiggin association, but I love Ender’s game… maybe there’s a way to fit him and Bean into this article somewhere.

breeson · April 7, 2008, 5:48am

Gr8 post. really helpful.

kalid · April 29, 2008, 5:24am

Thanks Breeson, glad it was useful.

kumar · April 30, 2008, 7:08pm

Thanks alot Kalid ur explanation about endianness is awesome. I hve one question that need to be answered Is there is a way In a mixed binary file with 4 byte Integers and single byte characters to identify whether the byte we read from the file is a character or is a part of 4 byte integer data.

It would be very helpful if you can answer me

kalid · April 30, 2008, 7:20pm

Hi Kumar, glad you enjoyed it. Offhand, I don’t think there’s a way, looking at the raw data, to tell whether it’s supposed to be a character or integer.

I think you’d need the file format spec to figure out the structure of the data – for example, the TCP header defines the byte ranges for each field.