Skip to Content

Gareth53.co.uk - the online home of Gareth Senior

How The sort() Method Of An Array Works

1:17p.m., Fri 2 Dec 2011

... or "What I What I Learned from the Exercise In Futility, Part 2". (This follows on from my earlier blog post - read part 1.)

If you're looking for a short answer: The JavaScript array sort() method, by default, works by converting everything to a string (using the toString() method) and sorts according to ASCII value. If you've ever compared strings using a '<' or a '>' - it's the same deal.

If you're looking for the long answer, complete with my workings-out, read on.... This all started when I was asked, in a job interview, how I might re-create arrays in JavaScript if they didn't exist. Here's what I learned about the sort method in doing so.

Some background first. The sort method is a mutator method. It sorts the values of an array and returns the array. By default it sorts the elements in a pseudo-alphabetical fashion, or you can pass a function as an argument of the sort method specifying how the sorting should be handled.

I only had a vague notion of how the default sorting behaviour worked. And my google-fu turned up no specification as to how sort() ordering *should* behave. I had no idea how it might handle types that weren't numbers or strings - how would it handle objects? undefineds? NaNs or functions?

Well, through trial and error I began observing the behaviour.

One of the oft-quoted oddities of the sort method is that it won't sort numbers in numerical order. For example, 10 will be sorted before 2. As a wider example....

>var myArray = [5, 7, 3, 12, 22, 31, 65]
[5, 7, 3, 12, 22, 31, 65]
>console.log(myArray.sort())
[12, 22, 3, 31, 5, 65, 7]

So, numbers are sorted alphabetically - the values compared character by character.

Howabout functions? Consider this:

>var bravo = function() {};
>var myArray = [function bravo(){}, function(abc){}, function alpha(){}, function(){}, function (xyz){}, bravo]
>console.log(myArray.sort())
[function () {}, function (){}, function (abc){}, function (xyz){}, function alpha(){}, function bravo(){}]

Hmm, anonymous functions are sorted first. then by their arguments, then named functions are sorted according to their name.

Howabout objects? Let's try some stuff:

>obj = { abc: "test", xyz: "testing" };
>obj2 = undefined;
>obj3 = true;
>obj4 = false;
>obj5 = { xyz:"test2", abc:"testing2" };
>myArray = ["aaa", obj, obj2, obj3, obj4, obj5, Infinity, null, "zzz", "AAA"];
>console.log(myArray.sort())
[
"AAA",
Infinity,
{ abc: "test", xyz: "testing" },
{ abc: "test2", xyz: "testing2" },
"aaa",
false,
null,
true,
"zzz",
undefined]

What's going on here? The answer is that the values are evaluated as strings and sorted alphabetically - in effect we're calling the toString() method on array entries before sorting them. So anonymous functions are sorted before named functions simple because an open bracket character is ordered before any letter characters. The 'undefined' comes last which is interesting - undefined.toString() returns a TypeError, yet null.toString() also returns a TypeError, yet null appears to be sorted alphabetically based on the string 'null'.

Here's some more evidence that array entries are sorted according to their toString() value. It's another oddity with numbers in arrays, observe this behaviour:

num_array = [
    0.000000001,
    0.0000000011
];
num_array.sort();

You might expect this array to be unaffected. However, calling toString() on a number this small converts the representation of the number to a shorthand.

> 0.00000001.toString()
'1e-8'
> 0.000000011.toString()
'1.1e-8'
> console.log(num_array);
[1.1e-8, 1e-8]
> typeof num_array[0]
'number'

Note - Here's the Internet Explorer gotcha (didn't you just know there'd be at least one ;) IE doesn't convert small (or large) numbers into a shorthand representation for comparison purposes.

 

Before I use the word "alphabetically" again, let's see how alphabetical the ordering is.

> var str = "abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
> arr = str.split('');
> console.log(arr.sort().join(''));
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

So, a logical order, but not alphabetical - uppercase letters are always ranked before lowercase letters. This might feel familiar if you've ever tried comparing strings using 'greater-than' and or 'less-than'.

What about punctuation?

> var str = "!@#$%^&*()_+{};<>,.?/\\|±~§¡™£abcxyz0189ABCXYZ'`";
> arr = str.split('');
> console.log(arr.sort().join(''));
!##$%&'()*+,./0189;<>?@ABCXYZ\^_`abcxyz{|}~¡£§±™

So, some punctuation ranks before letters and numbers, some after, some characters appear top be ranked after uppercase letters but before lowercase. What's going on here?

The answer is that characters are being ordered according to the ASCII standard which maps characters to a numerical value commonly expressed as a hexidecimal. If we pass a string through javascript's escape() function (of the encodeURI function) we get an ASCII hexadecimal value:

> console.log(escape('!'))
%21
> console.log(escape('&'))
%26

There's a full list of values returned by the escape (and encodeURI) functions at the foot of this post.

When writing the default function for my recreation of an array object I didi initially consider using escape() to retrieve a hexadecimal value for comparison, but escape() was created with a mind to converting strings for use in URLs. To this end, it leaves certain characters unaffected, namely: the at symbol, asterisk, plus sign, full-stop, backslash and underscore (@ * + . / _). So, in the end, I have a much simpler method that handles all cases the same.

function default_sorter(a, b) {
    /*
    if a should be sorted before b return true
    if a and b are equal return false
    if b should be sorted before a, return false
    */
    // simplest case, a direct hit!
    if (a === b) return false;
    var a = to_string(a),
        b = to_string(b),
        a_lim = a.length,
        b_lim = b.length,
        limit = ((a_lim - b_lim) > 0) ? b_lim : a_lim;
    // if we get substrings, don't bother looping, just return the shortest string
    if (a.indexOf(b) === 0) return false;
    if (b.indexOf(a) === 0) return true;
    // we convert the strings, character by character to decimal numbers based on their ASCII number
    for (var charac=0; charac         if (a[charac] === b[charac]) continue;
        if (ascii_val(a[charac]) < ascii_val(b[charac])) return true;
        else return false;
    }
    // if we get this far, shortest string wins
    return (a_lim - b_lim > 0) ? false : true;

        function ascii_val(charac) {
            var ascii_order = "!\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~",
            ascii_score = ascii_order.indexOf(charac);
            if (ascii_score === -1) return Infinity;
            return ascii_score;
        }
    
    function to_string(input) {
        /*
        null and undefined objects have no toString method, so let's handle that
        */
        if (typeof input === 'undefined') input = '~undefined';
        try {
            return input.toString();
        }
        catch (e) {
            return 'null';
        }
    }
};

                                                                       
CharescapeencodeURI
! %21 !
" %22 %22
# %23 #
$ %24 $
% %25 %25
& %26 &
' %27 '
( %28 (
) %29 )
* * *
+ + +
, %2C ,
- - -
. . .
/ / /
0 0 0
1 1 1
2 2 2
3 3 3
: %3A :
; %3B ;
< %3C %3C
= %3D =
> %3E %3E
? %3F ?
@ @ @
A A A
B B B
C C C
\ %5C %5C
^ %5E %5E
_ _ _
` %60 %60
a a a
b b b
c c c
{ %7B %7B
| %7C %7C
} %7D %7D
¡ %A1 %C2%A1
¢ %A2 %C2%A2
£ %A3 %C2%A3
¥ %A5 %C2%A5
§ %A7 %C2%A7
§ %A7 %C2%A7
¨ %A8 %C2%A8
© %A9 %C2%A9
ª %AA %C2%AA
« %AB %C2%AB
¬ %AC %C2%AC
± %B1 %C2%B1
µ %B5 %C2%B5
%B6 %C2%B6
ß %DF %C3%9F
å %E5 %C3%A5
æ %E6 %C3%A6
ç %E7 %C3%A7
÷ %F7 %C3%B7
œ %u0153 %C5%93
ƒ %u0192 %C6%92
˙ %u02D9 %CB%99
˚ %u02DA %CB%9A
Ω %u03A9 %CE%A9
%u2013%u2260 %E2%80%93%E2%89%A0
%u2018 %E2%80%98
%u2019 %E2%80%99
%u201C %E2%80%9C
%u201D %E2%80%9D
%u201D %E2%80%9D
%u2020 %E2%80%A0
%u2022 %E2%80%A2
%u2026 %E2%80%A6
%u20AC %E2%82%AC
%u2202 %E2%88%82
%u2206 %E2%88%86
%u2211 %E2%88%91
%u221A %E2%88%9A
%u221E %E2%88%9E
%u2248 %E2%89%88
%u2264 %E2%89%A4
%u2265 %E2%89%A5

Related Links - my code mimcking the array object

Latest Posts

Blog Categories