Saturday, December 12, 2009

[BLUG] Strange sort behavior fixed with LANG

I was sorting an /etc/passwd and /etc/shadow file on the same machine
for the purpose of using the join command for an ldap import. I was
puzzled when I ran into little inconsistencies in my sorted output, like
this:

excerpt of sorted /etc/passwd:

joe:x:821:821::/home/joe:/bin/bash
joebob:x:1192:1192::/home/joebob:/bin/bash

excerpt of sorted /etc/shadow:

joebob:$1$uyt4hg46$Gf1EAPxzZ/Tm7X8BEgyBe0:12687:0:99999:7:::
joe:$1$uyt4hg46$Gf1EAPxzZ/Tm7X8BEgyBe0:12687:0:99999:7:::


What? Why would it put joe before joebob in one file and joebob before
joe in the other. All I had run to generate the output was this:

sort /etc/passwd > passwd-sorted
sort /etc/shadow > shadow-sorted

It turns out that when it comes to things like colons or
non-alphanumeric characters, sort will do strange things. So what you
have to do to get the right output is set the LANG=C variable in your
command line like this:

LANG=C sort filename

I can't believe I haven't run into this before. I hate to think how
many lists I may have constructed that had little errors in it now.

(And no that isn't a real password I used above for this e-mail.)

--
Mark Krenz
Bloomington Linux Users Group
http://www.bloomingtonlinux.org/
_______________________________________________
BLUG mailing list
BLUG@linuxfan.com
http://mailman.cs.indiana.edu/mailman/listinfo/blug