[ Why does Perl's Inline::C sort 4.0e-5 after 4.4e-5? ]
I built a Perl Inline::C module, but there is some oddity with the sorting. Does anyone know why it would sort like this? Why is the 4.0e-5 is not first?
my $ref = [ 5.0e-5,4.2e-5,4.3e-5,4.4e-5,4.4e-5,4.2e-5,4.2e-5,4.0e-5];
use Inline C => <<'END_OF_C_CODE';
void test(SV* sv, ...) {
I32 i;
I32 arrayLen;
AV* data;
float retval;
SV** pvalue;
Inline_Stack_Vars;
data = SvUV(Inline_Stack_Item(0));
/* Determine the length of the array */
arrayLen = av_len(data);
// sort
sortsv(AvARRAY(data),arrayLen+1,Perl_sv_cmp_locale);
for (i = 0; i < arrayLen+1; i++) {
pvalue = av_fetch(data,i,0); /* fetch the scalar located at i .*/
retval = SvNV(*pvalue); /* dereference the scalar into a number. */
printf("%f \n",newSVnv(retval));
}
}
END_OF_C_CODE
test($ref);
0.000042
0.000042
0.000042
0.000043
0.000044
0.000044
0.000040
0.000050
Answer 1
Because you are sorting lexically, Try this code:
#!/usr/bin/perl
use strict;
use warnings;
my $ref = [ 5.0e-5,4.2e-5,4.3e-5,4.4e-5,4.4e-5,4.2e-5,4.2e-5,4.0e-5];
print "Perl with cmp\n";
for my $val (sort @$ref) {
printf "%f \n", $val;
}
print "Perl with <=>\n";
for my $val (sort { $a <=> $b } @$ref) {
printf "%f \n", $val;
}
print "C\n";
test($ref);
use Inline C => <<'END_OF_C_CODE';
void test(SV* sv, ...) {
I32 i;
I32 arrayLen;
AV* data;
float retval;
SV** pvalue;
Inline_Stack_Vars;
data = SvUV(Inline_Stack_Item(0));
/* Determine the length of the array */
arrayLen = av_len(data);
// sort
sortsv(AvARRAY(data),av_len(data)+1,Perl_sv_cmp_locale);
arrayLen = av_len(data);
for (i = 0; i < arrayLen+1; i++) {
pvalue = av_fetch(data,i,0); /* fetch the scalar located at i .*/
retval = SvNV(*pvalue); /* dereference the scalar into a number. */
printf("%f \n",newSVnv(retval));
}
}
END_OF_C_CODE
Of course, lexically 0.00040
is smaller than 0.00042
as well, but you aren't comparing 0.00040
to 0.00042
; you are comparing the number 0.00040
converted to a string with the number 0.00042
converted to a string. When a number gets too large or small, Perl's stringifying logic resorts to using scientific notation. So you are sorting the set of strings
"4.2e-05", "4.2e-05", "4.2e-05", "4.3e-05", "4.4e-05", "4.4e-05", "4e-05", "5e-05"
which are properly sorted. Perl happily turns those strings back into their numbers when you ask it to with the %f
format in printf
. You could stringify the numbers yourself, but since you have stated you want this to be faster, that would be a mistake. You should not to be trying to optimize the program before you know where it slow (premature optimization is the root of all evil*
). Write your code then run Devel::NYTProf against it to find where it is slow. If necessary, rewrite those portions in XS or Inline::C
(I prefer XS). You will find that you get more speed out of choosing the right data structure than micro-optimizations like this.
*
Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268.
Answer 2
Perl_sv_cmp_locale
is your sorting function which I suspect is lexical comparison. Look for numeric sorting one or write your own.
Answer 3
Have an answer with help from the people over at http://www.perlmonks.org/?node_id=761015
I ran some profiling (DProf) and it's a 4x improvement in speed
Total Elapsed Time = 0.543205 Seconds
User+System Time = 0.585454 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
100. 0.590 0.490 100000 0.0000 0.0000 test_inline_c_pkg::percent2
Total Elapsed Time = 2.151647 Seconds
User+System Time = 1.991647 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
104. 2.080 1.930 100000 0.0000 0.0000 main::percent2
Here is the code
use Inline C => <<'END_OF_C_CODE';
#define SvSIOK(sv) ((SvFLAGS(sv) & (SVf_IOK|SVf_IVisUV)) == SVf_IOK)
#define SvNSIV(sv) (SvNOK(sv) ? SvNVX(sv) : (SvSIOK(sv) ? SvIVX(sv) : sv_2nv(sv)))
static I32 S_sv_ncmp(pTHX_ SV *a, SV *b) {
const NV nv1 = SvNSIV(a);
const NV nv2 = SvNSIV(b);
return nv1 < nv2 ? -1 : nv1 > nv2 ? 1 : 0;
}
void test(SV* sv, ...) {
I32 i;
I32 arrayLen;
AV* data;
float retval;
SV** pvalue;
Inline_Stack_Vars;
data = SvUV(Inline_Stack_Item(0));
/* Determine the length of the array */
arrayLen = av_len(data);
/* sort descending (send numerical sort function S_sv_ncmp) */
sortsv(AvARRAY(data),arrayLen+1, S_sv_ncmp);
for (i = 0; i < arrayLen+1; i++) {
pvalue = av_fetch(data,i,0); /* fetch the scalar located at i .*/
retval = SvNV(*pvalue); /* dereference the scalar into a number. */
printf("%f \n",newSVnv(retval));
}
}
END_OF_C_CODE