 |
Linux Format forums Help, discussion, magazine feedback and more
|
| View previous topic :: View next topic |
| Author |
Message |
gch15
Joined: Thu Jun 09, 2005 5:00 pm Posts: 39 Location: Norfolk, UK
|
Posted: Wed Jun 29, 2011 10:38 pm Post subject: Simple Perl and C comparison |
|
|
Hi,
I program quite a lot in Perl and find to be fast enough for whatever I want to do. A few days back I thought of comparing Perl with C. Since I often read in lines of text from files I thought I will compare the speed of doing this in Perl and C.
First I generate a text file to read using the BASH code below.
| Code: |
if [[ -e stuff ]]
then rm stuff;
fi
for x in {1..5000}
do
echo "This is line $x" >> stuff;
done
|
If I need a longer test file I just change the 5000 to some bigger number.
Below are a Perl script and a C program. Both do the same thing, which is, read lines from the file (stuff, created above) and keep adding them to a string variable. When all lines have been read, the length of this string variable is printed. That is all.
| Code: |
$ gcc -o for_cmp for_cmp.c
$ time ./for_cmp
88893
real 0m1.126s
user 0m1.122s
sys 0m0.003s
$ time perl for_cmp.pl
88893
real 0m0.014s
user 0m0.006s
sys 0m0.007s
|
As you can see above, my C program is significantly slower than the Perl script. My C is very amateurish so I believe there must faster ways of doing this in C. I would greatly appreciate an example C code which is faster than (or as fast as) the Perl script in this simple task.
Thanks.
Here is the Perl code
| Code: |
# begin perl script for_cmp.pl
open(IN, "<stuff");
my $growing;
while(<IN>) {
$growing .= $_;
}
close(IN);
print(length($growing), "\n");
# end perl script
|
And here is the C code
| Code: |
/* begin C code for_cmp.c */
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main (int argc, char *argv[]) {
FILE *infile;
const size_t mem_chunk = sizeof(char) * 1000 * 500;
size_t allocd;
char *growing = (char *) malloc(mem_chunk);
char *moving = growing;
allocd = mem_chunk;
size_t initsize = 10000;
char *line = (char *) malloc(sizeof(char) * initsize);
char *str = NULL;
infile = fopen("stuff", "r");
while((fgets(line, initsize, infile)) != NULL) {
if(strlen(growing) + strlen(line) + 100 > allocd) {
growing = (char *) realloc(growing, allocd + mem_chunk);
allocd += mem_chunk;
moving = growing + strlen(growing);
}
moving = mempcpy(moving, line, strlen(line));
}
printf("%zu\n", strlen(growing));
fclose(infile);
exit(0);
}
/* end C code */
|
|
|
| Back to top |
|
 |
spaceyhase LXF regular
Joined: Mon Jun 30, 2008 1:07 pm Posts: 116
|
Posted: Wed Jul 06, 2011 10:42 pm Post subject: |
|
|
All the memory allocation and copying is killing the C performance. fgets isn't helping as it is a line-orientated read. What you should do is figure out the file size (using fseek and ftell, for instance) and allocate once. We know the length of the file now so the rest is artificial but... Fill the buffer (again, just a single read will do) and count its length (no need to pull in string.h then either). The best way would be to just keep track of how many bytes have been read as you go - there's no need to count 'em afterwards. Or, as we know the file size and expect to read the file size, if we do read 'the file size' that should suffice in confirming the length of the 'string'.
And then free the memory.
You can probably do the similar in perl to make it even faster too.
Sorry it's all a bit vague. It shows the obvious differences between the two languages and that it isn't just a like-for-like comparison (who knows what perl's interpreter has done?; is 'while<in>' functionally the same as 'fgets'?; etc), even though the question itself is a fairly interesting one. |
|
| Back to top |
|
 |
johnhudson LXF regular
Joined: Wed Aug 03, 2005 2:37 pm Posts: 767
|
|
| Back to top |
|
 |
Bazza LXF regular

Joined: Sat Mar 21, 2009 11:16 am Posts: 1381 Location: Loughborough
|
|
| Back to top |
|
 |
gch15
Joined: Thu Jun 09, 2005 5:00 pm Posts: 39 Location: Norfolk, UK
|
Posted: Fri Jul 22, 2011 1:14 pm Post subject: |
|
|
Thanks for the response. I had guessed some of the issues you mention but not all of them so I have learned. "who knows what Perl interpreter has done?", however it is good to know that whatever it is doing it is pretty efficient!
| spaceyhase wrote: | All the memory allocation and copying is killing the C performance. fgets isn't helping as it is a line-orientated read. What you should do is figure out the file size (using fseek and ftell, for instance) and allocate once. We know the length of the file now so the rest is artificial but... Fill the buffer (again, just a single read will do) and count its length (no need to pull in string.h then either). The best way would be to just keep track of how many bytes have been read as you go - there's no need to count 'em afterwards. Or, as we know the file size and expect to read the file size, if we do read 'the file size' that should suffice in confirming the length of the 'string'.
And then free the memory.
You can probably do the similar in perl to make it even faster too.
Sorry it's all a bit vague. It shows the obvious differences between the two languages and that it isn't just a like-for-like comparison (who knows what perl's interpreter has done?; is 'while<in>' functionally the same as 'fgets'?; etc), even though the question itself is a fairly interesting one. |
|
|
| Back to top |
|
 |
| View previous topic :: View next topic |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|
|