Subject: Re: search.c clean-up



Julian Foad <julianfoad@xxxxxxxxxxxxxxx> wrote:
> Jim Meyering wrote:
>> Here's a little change I wrote months ago.
>> The only problem is that it relies on mempcpy which is not
>> always available, so this will have to wait until gnulib
>> support is brought up to date.
>
> Thank you for offering this patch. I hope you won't mind if I criticise it a
> bit.
>
> While I admire the spirit of your change, the use of non-standard function
> mempcpy is a bit of a problem in portable code,

Perhaps you didn't see the mention of mempcpy, above?
mempcpy is the right function to use, here. Just because it's not
part of every C library doesn't mean we shouldn't use it.
That's one of the big reasons for the existence of gnulib.
In case you'd rather not wait for the integration of gnulib's
framework into the grep package, I've included a copy of that
tiny function in the patch below.

> and it seems to call strlen()
> on constant strings even more times than it did before, which is only a small
> and constant cost but not very tasteful.

It is better practice to use strlen in such macros,
in case the caller passes non-literal strings.
I think of it as defensive programming. More about this, below.
The compiler we typically care about (gcc) can optimize away
calls to strlen with a literal argument.

> Also, defining a macro within the
> body of a function, and then using it outside that function is a bit odd
> (though there is nothing syntactically wrong with it).

It's easy to move it `up' and out of that function.
I've done so.

> Also, your code
> calculates and allocates an extra byte, but doesn't use it.

Good catch.
FYI, it allocates exactly the same amount of space as
the original. The difference is that with my patch
the pattern is no longer terminated with an unused NUL byte.
Though it might be nice (e.g. for debugging) to NUL-terminate
that string.
For now I've made it allocate one byte less.
Hmm... I've just noticed that my WRAP_PATTERN macro used the local
variable `size' when it should be using the corresponding macro
parameter Pat_size.

> The bigger benefit will be gained by factoring out almost the entire content
> of
> the two functions Ecompile and Gcompile. If you would like to do this partial
> factorisation first, then would you consider re-writing it without
> non-standard
> library functions, and minimising use of string functions[1], and the other
> points I mentioned.
>
> Thanks.
>
>
> [1] Instead of strlen, for the size of the macro arguments that are constant
> strings you could use "sizeof(Prefix) - 1", but that would silently go wrong
> if
> the macro were involed with a string pointer. So perhaps "const char *beg[] =
> Prefix; ...sizeof(beg)-1..." which would at least catch and disallow variable
> strings.

But, as you suggest, it would still not catch constant, non-literal string
arguments. For that reason alone, it is best not to use sizeof here.
Also, I think that using strlen makes the code more readable and hence
more maintainable.

> Note that some compiling environments might optimise away strlen
> calls on constant strings, but only if they know at compile time that they are
> going to be linked to a "strlen" that behaves like the standard version of
> that
> function. That's a dangerous assumption in some environments. The use of
> intermediate variables can much more easily be optimised away.

I see no problem with using strlen here.

2004-11-23 Jim Meyering <jim@xxxxxxxxxxxx>

* src/search.c (mempcpy): Define.
(WRAP_PATTERN): New macro to factor out duplication...
(Gcompile): ...here,
(Ecompile): ...and here.

Index: src/search.c
===================================================================
RCS file: /cvsroot/grep/grep/src/search.c,v
retrieving revision 1.29
diff -u -p -r1.29 search.c
--- src/search.c 22 Nov 2004 13:40:56 -0000 1.29
+++ src/search.c 22 Nov 2004 21:52:24 -0000
@@ -1,5 +1,5 @@
/* search.c - searching subroutines using dfa, kwset and regex for grep.
- Copyright 1992, 1998, 2000 Free Software Foundation, Inc.
+ Copyright 1992, 1998, 2000, 2004 Free Software Foundation, Inc.

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -232,20 +232,25 @@ Gcompile (char const *pattern, size_t si
In the whole-line case, we use the pattern:
^\(userpattern\)$. */

- static char const line_beg[] = "^\\(";
- static char const line_end[] = "\\)$";
- static char const word_beg[] = "\\(^\\|[^[:alnum:]_]\\)\\(";
- static char const word_end[] = "\\)\\([^[:alnum:]_]\\|$\\)";
- char *n = xmalloc (sizeof word_beg - 1 + size + sizeof word_end);
- size_t i;
- strcpy (n, match_lines ? line_beg : word_beg);
- i = strlen (n);
- memcpy (n + i, pattern, size);
- i += size;
- strcpy (n + i, match_lines ? line_end : word_end);
- i += strlen (n + i);
- pattern = n;
- size = i;
+#define WRAP_PATTERN(Pattern, Pat_size, Prefix, Suffix)
\
+ do
\
+ {
\
+ size_t new_size = strlen (Prefix) + (Pat_size) + strlen (Suffix) + 1;
\
+ char *new_pat = xmalloc (new_size);
\
+ mempcpy (mempcpy (mempcpy (new_pat, Prefix, strlen (Prefix)),
\
+ Pattern, Pat_size),
\
+ Suffix, strlen (Suffix));
\
+ size = new_size - 1;
\
+ Pattern = new_pat;
\
+ }
\
+ while (0)
+
+ if (match_lines)
+ WRAP_PATTERN (pattern, size, "^\\(", "\\)$");
+ else
+ WRAP_PATTERN (pattern, size,
+ "\\(^\\|[^[:alnum:]_]\\)\\(",
+ "\\)\\([^[:alnum:]_]\\|$\\)");
}

dfacomp (pattern, size, &dfa, 1);
@@ -315,20 +320,12 @@ Ecompile (char const *pattern, size_t si
In the whole-line case, we use the pattern:
^(userpattern)$. */

- static char const line_beg[] = "^(";
- static char const line_end[] = ")$";
- static char const word_beg[] = "(^|[^[:alnum:]_])(";
- static char const word_end[] = ")([^[:alnum:]_]|$)";
- char *n = xmalloc (sizeof word_beg - 1 + size + sizeof word_end);
- size_t i;
- strcpy (n, match_lines ? line_beg : word_beg);
- i = strlen(n);
- memcpy (n + i, pattern, size);
- i += size;
- strcpy (n + i, match_lines ? line_end : word_end);
- i += strlen (n + i);
- pattern = n;
- size = i;
+ if (match_lines)
+ WRAP_PATTERN (pattern, size, "^(", ")$");
+ else
+ WRAP_PATTERN (pattern, size,
+ "(^|[^[:alnum:]_])(",
+ ")([^[:alnum:]_]|$)");
}

dfacomp (pattern, size, &dfa, 1);







Privacy