[Pmwiki-users] Smart Quotes

John Rankin john.rankin at affinity.co.nz
Wed May 28 16:34:24 CDT 2003


An algorithm emerges, blinking. JR
--

In that case, perhaps the best thing to do is to take care of the &lsquo's
first (which are generally paired and do not stand alone), then anything=20
that remains that is converted to an &rsquo, unless it's part of an HTML
tag attribute (finding this last part might be a bit tricky).  For this
last part, maybe something like:

   $InlineReplacements("/'([^>]*(<|\$))/"] =3D "'&rsquo;\$1'";

which says to replace any single quote that is not followed by a '>'
(the closing part of an HTML tag) up to a '<' or the end of the string--
i.e., quotes that aren't part of an HTML tag.  There might need to be
some global modifiers added to this pattern to deal with multiple
quotes outside of tags.

Just a thought.  Interesting problem.

Pm
--

Step 1
Pass over any pattern of the form "/<.*?>/"

(I'm still working on this, having first concentrated on...)


Step 2
Smarten patterns of the form "/.?['\"]+/"

Quotes have 2 properties: handedness (left or right) and type (single or =
double) -- in fact the algorithm works just as well if, for example, =
handedness is up or down.

The problem is to decide the handedness and type of a quote mark and =
return &[l|r][s|d]quo;.

If the . character is null, a space or possibly an =3D, handedness starts =
left otherwise it's right.

If the . character is a backtick, set it to null (this takes care of '90s =
written as `'90s).

smartstring =3D character

for each quote mark:

  If the i'th quote mark is the same type as the (i-1)'th quote mark,=20
      flip the handedness=20
     (2 successive singles come out as left right,
      a double followed by a single comes out as left left)

  smartstring .=3D &handednesstypequo;

return smartstring

There are boundary issues to take care of, such as paragraphs that start =
with multiple quote marks, and a stripslashes, but step 2 works correctly =
for all the test cases I could think of.

Your approach has the benefit of starting with quote pairs, so you set =
handedness at a global level, then tidy up the left overs, which I can see =
has a number of advantages. One possible defect with the algorithm I wrote =
is that spacequotespace or spacequoteendofline comes out as an orphaned =
right quote. Maybe it should leave it dumb?

There are issues with simply not smartening =3D'...' because =3D '...' is =
valid HTML (I think, although pmwiki always writes =3D'...') and what if I =
write =3D'This is Not a Pipe' in the body of a page.

Back to my day job while I mull step 1. Interestingly, Safari doesn't seem =
to mind HTML entities for quote marks on attributes, but I can't imagine =
all browsers are so forgiving.


JR







More information about the pmwiki-users mailing list