Saturday, September 17, 2005

 

Musing about RegExp

I think I understand the general principles of regular expressions. I also can see how powerful they can be. But it shouldn't be so hard to work out what the various calls do and how to use them. Rather than address this abstractly, let's look at the specific need.

With the text I'm processing there are many html tags calling for italics, superscript, etc. Because of the source for this data, it is OK to assume that the tags are well-formed, and I don't have to worry about nested tags either (although that assumption might need testing). My text is inside an InDesign story, so we have to get it out to an array of JavaScript strings upon which the RegExp searches will be performed.

But, the changes made as a result of said searches must be applied to the story in InDesign because the result will be to apply styling. So, I'm looking for a string like <i>some text</i> except that I have to allow for the case of the tags to be mixed. Seems to me that the simplest approach is to apply styling to the whole shebang and the after the fact use InDesign's Find/Change get rid of the styled tags.

So, the first thing we have to do is construct a RegExp object to seek out the strings that match our strings of interest and then apply it. As a test, I wrote this to operate on the last paragraph of the story only:
app.findPreferences = null;
app.changePreferences = null;
myDoc = app.activeDocument;
myStory = app.activeDocument.stories[0]
myText = myStory.paragraphs[-1].contents;
myStyle = myDoc.characterStyles.item("Italic");
myRE = new RegExp("<i>.*?</i>", "i");
myFind = myText.search(myRE);
while (myFind > 1) {
 myMatch = myText.match(myRE);
 myStory.paragraphs[-1].characters.itemByRange(myFind,myFind + myMatch[0].length - 1).appliedCharacterStyle = myStyle;
 myStory.paragraphs[-1].characters[myFind].contents = "*";
 myText = myStory.paragraphs[-1].contents;
 myFind = myText.search(myRE);
}
myStory.paragraphs[-1].search("*i>",false,false,"");
myStory.paragraphs[-1].search("</i>",false,false,"");
And a loud voice is screaming that there's got to be a better way! But at least it works.

Comments: Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?