Is there a Regex way to find paragraph widows?

General comments and questions. Technical support.
Post Reply
Alan
Posts: 260
Joined: Wed Jul 24, 2002 11:57 am
Location: TN USA

Is there a Regex way to find paragraph widows?

Post by Alan »

I have been updating the formatting of RTF books in preparation to go on a week vacation and loading the ePubs generated on my Kobo Clara 2E reader. Most of the time spent is on reducing the number of paragraph widows (by a visual scan of the whole story/book) as I prefer this aesthetically. Is there a way to write a Regex expression that will find paragraph widows (where the last line of the paragraph has only 1 word on it)?

Alan
Atlantis 4.4
Windows 10 Pro 64-bit
User avatar
admin
Site Admin
Posts: 2824
Joined: Wed Jun 05, 2002 10:48 pm
Contact:

Re: Is there a Regex way to find paragraph widows?

Post by admin »

You cannot search for automatic line breaks because they are not present within document text. They are automatically produced by the pagination routine.

You should not expect that your eBook would be paginated by your eReader in the same way it is paginated in Atlantis.
Alan
Posts: 260
Joined: Wed Jul 24, 2002 11:57 am
Location: TN USA

Re: Is there a Regex way to find paragraph widows?

Post by Alan »

Hi admin,
I do not concern myself with how the ePubs are paginated. My question only concerned the RTF document. Your answer tells me that the RTF paragraphs are seen as continuous strings of text with only the beginning and end known. Thanks.

Alan
Atlantis 4.4
Windows 10 Pro 64-bit
User avatar
admin
Site Admin
Posts: 2824
Joined: Wed Jun 05, 2002 10:48 pm
Contact:

Re: Is there a Regex way to find paragraph widows?

Post by admin »

You can then search for the last space character before each hard break, and replace it with a nonbreaking space:

Find what:

Code: Select all

(^32)([!^w^p^l^n^m^45^~]{0,}[^p^l^n^m])
Replace with:

Code: Select all

^s\2
And to deal with hyphenated “widows” (replace ordinary hyphens with nonbreaking ones):

Find what:

Code: Select all

(^45)([!^w^p^l^n^m]{0,}[^p^l^n^m])
Replace with:

Code: Select all

^~\2
Alan
Posts: 260
Joined: Wed Jul 24, 2002 11:57 am
Location: TN USA

Re: Is there a Regex way to find paragraph widows?

Post by Alan »

Thank you admin for the suggestions.

Alan
Atlantis 4.4
Windows 10 Pro 64-bit
Post Reply