WORD: Alliteration VBA script
Thread poster: DZiW (X)
DZiW (X)
DZiW (X)
Ukraine
English to Russian
+ ...
Apr 24, 2018

Dear colleagues,

I got a task to check text for two-letter alliterations (marking them light-green) and here is a hurry-skurry yet working example:

Code:
Sub Alliterations()
N = ThisDocument.Words.Count: If N < 10 Then Exit Sub' if very s... See more
Dear colleagues,

I got a task to check text for two-letter alliterations (marking them light-green) and here is a hurry-skurry yet working example:

Code:
Sub Alliterations()
N = ThisDocument.Words.Count: If N < 10 Then Exit Sub' if very short, then over
On Error Resume Next 'safety
Application.ScreenUpdating = False 'speeding up a little
i = 1

With ThisDocument
Do 'MAIN LOOP

Do 'seek the FIRST 2-letter pair at the beginning of the word
was = UCase(Left(.Words(i), 2))
i = i + 1
Loop Until (i < N) And (Left(was, 1) Like "[A-ZА-Я]")

it = UCase(Left(.Words(i), 2)) 'get the NEXT 2-letters from the NEXT word

While (it = was) And (i < N) 'and mark all matches
.Words(i).Characters(1).Font.ColorIndex = wdBrightGreen
.Words(i).Characters(2).Font.ColorIndex = wdBrightGreen
'.UndoClear
i = i + 1
DoEvents 'check for Ctrl+ScrollLock
was = UCase(Left(.Words(i), 2))
Wend

'If Application.NumLock = True Then Exit Sub 'a terminator flag
If i Mod 100 = 0 Then Application.StatusBar = i & "/" & N - i: DoEvents
Loop Until i > = N
End With 'ThisDocument

Beep
MsgBox "Done"
End Sub



The hardware is no issue--a Corei7 desktop with 8GB RAM running MS Word 2013 under W8.1 x64, with a peak load under 30% and no intense HDD activity.

The document is a 25K-word DOCX (OOXML).

I ran the script, which did some 5000 words promptly, then started to slow down on and off without apparent reasons, steadily taking about 20% CPU time. As far as neither old DOC, nor RTF type changed a thing, nor Affinity/Priority, I thought it was something to do with Undo (see remarked UndoClear), but seems a red-herring too. I tried other plain texts with the same result.

Of course, the script is far from perfection and non-optimized, but I'm really missing something important... Any ideas?
Even better, if you know a software to mark alliterations at the beginning)

Thank you
Collapse


 
John Fossey
John Fossey  Identity Verified
Canada
Local time: 09:40
Member (2008)
French to English
+ ...
DoEvents? Apr 25, 2018

Possibly the way DoEvents is being used?

See for example: http://www.fmsinc.com/microsoftaccess/modules/examples/avoiddoevents.asp


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Regex? Apr 25, 2018

Not a software but a regex? ((aa)[a-z]*\s(aa)[a-z]*)|((bb)[a-z]*\s(bb)[a-z]*) etc.

Or: \b(\w)\1\w*\s(\w)\1\w*

https://regex101.com/r/fDuRWj/1

[Edited at 2018-04-25 04:56 GMT]


 
Rolf Keller
Rolf Keller
Germany
Local time: 14:40
English to German
String space is being filled up? Apr 25, 2018

DZiW wrote:

I'm really missing something important... Any ideas?

First of all you should delete the "On Error Resume Next" because this statement might hide some important error messages.

Maybe you use the string heap too extensively, so that VBA's garbage collector gets much work.
https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
https://stackoverflow.com/questions/19038350/when-should-an-excel-vba-variable-be-killed-or-set-to-nothing
Anyway, your loops put tens of thousands strings into the size-limited string heap. As soon as the heap gets full, you'll experience a slowdown.

VBA doesn't provide programmatic access to the garbage collector but there are some crutches:
https://blogs.msdn.microsoft.com/ericlippert/2004/04/28/when-are-you-required-to-set-objects-to-nothing/

In case VBA isn't able to do the job, you could use VSTO. Or you could rename .docx to .zip, unzip it, look for the .xml that contains the document's content and then process the .xml file programmatically. Both methods will require some time to get them working, though. And VSTO is not free, AFAIK.


 
DZiW (X)
DZiW (X)
Ukraine
English to Russian
+ ...
TOPIC STARTER
Hello Apr 25, 2018

John, I know it works faster without DoEvents, but retains the same pattern--doing first ~5K words at ease and hobbling down, let alone it doesn't break/exit the subroutine using Ctrl+ScrLk, alas.

Hans, it's interesting, yet seems MS Word uses some other RegEx, right?


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

WORD: Alliteration VBA script






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search