How to do syntax highlight
by Andrey B. Yastrebov

Have you seen how syntax highlight works? Looks wery easy, doesn't it?
However it is very complicated process that involves different steps.
Generally there may be three steps:
parsing
the text into separate pices - tokens or words
interpreting them and classifying onto
categories
highlighting by choosing appropriate colors/fonts for different items
This article discusses latter two steps.
Syntax parser, that is implemented
entirely inside LEdit,
is discussied elsewhere.
LEdit's syntax interpreter makes only a part of work an interpreter
has to do. Actually, LEdit tries to do all its best, but the information
it has is very poor, so the most of the job has to be done by
an application. LEdit only takes the input of
syntax parser and makes some
very basic tasks.
It interprets only special items and numbers. Its
output is redirected to the application. It supplies the application
with the information about every word.
Application receives special message EM_CTRLCOLOREX for
each item produced by syntax interpreter. Object oriented wrappers
translate this message to the event called OnControlHighlight.
When application gets the message or handles the event, it should
interpret the words further - determine keywords and so on.
After interpreting the words, application may respond by changing
text or background color of the word. Such a way, it can make keywords
green or red, while keeping other words black or brown. This
step is very easy to implement as long as the interpretation is made.
Looks like very easy solution, however there are some very important
things, that any programmer must keep in mind while doing syntax
highlight with LEdit.
Application shouldn't relay on the order the messages come to it.
For example if somewhere in the text is some line like the
following:
first-word second-word third-word
then these words may come in any order. third-word may
come first or it may come last as well as any other word may come
first or last or wherever in the middle. Application should know
how to highlight the word without any knowledge of where this
word was taken from. This is extremely important and is really
necessary to make window redrawing more effective.
Other extremely important thing is that application has to
decide on color highlight very fast. LEdit highlights thousands
of words per second, so if the application spends only several
milliseconds to determine how to highlight some word, the things
become extremely slow. Fortunately, most tasks may be
fast enough not to cause delays in redrawing LEdit's screen.
So, the procedure that determines the color for syntax highlight
has to be extremely effective. Unfortunately, nowadays many
programmers aren't much awared on what the effective programming is.
The best tip for them is to use some effective algorythm if they're
going to search trough the large list of keywords. It's extremely
slow to compare every single word from the list with the word
taken from syntax interpreter. If you have hundreds or thouthands
of words it will be extremely uneffective. Good solution may
be to make several lists and to place into each list only words
starting from the same letter. So, you have separate list for
words starting from letter A, separate list for words starting
with B and so on. When you get the word from syntax interpreter
it won't take much time to determine its first letter and to
choose corresponding list of keywords. It'll fasten the search
by 10-15 times. If the lists for some letters are still large,
they may be divided further according to the second letter and
so on. That way you have a chance to get relatively fast syntax
highlight.
Sometimes, people think that syntax highlight should include
changing fonts (or just making it bold, italic, underlined) for
some keywords, as it is implemented in some IDE. LEdit won't
do it because changing font automatically changes text metrics,
so the drawing and navigation becomes overcomplicated. Those, who
want such a feature, should use RTF controls (or even think about
embedding MS Word). LEdit won't move in that direction.
If you're interested in decorating LEdit even further, you
may want to know about colored bookmarks
or about drawing behind the text.
Enjoy!
|