Scan the active edge table. If there is any fill that cannot be handled by the engine itself, return true. Otherwise handle the fills and return false.
Check the active edge table for any entries that cannot be handled by the engine itself.
If there are any, return true. Otherwise, step the the edge to the next y value.
Insert the edge with the given index from the global edge table into the active edge table.
The edge has already been stepped to the initial yValue -- thus remainingLines and rasterX
are both set.
Check the global edge table for any entries that cannot be handled by the engine itself.
If there are any, return true. Otherwise, initialize the the edge and add it to the AET
Fill the span buffer between leftEdge and rightEdge using the given bits.
Note: We always start from zero - this avoids using huge bitmap buffers if the bitmap is to be displayed at the very far right hand side and also gives us a chance of using certain bitmaps (e.g., those with depth 32) directly.
This is the inner loop for solid color fills with anti-aliasing.
This loop has been unrolled for speed and quality into three parts:
a) copy all pixels that fall into the first full pixel.
b) copy aaLevel pixels between the first and the last full pixel
c) copy all pixels that fall in the last full pixel
Fill the span buffer from leftX to rightX with the given fill.
Clip before performing any operations. Return true if the fill must
be handled by some Smalltalk code.
Fill the span buffer from leftX to rightX with the given fill.
Clip before performing any operations. Return true if the fill must
be handled by some Smalltalk code.
Load a transformation from transformOop into the float array
defined by destPtr. The transformation is assumed to be either
an array or a FloatArray of length n.
Estimate the length of the vector described by deltaX and deltaY.
This method may be extremely inaccurate - use it only
if you know exactly that this doesn't matter. Otherwise
use #accurateLengthOf:width:
The following is a brief outline on how the engine works.
In general, we're using a pretty straight-forward active edge approach, e.g.,
we classify all edges into three different states:
a) Waiting for processing
b) Active (e.g., being processed)
c) Finished
Before the engine starts all edges are sorted by their y-value in a so-called
'global edge table' (furthermore referred to as GET) and processed in top
to bottom order (the edges are also sorted by x-value but this is only for
simplifying the insertion when adding edges).
Then, we start at the first visible scan line and execute the following steps:
1) Move all edges starting at the current scan line from state a) to state b)
This step requires the GET to be sorted so that we only need to check
the first edges of the GET. After the initial state of the edge (e.g., it's current
pixel value and data required for incremental updates) the edges are then
inserted in the 'active edge table' (called AET). The sort order in the AET is
defined by the pixel position of each edge at the current scan line and thus
edges are kept in increasing x-order.
This step does occur for every edge only once and is therefore not the most
time-critical part of the approach.
2) Draw the current scan line
This step includes two sub-parts. In the first part, the scan line is assembled.
This involves walking through the AET and drawing the pixels between
each two neighbour edges. Since each edge can have two associated fills
(a 'left' and a 'right' fill) we need to make sure that edges falling on the
same pixel position do not affect the painted image. This issue is discussed
in the aetScanningProblems documentation.
Wide edges (e.g., edges having an associated width) are also handled during
this step. Wide edges are always preferred over interior fills - this ensures
that the outline of an object cannot be overdrawn by any interior fill of
a shape that ends very close to the edge (for more information see wideEdges
documentation).
After the scan is assembled it is blitted to the screen. This only happens all
'aaLevel' scan lines (for further information see the antiAliasing documentation).
This second step is done at each scan line in the image, and is usually the most
time-critical part.
3) Update all currently active edges
Updating the active edges basically means either to remove the edge from the AET
(if it is at the end y value) or incrementally computing the pixel value for the
next scan line. Based on the information gathered in the first step, this part
should be executed as fast as possible - it happens for each edge in the AET
at each scan line and may be the bottleneck if many edges are involved in
the drawing operations (see the TODO list; part of it probably deals with the
issue).
The engine currently used a very simple, but efficient anti-aliasing scheme. It is based on a square unweighted filter of size 1, 2, or 4 resulting in three levels of anti-aliasing:
* No anti-aliasing (filter size 1)
This simply draws each pixel 'as is' on the screen
* Slight anti-aliasing (filter size 2)
Doubles the rasterization size in each direction and assembles the pixel value as the medium of the four sub-pixels falling into the full pixel
* Full anti-aliasing (filter size 4)
Quadruples the rasterization in each direction and assembles the pixel value as the medium of the sixteen sub-pixels falling into the full pixel
The reason for using these three AA levels is simply efficiency of computing. Since the above filters (1x1, 2x2, 4x4) have all power of two elements (1, 4, and 16) we can compute the weighted sum of the final pixel by computing
destColor _ destColor + (srcColor // subPixels)
And, since we're only working on 32bit destination buffer we do not need to compute the components of each color separately but can neatly put the entire color into a single formula:
with aaMask = 16rFFFFFFFF for aaLevel = 1, aaMask = 16rFCFCFCFC for aaLevel = 2, aaMask = 16rF0F0F0F0 for aaLevel = 4 and aaShift = 0, 2, or 4 for the different levels. However, while the above is efficient to compute, it also drops accuracy. So, for the 4x4 anti-aliasing we're effectively only using the high 4 bits of each color component. While is generally not a problem (we add 16 sub-pixels into this value) there is a simple arithmetic difficulty because the above cannot fill the entire range of values, e.g.,
16 * (255 // 16) = 16 * 15 = 240
and not 255 as expected. We solve this problem by replicating the top n (n=0, 2, 4) bits of each component as the low bits in an adjustment step before blitting to scan line to the screen. This has the nice effect that a zero pixel value (e.g., transparent) will remain zero, a white pixel (as computed above) will result in a value of 255 for each component (defining opaque white) and each color inbetween linearly mapped between 0 and 255.
Due to having two fill entries (one left and one right) there can be problems while scanning the active edge table. In general, the AET should look like the following (ri - regions, ei - edges, fi - fills):
However, due to integer arithmetic used during computations the AET may look like the following:
X
\| |
| \ |
| \ |
r1 | r2 \ r3 | r4
| \ |
e1 e2 e3
In this case, the starting point of e1 and e2 have the same x value at the first scan line but e2 has been sorted before e1 (Note: This can happen in *many* cases - the above is just a very simple example). Given the above outlined fill relations we have a problem. So, for instance, using the left/right fills as defined by the edges would lead to the effect that in the first scan line region r3 is actually filled with the right fill of e1 while it should actually be filled with the right fill of e2. This leads to noticable artifacts in the image and increasing resolution does not help.
What we do here is defining an arbitrary sort order between fills (you can think of it as a depth value but the only thing that matters is that you can order the fills by this number and that the empty fill is always sorted at the end), and toggle the fills between an 'active' and an 'inactive' state at each edge. This is done as follows:
For each edge ei in the AET do:
* if fLeft(ei) isActive then removeActive(fLeft(ei)) else addActive(fLeft(ei))
* if fRight(ei) isActive then removeActive(fRight(ei)) else addActive(fRight(ei))
* draw the span from ei to ei+1 with currentActive
where addActive adds the fill to the list of currently active fills, removeActive() removes the fill from the active list and currentActive returns the fill AS DEFINED BY THE SORT ORDER from the list of active fills. Note that this does not change anything in the first example above because the list will only contain one entry (besides the empty fill). In the second case however, it will lead to the following sequence:
* toggle fLeft(e2) = f(r2) = 'x'
- makes fLeft(e2) active
- activeList = 'x'
* toggle fRight(e2) = f(r3) = 'o'
- makes fRight(e2) active
- activeList = 'xo'
* draw span from e2 to e1
Depending on the sort order between 'x' and 'o' the region will be drawn with either one of the fills. It is significant to note here that the occurence of such a problem is generally only *very* few pixels large (in the above example zero pixels) and will therefore not be visually noticable. In any case, there is a unique decision for the fill to use here and that is what we need if the problem did not happen accidentally (e.g., someone has manually changed one fill of an edge but not the fill of the opposite edge).
* toggle fLeft(e1) = f(r1) = '-'
- makes fLeft(r1) visible
- activeList = 'xo-'
[Note: empty fills are a special case.
They can be ignored since they sort last
and the activeList can return the empty
fill if it is itself empty].
* toggle fRight(e1) = f(r2) = 'x'
- makes fRight(e1) invisible
- activeList = 'o-'
* draw span from e2 to e3
Since the active list contains (besides the empty fill) only one fill value this will be used. Fortunately, this is the correct fill because it is the fill we had initially defined for the region r2.
An interesting side effect of the above is that there is no such notion as a 'left' or 'right' fill anymore. Another (not-so-nice) side effect is that the entire AET has to be scanned from the beginning even if only the last few edges actually affect the visible region.
PS. I need to find a way of clipping the edges for this. More on it later...
BalloonEnginePlugin>>stepToFirstBezierIn:at:
1) Check if reducing maxSteps from 2*deltaY to deltaY
brings a *significant* performance improvement.
In theory this should make for double step performance
but will cost in quality. Might be that the AA stuff will
compensate for this - but I'm not really sure.
BalloonEngineBase>>dispatchOn:in:
1) Check what dispatches cost most and must be inlined
by an #inlinedDispatchOn:in: Probably this will be
stepping and eventually wide line stuff but we'll see.
BalloonEngineBase
1) Check which variables should become inst vars, if any.
This will remove an indirection during memory access
and might allow a couple of optimizations by the C compiler.
Anti-Aliasing:
1) Check if we can use a weighted 3x3 filter function of the form
1 2 1
2 4 2
1 2 1
Which should be *extremely* nice for fonts (it's sharpening
edges). The good thing about the above is that it sums up to
16 (as in the 4x4 case) but I don't know how to keep a history
without needing two extra scan lines.
2) Check if we can - somehow - integrate more general filters.
3) Unroll the loops during AA so we can copy and mask aaLevel pixels
in each step between start and end. This should speed up filling
by a factor of 2-4 (in particular for difficult stuff like radial gradients).
Clipping
1) Find a way of clipping edges left of the clip rectangle
or at least ignoring most of them after the first scan line.
The AET scanning problems discuss the issue but it should be
possible to keep the color list between spans (if not empty)
and speed up drawing at the very right (such as in the
Winnie Pooh example where a lot of stuff is between the
left border and the clipping rect.
2) Check if we can determine empty states of the color list and
an edge that is longer than anything left of it. This should
work in theory but might be relatively expensive to compute.