Find the character position using javaScript: FAST, BIG pages, ALL browsers, NO preprocessing.

During my masters year at University, I set out to experiment with a modeless editing model. I decided to use the web at my content environment: the challenge became to making the web editable. I did, for all browsers/OSs (tested and worked for IE 6.0+, All webkit:Chrome/Safari, FF 1.5+, Konqueror, Opera 9+, on apple, windows and linux). One of the interesting problems to solve was finding the nearest character location when a user clicks on the web page. This post explains my solution.

Building Blocks
Microsoft’s Internet Explorer 4 (1997) introduced attributes for all element nodes in the DHTML model that describes the physical dimensions of an element in a web page. These attributes are offsetParent; offsetWidth; offsetHeight;
offsetTop and offsetLeft. Most, if not all, web browsers followed Internet Explorer’s footsteps and the attributes are well supported in contemporary browsers. The W3C drafts for the upcoming CSSOM (CSS Object Model) specification have included these properties and thus must be supported by all modern web browsers in the future.

Figure 1

Figure 1: DHTML/CSSOM offset properties

Figure 1 displays the top part of a web page that contains a <p> element inside a <div> element. The offsetWidth and offsetHeight attributes refer to the dimensions of an element in pixels (excluding margins). The offsetTop and offsetLeft attributes are the distances from the offsetParent in pixels. The offsetParent is the closest positioned containing element. For example, in Figure 1 the offsetParent of the <p> element is the containing <div> element. Note that the hierarchical parent in the DOM tree is not always the same as offsetParent due to CSS positioning schemes such as relative positioning.

The offset attributes can be used to locate the position of an element relative to the document. This is achieved by summing the offsetTop and offsetLeft attributes of all the offsetParent ancestry up to the document’s <body> element. In some cases, depending on the browser, and whether it is in quirksmode or not, the document’s CSS border must be manually added (if one exists). To determine the position of an element relative to the window, the document’s horizontal and vertical scroll-bar positions are subtracted from the calculated x and y positions of the element relative to the document respectively.

Another widely supported method is elementFromPoint(), which is currently included in the CCSOM specification drafts. This method returns an element in a document at the given (x, y) pixel coordinates. These coordinates are relative to the browser window for Gecko and Trident based browsers, where other browsers use coordinates relative to the document. This method is not as well supported as the offset properties (for example Firefox versions below 3 do not support this), so an alternative JavaScript implementation was written for legacy browser support.

Total Isolation Approach (Preprocessing)

One solution for determining a cursor position from a mouse click is to encapsulate every character within an editable section in a <span> element. Thus, when a user clicks in an editable section, the elementFromPoint() method can be used to directly discover the clicked character. Since the characters are isolated in dedicated <span> elements, the position and size of the characters can also be determined using methods described in the previous section. <span> elements are chosen because they are the only element that can reside in any element where there is text (according to HTML 4.01 and XHTML 1.0 specifications). As simple as this approach appears, it has too many pitfalls to be regarded as a practical solution:

  • The pre-processing step to isolate every character with <span> elements takes noticeably long periods of computation time. During this period, the browser becomes unresponsive (except for Opera which runs JavaScript on a separate thread to the window event thread). The pre-processing phase must occur either when a page loads or on the first cursor placement. The former approach would thwart bursty/rapid navigation through seamlessly editable web pages. The latter approach would lose the illusion of direct manipulation since the response time for the first edit would be too long.
  • The amount of memory used is bloated because of the large amount of DOM tree nodes required.
  • The large amount of <span> elements create a large DOM tree. In general, the larger the DOM tree, the slower the performance of the browser. Manipulating the DOM for performing editing operations becomes more expensive.
  • The elementFromPoint() method becomes slower because there is a larger amount of nodes to search.
  • The approach increases complexity in the rest of the framework’s implementation. All operations which manipulate the DOM must ensure that all text nodes always have exactly one character that is encapsulated in a dedicated <span> element.

Rather than total isolation of every character, you can break down this character location problem into a smaller problem effeciently using a more coarse grained preprocessing method: splitting just the lines using <div> tags. Codemirror does this for there code editor. This is a great idea if you don’t want to wrap lines. However if you want to retain text wrapping then you need to use another method. Let’s look more closely at the search space we are scanning to locate the nearest character.

Search space

When a user clicks on a web page, the (x, y) coordinates of the mouse pointer are supplied by the DOM in a mouse event object. The elementFromPoint() method is used to get the element which the mouse cursor is pointing at (if not already supplied by the mouse event object). A list of all the text nodes within the element is collected by traversing the element’s DOM tree in-order. Text nodes where cursor placements cannot occur are excluded. For example, text nodes within <script> or <style> elements, or text nodes consisting purely of white-space for HTML source-code formatting purposes.

In a common case, the element which a user clicks on is a type of container element that is not fully visible by the user. These container elements are only partially visible due to the web browser’s scroll-bar state only revealing part of the element, or the container element happens to be larger than the web browser’s window. An extreme, but typical scenario where this occurs is when a user clicks near the edges of a web page just outside of a <p> element. The element actually clicked is the document <body> element, and therefore includes every text node in the entire web page as part of the search space. To minimise the search space to yield faster performance when searching for the nearest cursor position, the search space is capped to include only nodes that are visible to the user.

Dual Binary Search

The binary search algorithm is an efficient algorithm for locating an item in a sorted collection (on average O(Log N)). It turns out that a two-pass binary search can be used against the search space to discover the closest cursor position.

In the general case when performing an in-order traversal on a DOM tree,the nodes are visited such that each node is either visually below or to the right of the previously visited node. The search space is collected using an in-order traversal, thus forming a spatially sorted collection. Search spaces are readily organised in ascending order by the y coordinate in the web page, therefore the characters in the search space are grouped by the line where they reside.

Figure2

Figure 2: The dual binary algorithm’s perspective of the search space.

Figure 2 conveys this grouping by surrounding characters on the same line with a box in the search space view of a paragraph of text. Furthermore, the x coordinates of the characters are in ascending order within each line-group. For example, the first character in the last group (the right-most group with text “words.”) is at an x coordinate of 6 pixels, and each successive character’s x coordinate increases by roughly 6 pixels all the way up to the last character on the line at an x coordinate of 38 pixels.

Two separate binary searches are used to quickly locate the nearest cursor position to the target coordinate. The first search performs a vertical position based binary search over the whole (y-ordered) search space to locate the line group which the target coordinate resides. The second search performs a horizontal position based binary search over the (x-ordered) line group to locate the nearest character to the target coordinate.

Unfortunately the JavaScript code for this implementation is on CD at home (in another country). However I present to you the pseudo code to whci leaves you the fun of implementing it:

find-cursor-at(target-coordinate) {
  search-space = get-search-space(target-coordinate);
  start = measure(first character in the search-space );
  end = measure(last character in the search-space );
  sample-set = [start,end];
  current-closest = closest(start, end, target-coordinate);
  if (target-coordinate within best.bounds ) {
    return current-closest;
  }
  # First pass
  y-search(start,end);
  (lower, upper) = select-bounds(sample-set);
  # Second pass
  x-search(lower, upper);
  return current-closest;
}

y-search(lower, upper) {
  if (lower.abs-index) == (upper.abs-index - 1) {
    # Upper and lower bounds have met, no more characters to search
    return;
  }
  # Half way point between upper and lower samples
  current.abs-index = (lower-sample.abs-index + upper.abs-index)/2;
  # Get node and relative index from absolute index
  (current.node,current.rel-index) = get-rel-dom-position(current.abs-index);
  # Measure the sample
  (current.x, current.y, current.width, current.height) = measure(current.node, current.rel-index);
  # Store it in the sample set if doing a line search
  sample-set.add(current);
  if (current is closer to target than current-closest ) {
    current-closest = current;
  }
  if (current is same line as the target y ) {
    # Goto second pass: the x binary search
    return;
  } else if (current.y > target-y) {
    upper = current;
  } else {
    lower = current;
  }
  y-search(lower, upper);
}

x-search(upper,lower) {
  if (lower-sample.abs-index) == (upper.abs-index - 1) {
    # Upper and lower bounds have met, no more characters to search
    return;
  }
  # Half way point between upper and lower samples
  current.abs-index = (lower-sample.abs-index + upper.abs-index)/2
  # Determine node and relative index from absolute index
  (current.node,current.rel-index) = get-rel-dom-position(current.abs-index);
  # Measure the sample
  (current.x, current.y, current.width, current.height) = measure(current.node, current.rel-index);
  if (current is closer to target than current-closest ) {
    current-closest = current;
  }
  if(current on line above current-closest ) {
    lower = current;
  } else if (current on line below current-closest ) {
    upper = current
  } else {
    # Current is on same line as current-closest
    if (current.x > target.x) {
      upper = current;
    } else {
      lower = current;
    }
  }
  x-search(lower, upper);
}

That’s a lot of code! Let’s go through it. The search begins by measuring the first and last characters in the search space, becoming the lower and upper
bounds of the search space respectively. If one of the initial measurements is displayed directly at the target coordinate, the search is finished. The first binary search is then entered.

Once the first binary search, the y-search, finds a measurement on the same line as the target coordinate, the search ends. The upper and lower bounds are selected from measurements stored in a sample-set (gathered during the y-search) for the second search, the x-search. The upper and lower bounds for the second search are determined as follows: if the current closest measurement is visually positioned to the left of the target x coordinate, then it becomes the lower bound and the upper bound is set to the closest sample in the sample-set with a larger absolute index to the lower bound. Conversely, if the  current closest measurement lies to the right of the target x coordinate, then it becomes the upper bound and the lower bound is set to the closest sample in the sample-set with a smaller absolute index to the upper bound.

Even though the upper and lower bounds selected for the second pass may contain characters on different lines, the search space is still sorted by distance. The x-search pseudo code shows how samples found to be on different lines to the target y coordinate during the x-search are considered to be more distant regardless of their x coordinates compared to samples that lie on the same line at the target y coordinate.

Some white-space characters are not rendered due to the HTML layout engine collapsing the white-space. When a character is found to be not rendered (where the offset attributes are zero or non-existent), the algorithm sequentially searches in the left direction to discover the nearest occurring rendered character. If the search reaches the lower bound, then the sequential search switches direction and locates the nearest rendered character to the right of the sample. If no other characters are found that are rendered within the upper and lower search space bounds then the search is complete.

The dual binary search algorithm requires the search space to be sorted. There are exceptions to the in-order traversal method for creating a spatially sorted collection of DOM nodes. For example, CSS positioning allows elements to be manually positioned in a web page regardless of where they reside in the DOM tree. CSS Floats are another example that can lead to unordered collections. However positioning styles are usually used for the presentation and structure of a web page rather than actual content. Furthermore the algorithm works within manually positioned elements/floats since an in-order traversal within these elements is spatially ordered.

Homing Dual Binary Search

I had experimented with a homing dual binary search, which is the same as the binary search, except instead of halving the search space for each new sample, the search space is narrowed down to an estimation of the closest absolute index. The estimation step uses the container elements bounds together with previously measured characters to choose a more informed search space split. But this doesnt preform very well, since estimations can’t take in account text wrapping and whitespace collapsing (life would be easy if the characters where uniformly spread throughout the container!). What happens is that the estimations often slightly miss the actual character each split, which sometimes can great reduce the search space, or hardly reduce it at all. So best stick with your vanilla binary search 50/50 split.

Conclusion

You dont have to pre-process the DOM to quickly find a character position in a web page document. If you want a flexible cross browser solution, try using a dual binary search as described by this post. This solution was devised during the days of HTML 4.01 and CSS2.0. You may need to handle more complicated cases with CSS 3.0, I’d imagine it would work just fine with HTML 5, given its basic use of standard DOM features.

Advertisements

Inferring the Locale text direction with Javascript

Here’s a hack to detect which is the locale text direction (e.g. left-to-right for English or right-to-left for Hebrew) using JavaScript:

/**
 * Detects locale text direction...
 * @return {String} "ltr" or "rtl"
 */
function detectTextDir() {
   var container = document.createElement("p");
    container.style.margin = "0 0 0 0";
    container.style.padding = "0 0 0 0";
    // container.style.textAlign = "start"; // If CSS 3+ 
    container.style.textAlign = ""; // Explicitly override text align that might be assigned via style sheets
    
    var span = document.createElement("span");
    span.innerHTML = "X";
    
    container.appendChild(span);
    document.body.appendChild(container);
    
    // LTR if text position is nearer left of container, RTL if text position is nearer right of container
    var localeDirection = span.offsetLeft < (container.offsetWidth - (span.offsetLeft + span.offsetWidth)) ?
        "ltr" : 
        "rtl";
       
    document.body.removeChild(container);

    return localeDirection;
}

A limitation is that it doesn’t work if there is a style sheet present which explicitly assigns a text-align value other than “start” for paragraphs directly in the document body. You could probably tweak the idea to use something more obscure like an address block to reduce chances of this happening.

A Simple Javascript Model-View-Controller API

Model-View-Controller is such a useful and powerful design pattern. You can use MVC for more than just physical views (i.e. GUI’s), but abstract views as well (e.g. pure data structures). You can even use MVC as an event mechanism.
Continue reading