Squeak Class Documentation category index | class index  
 
HtmlTokenizer
  category: Network-HTML Tokenizer
  superclass: Stream
  subclasses:

This class takes a text stream and produces a sequence of HTML tokens.

It requires its source stream to support #peek.

instance methods
  private
  nextChar
peekChar

  private-initialization
  initialize:

  private-tokenizing
  nextAttributeValue
nextComment
nextName
nextSpaces
nextTag
nextTagOrComment
nextText
skipSpaces

  stream protocol
  atEnd

  tokenizing
  next

class methods
  initialization
  initialize

  instance creation
  on:

instance methods
  private top  
 

nextChar


 

peekChar


  private-initialization top  
 

initialize:


  private-tokenizing top  
 

nextAttributeValue

return the next sequence of alphanumeric characters; used to read in the value part of a tag's attribute, ie <tagname attribname=attribvalue>


 

nextComment

we've seen < and the next is a !. read until the whole comment is done


 

nextName

return the next sequence of alphanumeric characters


 

nextSpaces

read in as many consecutive space characters as possible


 

nextTag

we've seen a < and peek-ed something other than a !. Parse and return a tag


 

nextTagOrComment

next character is a $<. So read either a tag or a token


 

nextText

returns the next textual segment


 

skipSpaces

skip as many consecutive space characters as possible


  stream protocol top  
 

atEnd

are there any more tokens? This is equivalent to whether there is any more input


  tokenizing top  
 

next

return the next HtmlToken, or nil if there are no more


class methods
  initialization top  
 

initialize

HtmlTokenizer initialize


  instance creation top  
 

on: