Module: Lich::Util::TextStripper

Defined in:: documented/util/textstripper.rb

Overview

Module for stripping formatting from text This module provides methods to remove HTML, XML, and Markdown formatting from text.

Examples:

Stripping HTML from text

plain_text = Lich::Util::TextStripper.strip_html(html_text)

Defined Under Namespace

Modules: Mode

Constant Summary collapse

MODE_TO_INPUT_FORMAT =

{
  Mode::HTML     => 'html',
  Mode::MARKUP   => 'GFM',
  Mode::MARKDOWN => 'GFM'
}.freeze

Class Method Summary collapse

.entity_to_char(entity) ⇒ String

Converts an HTML entity to its corresponding character.
.extract_text(element) ⇒ String

Extracts plain text from a Kramdown element.
.extract_xml_text(element) ⇒ String

Extracts text content from an XML element.
.log_error(message, exception) ⇒ Object

Logs an error message along with the exception details.
.requires_kramdown?(mode) ⇒ Boolean

Checks if Kramdown is required for the given mode.
.smart_quote_to_char(quote_type) ⇒ String

Converts a smart quote type to its corresponding character.
.strip(text, mode) ⇒ String

Strips formatting from the given text based on the specified mode.
.strip_html(text) ⇒ String

Strips HTML tags from the given text.
.strip_markdown(text) ⇒ String

Strips Markdown formatting from the given text.
.strip_markup(text) ⇒ String

Strips Markdown/markup formatting from the given text.
.strip_with_kramdown(text, mode) ⇒ String

Strips formatting from text using Kramdown.
.strip_xml(text) ⇒ String

Strips XML tags from the given text.
.strip_xml_with_rexml(text) ⇒ String

Strips XML tags from the given text.
.validate_mode(mode) ⇒ Symbol

Validates the given mode and normalizes it to a symbol.

Class Method Details

.entity_to_char(entity) ⇒ `String`

Converts an HTML entity to its corresponding character

Examples:

Converting an entity

char = Lich::Util::TextStripper.entity_to_char(:nbsp) # => " "

Parameters:

entity (Symbol) —

The HTML entity to convert

Returns:

(String) —

The corresponding character

# File 'documented/util/textstripper.rb', line 268

def self.entity_to_char(entity)
  if entity.respond_to?(:char)
    entity.char
  else
    # Fallback for symbol entities
    case entity
    when :nbsp then ' '
    when :lt then '<'
    when :gt then '>'
    when :amp then '&'
    when :quot then '"'
    else entity.to_s
    end
  end
end

.extract_text(element) ⇒ `String`

Extracts plain text from a Kramdown element

Examples:

Extracting text from Kramdown

text = Lich::Util::TextStripper.extract_text(kramdown_element)

Parameters:

element (Kramdown::Element) —

The Kramdown element to extract text from

Returns:

(String) —

The extracted plain text

# File 'documented/util/textstripper.rb', line 232

def self.extract_text(element)
  return '' if element.nil?

  case element.type
  when :text
    element.value
  when :entity
    # Convert HTML entities (e.g., &nbsp; -> space)
    entity_to_char(element.value)
  when :smart_quote
    # Convert smart quotes to regular quotes
    smart_quote_to_char(element.value)
  when :codeblock, :codespan
    # Return code content as plain text
    element.value
  when :br
    # Convert line breaks to newlines
    "\n"
  when :blank
    # Blank lines become newlines
    "\n"
  else
    # For all other elements (p, div, span, etc.), recursively process children
    if element.children
      element.children.map { |child| extract_text(child) }.join
    else
      ''
    end
  end
end

.extract_xml_text(element) ⇒ `String`

Extracts text content from an XML element

Examples:

Extracting XML text

text = Lich::Util::TextStripper.extract_xml_text(xml_element)

Parameters:

element (REXML::Element) —

The XML element to extract text from

Returns:

(String) —

The extracted text

# File 'documented/util/textstripper.rb', line 203

def self.extract_xml_text(element)
  return '' if element.nil?

  text_parts = []

  # Iterate through all child nodes
  element.each do |node|
    case node
    when REXML::Text
      # Regular text node
      text_parts << node.value
    when REXML::CData
      # CDATA section - extract the content
      text_parts << node.value
    when REXML::Element
      # Nested element - recursively extract text
      text_parts << extract_xml_text(node)
    end
    # Ignore other node types (comments, processing instructions, etc.)
  end

  text_parts.join
end

.log_error(message, exception) ⇒ `Object`

Logs an error message along with the exception details

Examples:

Logging an error

Lich::Util::TextStripper.log_error("An error occurred", e)

Parameters:

message (String) —

The error message to log
exception (StandardError) —

The exception that occurred

# File 'documented/util/textstripper.rb', line 154

def self.log_error(message, exception)
  full_message = "TextStripper: #{message} (#{exception.class}: #{exception.message}). Returning original."
  respond(full_message)
  Lich.log(full_message)
end

.requires_kramdown?(mode) ⇒ `Boolean`

Checks if Kramdown is required for the given mode

Examples:

Checking Kramdown requirement

Lich::Util::TextStripper.requires_kramdown?(:markup) # => true

Parameters:

mode (Symbol) —

The mode to check

Returns:

(Boolean) —

True if Kramdown is required, false otherwise



79
80
81

# File 'documented/util/textstripper.rb', line 79

def self.requires_kramdown?(mode)
  MODE_TO_INPUT_FORMAT.key?(mode)
end

.smart_quote_to_char(quote_type) ⇒ `String`

Converts a smart quote type to its corresponding character

Examples:

Converting a smart quote

char = Lich::Util::TextStripper.smart_quote_to_char(:ldquo) # => "

Parameters:

quote_type (Symbol) —

The type of smart quote

Returns:

(String) —

The corresponding character

# File 'documented/util/textstripper.rb', line 289

def self.smart_quote_to_char(quote_type)
  case quote_type
  when :lsquo, :rsquo then "'"
  when :ldquo, :rdquo then '"'
  else quote_type.to_s
  end
end

.strip(text, mode) ⇒ `String`

Strips formatting from the given text based on the specified mode

Examples:

Stripping text

plain_text = Lich::Util::TextStripper.strip(html_text, Lich::Util::TextStripper::Mode::HTML)

Parameters:

text (String) —

The text to be stripped
mode (Symbol) —

The mode to use for stripping

Returns:

(String) —

The stripped text

Raises:

(ArgumentError) —

If the mode is invalid

# File 'documented/util/textstripper.rb', line 90

def self.strip(text, mode)
  return "" if text.nil? || text.empty?

  # Validate mode BEFORE entering the rescue block
  # This allows ArgumentError to propagate to the caller as documented
  validated_mode = validate_mode(mode)

  # Check if kramdown is required and available
  if requires_kramdown?(validated_mode) && !KRAMDOWN_LOADED
    respond("Need to restart Lich5 in order to use this method.")
    return text
  end

  # Route to appropriate parsing method based on mode
  case validated_mode
  when Mode::XML
    strip_xml_with_rexml(text)
  else
    strip_with_kramdown(text, validated_mode)
  end
rescue Kramdown::Error => e
  # Handle Kramdown parsing errors (HTML/MARKUP/MARKDOWN modes)
  log_error("Failed to parse #{validated_mode}", e)
  text
rescue REXML::ParseException => e
  # Handle REXML parsing errors (XML mode)
  log_error("Failed to parse #{validated_mode}", e)
  text
rescue StandardError => e
  # Catch any other unexpected errors during parsing
  log_error("Unexpected error during #{validated_mode} parsing", e)
  text
end

.strip_html(text) ⇒ `String`

Strips HTML tags from the given text

Examples:

Stripping HTML

plain_text = Lich::Util::TextStripper.strip_html(html_text)

Parameters:

text (String) —

The HTML text to be stripped

Returns:

(String) —

The stripped text

Raises:

(RuntimeError) —

If Kramdown is not loaded

# File 'documented/util/textstripper.rb', line 303

def self.strip_html(text)
  unless KRAMDOWN_LOADED
    respond("Need to restart Lich5 in order to use this method.")
    return text
  end

  strip_with_kramdown(text, Mode::HTML)
end

.strip_markdown(text) ⇒ `String`

Strips Markdown formatting from the given text

Examples:

Stripping Markdown

plain_text = Lich::Util::TextStripper.strip_markdown(markdown_text)

Parameters:

text (String) —

The Markdown text to be stripped

Returns:

(String) —

The stripped text

Raises:

(RuntimeError) —

If Kramdown is not loaded

# File 'documented/util/textstripper.rb', line 342

def self.strip_markdown(text)
  unless KRAMDOWN_LOADED
    respond("Need to restart Lich5 in order to use this method.")
    return text
  end

  strip_with_kramdown(text, Mode::MARKDOWN)
end

.strip_markup(text) ⇒ `String`

Strips Markdown/markup formatting from the given text

Examples:

Stripping markup

plain_text = Lich::Util::TextStripper.strip_markup(markup_text)

Parameters:

text (String) —

The text to be stripped

Returns:

(String) —

The stripped text

Raises:

(RuntimeError) —

If Kramdown is not loaded

# File 'documented/util/textstripper.rb', line 327

def self.strip_markup(text)
  unless KRAMDOWN_LOADED
    respond("Need to restart Lich5 in order to use this method.")
    return text
  end

  strip_with_kramdown(text, Mode::MARKUP)
end

.strip_with_kramdown(text, mode) ⇒ `String`

Strips formatting from text using Kramdown

Examples:

Stripping with Kramdown

plain_text = Lich::Util::TextStripper.strip_with_kramdown(markdown_text, Lich::Util::TextStripper::Mode::MARKDOWN)

Parameters:

text (String) —

The text to be stripped
mode (Symbol) —

The mode to use for stripping

Returns:

(String) —

The stripped text

Raises:

(RuntimeError) —

If Kramdown is not loaded

# File 'documented/util/textstripper.rb', line 167

def self.strip_with_kramdown(text, mode)
  unless KRAMDOWN_LOADED
    respond("Need to restart Lich5 in order to use this method.")
    return text
  end

  input_format = MODE_TO_INPUT_FORMAT[mode]
  doc = Kramdown::Document.new(text, input: input_format)

  # Extract plain text from the parsed document by traversing the element tree
  extract_text(doc.root).strip
end

.strip_xml(text) ⇒ `String`

Strips XML tags from the given text

Examples:

Stripping XML

plain_text = Lich::Util::TextStripper.strip_xml(xml_text)

Parameters:

text (String) —

The XML text to be stripped

Returns:

(String) —

The stripped text



317
318
319

# File 'documented/util/textstripper.rb', line 317

def self.strip_xml(text)
  strip_xml_with_rexml(text)
end

.strip_xml_with_rexml(text) ⇒ `String`

Strips XML tags from the given text

Examples:

Stripping XML

plain_text = Lich::Util::TextStripper.strip_xml_with_rexml(xml_text)

Parameters:

text (String) —

The XML text to be stripped

Returns:

(String) —

The stripped text

# File 'documented/util/textstripper.rb', line 185

def self.strip_xml_with_rexml(text)
  # Try to parse as-is first (in case it's already well-formed XML)
  begin
    doc = REXML::Document.new("<root>#{text}</root>")
  rescue REXML::ParseException
    # If parsing fails due to unescaped characters, wrap in CDATA
    doc = REXML::Document.new("<root><![CDATA[#{text}]]></root>")
  end

  # Extract all text content from the document
  extract_xml_text(doc.root).strip
end

.validate_mode(mode) ⇒ `Symbol`

Validates the given mode and normalizes it to a symbol

Examples:

Validating a mode

Lich::Util::TextStripper.validate_mode(:html) # => :html

Parameters:

mode (Symbol, String) —

The mode to validate

Returns:

(Symbol) —

The normalized mode

Raises:

(ArgumentError) —

If the mode is invalid

# File 'documented/util/textstripper.rb', line 130

def self.validate_mode(mode)
  # Ensure mode is a Symbol or String
  unless mode.is_a?(Symbol) || mode.is_a?(String)
    raise ArgumentError,
          "Mode must be a Symbol or String, got #{mode.class}"
  end

  # Normalize to symbol
  normalized_mode = mode.to_sym

  # Validate against allowed modes
  unless Mode.valid?(normalized_mode)
    raise ArgumentError,
          "Invalid mode: #{mode}. Use one of: #{Mode.list}"
  end

  normalized_mode
end

Module: Lich::Util::TextStripper

Overview

Examples:

Stripping HTML from text

Defined Under Namespace

Constant Summary collapse

Class Method Summary collapse

Class Method Details

.entity_to_char(entity) ⇒ String

Examples:

Converting an entity

.extract_text(element) ⇒ String

Examples:

Extracting text from Kramdown

.extract_xml_text(element) ⇒ String

Examples:

Extracting XML text

.log_error(message, exception) ⇒ Object

Examples:

Logging an error

.requires_kramdown?(mode) ⇒ Boolean

Examples:

Checking Kramdown requirement

.smart_quote_to_char(quote_type) ⇒ String

Examples:

Converting a smart quote

.strip(text, mode) ⇒ String

Examples:

Stripping text

.strip_html(text) ⇒ String

Examples:

Stripping HTML

.strip_markdown(text) ⇒ String

Examples:

Stripping Markdown

.strip_markup(text) ⇒ String

Examples:

Stripping markup

.strip_with_kramdown(text, mode) ⇒ String

Examples:

Stripping with Kramdown

.strip_xml(text) ⇒ String

Examples:

Stripping XML

.strip_xml_with_rexml(text) ⇒ String

Examples:

Stripping XML

.validate_mode(mode) ⇒ Symbol

Examples:

Validating a mode

.entity_to_char(entity) ⇒ `String`

.extract_text(element) ⇒ `String`

.extract_xml_text(element) ⇒ `String`

.log_error(message, exception) ⇒ `Object`

.requires_kramdown?(mode) ⇒ `Boolean`

.smart_quote_to_char(quote_type) ⇒ `String`

.strip(text, mode) ⇒ `String`

.strip_html(text) ⇒ `String`

.strip_markdown(text) ⇒ `String`

.strip_markup(text) ⇒ `String`

.strip_with_kramdown(text, mode) ⇒ `String`

.strip_xml(text) ⇒ `String`

.strip_xml_with_rexml(text) ⇒ `String`

.validate_mode(mode) ⇒ `Symbol`