Skip to content

Fix xpath parsing xpath that has string literal containing parentheses and brackets#318

Open
tompng wants to merge 1 commit into
ruby:masterfrom
tompng:xpath_paren_bracket_in_string
Open

Fix xpath parsing xpath that has string literal containing parentheses and brackets#318
tompng wants to merge 1 commit into
ruby:masterfrom
tompng:xpath_paren_bracket_in_string

Conversation

@tompng
Copy link
Copy Markdown
Member

@tompng tompng commented May 24, 2026

String literal in XPath 1.0 doesn't have escapes.
begin end while cond is not a normal modifier while, but a do-while loop. (I didn't know it). REXML uses this syntax in several files.

Removes preprocessing that removes spaces with gsub which may wrongly modifies string literal.
Also fixes error parsing //a[1] [2].
All test except exact-error-message test and the one I added in this PR passed with:

def parse path
  path = path.gsub('(', '( ').gsub(/[\)\[\]]/, ' \0 ') # Insert space around paren/bracket

REF: https://www.w3.org/TR/xpath-10/#exprlex

For readability, whitespace may be used in expressions even though not explicitly allowed by the grammar: ExprWhitespace may be freely added within patterns before or after any ExprToken.

Copilot AI review requested due to automatic review settings May 24, 2026 17:38
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR improves XPath parsing robustness around whitespace handling, especially when spaces appear inside string literals, and adds regression tests to cover these cases.

Changes:

  • Add tests validating that extra whitespace in XPath expressions is treated as ignorable where appropriate.
  • Update XPathParser#parse to strip ignorable spaces without altering whitespace inside quoted string literals.
  • Update group parsing to ignore parentheses/brackets that appear inside quoted string literals.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
test/xpath/test_base.rb Adds test coverage for whitespace normalization and for bracket/paren characters inside XPath string literals.
lib/rexml/parsers/xpathparser.rb Refactors space-stripping into a helper that respects string literals; improves grouping logic to ignore delimiters inside quotes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/rexml/parsers/xpathparser.rb Outdated
Comment thread lib/rexml/parsers/xpathparser.rb
Comment thread lib/rexml/parsers/xpathparser.rb Outdated
…s and brackets

Also fixes xpath that has space around closing paren and opening/closing brackets that removing spaces with gsub wrongly removes spaces within string literal.
@tompng tompng force-pushed the xpath_paren_bracket_in_string branch from f342f18 to 1578eb2 Compare May 27, 2026 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants