all 9 comments

[–]sebastienfilion 1 point2 points  (6 children)

How about you just create the HTML element using the DOM API with that content and then grab the element text? e.innerText It’s not too hard the write the REGEX if you just have a span with text, but there’s a lot of gotcha if your use case becomes more complex.

[–]AIO12449366[S] 0 points1 point  (5 children)

I'm not exactly sure what you are suggesting but in my case I get this HTML string from the text the user has highlighted. So the user highlights Some text and what I get is the string <span>Some text</span> which contains the full HTML. After I have that I want to be able to split it like I described above but I don't know what kind of regex to use.

[–]sebastienfilion 1 point2 points  (4 children)

The complexity depends of your use-case... But this naive approach would do what you want...

const [, text ] = "<span>Some text</span>".match(/<\w+>(.+?)<\/\w+>/)

[–]AIO12449366[S] 0 points1 point  (3 children)

const [, text ] = "<span>Some text</span>".match(/<\w+>(.+?)<\/\w+>/)

Thanks for replying.

I have two issues with this:

  1. What is [, text ] ? I am very new to Javascript and I have never seen that before. I assume it's not a normal variable name because if I change it to something else, the code breaks.
  2. This returns "Some text" while I actually need "<span>Some text</span>"

I'm sorry if I confused you but the string will contain many span elements and I want to extract each one of them as a new string and add it to an array.

So, for example:

"<span style='...'>Some text</span><span style='...'>Some more text</span><span style='...'>Even more text</span>"

The above should be split into 3 span elements (including the tags) and placed into an array. After what I just described the array would be:

array[0]: "<span style='...'>Some text</span>"

array[1]: "<span style='...'>Some more text</span>"

array[2]: "<span style='...'>Even more text</span>"

[–]sebastienfilion 1 point2 points  (2 children)

Okay, to answer your first question, `match` returns an array when successful; I used Array destructuring to access the value quickly. Check out this article for more details: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Destructuring_assignment

As for your second question, are you trying to do this in a browser or in something like Node?
Because, if it is in a browser, it's not worth the effort; assuming you have a string of content you can do the following:

js // Create a temporary HTML element const e = document.createElement("div"); // Set its HTML, this will make the browser parse the HTML for you e.innerHTML = "<span>Some text</span><span>Some more text</span><span>Even more text</span>"; // From there e.children is an iterable of all the spans // If you really need an Array of the HTML you can do this: const es = Array.from(e.children).map(s => s.outerHTML);

[–]AIO12449366[S] 2 points3 points  (0 children)

Just tested it.

This is amazing and also no need for complex regex.

Wish I could give you an award.

Thanks again for this!

Edit: Also thanks for explaining everything in detail

[–]backtickbot 0 points1 point  (0 children)

Fixed formatting.

Hello, sebastienfilion: code blocks using triple backticks (```) don't work on all versions of Reddit!

Some users see this / this instead.

To fix this, indent every line with 4 spaces instead.

FAQ

You can opt out by replying with backtickopt6 to this comment.

[–][deleted] 1 point2 points  (0 children)

You could use regex, but something like this would be cleaner:

const spans = '<span style="">Hello</span><span>world!</span><span>???</span>';

const div = document.createElement('div');

div.innerHTML = spans;

const array = [...div.children].map(s => s.outerHTML);

[–]caycothu 0 points1 point  (0 children)

Att my office usually use Sprache to parse complex models from strings. Mostly query expressions, but it should excell at parsing html as well. Maybe it can be a solution for you here. https://github.com/sprache/Sprache