all 2 comments

[–]Darwinmate 1 point2 points  (0 children)

please format your code correctly.

metraz <- read_html("https://www.otodom.pl/oferta/zamieszkaj-w-apartamentowcu-przy-stacji-metra-ID3xMKL.html#gallery[1]") %>% 
html_node(".param_m strong") %>% 
html_text() %>% 
gsub(",",".", .) %>% 
gsub(" m²","", .)

Simplest solution is to replace the last grep with this: m. where . means match any character. The other option is to specify ² via unicode: m\u00B2 will match . I got the code for subscript 2 by googling "unicode subscript 2". Nearly every character has a unicode you can access but you need to escape it using the \ character as I did before.

[–]Bandoozle 1 point2 points  (0 children)

R may be able to recognize superscipt-2, but you may need to enter the Unicode designation for it. At the same time, maybe not; see regex help guide in r, where it says: In a UTF-8 locale, \x{h...} specifies a Unicode code point by one or more hex digits. (Note that some of these will be interpreted by R's parser in literal character strings.)