I would like to scrape a series of html_nodes from a series of pages. The problem comes when those elements are inside a list which does not have any class nor id. I can’t use XPATH neither because the position of the desired elements differ from one page to another depending on the previous information.
Detailed information:
Sample page: https://www.fablabs.io/machines/othermill
Target: I would like to scrape the name of all fablabs that are using that specific machine. And how it can be integrated with Apache Spark.
The html code (fragment) looks like this:
- The Beach Lab x Middle East ...
fablabs = url %>%
html_nodes(xpath = ‘/html/body/div[2]/div[2]/div[2]/ul[3]/li/a’) %>%
html_text()
Unfortunately, although this would work for this page, will not work in other pages, as the position of this list changes from page to page depending on its previous content.
the only thing I know is that I would like to scrape something that is below the string Available at. Is there any way to achieve that in R?