Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
500 views
in Technique[技术] by (71.8m points)

html - returning character(0) when scraping with rvest

I'm trying to do some web scraping with rvest. I'm new to R, so I have a bit of a knowledge barrier. I want to scrape the following URL:

https://www.spa.gov.sa/search.php?lang=ar&search=%D8%AD%D9%83%D9%85

That directs to a website in Arabic, but I don't think you need to be able to read Arabic to advise me. Basically, this is the first results page for a specific search term on this website (which is not a search engine). What I want to do is use rvest to scrape this page to return a list of the titles of the hyperlinks returned by the search. Using selectorgadget, I identified that the node containing those titles is called ".h2Newstitle". However, when I try to scrape that node using the code below, all I get in return is "character(0)":

library(tidyverse)
library(rvest)
read_html("https://www.spa.gov.sa/search.php?lang=ar&search=%D8%AD%D9%83%D9%85") %>%
    html_nodes(".h2NewsTitle") %>%
    html_text()

I don't think the issue here has to do with the Arabic text itself. I'm pretty sure everything is in UTF-8, and I can scrape other nodes on the same page and return Arabic text without issue. For example, the code below returns the Arabic text "??? ??????", which corresponds to the Arabic text in that node on the page itself:

read_html("https://www.spa.gov.sa/search.php?lang=ar&search=%D8%AD%D9%83%D9%85") %>%
    html_nodes("WeeklySearch") %>%
    html_text()

So I'm unsure why it is when I try to scrape the ".h2NewsTitle" node, I just get character(0) in return. I wonder if it has to do with some elements being rendered with JavaScript or something. This is a bit outside my expertise, so any advice on how to proceed would be appreciated. I'd like to continue using R, but am open to switching to Python/Beautiful Soup or something if it's better suited for this.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...