Issue
I am working on a project right now that requires me to scrape information from this website:
I have already managed to scrape the table with RSelenium and Rvest. But there are some details I would like to add to the dataframe, which can be found in an expandable java (?) object. I have illustrated the object here:
Essentially, I need to expand ALL of them before scraping in order to include them. Is there an easy way to do this with a code? Yesterday I had a script that clicked them all manually, which took hours to complete.
Is it possible to inject a code on the website that expands them all, or have RSelenium execute a code?
Solution
The below script will allow you to scrape the web page and select the relevant dates and the court. Modify "Query" with the date range and the court you need. I picked the first court in the list. You can find the court codes by inspecting the page and searching for FormData.Court.
library(tidyverse)
library(httr)
library(rvest)
#################################
## FUNCTION TO PROCESS DATA ##
#################################
parseDetails <- function(pg_dtls){
dt <- pg_dtls %>%
html_elements('dt') %>%
html_text()
dd <- pg_dtls %>%
html_elements('dd') %>%
html_text()
tbl <- pg_dtls %>%
html_element('table') %>%
html_table()
df <- as.data.frame(dd, row.names = dt) %>% set_names("Details")
rtnList <- list(mainData = df, otherData = tbl)
}
###############################
## SCRAPE WEB PAGE ##
###############################
url <- "https://www.domstol.no/enkelt-domstol/hoyesterett/saksliste/berammingsliste/"
query <- list('FormData.From'="25.02.2022",
'FormData.To'="07.03.2022",
'FormData.Court'="AAAA2104220835148622091WAFLAU#EJBOrgUnit")
response <- POST(url, body = query)
dtls <- content(response, "parsed") %>%
html_elements("button") %>%
html_attr("data-action") %>%
na.omit() %>%
paste0("https://www.domstol.no/", .) %>%
map(read_html)
scrapedData <- map(dtls, parseDetails)
Answered By - Blue050205
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.