read_html() usually returns all the page html for a given url.
But when I try on this url, I can see that not all of the page is returned.
Why is this (and more importantly, how do I fix it)?
page_html <- "https://raw.githubusercontent.com/mjaniec2013/ExecutionTime/master/ExecutionTime.R" %>% read_html page_html %>% html_text %>% cat # We can see not all the page html has been retrieved # And just to be sure page_html %>% as.character
- It looks like github is okay with bots visiting, so I don't think it's an issue to do with a github
- I tried the same scrape with ruby's
Nokogirilibrary. It gives exactly the same result at
read_html. So it looks like it's not something that's specific to R or
This looks like a bug associated when there's an assignment operator in the text of the page.
fakepage <- "<html>the text after the assignment operator <- will be lost</html>" read_html(fakepage) %>% html_text()  "the text after the assignment operator "
As the page you're after is a plain text file, you can use
readr::read_file() in this instance.
- Corrupt KML file when converting shapefile to kml in R using kml function from plotKML
- Assign a dataframe column a value, based on multiple conditions
- How to make a figure caption in Rmarkdown?
- How to write (bullet) lists in a table using rmarkdown and pandoc
- serching into a really big list from another bigger list in R
- Change the body of a function without quote marks
- How to check rowwise condition for a data frame (with dplyr, purrr, etc)?
- ANOVA: Degrees of freedom almost all equal 1
- transform one data format to another with created additional variable in R
- Keep before and after date of an external list
- Source code for django.test.testcases impo
- ptional \--with-libxml-dir directive is used to specify the
- Testing tools¶ Django provides a small set of tools that c
- | Page | Tags and summary ---|---|--- 1 | Archive of o
- Module jdk.scripting.nashorn Package jdk.nashorn.api.tree
- M git-diff-files - Compare les fichiers dans l’arbre de t
- che les modifications entre les commits, un commit et l’ar
- git-checkout - Bascule sur une autre branche ou restaure des
- dépôt dans un nouveau répertoire SYNOPSIS
- | Page | Tags and summary ---|---|--- 1 | SpiderMonkey