Web Robots
Web robots that automates the process of obtaining full text papers (and other interactions with the web)
These functions spawn a web-browser to search external websites and retrieve papers and files for collation into an underlying document store. Developers using Alhazen
must abide by data licensing requirements and third party websites terms and conditions. Users of this code should ensure that they do not infringe upon third party privacy or intellectual property rights through the use of this code.
retrieve_pdf_from_doidotorg
retrieve_pdf_from_doidotorg (doi, base_dir, headless=False)
retrieve_full_text_links_from_biorxiv
retrieve_full_text_links_from_biorxiv (doi, base_dir)
execute_search_on_biorxiv
execute_search_on_biorxiv (search_term)
extract_reconstructed_nxml
extract_reconstructed_nxml (html)
get_html_from_pmc_doi
get_html_from_pmc_doi (doi, base_file_path)
Given a DOI, navigate to the PMC HTML page and reconstruct NXML from that