Database for Scientific Knowledge
We define a local database for use with the Alhazen schema conforming to the following schema:
The key elements of the schema are:
Abstract Parent Class
- InformationContentEntity - a named piece of information (all other entities inherit from this)
Classes Denoting Extant Scientific Knowledge from External Sources
- ScientificKnowledgeCollection - a collection of scientific knowledge expressions
- ScientificKnowledgeExpression - an expression of scientific knowledge (e.g., a paper, a database record, a blog post, etc.)
- ScientificKnowledgeItem - the data item that manifests the expression (e.g., a pdf, an html data file, a yaml file, etc.)
- ScientificKnowledgeFragment - a relevant fragment of the item (e.g., a paragraph, a table, a figure, etc.)
Classes Denoting Information Generated by Alhazen
- Note - an annotation placed on any InformationContentEntity (e.g., a comment, a question, a link to another fragment, etc.)
Our definitions are inspired by FABIO, but only differentiate between ‘expression’ (as the reference to an scientific underlying work product, e.g., a paper or database record) and ‘item’ (as the actual data that delivers on that expression). The system is coded in LinkML and generated as a Postgresql database and a SQLAlchemy ORM.
list_databases
list_databases ()
List all CEIFNS databases in the postgres server
restore_ceifns_database
restore_ceifns_database (db_name, backup_file, verbose=False)
Restore postgres db from a file.
backup_ceifns_database
backup_ceifns_database (db_name, dest_file, verbose=False)
Backup postgres db to a local file. Note that this
drop_ceifns_database
drop_ceifns_database (db_name, backupFirst=True)
Use this function to delete a CEIFNS database from the local postgres server. Set the backupFirst flag to True to backup the database before deletion.
create_ceifns_database
create_ceifns_database (db_name)
Use this function to create a CEIFNS database within the local postgres server.
Ceifns_LiteratureDb
Ceifns_LiteratureDb (loc:str, name:str, engine:sqlalchemy.engine.base.Engine=None, session:sqlalchemy.orm.session.Session=None, sent_de tector:nltk.tokenize.punkt.PunktSentenceTokenizer=No ne, embed_model:langchain_core.embeddings.embeddings .Embeddings=None)
This class runs a set of queries on external literature databases to build a local database of linked corpora and papers.
Functionality includes:
- Executes queries over European PMC
- Can run combinatorial sets of queries using a dataframe structure
- Requires a column of queries expressed in boolean logic
- Optional to define a secondary spreadsheet with a column of subqueries expressed in boolean logic
- Has capability to run boolean logic over sources (currently only European PMC, but possibly others)
- Builds a local Postgresql database with tables for collections, expressions, items, fragments, and notes.
- Provides an API for querying the database and returning results as sqlAlchemy objects.
- Permits user to download a local copy of full text papers in NXML(JATS), PDF, and HTML format.
read_information_content_entity_iri
read_information_content_entity_iri (ice, id_prefix)
Reads an identifier for a given prefix