Database for Scientific Knowledge

A local Postgresql database of scientific content with associated information generated by the agent. The intent is to use this repository as ‘memory’ in a Langchain-enabled agent.

We define a local database for use with the Alhazen schema conforming to the following schema:

Schema

The key elements of the schema are:

Abstract Parent Class

Classes Denoting Extant Scientific Knowledge from External Sources

Classes Denoting Information Generated by Alhazen

Our definitions are inspired by FABIO, but only differentiate between ‘expression’ (as the reference to an scientific underlying work product, e.g., a paper or database record) and ‘item’ (as the actual data that delivers on that expression). The system is coded in LinkML and generated as a Postgresql database and a SQLAlchemy ORM.


source

list_databases

 list_databases ()

List all CEIFNS databases in the postgres server


source

restore_ceifns_database

 restore_ceifns_database (db_name, backup_file, verbose=False)

Restore postgres db from a file.


source

backup_ceifns_database

 backup_ceifns_database (db_name, dest_file, verbose=False)

Backup postgres db to a local file. Note that this


source

drop_ceifns_database

 drop_ceifns_database (db_name, backupFirst=True)

Use this function to delete a CEIFNS database from the local postgres server. Set the backupFirst flag to True to backup the database before deletion.


source

create_ceifns_database

 create_ceifns_database (db_name)

Use this function to create a CEIFNS database within the local postgres server.


source

Ceifns_LiteratureDb

 Ceifns_LiteratureDb (loc:str, name:str,
                      engine:sqlalchemy.engine.base.Engine=None,
                      session:sqlalchemy.orm.session.Session=None, sent_de
                      tector:nltk.tokenize.punkt.PunktSentenceTokenizer=No
                      ne, embed_model:langchain_core.embeddings.embeddings
                      .Embeddings=None)

This class runs a set of queries on external literature databases to build a local database of linked corpora and papers.

Functionality includes:

  • Executes queries over European PMC
  • Can run combinatorial sets of queries using a dataframe structure
    • Requires a column of queries expressed in boolean logic
    • Optional to define a secondary spreadsheet with a column of subqueries expressed in boolean logic
  • Has capability to run boolean logic over sources (currently only European PMC, but possibly others)
  • Builds a local Postgresql database with tables for collections, expressions, items, fragments, and notes.
  • Provides an API for querying the database and returning results as sqlAlchemy objects.
  • Permits user to download a local copy of full text papers in NXML(JATS), PDF, and HTML format.

source

read_information_content_entity_iri

 read_information_content_entity_iri (ice, id_prefix)

Reads an identifier for a given prefix