Building databases of published works

Pragmatic tools for constructing databases of scientific works based on queries defined with Boolean Logic.

Tabulate queries in a spreadsheet and generate a database based on the data from those queries.

Example: Define a dataframe with an id column and a query column (expressing a search query in Boolean Logic):

ID DISEASE NAME MONDO_ID QUERY
1 Adult Polyglucosan Body Disease MONDO:0009897 adult polyglucosan body disease | adult polyglucosan body neuropathy
2 AGAT deficiency MONDO:0012996 “GATM deficiency” | “AGAT deficiency” | “arginine:glycine amidinotransferase deficiency” | “L-arginine:glycine amidinotransferase deficiency”
3 Guanidinoacetate methyltransferase deficiency MONDO:0012999 “guanidinoacetate methyltransferase deficiency” | “GAMT deficiency”
4 CLOVES Syndrome MONDO:0013038 “CLOVES syndrome | (congenital lipomatous overgrowth) & (vascular malformation epidermal) & (nevi-spinal) & syndrome | (congenital lipomatous overgrowth) & (vascular malformations) & (Epidermal nevi) & ((skeletal|spinal) & abnormalities) | CLOVE syndrome | (congenital lipomatous overgrowth) & (vascular malformation) & (epidermal nevi)

source

DashboardDb

 DashboardDb (catalog, database, loc)

This class permits the construction of a database of resources generated from combining a list of queries with a list of subqueries on multiple online repositories.

Functionality includes:

  • Define a spreadsheet with a column of queries expressed in boolean logic
  • Optional: Define a secondary spreadsheet with a column of subqueries expressed in boolean logic
  • Iterate over different sources (Pubmed + European Pubmed) to execute all combinations of queries and subqueries
  • Store extended records for all papers - including full text where available from CZI’s internal data repo.