Building databases of published works
Pragmatic tools for constructing databases of scientific works based on queries defined with Boolean Logic.
Tabulate queries in a spreadsheet and generate a database based on the data from those queries.
Example: Define a dataframe with an id
column and a query
column (expressing a search query in Boolean Logic):
ID | DISEASE NAME | MONDO_ID | QUERY |
---|---|---|---|
1 | Adult Polyglucosan Body Disease | MONDO:0009897 | adult polyglucosan body disease | adult polyglucosan body neuropathy |
2 | AGAT deficiency | MONDO:0012996 | “GATM deficiency” | “AGAT deficiency” | “arginine:glycine amidinotransferase deficiency” | “L-arginine:glycine amidinotransferase deficiency” |
3 | Guanidinoacetate methyltransferase deficiency | MONDO:0012999 | “guanidinoacetate methyltransferase deficiency” | “GAMT deficiency” |
4 | CLOVES Syndrome | MONDO:0013038 | “CLOVES syndrome | (congenital lipomatous overgrowth) & (vascular malformation epidermal) & (nevi-spinal) & syndrome | (congenital lipomatous overgrowth) & (vascular malformations) & (Epidermal nevi) & ((skeletal|spinal) & abnormalities) | CLOVE syndrome | (congenital lipomatous overgrowth) & (vascular malformation) & (epidermal nevi) |
DashboardDb
DashboardDb (catalog, database, loc)
This class permits the construction of a database of resources generated from combining a list of queries with a list of subqueries on multiple online repositories.
Functionality includes:
- Define a spreadsheet with a column of queries expressed in boolean logic
- Optional: Define a secondary spreadsheet with a column of subqueries expressed in boolean logic
- Iterate over different sources (Pubmed + European Pubmed) to execute all combinations of queries and subqueries
- Store extended records for all papers - including full text where available from CZI’s internal data repo.