AstrodbKit2 Documentation

AstrodbKit2 is an astronomical database handler code built on top of SQLAlchemy. The goal behind this code is to provide SQLAlchemy’s powerful Object Relational Mapping (ORM) infrastructure to access astronomical database contents regardless of the underlying architecture.

Astrodbkit2 is inspired from the original astrodbkit, which is hardcoded for the SQLite BDNYC database.

Introduction

Astronomical databases tend to focus on targets or observations with ancillary data that support them. Astrodbkit2 is designed to work with these types of databases and adopts several principles for some of its more advanced features. These are:

  • There exists a primary table with object identifiers

  • There exists any number of supplementary tables that either refer back to the primary table or exist independently

For example, the SIMPLE database (which was the initial design for Astrodbkit2) contains:

  • the primary Sources table, with coordinate information for each target

  • several object data tables, like Photometry, Spectra, etc, that contain information for each target

  • reference tables, like Publications, Telescopes, etc, that list other information that is used throughout the database, but doesn’t refer to a particular target

The goal of Astrodbkit2 is to link together the object tables together in order to express them as a single entity, while still retaining the information for other reference tables. Astrodbkit2 can read and write out an entire target’s data as a single JSON file for ease of transport and version control. Reference tables are also written as JSON files, but organized differently- a single file per table with multiple records. An Astrodbkit2-supported database can thus be exported to two types of JSON files: individual target files and reference table files If your database is constructed in a similar fashion, it will work well with Astrodbkit2. Other databases can still benefit from some of the functionality of Astrodbkit2, but they might not work properly if attempting to use the save/load methods.

Getting Started

To install Astrodbkit2, do:

pip install astrodbkit2

or directly via the Github repo:

pip install git+https://github.com/dr-rodriguez/AstrodbKit2

Creating a Database

To create a database from scratch users will need a database schema coded with the SQLAlchemy ORM. An example schema is provided (see schema_example.py), but users can also refer to the SIMPLE schema. With that on hand, users should import their schema and prepare the database:

from astrodbkit2.astrodb import create_database
from simple.schema import *

connection_string = 'sqlite:///SIMPLE.db'  # connection string for a SQLite database named SIMPLE.db
create_database(connection_string)

Accessing the Database

To start using the database, launch Python, import the module, then initialize the database with the astrodbkit2.astrodb.Database() class like so:

from astrodbkit2.astrodb import Database

connection_string = 'sqlite:///SIMPLE.db'  # SQLite connection string
db = Database(connection_string)

The database is now read to be used. If the database is empty, see below how to populate it.

Note

The astrodbkit2.astrodb.Database() class has many parameters that can be set to control the names of primary/reference tables. By default, these match the SIMPLE database, but users can configure them for their own needs and can pass them here or modify their __init__.py file.

Loading the Database

Astrodbkit2 contains methods to output the full contents of the database as a list of JSON files. It can likewise read in a directory of these files to populate the database. This is how SIMPLE is currently version controlled. To load a database of this form, do the following:

from astrodbkit2.astrodb import Database

connection_string = 'sqlite:///SIMPLE.db'  # SQLite connection string
db_dir = 'data'  # directory where JSON files are located
db = Database(connection_string)
db.load_database(db_dir)

Note

Database contents are cleared when loading from JSON files to ensure that the database only contains sources from on-disk files. We describe later how to use the save_db() method to produce JSON files from the existing database contents.

Loading SQLite databases with Windows

Large databases may significantly slow down when attempted to load to a SQLite binary file under Windows. To avoid this, one can create the load database purely in memory and then connect to it when it’s ready. For example:

from astrodbkit2.astrodb import Database

connection_string = 'sqlite:///SIMPLE.db'  # SQLite connection string
db_dir = 'data'  # directory where JSON files are located

# Create a temporary in-memory database and load to it
db = Database('sqlite://')
db.load_database(db_dir)
# Dump in-memory database to file
db.dump_sqlite('SIMPLE.db')

# Connect to the newly created database as usual
db = Database(connection_string)

Querying the Database

Upon connecting to a database, Astrodbkit2 creates methods for each table defined in the schema. This allows for a more pythonic approach to writing queries. There are also methods to perform specialized queries.

Exploring the Schema

The database schema is accessible via the metadata attribute.

For example, to see the available tables users can do:

for table in db.metadata.tables:
    print(table)

And users can also examine column information for an existing table:

for c in db.metadata.tables['Sources'].columns:
    print(c.name, c.type, c.primary_key, c.foreign_keys, c.nullable)

# Example output
source VARCHAR(100) True set() False
ra FLOAT False set() True
dec FLOAT False set() True
shortname VARCHAR(30) False set() True
reference VARCHAR(30) False {ForeignKey('Publications.name')} False
comments VARCHAR(1000) False set() True

Specialized Searches

General Queries

Frequently, users may wish to perform specialized queries against the full database. This can be used with the SQLAlchemy ORM and a convenience method, query, exists for this. For more details on how to use SQLAlchemy, refer to their documentation. Here are a few examples.

Query all columns for the table Sources and output in a variety of formats:

db.query(db.Sources).all()      # default SQLAlchemy output (list of named tuples)
db.query(db.Sources).astropy()  # Astropy Table output
db.query(db.Sources).table()    # equivalent to astropy
db.query(db.Sources).pandas()   # Pandas DataFrame

Example query for sources with declinations larger than 0:

db.query(db.Sources).filter(db.Sources.c.dec > 0).table()

Example query returning just a single column (source) and sorting sources by declination:

db.query(db.Sources.c.source).order_by(db.Sources.c.dec).table()

Example query joining Sources and Publications tables and return just several of the columns:

db.query(db.Sources.c.source, db.Sources.c.reference, db.Publications.c.name)\
        .join(db.Publications, db.Sources.c.reference == db.Publications.c.name)\
        .table()

Example queries showing how to perform ANDs and ORs:

# Query with AND
db.query(db.Sources).filter(and_(db.Sources.c.dec > 0, db.Sources.c.ra > 200)).all()

# Query with OR
db.query(db.Sources).filter(or_(db.Sources.c.dec < 0, db.Sources.c.ra > 200)).all()

In addition to using the ORM, it is useful to note that a sql_query() method exists to pass direct SQL queries to the database for users who may wish to write their own SQL statements:

results = db.sql_query('select * from sources', fmt='astropy')
print(results)

General Queries with Transformations

Astrodbkit2 can convert columns to special types. Currently, spectra transformations are implemented and the specified column would be converted to a Spectrum1D object using the specutils package. To call this, users can supply the name of the column to convert (by default, none is converted, though .spectra assumes the column name is spectrum):

db.query(db.Spectra).astropy(spectra='spectrum')
db.query(db.Spectra).pandas(spectra=['spectrum'])
db.query(db.Spectra).spectra(fmt='astropy')

These three calls will return results from the Spectra table and will attempt to convert the spectrum column to a Spectrum1D object for each row. Multiple columns to convert can also be passed as a list. The parameter spectra_format can be specified if specutils is having trouble determining the type of spectrum.

Spectra need to be specified as either URL or paths relative to an environment variable, for example $ASTRODB_SPECTRA/infrared/myfile.fits. AstrodbKit2 would examine the environment variable $ASTRODB_SPECTRA and use that as part of the absolute path to the file.

Working With Views

If your database contains views, they will not be included in the inventory methods or in the output JSON files. However, you can still work with and query data in them. The best way to define them is in the schema file, for example, with something like:

SampleView = view(
    "SampleView",
    Base.metadata,
    sa.select(
        Sources.source.label("source"),
        Sources.ra.label("s_ra"),
        Sources.dec.label("s_dec"),
        SpectralTypes.spectral_type.label("spectral_type"),
    ).select_from(Sources).join(SpectralTypes, Sources.source == SpectralTypes.source)
    )

When created, the database will contain these views and users can reflect them to access them:

import sqlalchemy as sa

# Inspect the database to get a list of available views
insp = sa.inspect(db.engine)
print(insp.get_view_names())

# Reflect a view and query it
SampleView = sa.Table('SampleView', sa.MetaData())
insp.reflect_table(SampleView, include_columns=None)
db.query(SampleView).table()

It is important that sa.MetaData() be used in the sa.Table() call as you don’t want to modify the metadata of the actual database. If you don’t do this and instead use db.metadata, you may end up with errors in other parts of AstrodbKit2 functionality as the view will be treated as a physical table.

Modifying Data

As a wrapper against standard SQLAlchemy calls, data can be added fairly simply.

Note

Primary and Foreign keys, if present in the database, are verified when modifying data. This can prevent duplicated keys from being created and can propagate deletes or updates as specified in the database schema.

Adding Data

The simplest way to add data to an existing database is to construct a list of dictionaries and insert it to a table:

sources_data = [{'ra': 209.301675, 'dec': 14.477722,
                 'source': '2MASS J13571237+1428398',
                 'reference': 'Schm10',
                 'shortname': '1357+1428'}]
with db.engine.connect() as conn:
    conn.execute(db.Sources.insert().values(sources_data))
    conn.commit()  # sqlalchemy 2.0 does not autocommit

As a convenience method, users can use the add_table_data() method to load user-supplied tables into database tables. If not loading the primary table, the code will first check for missing sources and print those out for the user to correct them. Column names should match those in the database, but extra columns in the supplied table are ignored. Currently, csv-formatted data, astropy Tables, and pandas DataFrames are supported. For example:

db.add_table_data('my_file.csv', table='Photometry', fmt='csv')

Updating Data

Similarly, rows can be updated with standard SQLAlchemy calls. For simplicity, here we use SQLAlchemy’s “Begin Once” style which commits/rollbacks the transaction as needed; see https://docs.sqlalchemy.org/en/20/orm/session_transaction.html#begin-once. This example sets the shortname for a row that matches a Source with source name of 2MASS J13571237+1428398:

with db.engine.begin() as conn:
    conn.execute(db.Sources.update()\
            .where(db.Sources.c.source == '2MASS J13571237+1428398')\
            .values(shortname='1357+1428'))

Deleting Data

Deleting rows can also be done. Here’s an example that deletes all photometry with band name of WISE_W1:

with db.engine.begin() as conn:
    conn.execute(db.Photometry.delete().where(db.Photometry.c.band == 'WISE_W1'))

Saving the Database

If users perform changes to a database, they will want to output this to disk to be version controlled. Astrodbkit2 provides methods to save an individual source or reference table as well as the entire data. We recommend the later to output the entire contents to disk:

# Save single object
db.save_json('2MASS J13571237+1428398', 'data')

# Save single reference table
db.save_reference_table('Publications', 'data')

# Save entire database to directory 'data'
db.save_database('data')

Note

To properly capture database deletes, the contents of the specified directory is first cleared before creating JSON files representing the current state of the database.

Using the SQLAlchemy ORM

The SQLAlchemy ORM (Object Relational Mapping) can be used for many of the examples provided above. This also allows for adding extra functionality to your schema, such as validation.

For example, the schema of your sources table could be written to validate RA/Dec as follows:

from sqlalchemy import Column, Float, String
from sqlalchemy.orm import validates

class Sources(Base):
    """ORM for the sources table. This stores the main identifiers for our objects along with ra and dec"""

    __tablename__ = "Sources"
    source = Column(String(100), primary_key=True, nullable=False)
    ra = Column(Float)
    dec = Column(Float)

    @validates("ra")
    def validate_ra(self, key, value):
        if value > 360 or value < 0:
            raise ValueError("RA not in allowed range (0..360)")
        return value

    @validates("dec")
    def validate_dec(self, key, value):
        if value > 90 or value < -90:
            raise ValueError("Dec not in allowed range (-90..90)")
        return value

In your scripts, you can then create objects and populate them accordingly. For example:

from astrodbkit2.astrodb import Database
from schema import Sources

db = Database(connection_string)

# Create a new object and insert to database
s = Sources(source="V4046 Sgr", ra=273.54, dec=-32.79)
with db.session as session:
    session.add(s)
    session.commit()

If the RA or Dec fail the validation checks, the creation of the object will raise a ValueError exception.

One can also use session.add_all() to insert a list of table entries and session.delete() to delete a single one. These options can facilitate the creation of robust ingest scripts that are both intuitive (ie, instantiating objects in a pythonic way) and that can take care of validating input values before they get to the database.

Reference/API

Indices and tables