Format

Each site has a somewhat different set of data that it exposes. Because of this, it is difficult to find a single database schema that would fit all of them. Instead, the source of truth is a set of JSON records, and those are later processed into database tables offline.

For every successful scrape, a record is created. The record contains all data that has been scraped, even if the same data was recorded previously.

Each record has the same top-level keys: v, t, s, k.

The v key contains a string representing the global format being used. The current version is “1”.

The t key contains a number that is the Unix time when the record was created. This corresponds to the time that the corresponding fact was scraped.

The s key contains a string that is the identifier of the site from which the data was obtained.

The t key contains a string that identifies the type of entity that the record represents.

The d key contains an object that contains the data that was scraped. This must be interpreted with an appropriate processor based on the s and t keys.

The object inside the d key must contain a v key, which is a string representing the version of the processor to be used. This is distinct from the global version – a site might change its format without the global format changing.