Kdict: dict with multi-dimensional, sliceable keys¶
kdict is like dict for multi-dimensional keys. With kdict, you can easily filter and slice your dictionary by key dimensions.
Example: machine learning model evaluation. Suppose you’re evaluating several models on three cross validation folds, each with a training set and a test set.
Before kdict, you might store evaluation scores in a nested dictionary. But that’s cumbersome and error-prone. Here’s what it would take to get the mean accuracy for a particular model across all folds:
# To access inner nested data without kdict, you'd need to write iterators like this:
import numpy as np
np.mean(
[
data[fold_id][fold_label]["lasso"]
for fold_id in data.keys()
for fold_label in data[fold_id].keys()
]
)
kdict makes storing and accessing this type of data a breeze. No more nesting:
# Store data in a three-dimensional kdict.
# Dimensions: fold ID, fold label, model name
data = kdict(...)
# Slice the kdict to get lasso model's mean accuracy across all folds:
# data[:, :, 'lasso'] is a subset of the full dictionary
np.mean(list(data[:, :, 'lasso'].values()))
In this example, data
is a three-dimensional kdict that you can slice along any dimension. So how did we make this kdict?
from kdict import kdict
data = kdict() # make a blank kdict
for fold_id in range(3):
for fold_label in ['train', 'test']:
for model_name in ['lasso', 'randomforest']:
# add an entry for each fold ID, fold label, and model name
data[fold_id, fold_label, model_name] = get_model_score(
fold_id,
fold_label,
model_name
)
The syntax, in a nutshell:
Read or write a single element by accessing
[key_dimension_1, key_dimension_2]
and so on.Or get a subset of the dictionary by slicing, e.g.
[:, key_dimension_2]
.
Installation¶
pip install kdict
Usage¶
Create a kdict¶
Import: from kdict import kdict
Create a blank kdict: data = kdict()
. Or initialize from an existing dict: data = kdict(existing_dict)
. You can also use a dict comprehension there, such as:
data = kdict({
(fold_id, fold_label, model_name): get_model_score(fold_id, fold_label, model_name)
for model_name in ['lasso', 'randomforest']
for fold_label in ['train', 'test']
for fold_id in range(3)
})
Slice a kdict¶
Access an individual item with data[0, 'train', 'lasso']
.
Or get a subset of the dictionary with slices: data[0, :, :]
will have all items where the first dimension of the key is 0. This slice is also a kdict, so you can keep slicing and filtering further.
You can also iterate over specific key dimensions:
# get final dimension of the keys
available_models = data.keys(dimensions=2)
# or get all pairs of first two dimensions
for fold_id, fold_label in data.keys(dimensions=[0, 1]):
... # now do something with data[fold_id, fold_label, :]
Eject¶
A kdict behaves just like a dict, except all keys must have the same number of dimensions.
To get a raw dict back, call data.eject()
.
Development¶
Submit PRs against develop
branch, then make a release pull request to master
.
# Install requirements
pip install --upgrade pip wheel
pip install -r requirements_dev.txt
# Install local package
pip install -e .
# Install pre-commit
pre-commit install
# Run tests
make test
# Run lint
make lint
# bump version before submitting a PR against master (all master commits are deployed)
bump2version patch # possible: major / minor / patch
# also ensure CHANGELOG.md updated