Support axes labels
in progress
S
Seth Shelnutt
in progress
Fons de Leeuw
Good to hear this is already implemented for sparse arrays. It would be great to have this for dense arrays as well, in order to use it for xarray/NetCDF-type data. In the meantime, I'll build my own mapping using array metadata. For example, defining an offset and multiplier for a numeric dimension, or storing dataframe column labels as list of strings.
N
Nick Gates
Perhaps the API could be list of strings as the domain argument to an integer dimension?
e.g.
Dim(name='labelled', dtype=np.uint8, domain=['A', 'B', 'C'], tile=1)
Queries could then be mapped through the domain to find positional indices. Perhaps even using a Filter to do so?
S
Stavros Papadopoulos
Nick Gates: That would work if the domain label list is small. It'd be a good idea to expose this kind of API, but we should handle the more general case where the labels may not fit in main memory. Our current ideas are around storing the labels as a sparse array (which will allow very efficient lookup) and "attach" that sparse array to the "main" array. More on this soon.
S
Stavros Papadopoulos
We just announced TileDB 2.0 that adds support for string and heterogeneous dimensions to sparse arrays. Axes labels can now be implemented by "attaching" any sparse array (acting practically as any dataframe) by mapping coordinates (e.g., string labels) to positional indices. Currently, the user must do it manually. We'd like to hear some feedback on what API you would like us to add to make axes labels more seamless with the array object.
Florian R. Hölzlwimmer
Wow, this would be super useful. E.g. indexing by string in conjunction with https://feedback.tiledb.com/tiledb-core/p/support-string-dimensions.
Integration into xarray in python and feature parity to xarray in other languages would be the silver bullet.
S
Stavros Papadopoulos
planned