exetera.core package¶
Submodules¶
exetera.core.data_writer module¶
-
class
exetera.core.data_writer.DataWriter¶ Bases:
object-
static
clear_dataset(parent_group, name)¶
-
static
create_group(parent_group, name, attrs)¶
-
static
flush(group)¶
-
static
write(group, name, field, count, dtype=None)¶
-
static
write_additional(group, name, field, count)¶
-
static
write_first(group, name, field, count, dtype=None)¶
-
static
exetera.core.dataset module¶
-
class
exetera.core.dataset.HDF5Dataset(session, dataset_path, mode, name)¶ Bases:
exetera.core.abstract_types.DatasetDataset is the means which which you interact with an ExeTera datastore. These are created and loaded through Session.open_dataset, rather than being constructed directly.
Datasets are composed of one or more DataFrame objects and the means by which DataFrames are interacted with.
For a detailed explanation of Dataset along with examples of its use, please refer to the wiki documentation at https://github.com/KCL-BMEIS/ExeTera/wiki/Dataset-API
- Parameters
session – The session instance to include this dataset to.
dataset_path – The path of HDF5 file.
mode – the mode in which the dataset should be opened. This is one of “r”, “r+” or “w”.
name – the name that is associated with this dataset. This can be used to retrieve the dataset when calling
get_dataset().
- Returns
A HDF5Dataset instance.
-
close()¶ Close the HDF5 file operations.
-
contains_dataframe(dataframe: exetera.core.abstract_types.DataFrame)¶ Check if a dataframe is contained in this dataset by the dataframe object itself.
- Parameters
dataframe – the dataframe object to check
- Returns
True or False if the dataframe is contained
-
copy(dataframe, name)¶ Add an existing dataframe (from other dataset) to this dataset, write the existing group attributes and HDF5 datasets to this dataset.
- Parameters
dataframe – the dataframe to copy to this dataset.
name – optional- change the dataframe name.
- Returns
None if the operation is successful; otherwise throw Error.
-
create_dataframe(name: str, dataframe: Optional[exetera.core.abstract_types.DataFrame] = None)¶ Create a new DataFrame object as a part of this Dataset.
- Parameters
name – name of the dataframe
dataframe – if set, this is a dataframe object whose contents are duplicated
- Returns
a dataframe object
-
create_group(name: str)¶ This method is a wrapper around
create_dataframe()instead.
-
delete_dataframe(dataframe: exetera.core.abstract_types.DataFrame)¶ Remove dataframe from this dataset by the dataframe object.
- Parameters
dataframe – The dataframe instance to delete.
- Returns
Boolean if the dataframe is deleted.
-
drop(name: str)¶
-
get_dataframe(name: str)¶ Get the dataframe by dataset.get_dataframe(dataframe_name).
- Parameters
name – The name of the dataframe.
- Returns
The dataframe or throw Error if the name is not existed in this dataset.
-
items()¶ Return the (name, dataframe) tuple in this dataset.
-
keys()¶ Return all dataframe names in this dataset.
-
require_dataframe(name)¶ Get a dataframe, creating it if it doesn’t exist.
- Parameters
name – name of the dataframe
-
property
session¶ The session property interface.
- Returns
The _session instance.
-
values()¶ Return all dataframe instance in this dataset.
-
exetera.core.dataset.copy(dataframe: exetera.core.abstract_types.DataFrame, dataset: exetera.core.abstract_types.Dataset, name: str)¶ Copy dataframe to another dataset via HDF5DataFrame.copy(ds1[‘df1’], ds2, ‘df1’])
- Parameters
dataframe – The dataframe to copy.
dataset – The destination dataset.
name – The name of dataframe in destination dataset.
-
exetera.core.dataset.move(dataframe: exetera.core.abstract_types.DataFrame, dataset: exetera.core.abstract_types.Dataset, name: str)¶ Move a dataframe to another dataset via HDF5DataFrame.move(ds1[‘df1’], ds2, ‘df1’]). If move within the same dataset, e.g. HDF5DataFrame.move(ds1[‘df1’], ds1, ‘df2’]), function as a rename for both dataframe and HDF5Group. However, to
- Parameters
dataframe – The dataframe to copy.
dataset – The destination dataset.
name – The name of dataframe in destination dataset.
exetera.core.dataframe module¶
-
class
exetera.core.dataframe.HDF5DataFrame(dataset: exetera.core.abstract_types.Dataset, name: str, h5group: h5py._hl.group.Group)¶ Bases:
exetera.core.abstract_types.DataFrameDataFrame is the means which which you interact with an ExeTera datastore. These are created and loaded through Dataset.create_dataframe, and other methods, rather than being constructed directly.
DataFrames closely resemble Pandas DataFrames, but with a number of key differences: 1. Instead of Series, DataFrames are composed of Field objects 2. DataFrames can store fields of differing lengths, although all fields must be of the same length when performing certain operations such as merges. 3. ExeTera DataFrames do not (yet) have the ability to create filtered views onto an underlying DataFrame, although this functionality will be added in upcoming releases
For a detailed explanation of DataFrame along with examples of its use, please refer to the wiki documentation at https://github.com/KCL-BMEIS/ExeTera/wiki/DataFrame-API
- Parameters
name – name of the dataframe.
dataset – a dataset object, where this dataframe belongs to.
h5group – the h5group object to store the fields. If the h5group is not empty, acquire data from h5group object directly. The h5group structure is h5group<-h5group-dataset structure, the later group has a ‘fieldtype’ attribute and only one dataset named ‘values’. So that the structure is mapped to Dataframe<-Field-Field.data automatically.
dataframe – optional - replicate data from another dictionary of (name:str, field: Field).
-
add(field: exetera.core.abstract_types.Field)¶ Add a field to this dataframe as well as the HDF5 Group.
- Parameters
field – field to add to this dataframe, copy the underlying dataset
-
apply_filter(filter_to_apply, ddf=None)¶ Apply a filter to all fields in this dataframe, returns filtered dataframe (itself) or a new target (destination) dataframe
Example:
df = ... # df contains a field ('foo') with data: ["a", "b", "c", "d", "e", "f", "g"] # apply boolean filter to dataframe in place bfilter = np.array([0, 1, 0, 1, 0, 1, 1], dtype='bool') df.apply_filter(bfilter) print(df['foo'].data[:]) # prints ["b", "d", "f", "g"] # apply numeric filter to dataframe and store filtered result to designated dataframe nfilter = np.array([0, 1, 0, 1, 0, 1, 1, 0]) df.apply_filter(nfilter, ddf = df2) print(df2['foo'].data[0:10]) # prints ["b", "d", "f", "g"]
- Parameters
filter_to_apply – the filter to be applied to the source field, an array of boolean
ddf – optional- the destination data frame
- Returns
a dataframe contains all the fields filterd, self if ddf is not set
-
apply_index(index_to_apply, ddf=None)¶ Apply an index to all fields in this dataframe, returns filtered dataframe (itself) or a new target (destination) dataframe
Example:
df = ... # df contains a field ('foo') with data: ["a", "b", "c", "d", "e"] # apply index inplace index = np.array([4, 3, 2, 1, 0]) df.apply_index(index) print(df['foo'].data[:]) # prints ["e", "d", "c", "b", "a"] # apply index and store new result to designated dataframe df.apply_index(index, ddf=df2) print(df2['foo'].data[0:10]) # prints ["e", "d", "c", "b", "a"]
- Parameters
index_to_apply – the index to be applied to the fields, an ndarray of integers
ddf – optional- the destination data frame
- Returns
a dataframe contains all the fields re-indexed, self if ddf is not set
-
property
columns¶ The columns property interface. Columns is a dictionary to store the fields by (field_name, field_object). The field_name is field.name without prefix ‘/’ and HDF5 group name.
-
contains_field(field)¶ check if dataframe contains a field by the field object
- Parameters
field – the filed object to check, return a tuple(bool,str). The str is the name stored in dataframe.
- Returns
bool value indicating whether this DataFrame contains a Field
-
create_categorical(name: str, nformat: int, key: dict, timestamp: Optional[str] = None, chunksize: Optional[int] = None)¶ Create a categorical type field. Please see https://github.com/KCL-BMEIS/ExeTera/wiki/Datatypes#categoricalfield for a detailed description of indexed string fields
- Parameters
name – name of field to be created
nformat – A numerical type in the set (int8, uint8, int16, uint18, int32, uint32, int64, uint64, float32, float64). It is recommended to use ‘int8’.
timestamp – optional - If set, the timestamp that should be given to the new field.
chunksize – optional - If set, the chunksize that should be used to create the new field.
- Returns
a newly created categorical type field
-
create_fixed_string(name: str, length: int, timestamp: Optional[str] = None, chunksize: Optional[int] = None)¶ Create a fixed string type field. Please see https://github.com/KCL-BMEIS/ExeTera/wiki/Datatypes#fixedstringfield for a detailed description of fixed string fields
- Parameters
name – name of field to be created
timestamp – optional - If set, the timestamp that should be given to the new field.
chunksize – optional - If set, the chunksize that should be used to create the new field.
- Returns
a newly created fixed string type field
-
create_group(name: str)¶ Create a group object in HDF5 file for field to use. Please note, this function is for backwards compatibility with older scripts and should not be used in the general case.
- Parameters
name – the name of the group and field
- Returns
a hdf5 group object
-
create_indexed_string(name: str, timestamp: Optional[str] = None, chunksize: Optional[int] = None)¶ Create a indexed string type field. Please see https://github.com/KCL-BMEIS/ExeTera/wiki/Datatypes#indexedstringfield for a detailed description of indexed string fields
- Parameters
name – name of field to be created
timestamp – optional - If set, the timestamp that should be given to the new field.
chunksize – optional - If set, the chunksize that should be used to create the new field.
- Returns
a newly created indexed string type field
-
create_numeric(name: str, nformat: int, timestamp: Optional[str] = None, chunksize: Optional[int] = None)¶ Create a numeric type field. Please see https://github.com/KCL-BMEIS/ExeTera/wiki/Datatypes#numericfield for a detailed description of numeric fields
- Parameters
name – name of field to be created
nformat – A numerical type in the set (int8, uint8, int16, uint18, int32, uint32, int64, uint64, float32, float64). It is recommended to avoid uint64 as certain operations in numpy cause conversions to floating point values.
timestamp – optional - If set, the timestamp that should be given to the new field.
chunksize – optional - If set, the chunksize that should be used to create the new field.
- Returns
a newly created numeric type field
-
create_timestamp(name: str, timestamp: Optional[str] = None, chunksize: Optional[int] = None)¶ Create a timestamp type field. Please see https://github.com/KCL-BMEIS/ExeTera/wiki/Datatypes#timestampfield for a detailed description of timestamp fields
- Parameters
name – name of field to be created
timestamp – optional - If set, the timestamp that should be given to the new field.
chunksize – optional - If set, the chunksize that should be used to create the new field.
- Returns
a newly created timestamp type field
-
property
dataset¶ The dataset property interface.
-
delete_field(field)¶ Remove field from dataframe by field.
- Parameters
field – The field to delete from this dataframe.
- Returns
None.
-
describe(include=None, exclude=None, output='terminal')¶ Show the basic statistics of the data in each field.
Example:
df = ... # df contains three fields: # field "foo" with data [1, 0, 0, 1, 1] # field "bar" with data ["b", "b", "a", "a", "b"] # field "baz" with data [3.5, 6.0, 4.2, 7.2, 5.5] # Display statistics results in stdout by default, # and return dataframe that contains staticstic results. result = df.describe() # Statistics results displayed # # fields foo baz # --------------------------------- # count 5 5 # mean 0.60 5.28 # std 0.49 1.31 # min 0.00 3.50 # 25% 0.00 3.51 # 50% 0.00 3.51 # 75% 0.00 3.52 # max 1.00 7.20 # Not display staticstic results result = df.describe(output='None') # Include multiple fields result = df.describe(include=['foo', 'bar', 'baz']) # Statistics results displayed # # fields foo bar baz # -------------------------------------------------------- # count 5 5 5 # unique NaN 2 NaN # top NaN b'b' NaN # freq NaN 3 NaN # mean 0.60 NaN 5.28 # std 0.49 NaN 1.31 # min 0.00 NaN 3.50 # 25% 0.00 NaN 3.51 # 50% 0.00 NaN 3.51 # 75% 0.00 NaN 3.52 # max 1.00 NaN 7.20 # Include multiple data types result = df.describe(include = [np.bytes_, np.float32]) # Statistics results displayed # # fields bar baz # ----------------------------------------- # count 5 5 # unique 2 NaN # top b'b' NaN # freq 3 NaN # mean NaN 5.28 # std NaN 1.31 # min NaN 3.50 # 25% NaN 3.51 # 50% NaN 3.51 # 75% NaN 3.52 # max NaN 7.20
- Parameters
include – The field name or data type or simply ‘all’ to indicate the fields included in the calculation.
exclude – The field name or data type to exclude in the calculation.
output – Display the result in stdout if set to terminal, otherwise silent.
- Returns
A dataframe contains the statistic results.
-
drop(name: str)¶ Drop a field from this dataframe as well as the HDF5 Group
- Parameters
name – name of field to be dropped
-
drop_duplicates(by: Union[str, List[str]], ddf: exetera.core.abstract_types.DataFrame = None, hint_keys_is_sorted=False)¶ Removes duplicated values in a field or list of fields, returns a dataframe with distinct values.
Example:
df = ... # df contains two fields: # field "foo" with data [1, 0, 0, 1] # field "bar" with data ["b", "b", "a", "a"] # return distinct values of a single field df.drop_duplicates(by = 'foo', ddf = df2) print(df2["foo"].data[:]) # prints [0, 1] # return distinct values of multiple fields df.drop_duplicates(by = ['foo', 'bar'], ddf = df3) # print dataframe (df3) data: # # "foo", "bar" # ------------- # 0 "a" # 0 "b" # 1 "a" # 1 "b"
- Parameters
by – Name (str) or list of names (str) to distinct.
ddf – optional - the destination dataframe
- Returns
DataFrame with distinct values.
-
get_field(name)¶ Get a field stored by the field name.
- Parameters
name – the name of field to get.
- Returns
field to get.
-
groupby(by: Union[str, List[str]], hint_keys_is_sorted=False)¶ Group DataFrame using a field or a list of field, return a groupby object.
Example:
df = ... # df contains two fields: # field "foo" with data [1, 0, 0, 1, 1] # field "bar" with data ["b", "b", "a", "a", "b"] # group by on single field, then compute max df.groupby(by = 'bar').max(ddf = ddf) # print dataframe (ddf) data: # # "bar", "foo_max" # ---------------- # "a" 1 # "b" 1 # group by on multiple field, then compute count df.groupby(by = ['foo', 'bar']).count(ddf = ddf) # print dataframe (ddf) data: # # "foo", "bar", "count" # ---------------------- # 0 "a" 1 # 0 "b" 1 # 1 "a" 1 # 1 "b" 2
- Parameters
by – Name (str) or list of names (str) to group by.
hint_keys_is_sorted – an optional flag that users could set to skip the sorted check. Note that it runs faster and uses less memory when the dataframe is sorted, that is, hint_key_is_sorted=True.
- Returns
Returns a groupby object that contains information about the groups.
-
property
h5group¶ The h5group property interface, used to handle underlying storage.
-
items()¶ Return all the field names and their corresponding field values
-
keys()¶ Return all the field names
-
rename(field: Union[str, Mapping[str, str]], field_to: Optional[str] = None) → None¶ Rename provides you with the means to rename fields within a dataframe. You can specify either a single field to be renamed or you can provide a dictionary with a set of fields to be renamed.
Example:
# rename a single field df.rename('old_field_name', 'new_field_name') # rename multiple fields df.rename({'old_field_name_a': 'new_field_name_a', 'old_field_name_a': 'new_field_name_b'})
Field renaming can fail if the resulting set of renamed fields would have name clashes. If this is the case, none of the rename operations go ahead and the dataframe remains unmodified.
- Parameters
field – Either a string or a dictionary of name pairs, each of which is the existing field name and the destination field name
field_to – Optional parameter containing a string, if field is a string. If ‘field’ is a dictionary, parameter should not be set. Field references remain valid after this operation and reflect their renaming.
- Returns
None
-
sort_values(by: Union[str, List[str]], ddf: exetera.core.abstract_types.DataFrame = None, axis=0, ascending=True, kind='stable')¶ Sort one or multiple fields in dataframe (itself) or a new target (destination) dataframe
Example:
df = ... # df contains a field ('idx') with data: ["a", "c", "e", "g", "f", "b", "d"] # sort inplace df.sort_values(by = 'idx') print(df['idx'].data[:]) # prints ["a", "b", "c", "d", "e", "f", "g"] # sort and store sorted value in designated dataframe df.sort_values(by = 'idx', ddf = df2) print(df2['idx'].data[:10]) # prints ["a", "b", "c", "d", "e", "f", "g"]
- Parameters
by – Name (str) or list of names (str) to sort by.
ddf – optional - the destination data frame
axis – Axis to be sorted. Currently only supports 0
ascending – Sort ascending vs. descending. Currently only supports ascending=True.
kind – Choice of sorting algorithm. Currently only supports “stable”
- Returns
DataFrame with sorted values or None if ddf=None.
-
to_csv(filepath: str, row_filter: Union[numpy.ndarray, exetera.core.abstract_types.Field] = None, column_filter: Union[str, List[str]] = None, chunk_row_size: int = 32768)¶ Write object to a comma-separated values (csv) file.
Example:
# write to csv file df.to_csv(csv_file_name) # write to csv file with row_filter. Only select rows when filter value is True. df.to_csv(csv_file_name, row_filter=df['foo']) # write to csv file with selected columns defined in column_filter. df.to_csv(csv_file_name, column_filter=['foo', 'bar'])
- Parameters
filepath – File path.
row_filter – A boolean array / field. Only select rows when filter value is True
column_filter – A sequence of string names for the fields.
- Chunk_row_size
Write rows for every chunk which has maximum chunk_row_size rows. The default is 1<<15.
-
to_pandas(row_filter: List[bool] = None, col_filter: Union[str, List[str]] = None)¶ Convert an ExeTera dataframe to Pandas DataFrame. :param row_filter: A boolean array indicates which rows to export. :param col_filter: String or list of strings indicates which columns to export. :returns: A pandas dataframe.
Example:
pandas_df = df.to_pandas()
-
values()¶ Return all the field values
-
class
exetera.core.dataframe.HDF5DataFrameGroupBy(columns, by, sorted_index, spans)¶ Bases:
exetera.core.abstract_types.DataFrameGroupBy-
count(ddf: exetera.core.abstract_types.DataFrame, write_keys=True) → exetera.core.abstract_types.DataFrame¶ Compute count of group values.
Example:
df = ... # df contains a fields ("foo") with data: [1, 0, 0, 1, 1] # group by on single field, then compute count df.groupby(by = 'foo').count(ddf = ddf) # print dataframe (ddf) data: # # "foo", "count" # ------------- # 0 2 # 1 3
- Parameters
ddf – the destination data frame
write_keys – optional - write groupby keys to ddf only if write_key=True. Default is True.
- Returns
dataframe with count of group values
-
distinct(ddf: exetera.core.abstract_types.DataFrame, write_keys=True) → exetera.core.abstract_types.DataFrame¶ Compute distinct values of a field or a list of field
Example:
df = ... # df contains two fields: # field "foo" with data [1, 0, 0, 1, 1] # field "bar" with data ["b", "b", "a", "a", "b"] # group by on multiple fields, then compute distinct df.groupby(by = ['foo', 'bar']).distinct(ddf = ddf) # print dataframe (ddf) data: # # "foo", "bar" # ------------- # 0 "a" # 0 "b" # 1 "a" # 1 "b"
- Parameters
ddf – the destination data frame
write_keys – optional - write groupby keys to ddf only if write_key=True. Default is True.
- Returns
dataframe with distinct values of a field or a list of field
-
first(target: Union[str, List[str]], ddf: exetera.core.abstract_types.DataFrame, write_keys=True) → exetera.core.abstract_types.DataFrame¶ Get first of group values.
Example:
df = ... # df contains three fields: # field "foo" with data [1, 0, 0, 1, 1] # field "bar" with data ["b", "b", "a", "a", "b"] # field "baz" with data [3.5, 6.0, 4.2, 7.2, 5.5] # group by on multiple fields, then compute first on a single target field df.groupby(by = ['foo', 'bar']).first(target = 'baz', ddf = ddf) # print dataframe (ddf) data: # # "foo", "bar", "baz_first" # ------------------------- # 0 "a" 4.2 # 0 "b" 6.0 # 1 "a" 7.2 # 1 "b" 3.5
- Parameters
target – Name (str) or list of names (str) to get first value.
ddf – the destination data frame
write_keys – optional - write groupby keys to ddf only if write_key=True. Default is True.
- Returns
dataframe with first of group values
-
last(target: Union[str, List[str]], ddf: exetera.core.abstract_types.DataFrame, write_keys=True) → exetera.core.abstract_types.DataFrame¶ Get last of group values.
Example:
df = ... # df contains three fields: # field "foo" with data [1, 0, 0, 1, 1] # field "bar" with data ["b", "b", "a", "a", "b"] # field "baz" with data [3.5, 6.0, 4.2, 7.2, 5.5] # group by on multiple fields, then compute first on a single target field df.groupby(by = ['foo', 'bar']).first(target = 'baz', ddf = ddf) # print dataframe (ddf) data: # # "foo", "bar", "baz_first" # ------------------------- # 0 "a" 4.2 # 0 "b" 6.0 # 1 "a" 7.2 # 1 "b" 5.5
- Parameters
target – Name (str) or list of names (str) to get last value.
ddf – the destination data frame
write_keys – optional - write groupby keys to ddf only if write_key=True. Default is True.
- Returns
dataframe with last of group values
-
max(target: Union[str, List[str]], ddf: exetera.core.abstract_types.DataFrame, write_keys=True) → exetera.core.abstract_types.DataFrame¶ Compute max of group values.
Example:
df = ... # df contains three fields: # field "foo" with data [1, 0, 0, 1, 1] # field "bar" with data ["b", "b", "a", "a", "b"] # field "baz" with data [3.5, 6.0, 4.2, 7.2, 5.5] # group by on a single field, then compute max on multiple target fields df.groupby(by = 'bar').max(target = ['foo','baz'], ddf = ddf) # print dataframe (ddf) data: # # "bar", "foo_max", "baz_max" # --------------------------- # "a" 1 7.2 # "b" 1 6.0
- Parameters
target – Name (str) or list of names (str) to compute max.
ddf – the destination data frame
write_keys – optional - write groupby keys to ddf only if write_key=True. Default is True.
- Returns
dataframe with max of group values
-
min(target: Union[str, List[str]], ddf: exetera.core.abstract_types.DataFrame, write_keys=True) → exetera.core.abstract_types.DataFrame¶ Compute min of group values.
Example:
df = ... # df contains two fields: # field "foo" with data [1, 0, 0, 1, 1] # field "bar" with data ["b", "b", "a", "a", "b"] # group by on a single field, then compute min on a single target field df.groupby(by = 'bar').min(target = 'foo', ddf = ddf) # print dataframe (ddf) data: # # "bar", "foo_min" # ------------- # "a" 0 # "b" 0
- Parameters
target – Name (str) or list of names (str) to compute min.
ddf – the destination data frame
write_keys – optional - write groupby keys to ddf only if write_key=True. Default is True.
- Returns
dataframe with min of group values
-
-
exetera.core.dataframe.copy(field: exetera.core.abstract_types.Field, ddf: exetera.core.abstract_types.DataFrame, name: str)¶ Copy a field to another dataframe as well as underlying dataset.
Example:
# Copy a field ('foobar') of dataframe (df1) to another dataframe (df2) with new field name ('foo') dataframe.copy(df1['foobar'], df2, 'foo')
- Parameters
field – The source field to copy.
ddf – The destination dataframe to copy to.
name – The name of field under destination dataframe.
-
exetera.core.dataframe.merge(left: exetera.core.abstract_types.DataFrame, right: exetera.core.abstract_types.DataFrame, dest: exetera.core.abstract_types.DataFrame, left_on: Union[Tuple[Union[str, exetera.core.abstract_types.Field]], str, exetera.core.abstract_types.Field], right_on: Union[Tuple[Union[str, exetera.core.abstract_types.Field]], str, exetera.core.abstract_types.Field], left_fields: Optional[Sequence[str]] = None, right_fields: Optional[Sequence[str]] = None, left_suffix: str = '_l', right_suffix: str = '_r', how='left', hint_left_keys_ordered: Optional[bool] = None, hint_left_keys_unique: Optional[bool] = None, hint_right_keys_ordered: Optional[bool] = None, hint_right_keys_unique: Optional[bool] = None, chunk_size=1048576)¶ Merge ‘left’ and ‘right’ DataFrames into a destination dataset. The merge is a database-style join operation, in any of the following modes (“left”, “right”, “inner”, “outer”). This method closely follows the Pandas ‘merge’ functionality.
The join is performed using the fields specified by ‘left_on’ and ‘right_on’; these can either be strings or fields; if they strings then they refer to fields that must exist in the corresponding dataframe.
You can optionally set ‘left_fields’ and / or ‘right_fields’ if you want to have only a subset of fields joined from the left and right dataframes. If you don’t want any fields to be joined from a given dataframe, you can pass an empty list.
Fields are written to the destination dataframe. If the field names clash, they will get appended with the strings specified in ‘left_suffix’ and ‘right_suffix’ respectively.
- Parameters
left – The left dataframe
right – The right dataframe
dest – The destination dataframe
left_on – The field corresponding to the left key used to perform the join. This is either the the name of the field, or a field object. If it is a field object, it can be from another dataframe but it must be the same length as the fields being joined. This can also be a tuple of such values when performing joins on compound keys
right_on – The field corresponding to the right key used to perform the join. This is either the name of the field, or a field object. If it is a field object, it can be from another dataframe but it must be the same length as the fields being joined. This can also be a tuple of such values when performing joins on compound keys
left_fields – Optional parameter listing which fields are to be joined from the left table. If this is not set, all fields from the left table are joined
right_fields – Optional parameter listing which fields are to be joined from the right table. If this is not set, all fields from the right table are joined
left_suffix – A string to be appended to fields from the left table if they clash with fields from the right table.
right_suffix – A string to be appended to fields from the right table if they clash with fields from the left table.
how – Optional parameter specifying the merge mode. It must be one of (‘left’, ‘right’, ‘inner’, ‘outer’ or ‘cross). If not set, the ‘left’ join is performed.
-
exetera.core.dataframe.move(field: exetera.core.abstract_types.Field, ddf: exetera.core.abstract_types.DataFrame, name: str)¶ Move a field to another dataframe as well as underlying dataset.
Example:
# Move a field ('foobar') of dataframe (df1) to another dataframe (df2) with new field name ('foo') dataframe.move(df1['foobar'], df2, 'foo')
- Parameters
src_df – The source dataframe where the field is located.
field – The field to move.
ddf – The destination dataframe to move to.
name – The name of field under destination dataframe.
exetera.core.exporter module¶
exetera.core.fields module¶
-
class
exetera.core.fields.CategoricalField(session, group, dataframe, write_enabled=False)¶ Bases:
exetera.core.fields.HDF5Field-
apply_filter(filter_to_apply, target=None, in_place=False)¶ Apply a boolean filter to this field. This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the filtered data is written to.
Example:
field = ... # field contains data [1, 2, 3, 4, 0, 5, 6, 7, 8] filter_to_apply = np.array([0, 2, 0, 1, 0, 1, 0, 1, 0]) field.apply_filter(filter_to_apply, in_place=True) field.data[:] # prints [2, 4, 5, 7]
- Parameters
filter_to_apply – a Field or numpy array that contains the boolean filter data
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The filtered field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_index(index_to_apply, target=None, in_place=False)¶ Apply an index to this field. This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
Example:
field = ... # field contains data [1, 2, 3, 4, 0, 5, 6, 7, 8] index_to_apply = np.array([8, 0, 7, 1, 6, 2, 5, 3, 4], dtype=np.int32) field.apply_index(index_to_apply, target_field) target_field.data[:] # [8, 1, 7, 2, 6, 3, 5, 4, 0]
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The reindexed field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_first(spans_to_apply, target=None, in_place=False)¶ Apply spans (first). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_last(spans_to_apply, target=None, in_place=False)¶ Apply spans (last). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_max(spans_to_apply, target=None, in_place=False)¶ Apply spans (max). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_min(spans_to_apply, target=None, in_place=False)¶ Apply spans (min). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
create_like(group=None, name=None, timestamp=None)¶ Creates a empty field of same type as this.
- Parameters
group – h5group
name – Name of new the field
timestamp – optional - If set, the timestamp that should be given to the new field.
- Returns
Indexed string field
-
property
data¶ Get data.
-
get_spans()¶ Get spans of field.
-
is_sorted()¶ Returns if data in field is sorted :return: bool
-
isin(test_elements: Union[list, set, numpy.ndarray])¶ Returns a boolean array of the same length as field that is True where an element of field is in test_elements and False otherwise.
- Parameters
test_elements – The values against which to test each value of field.
- Returns
a boolean array of the same length as field
-
property
keys¶ Get keys.
-
property
nformat¶ Get numeric format.
-
remap(key_map, new_key)¶ Remap the key names and key values.
- Parameters
key_map – The mapping rule of convert the old key into the new key.
new_key – The new key.
- Returns
A CategoricalMemField with the new key.
Example:
cat_field = df.create_categorical('cat', 'int32', {"a": 1, "b": 2}) cat_field.data.write([1,2,1,2]) newfield = cat_field.remap([(1, 4), (2, 5)], {"a": 4, "b": 5}) print(newfield.data[:])
-
unique(return_index=False, return_inverse=False, return_counts=False)¶ Find the unique elements of a CategoricalField. Returns the sorted unique elements of a CategoricalField. There are three optional outputs in addition to the unique elements: (1) the indices of the input array that give the unique values (2) the indices of the unique array that reconstruct the input array (3) the number of times each unique value comes up in the input array
- Parameters
return_index – boolean, if true returns index of unique elements
return_inverse – boolean, if true returns result in reverse
return_counts – boolean, if true returns counts of unique elements
- Returns
numpy array
-
writeable()¶ Indicates whether this field permits write operations. By default, dataframe fields are read-only in order to protect accidental writes to datasets
-
-
class
exetera.core.fields.CategoricalMemField(session, nformat, keys)¶ Bases:
exetera.core.fields.MemoryField-
apply_filter(filter_to_apply, target=None, in_place=False)¶ Apply a boolean filter to this field. This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the filtered data is written to.
Example:
field = ... # field contains data [1, 2, 3, 4, 0, 5, 6, 7, 8] filter_to_apply = np.array([0, 2, 0, 1, 0, 1, 0, 1, 0]) field.apply_filter(filter_to_apply, in_place=True) field.data[:] # prints [2, 4, 5, 7]
- Parameters
filter_to_apply – a Field or numpy array that contains the boolean filter data
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The filtered field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_index(index_to_apply, target=None, in_place=False)¶ Apply an index to this field. This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
Example:
field = ... # field contains data [1, 2, 3, 4, 0, 5, 6, 7, 8] index_to_apply = np.array([8, 0, 7, 1, 6, 2, 5, 3, 4], dtype=np.int32) field.apply_index(index_to_apply, target_field) target_field.data[:] # [8, 1, 7, 2, 6, 3, 5, 4, 0]
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The reindexed field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_first(spans_to_apply, target=None, in_place=False)¶ Apply spans (first). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_last(spans_to_apply, target=None, in_place=False)¶ Apply spans (last). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_max(spans_to_apply, target=None, in_place=False)¶ Apply spans (max). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_min(spans_to_apply, target=None, in_place=False)¶ Apply spans (min). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
create_like(group=None, name=None, timestamp=None)¶ Creates a empty field of same type as this.
- Parameters
group – h5group
name – Name of new the field
timestamp – optional - If set, the timestamp that should be given to the new field.
- Returns
Indexed string field
-
property
data¶ Returns memory field array with values from this field :return: MemoryFieldArray
-
get_spans()¶ Get spans of field.
-
is_sorted()¶ Returns if data in field is sorted :return: bool
-
isin(test_elements: Union[list, set, numpy.ndarray])¶ Returns a boolean array of the same length as field that is True where an element of field is in test_elements and False otherwise.
- Parameters
test_elements – The values against which to test each value of field.
- Returns
a boolean array of the same length as field
-
property
keys¶ Get keys.
-
remap(key_map, new_key)¶ Remap the key names and key values.
- Parameters
key_map – The mapping rule of convert the old key into the new key.
new_key – The new key.
- Returns
A CategoricalMemField with the new key.
Example:
cat_field = df.create_categorical('cat', 'int32', {"a": 1, "b": 2}) cat_field.data.write([1,2,1,2]) newfield = cat_field.remap([(1, 4), (2, 5)], {"a": 4, "b": 5}) print(newfield.data[:]) # [4,5,4,5]
-
unique(return_index=False, return_inverse=False, return_counts=False)¶ Find the unique elements of a CategoricalMemField. Returns the sorted unique elements of a CategoricalMemField. There are three optional outputs in addition to the unique elements: (1) the indices of the input array that give the unique values (2) the indices of the unique array that reconstruct the input array (3) the number of times each unique value comes up in the input array
- Parameters
return_index – boolean, if true returns index of unique elements
return_inverse – boolean, if true returns result in reverse
return_counts – boolean, if true returns counts of unique elements
- Returns
numpy array
-
writeable()¶ Indicates whether this field permits write operations. By default, dataframe fields are read-only in order to protect accidental writes to datasets
-
-
class
exetera.core.fields.FieldDataOps¶ Bases:
object-
static
apply_filter_to_field(source, filter_to_apply, target=None, in_place=False)¶ Apply filter to field, either in place (itself) or a target (new) field
- Parameters
source – Field
filter_to_apply – a Field or numpy array that contains the indices to filter
target – Optional, Field, if set create a field like as target
in_place – optional, bool, if set changes data in field
- Returns
Field with filter applied
-
static
apply_filter_to_indexed_field(source, filter_to_apply, target=None, in_place=False)¶
-
static
apply_index_to_field(source, index_to_apply, target=None, in_place=False)¶ Apply index to field, either in place (itself) or a target (new) field
- Parameters
source – Field
index_to_apply – a Field or numpy array that contains the indices
target – Optional, Field, if set create a field like as target
in_place – bool, if set changes data in field
- Returns
Field with index
-
static
apply_index_to_indexed_field(source, index_to_apply, target=None, in_place=False)¶
-
static
apply_isin(source: exetera.core.abstract_types.Field, test_elements: Union[list, set, numpy.ndarray])¶ Apply isin operation for elements on Field
- Parameters
source – Field
test_elements – list, set or ndarray
- Returns
bool
-
static
apply_spans_first(source: exetera.core.abstract_types.Field, spans: Union[exetera.core.abstract_types.Field, numpy.ndarray], target: Optional[exetera.core.abstract_types.Field] = None, in_place: bool = None) → exetera.core.abstract_types.Field¶ Apply spans first, either in place (itself) or a target (new) field
- Parameters
source – Field
spans – Field or ndarray
target – Optional, Field, if set create a field like as target
in_place – bool, if set changes data in field
- Returns
Field
-
static
apply_spans_last(source: exetera.core.abstract_types.Field, spans: Union[exetera.core.abstract_types.Field, numpy.ndarray], target: Optional[exetera.core.abstract_types.Field] = None, in_place: bool = None) → exetera.core.abstract_types.Field¶ Apply spans last, either in place (itself) or a target (new) field
- Parameters
source – Field
spans – Field or ndarray
target – Optional, Field, if set create a field like as target
in_place – bool, if set changes data in field
- Returns
Field
-
static
apply_spans_max(source: exetera.core.abstract_types.Field, spans: Union[exetera.core.abstract_types.Field, numpy.ndarray], target: Optional[exetera.core.abstract_types.Field] = None, in_place: bool = None) → exetera.core.abstract_types.Field¶ Apply spans max, either in place (itself) or a target (new) field
- Parameters
source – Field
spans – Field or ndarray
target – Optional, Field, if set create a field like as target
in_place – bool, if set changes data in field
- Returns
Field
-
static
apply_spans_min(source: exetera.core.abstract_types.Field, spans: Union[exetera.core.abstract_types.Field, numpy.ndarray], target: Optional[exetera.core.abstract_types.Field] = None, in_place: bool = None) → exetera.core.abstract_types.Field¶ Apply spans min, either in place (itself) or a target (new) field
- Parameters
source – Field
spans – Field or ndarray
target – Optional, Field, if set create a field like as target
in_place – bool, if set changes data in field
- Returns
Field
-
static
apply_unique(src: exetera.core.abstract_types.Field, return_index=False, return_inverse=False, return_counts=False) → numpy.ndarray¶ Find unique elements in field. Returns the sorted unique elements of a field. There are three optional outputs in addition to the unique elements: (1) the indices of the input array that give the unique values (2) the indices of the unique array that reconstruct the input array (3) the number of times each unique value comes up in the input array
- Parameters
src – Field
return_index – boolean, if true returns index of unique elements
return_inverse – boolean, if true returns result in reverse
return_counts – boolean, if true returns counts of unique elements
- Returns
numpy array
-
static
categorical_field_create_like(source, group, name, timestamp)¶ - Parameters
group – h5py group
name – str
timestamp – timestamp
- Returns
CategoricalField or CategoricalMemField
-
classmethod
equal(session, first, second)¶
-
static
fixed_string_field_create_like(source, group, name, timestamp)¶ - Parameters
group – h5py group
name – str
timestamp – TimestampField, see https://github.com/KCL-BMEIS/ExeTera/wiki/Datatypes#timestampfield
- Returns
FixedStringField or FixedStringMemField
-
classmethod
greater_than(session, first, second)¶
-
classmethod
greater_than_equal(session, first, second)¶
-
static
indexed_string_create_like(source, group, name, timestamp)¶ - Parameters
group – h5py group
name – Name of indexed string field
timestamp – timestamp, see: https://github.com/KCL-BMEIS/ExeTera/wiki/Datatypes#timestampfield
- Returns
Indexed string field
-
classmethod
invert(session, first)¶
-
classmethod
less_than(session, first, second)¶
-
classmethod
less_than_equal(session, first, second)¶
-
classmethod
logical_not(session, first)¶
-
classmethod
not_equal(session, first, second)¶
-
classmethod
numeric_add(session, first, second)¶
-
classmethod
numeric_and(session, first, second)¶
-
classmethod
numeric_divmod(session, first, second)¶
-
static
numeric_field_create_like(source, group, name, timestamp)¶ - Parameters
group – h5py group
name – str
timestamp – TimestampField, see https://github.com/KCL-BMEIS/ExeTera/wiki/Datatypes#timestampfield
- Returns
NumericField or NumericMemField
-
classmethod
numeric_floordiv(session, first, second)¶
-
classmethod
numeric_mod(session, first, second)¶
-
classmethod
numeric_mul(session, first, second)¶
-
classmethod
numeric_or(session, first, second)¶
-
classmethod
numeric_sub(session, first, second)¶
-
classmethod
numeric_truediv(session, first, second)¶
-
classmethod
numeric_xor(session, first, second)¶
-
static
timestamp_field_create_like(source, group, name, timestamp)¶ - Parameters
group – h5py group
name – str
timestamp – TimestampField, see https://github.com/KCL-BMEIS/ExeTera/wiki/Datatypes#timestampfield
- Returns
TimestampField, see https://github.com/KCL-BMEIS/ExeTera/wiki/Datatypes#timestampfield
-
static
-
class
exetera.core.fields.FixedStringField(session, group, dataframe, write_enabled=False)¶ Bases:
exetera.core.fields.HDF5Field-
apply_filter(filter_to_apply, target=None, in_place=False)¶ Apply a boolean filter to this field. This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the filtered data is written to.
Example:
field = ... # field contains data ['a', 'b', 'c', 'd', '', 'e', 'f', 'g', 'h'] filter_to_apply = np.array([0, 2, 0, 1, 0, 1, 0, 1, 0]) field.apply_filter(filter_to_apply, target_field) target_field.data[:] # prints ['b', 'd', 'e', 'g']
- Parameters
filter_to_apply – a Field or numpy array that contains the boolean filter data
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The filtered field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_index(index_to_apply, target=None, in_place=False)¶ Apply an index to this field. This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
Example:
field = ... # field contains data ['a', 'b', 'c', 'd', '', 'e', 'f', 'g', 'h'] index_to_apply = np.array([8, 0, 7, 1, 6, 2, 5, 3, 4], dtype=np.int32) field.apply_index(index_to_apply, target_field) target_field.data[:] # ['h', 'a', 'g', 'b', 'f', 'c', 'e', 'd', '']
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The reindexed field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_first(spans_to_apply, target=None, in_place=False)¶ Apply spans (first). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_last(spans_to_apply, target=None, in_place=False)¶ Apply spans (last). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_max(spans_to_apply, target=None, in_place=False)¶ Apply spans (max). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_min(spans_to_apply, target=None, in_place=False)¶ Apply spans (min). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
create_like(group=None, name=None, timestamp=None)¶ Creates a empty field of same type as this.
- Parameters
group – h5group
name – Name of new the field
timestamp – optional - If set, the timestamp that should be given to the new field.
- Returns
Indexed string field
-
property
data¶ Get data.
-
get_spans()¶ Get spans of field.
-
is_sorted()¶ Returns if data in field is sorted :return: bool
-
isin(test_elements: Union[list, set, numpy.ndarray])¶ Returns a boolean array of the same length as field that is True where an element of field is in test_elements and False otherwise.
- Parameters
test_elements – The values against which to test each value of field.
- Returns
a boolean array of the same length as field
-
unique(return_index=False, return_inverse=False, return_counts=False)¶ Find the unique elements of a FixedStringField. Returns the sorted unique elements of a FixedStringField. There are three optional outputs in addition to the unique elements: (1) the indices of the input array that give the unique values (2) the indices of the unique array that reconstruct the input array (3) the number of times each unique value comes up in the input array
- Parameters
return_index – boolean, if true returns index of unique elements
return_inverse – boolean, if true returns result in reverse
return_counts – boolean, if true returns counts of unique elements
- Returns
numpy array
-
writeable()¶ Indicates whether this field permits write operations. By default, dataframe fields are read-only in order to protect accidental writes to datasets
-
-
class
exetera.core.fields.FixedStringMemField(session, length)¶ Bases:
exetera.core.fields.MemoryField-
apply_filter(filter_to_apply, target=None, in_place=False)¶ Apply a boolean filter to this field. This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the filtered data is written to.
Example:
field = ... # field contains data ['a', 'b', 'c', 'd', '', 'e', 'f', 'g', 'h'] filter_to_apply = np.array([0, 2, 0, 1, 0, 1, 0, 1, 0]) field.apply_filter(filter_to_apply, target_field) target_field.data[:] # prints ['b', 'd', 'e', 'g']
- Parameters
filter_to_apply – a Field or numpy array that contains the boolean filter data
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The filtered field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_index(index_to_apply, target=None, in_place=False)¶ Apply an index to this field. This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
Example:
field = ... # field contains data ['a', 'b', 'c', 'd', '', 'e', 'f', 'g', 'h'] index_to_apply = np.array([8, 0, 7, 1, 6, 2, 5, 3, 4], dtype=np.int32) field.apply_index(index_to_apply, target_field) target_field.data[:] # ['h', 'a', 'g', 'b', 'f', 'c', 'e', 'd', '']
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The reindexed field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_first(spans_to_apply, target=None, in_place=False)¶ Apply spans (first). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None.
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_last(spans_to_apply, target=None, in_place=False)¶ Apply spans (last). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_max(spans_to_apply, target=None, in_place=False)¶ Apply spans (max). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_min(spans_to_apply, target=None, in_place=False)¶ Apply spans (min). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
create_like(group=None, name=None, timestamp=None)¶ Creates a empty field of same type as this.
- Parameters
group – h5group
name – Name of new the field
timestamp – optional - If set, the timestamp that should be given to the new field.
- Returns
Indexed string field
-
property
data¶ Returns moemory field array with values from this field :return: MemoryFieldArray
-
get_spans()¶ Get spans of field :return: Spans of field
-
is_sorted()¶ Returns if data in field is sorted :return: bool
-
isin(test_elements: Union[list, set, numpy.ndarray])¶ Returns a boolean array of the same length as field that is True where an element of field is in test_elements and False otherwise.
- Parameters
test_elements – The values against which to test each value of field.
- Returns
a boolean array of the same length as field
-
unique(return_index=False, return_inverse=False, return_counts=False)¶ Find the unique elements of a FixedStringMemField. Returns the sorted unique elements of a FixedStringMemField. There are three optional outputs in addition to the unique elements: (1) the indices of the input array that give the unique values (2) the indices of the unique array that reconstruct the input array (3) the number of times each unique value comes up in the input array
- Parameters
return_index – boolean, if true returns index of unique elements
return_inverse – boolean, if true returns result in reverse
return_counts – boolean, if true returns counts of unique elements
- Returns
numpy array
-
writeable()¶ Indicates whether this field permits write operations. By default, dataframe fields are read-only in order to protect accidental writes to datasets
-
-
class
exetera.core.fields.HDF5Field(session, group, dataframe, write_enabled=False)¶ Bases:
exetera.core.abstract_types.Field-
apply_filter(filter_to_apply, dstfld=None)¶ Apply filter on the field.
-
apply_index(index_to_apply, dstfld=None)¶ Apply index on the field.
-
property
chunksize¶ The chunksize for the field. This is not generally required for users, and may be ignored depending on the storage medium.
-
property
dataframe¶ The owning dataframe of this field, or None if the field is not owned by a dataframe :return str or None
-
get_spans()¶ Get spans of the field.
-
property
indexed¶ Whether the field is an indexed field or not. Indexed fields store their data internally as index and value arrays for efficiency, as well as making it accessible through the data property.
-
property
name¶ The name of the field within a dataframe, if the field belongs to a dataframe :return: str
-
property
timestamp¶ The timestamp representing the field creation time. This is the time at which the data for this field was added to the dataset, rather than the point at which the field wrapper was created. :return: timestamp
-
property
valid¶ Returns whether the field is a valid field object. Fields can become invalid as a result of certain operations, such as a field being moved from one dataframe to another. A field that is invalid with throw exceptions if any other operation is performed on them. :return: bool
-
-
class
exetera.core.fields.IndexedStringField(session, group, dataframe, write_enabled=False)¶ Bases:
exetera.core.fields.HDF5Field-
apply_filter(filter_to_apply, target=None, in_place=False)¶ Apply a boolean filter to this field. This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the filtered data is written to.
Example:
field = ... # field contains data ['a', 'bb', 'ccc', 'dddd', '', 'eeee', 'fff', 'gg', 'h'] filter_to_apply = np.array([0, 2, 0, 1, 0, 1, 0, 1, 0]) field.apply_filter(filter_to_apply, target_field) target_field.data[:] # prints ['bb', 'dddd', 'eeee', 'gg']
- Parameters
filter_to_apply – a Field or numpy array that contains the boolean filter data
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The filtered field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_index(index_to_apply, target=None, in_place=False)¶ Apply an index to this field. This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
Example:
field = ... # field contains data ['a', 'bb', 'ccc', 'dddd', '', 'eeee', 'fff', 'gg', 'h'] index_to_apply = np.array([8, 0, 7, 1, 6, 2, 5, 3, 4], dtype=np.int32) field.apply_index(index_to_apply, target_field) target_field.data[:] # ['h', 'a', 'gg', 'bb', 'fff', 'ccc', 'eeee', 'dddd', '']
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The reindexed field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_first(spans_to_apply, target=None, in_place=False)¶ Apply spans (first). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
Example:
field = ... # field contains data ['a', 'bb', 'ccc', 'dddd', '', 'eeee', 'fff', 'gg', 'h'] spans_to_apply = np.array([0, 2, 3, 6, 8], dtype=np.int32) field.apply_spans_first(spans_to_apply, target_field) target_field.data[:] # ['a', 'ccc', 'dddd', 'gg']
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_last(spans_to_apply, target=None, in_place=False)¶ Apply spans (last). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
Example:
field = ... # field contains data ['a', 'bb', 'ccc', 'dddd', '', 'eeee', 'fff', 'gg', 'h'] spans_to_apply = np.array([0, 2, 3, 6, 8], dtype=np.int32) field.apply_spans_last(spans_to_apply, target_field) target_field.data[:] # ['bb', 'ccc', 'fff', 'h']
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_max(spans_to_apply, target=None, in_place=False)¶ Apply spans (max). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
Example:
field = ... # field contains data ['a', 'bb', 'ccc', 'dddd', '', 'eeee', 'fff', 'gg', 'h'] spans_to_apply = np.array([0, 2, 3, 6, 8], dtype=np.int32) field.apply_spans_max(spans_to_apply, in_place=True) field.data[:] # ['bb', 'ccc', 'fff', 'h']
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_min(spans_to_apply, target=None, in_place=False)¶ Apply spans (min). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
Example:
field = ... # field contains data ['a', 'bb', 'ccc', 'dddd', '', 'eeee', 'fff', 'gg', 'h'] spans_to_apply = np.array([0, 2, 3, 6, 8], dtype=np.int32) field.apply_spans_min(spans_to_apply, in_place=True) field.data[:] # ['a', 'ccc', 'dddd', 'gg']
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
create_like(group=None, name=None, timestamp=None)¶ Creates a empty field of same type as this.
- Parameters
group – h5group
name – Name of new the field
timestamp – optional - If set, the timestamp that should be given to the new field.
- Returns
Indexed string field
-
property
data¶ Returns indexes writable field array with values of field :return: WriteableIndexedFieldArray
-
get_spans()¶ Get spans of field
-
property
indexed¶ Whether the field is an indexed field or not. Indexed fields store their data internally as index and value arrays for efficiency, as well as making it accessible through the data property.
-
property
indices¶ Get indices.
-
is_sorted()¶ Returns if data in field is sorted :return: bool
-
isin(test_elements: Union[list, set, numpy.ndarray])¶ Returns a boolean array of the same length as field that is True where an element of field is in test_elements and False otherwise.
- Parameters
test_elements – The values against which to test each value of field.
- Returns
a boolean array of the same length as field
-
unique(return_index=False, return_inverse=False, return_counts=False)¶ Find the unique elements of an IndexedStringField. Returns the sorted unique elements of an IndexedStringField. There are three optional outputs in addition to the unique elements: (1) the indices of the input array that give the unique values (2) the indices of the unique array that reconstruct the input array (3) the number of times each unique value comes up in the input array
- Parameters
return_index – boolean, if true returns index of unique elements
return_inverse – boolean, if true returns result in reverse
return_counts – boolean, if true returns counts of unique elements
- Returns
numpy array
-
property
values¶ Get values.
-
writeable()¶ Indicates whether this field permits write operations. By default, dataframe fields are read-only in order to protect accidental writes to datasets
-
-
class
exetera.core.fields.IndexedStringMemField(session, chunksize=None)¶ Bases:
exetera.core.fields.MemoryField-
apply_filter(filter_to_apply, target=None, in_place=False)¶ Apply a boolean filter to this field. This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the filtered data is written to.
Example:
field = ... # field contains data ['a', 'bb', 'ccc', 'dddd', '', 'eeee', 'fff', 'gg', 'h'] filter_to_apply = np.array([0, 2, 0, 1, 0, 1, 0, 1, 0]) field.apply_filter(filter_to_apply, target_field) target_field.data[:] # prints ['bb', 'dddd', 'eeee', 'gg']
- Parameters
filter_to_apply – a Field or numpy array that contains the boolean filter data
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The filtered field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_index(index_to_apply, target=None, in_place=False)¶ Apply an index to this field. This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
Example:
field = ... # field contains data ['a', 'bb', 'ccc', 'dddd', '', 'eeee', 'fff', 'gg', 'h'] index_to_apply = np.array([8, 0, 7, 1, 6, 2, 5, 3, 4], dtype=np.int32) field.apply_index(index_to_apply, target_field) target_field.data[:] # ['h', 'a', 'gg', 'bb', 'fff', 'ccc', 'eeee', 'dddd', '']
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The reindexed field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_first(spans_to_apply, target=None, in_place=False)¶ Apply spans (first). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
Example:
field = ... # field contains data ['a', 'bb', 'ccc', 'dddd', '', 'eeee', 'fff', 'gg', 'h'] spans_to_apply = np.array([0, 2, 3, 6, 8], dtype=np.int32) field.apply_spans_first(spans_to_apply, target_field) target_field.data[:] # ['a', 'ccc', 'dddd', 'gg']
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_last(spans_to_apply, target=None, in_place=False)¶ Apply spans (last). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
Example:
field = ... # field contains data ['a', 'bb', 'ccc', 'dddd', '', 'eeee', 'fff', 'gg', 'h'] spans_to_apply = np.array([0, 2, 3, 6, 8], dtype=np.int32) field.apply_spans_last(spans_to_apply, target_field) target_field.data[:] # ['bb', 'ccc', 'fff', 'h']
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_max(spans_to_apply, target=None, in_place=False)¶ Apply spans (max). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
Example:
field = ... # field contains data ['a', 'bb', 'ccc', 'dddd', '', 'eeee', 'fff', 'gg', 'h'] spans_to_apply = np.array([0, 2, 3, 6, 8], dtype=np.int32) field.apply_spans_max(spans_to_apply, in_place=True) field.data[:] # ['bb', 'ccc', 'fff', 'h']
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_min(spans_to_apply, target=None, in_place=False)¶ Apply spans (min). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
Example:
field = ... # field contains data ['a', 'bb', 'ccc', 'dddd', '', 'eeee', 'fff', 'gg', 'h'] spans_to_apply = np.array([0, 2, 3, 6, 8], dtype=np.int32) field.apply_spans_min(spans_to_apply, in_place=True) field.data[:] # ['a', 'ccc', 'dddd', 'gg']
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
create_like(group=None, name=None, timestamp=None)¶ Creates a empty field of same type as this.
- Parameters
group – h5group
name – Name of new the field
timestamp – optional - If set, the timestamp that should be given to the new field.
- Returns
Indexed string field
-
property
data¶ Returns indexed writable field array with values from this field :return: WriteableIndexedFieldArray
-
get_spans()¶ - Returns
Span of indices as List
-
property
indexed¶ Whether the field is an indexed field or not. Indexed fields store their data internally as index and value arrays for efficiency, as well as making it accessible through the data property.
-
property
indices¶ Get indices for field :return: MemoryFieldArray(‘int64’)
-
is_sorted()¶ Returns if data in field is sorted :return: bool
-
isin(test_elements: Union[list, set, numpy.ndarray])¶ Returns a boolean array of the same length as field that is True where an element of field is in test_elements and False otherwise.
- Parameters
test_elements – The values against which to test each value of field.
- Returns
a boolean array of the same length as field
-
unique(return_index=False, return_inverse=False, return_counts=False)¶ Find the unique elements of an IndexedStringMemField. Returns the sorted unique elements of an IndexedStringMemField. There are three optional outputs in addition to the unique elements: (1) the indices of the input array that give the unique values (2) the indices of the unique array that reconstruct the input array (3) the number of times each unique value comes up in the input array
- Parameters
return_index – boolean, if true returns index of unique elements
return_inverse – boolean, if true returns result in reverse
return_counts – boolean, if true returns counts of unique elements
- Returns
numpy array
-
property
values¶ Get values for field :return: MemoryFieldArray(‘8’)
-
writeable()¶ Indicates whether this field permits write operations. By default, dataframe fields are read-only in order to protect accidental writes to datasets
-
-
class
exetera.core.fields.MemoryField(session)¶ Bases:
exetera.core.abstract_types.Field-
apply_filter(filter_to_apply, dstfld=None)¶ Apply filter on the field.
-
apply_index(index_to_apply, dstfld=None)¶ Apply index on the field.
-
property
chunksize¶ The chunksize for the field. This is not generally required for users, and may be ignored depending on the storage medium.
-
property
dataframe¶ The owning dataframe of this field, or None if the field is now owned by a dataframe
-
property
indexed¶ Whether the field is an indexed field or not. Indexed fields store their data internally as index and value arrays for efficiency, as well as making it accessible through the data property.
-
property
name¶ The name of the field within a dataframe, if the field belongs to a dataframe :return: str or None
-
property
timestamp¶ The timestamp representing the field creation time. This is the time at which the data for this field was added to the dataset, rather than the point at which the field wrapper was created.
-
property
valid¶ Returns whether the field is a valid field object. Fields can become invalid as a result of certain operations, such as a field being moved from one dataframe to another. A field that is invalid with throw exceptions if any other operation is performed on them. :return: bool
-
-
class
exetera.core.fields.MemoryFieldArray(dtype)¶ Bases:
object-
clear()¶ Set dataset to None :return: None
-
complete()¶ Mark writing completed, usually used after calling write_part.
-
property
dtype¶ - Returns
dtype of field
-
write(part)¶ Writes data to field and marks it as complete.
- Example::
part = np.array([97, 97, 100]) field.write(part)
- Parameters
part – numpy array to write to field
- Returns
None
-
write_part(part, move_mem=False)¶ Writes data part to field, followed by calling complete().
- Example::
part = np.array([97, 97, 100]) field.write_part(part) field.complete()
- Parameters
part – numpy array to written to field
move_mem – boolean, use part provided directly or make copy before writing.
- Returns
None
-
-
class
exetera.core.fields.NumericField(session, group, dataframe, write_enabled=False)¶ Bases:
exetera.core.fields.HDF5Field-
apply_filter(filter_to_apply, target=None, in_place=False)¶ Apply a boolean filter to this field. This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the filtered data is written to.
Example:
field = ... # field contains data [1, 22, 333, 444, 0, 5555, 666, 77, 8] filter_to_apply = np.array([0, 2, 0, 1, 0, 1, 0, 1, 0]) field.apply_filter(filter_to_apply, in_place=True) field.data[:] # prints [22, 444, 5555, 77]
- Parameters
filter_to_apply – a Field or numpy array that contains the boolean filter data
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The filtered field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_index(index_to_apply, target=None, in_place=False)¶ Apply an index to this field. This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
Example:
field = ... # field contains data [1, 22, 333, 444, 0, 5555, 666, 77, 8] index_to_apply = np.array([8, 0, 7, 1, 6, 2, 5, 3, 4], dtype=np.int32) field.apply_index(index_to_apply, target_field) target_field.data[:] # [8, 1, 77, 22, 666, 333, 5555, 444, 0]
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The reindexed field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_first(spans_to_apply, target=None, in_place=False)¶ Apply spans (first). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_last(spans_to_apply, target=None, in_place=False)¶ Apply spans (last). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_max(spans_to_apply, target=None, in_place=False)¶ Apply spans (max). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_min(spans_to_apply, target=None, in_place=False)¶ Apply spans (min). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
astype(dtype: str, casting='unsafe')¶ Convert the field data type to dtype parameter given.
- Parameters
dtype – The new datatype, given as a str object. The dtype must be a subtype of np.number, e.g. int, float, etc.
casting – Similar to the casting parameter in numpy ndarray.astype, can be ‘no’, ‘equiv’, ‘safe’, ‘same_kind’, or ‘unsafe’.
- Returns
The field with new datatype.
-
create_like(group=None, name=None, timestamp=None)¶ Creates a empty field of same type as this.
- Parameters
group – h5group
name – Name of new the field
timestamp – optional - If set, the timestamp that should be given to the new field.
- Returns
Indexed string field
-
property
data¶ Get data.
-
get_spans()¶ Get spans of field.
-
is_sorted()¶ Returns if data in field is sorted :return: bool
-
isin(test_elements: Union[list, set, numpy.ndarray])¶ Returns a boolean array of the same length as field that is True where an element of field is in test_elements and False otherwise.
- Parameters
test_elements – The values against which to test each value of field.
- Returns
a boolean array of the same length as field
-
logical_not()¶
-
unique(return_index=False, return_inverse=False, return_counts=False)¶ Find the unique elements of a NumericField. Returns the sorted unique elements of a NumericField. There are three optional outputs in addition to the unique elements: (1) the indices of the input array that give the unique values (2) the indices of the unique array that reconstruct the input array (3) the number of times each unique value comes up in the input array
- Parameters
return_index – boolean, if true returns index of unique elements
return_inverse – boolean, if true returns result in reverse
return_counts – boolean, if true returns counts of unique elements
- Returns
numpy array
-
writeable()¶ Indicates whether this field permits write operations. By default, dataframe fields are read-only in order to protect accidental writes to datasets
-
-
class
exetera.core.fields.NumericMemField(session, nformat)¶ Bases:
exetera.core.fields.MemoryField-
apply_filter(filter_to_apply, target=None, in_place=False)¶ Apply a boolean filter to this field. This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the filtered data is written to.
Example:
field = ... # field contains data [1, 22, 333, 444, 0, 5555, 666, 77, 8] filter_to_apply = np.array([0, 2, 0, 1, 0, 1, 0, 1, 0]) field.apply_filter(filter_to_apply, in_place=True) field.data[:] # prints [22, 444, 5555, 77]
- Parameters
filter_to_apply – a Field or numpy array that contains the boolean filter data
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The filtered field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_index(index_to_apply, target=None, in_place=False)¶ Apply an index to this field. This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
Example:
field = ... # field contains data [1, 22, 333, 444, 0, 5555, 666, 77, 8] index_to_apply = np.array([8, 0, 7, 1, 6, 2, 5, 3, 4], dtype=np.int32) field.apply_index(index_to_apply, target_field) target_field.data[:] # [8, 1, 77, 22, 666, 333, 5555, 444, 0]
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The reindexed field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_first(spans_to_apply, target=None, in_place=False)¶ Apply spans to (first). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_last(spans_to_apply, target=None, in_place=False)¶ Apply spans (last). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_max(spans_to_apply, target=None, in_place=False)¶ Apply spans (max). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_min(spans_to_apply, target=None, in_place=False)¶ Apply spans (minimum). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
create_like(group=None, name=None, timestamp=None)¶ Creates a empty field of same type as this.
- Parameters
group – h5group
name – Name of new the field
timestamp – optional - If set, the timestamp that should be given to the new field.
- Returns
Indexed string field
-
property
data¶ Returns memory field array with values from this field :return: MemoryFieldArray
-
get_spans()¶ Get spans of field :return: Spans of field
-
is_sorted()¶ Returns if data in field is sorted :return: bool
-
isin(test_elements: Union[list, set, numpy.ndarray])¶ Returns a boolean array of the same length as field that is True where an element of field is in test_elements and False otherwise.
- Parameters
test_elements – The values against which to test each value of field.
- Returns
a boolean array of the same length as field
-
logical_not()¶
-
unique(return_index=False, return_inverse=False, return_counts=False)¶ Find the unique elements of a NumericMemField. Returns the sorted unique elements of a NumericMemField. There are three optional outputs in addition to the unique elements: (1) the indices of the input array that give the unique values (2) the indices of the unique array that reconstruct the input array (3) the number of times each unique value comes up in the input array
- Parameters
return_index – boolean, if true returns index of unique elements
return_inverse – boolean, if true returns result in reverse
return_counts – boolean, if true returns counts of unique elements
- Returns
numpy array
-
writeable()¶ Indicates whether this field permits write operations. By default, dataframe fields are read-only in order to protect accidental writes to datasets
-
-
class
exetera.core.fields.ReadOnlyFieldArray(field, dataset_name)¶ Bases:
object-
clear()¶ Clear Field Array.
-
complete()¶ Mark writing completed, usually used after calling write_part.
-
property
dtype¶ Return datatype of field.
-
write(part)¶ Write data to field.
-
write_part(part)¶ Write data part to field.
-
-
class
exetera.core.fields.ReadOnlyIndexedFieldArray(field, indices, values)¶ Bases:
object-
clear()¶ Clears field array.
-
complete()¶ Mark writing completed, usually used after calling write_part.
-
property
dtype¶ Get datatype of field. Please note constructing a numpy array from IndexedString data can be very memory expensive.
-
write(part)¶ Writes data to field.
-
write_part(part)¶ Writes data part to field.
-
-
class
exetera.core.fields.TimestampField(session, group, dataframe, write_enabled=False)¶ Bases:
exetera.core.fields.HDF5Field-
apply_filter(filter_to_apply, target=None, in_place=False)¶ Apply a boolean filter to this field. This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the filtered data is written to.
Example:
field = ... # field contains data [1, 22, 333, 444, 0, 5555, 666, 77, 8] filter_to_apply = np.array([0, 2, 0, 1, 0, 1, 0, 1, 0]) field.apply_filter(filter_to_apply, in_place=True) field.data[:] # prints [22, 444, 5555, 77]
- Parameters
filter_to_apply – a Field or numpy array that contains the boolean filter data
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The filtered field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_index(index_to_apply, target=None, in_place=False)¶ Apply an index to this field. This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
Example:
field = ... # field contains data [1, 22, 333, 444, 0, 5555, 666, 77, 8] index_to_apply = np.array([8, 0, 7, 1, 6, 2, 5, 3, 4], dtype=np.int32) field.apply_index(index_to_apply, target_field) target_field.data[:] # [8, 1, 77, 22, 666, 333, 5555, 444, 0]
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The reindexed field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_first(spans_to_apply, target=None, in_place=False)¶ Apply spans (first). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_last(spans_to_apply, target=None, in_place=False)¶ Apply spans (last). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_max(spans_to_apply, target=None, in_place=False)¶ Apply spans (max). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_min(spans_to_apply, target=None, in_place=False)¶ Apply spans (min). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
create_like(group=None, name=None, timestamp=None)¶ Creates a empty field of same type as this.
- Parameters
group – h5group
name – Name of new the field
timestamp – optional - If set, the timestamp that should be given to the new field.
- Returns
Indexed string field
-
property
data¶ Get data.
-
get_spans()¶ Get spans of field.
-
is_sorted()¶ Returns if data in field is sorted :return: bool
-
isin(test_elements: Union[list, set, numpy.ndarray])¶ Returns a boolean array of the same length as field that is True where an element of field is in test_elements and False otherwise.
- Parameters
test_elements – The values against which to test each value of field.
- Returns
a boolean array of the same length as field
-
unique(return_index=False, return_inverse=False, return_counts=False)¶ Find the unique elements of a TimestampField. Returns the sorted unique elements of a TimestampField. There are three optional outputs in addition to the unique elements: (1) the indices of the input array that give the unique values (2) the indices of the unique array that reconstruct the input array (3) the number of times each unique value comes up in the input array
- Parameters
return_index – boolean, if true returns index of unique elements
return_inverse – boolean, if true returns result in reverse
return_counts – boolean, if true returns counts of unique elements
- Returns
numpy array
-
writeable()¶ Indicates whether this field permits write operations. By default, dataframe fields are read-only in order to protect accidental writes to datasets
-
-
class
exetera.core.fields.TimestampMemField(session)¶ Bases:
exetera.core.fields.MemoryField-
apply_filter(filter_to_apply, target=None, in_place=False)¶ Apply a boolean filter to this field. This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the filtered data is written to.
Example:
field = ... # field contains data [1, 22, 333, 444, 0, 5555, 666, 77, 8] filter_to_apply = np.array([0, 2, 0, 1, 0, 1, 0, 1, 0]) field.apply_filter(filter_to_apply, in_place=True) field.data[:] # prints [22, 444, 5555, 77]
- Parameters
filter_to_apply – a Field or numpy array that contains the boolean filter data
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The filtered field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_index(index_to_apply, target=None, in_place=False)¶ Apply an index to this field. This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
Example:
field = ... # field contains data [1, 22, 333, 444, 0, 5555, 666, 77, 8] index_to_apply = np.array([8, 0, 7, 1, 6, 2, 5, 3, 4], dtype=np.int32) field.apply_index(index_to_apply, target_field) target_field.data[:] # [8, 1, 77, 22, 666, 333, 5555, 444, 0]
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The reindexed field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_first(spans_to_apply, target=None, in_place=False)¶ Apply spans (first). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_last(spans_to_apply, target=None, in_place=False)¶ Apply spans (last). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_max(spans_to_apply, target=None, in_place=False)¶ Apply spans (max). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
apply_spans_min(spans_to_apply, target=None, in_place=False)¶ Apply spans (min). This operation doesn’t modify the field on which it is called unless ‘in_place is set to true’. The user can specify a ‘target’ field that the reindexed data is written to.
- Parameters
index_to_apply – a Field or numpy array that contains the indices
target – if set, this is the field that is written to. This field must be writable. If ‘target’ is set, ‘in_place’ must be False.
in_place – if True, perform the operation destructively on this field. This field must be writable. If ‘in_place’ is True, ‘target’ must be None
- Returns
The respanned field. This is a new field instance unless ‘target’ is set, in which case it is the target field, or unless ‘in_place’ is True, in which case it is this field.
-
create_like(group=None, name=None, timestamp=None)¶ Creates a empty field of same type as this.
- Parameters
group – h5group
name – Name of new the field
timestamp – optional - If set, the timestamp that should be given to the new field.
- Returns
Indexed string field
-
property
data¶ Returns memory field array with values from this field :return: MemoryFieldArray
-
get_spans()¶ Get spans of field.
-
is_sorted()¶ Returns if data in field is sorted :return: bool
-
isin(test_elements: Union[list, set, numpy.ndarray])¶ Returns a boolean array of the same length as field that is True where an element of field is in test_elements and False otherwise.
- Parameters
test_elements – The values against which to test each value of field.
- Returns
a boolean array of the same length as field
-
unique(return_index=False, return_inverse=False, return_counts=False)¶ Find the unique elements of a TimestampMemField. Returns the sorted unique elements of a TimestampMemField. There are three optional outputs in addition to the unique elements: (1) the indices of the input array that give the unique values (2) the indices of the unique array that reconstruct the input array (3) the number of times each unique value comes up in the input array
- Parameters
return_index – boolean, if true returns index of unique elements
return_inverse – boolean, if true returns result in reverse
return_counts – boolean, if true returns counts of unique elements
- Returns
numpy array
-
writeable()¶ Indicates whether this field permits write operations. By default, dataframe fields are read-only in order to protect accidental writes to datasets
-
-
class
exetera.core.fields.WriteableFieldArray(field, dataset_name)¶ Bases:
object-
clear()¶ Replaces current dataset with empty dataset. :return: None
-
complete()¶ Mark writing completed, usually used after calling write_part.
- Example::
field.write_part(part) field.complete()
- Returns
None
-
property
dtype¶ Returns the datatype for the dataset :return: dtype
-
write(part)¶ Writes data to field and marks it as complete.
- Example::
part = np.array([97, 97, 100]) field.write(part)
- Parameters
part – numpy array to write to field
- Returns
None
-
write_part(part)¶ Writes data part to field, followed by calling complete().
- Example::
part = np.array([97, 97, 100]) field.write_part(part) field.complete()
- Parameters
part – numpy array to write to field
- Returns
None
-
-
class
exetera.core.fields.WriteableIndexedFieldArray(chunksize, indices, values)¶ Bases:
object-
clear()¶ Resets field, clears all indices and values.
- Returns
None
-
complete()¶ Mark writing completed, usually used after calling write_part.
- Example::
field.write_part(part) field.complete()
- Returns
None
-
property
dtype¶ Returns datatype of field. Please note constructing a numpy array from IndexedString data can be very memory expensive. :return: dtype
-
write(part)¶ Writes data to field and marks it as complete.
- Example::
part = np.array([97, 97, 100]) field.write(part)
- Parameters
part – List of strings to write to field
- Returns
None
-
write_part(part)¶ Writes data part to field, followed by calling complete().
- Example::
part = np.array([97, 97, 100]) field.write_part(part) field.complete()
- Parameters
part – List of strings to be written
- Returns
None
-
-
exetera.core.fields.argsort(field: exetera.core.abstract_types.Field, dtype: str = None)¶
-
exetera.core.fields.as_field(data, key=None)¶
-
exetera.core.fields.base_field_contructor(session, group, name, timestamp=None, chunksize=None)¶ Constructor are for 1)create the field (hdf5 group), 2)add basic attributes like chunksize, timestamp, field type, and 3)add the dataset to the field (hdf5 group) under the name ‘values’
-
exetera.core.fields.categorical_field_constructor(session, group, name, nformat, key, timestamp=None, chunksize=None)¶
-
exetera.core.fields.dtype_to_str(dtype)¶ Returns string name for given data type :param dtype: given data type :return: str
-
exetera.core.fields.fixed_string_field_constructor(session, group, name, length, timestamp=None, chunksize=None)¶
-
exetera.core.fields.indexed_string_field_constructor(session, group, name, timestamp=None, chunksize=None)¶
-
exetera.core.fields.isin(field: exetera.core.abstract_types.Field, test_elements: Union[list, set, numpy.ndarray])¶ Returns a boolean array of the same length as field that is True where an element of field is in test_elements and False otherwise.
- Parameters
field – The field to check.
test_elements – The values against which to test each value of field.
- Returns
a boolean array of the same length as field
-
exetera.core.fields.numeric_field_constructor(session, group, name, nformat, timestamp=None, chunksize=None)¶
-
exetera.core.fields.timestamp_field_constructor(session, group, name, timestamp=None, chunksize=None)¶
exetera.core.journal module¶
-
exetera.core.journal.journal_table(session, schema, old_src, new_src, src_pk, result)¶
-
exetera.core.journal.journal_test_harness(session, schema, old_file, new_file, dest_file)¶
exetera.core.operations module¶
-
exetera.core.operations.apply_filter_to_index_values(index_filter, indices, values)¶
-
exetera.core.operations.apply_indices_to_index_values(indices_to_apply, indices, values)¶
-
exetera.core.operations.apply_spans_count(spans, dest_array=None)¶
-
exetera.core.operations.apply_spans_first(spans, src_array, dest_array=None)¶
-
exetera.core.operations.apply_spans_index_of_first(spans, dest_array=None)¶
-
exetera.core.operations.apply_spans_index_of_first_filter(spans, dest_array, filter_array)¶
-
exetera.core.operations.apply_spans_index_of_last(spans, dest_array=None)¶
-
exetera.core.operations.apply_spans_index_of_last_filter(spans, dest_array, filter_array)¶
-
exetera.core.operations.apply_spans_index_of_max(spans, src_array, dest_array=None)¶
-
exetera.core.operations.apply_spans_index_of_max_filter(spans, src_array, dest_array, filter_array)¶
-
exetera.core.operations.apply_spans_index_of_max_indexed(spans, src_indices, src_values, dest_array=None)¶
-
exetera.core.operations.apply_spans_index_of_min(spans, src_array, dest_array=None)¶
-
exetera.core.operations.apply_spans_index_of_min_filter(spans, src_array, dest_array, filter_array)¶
-
exetera.core.operations.apply_spans_index_of_min_indexed(spans, src_indices, src_values, dest_array=None)¶
-
exetera.core.operations.apply_spans_last(spans, src_array, dest_array=None)¶
-
exetera.core.operations.apply_spans_max(spans, src_array, dest_array=None)¶
-
exetera.core.operations.apply_spans_min(spans, src_array, dest_array=None)¶
-
exetera.core.operations.calculate_chunk_decomposition(s_start, s_end, indices, value_chunk_size, sub_chunks)¶
-
exetera.core.operations.categorical_transform(chunk, i_c, column_inds, column_vals, column_offsets, cat_keys, cat_index, cat_values)¶ Transform method for categorical importer in readerwriter.py
-
exetera.core.operations.check_if_sorted_for_multi_fields(fields_data)¶ Check if input fields data is sorted. Note that fields_data should be treat as a group key
pre_row[j] < cur_row[j], means these two rows are sorted, move to next row => i + 1 pre_row[j] = cur_row[j], means we need to check if next element is sorted => j + 1 pre_row[j] > cur_row[j], means input data is not sorted
-
exetera.core.operations.chunked_copy(src_field, dest_field, chunksize=1048576)¶
-
exetera.core.operations.chunks(length, chunksize=1048576)¶
-
exetera.core.operations.compare_arrays(source[s1: s2], target[t1: t2])¶
-
exetera.core.operations.compare_indexed_rows_for_journalling(old_map, new_map, old_indices, old_values, new_indices, new_values, to_keep)¶
-
exetera.core.operations.compare_rows_for_journalling(old_map, new_map, old_field, new_field, to_keep)¶
-
exetera.core.operations.count_back(array)¶ This is a helper function that provides functionality specific to streaming ordered merges. It takes an array in sorted order and calculates a trimmed length that excludes the final sequence of equal values: Example:
[10, 20, 30, 40, 50] -> 4 ([10, 20, 30, 40]) [10, 20, 30, 40, 40] -> 3 ([10, 20, 30]) [10, 20, 30, 30, 30] -> 2 ([10, 20]) [10, 20, 20, 20, 20] -> 1 ([10])
-
exetera.core.operations.data_iterator(data_field, chunksize=1048576)¶
-
exetera.core.operations.element_chunked_copy(src_elem, dest_elem, chunksize)¶
-
exetera.core.operations.filter_duplicate_fields(field)¶ DEPRECATED
-
exetera.core.operations.first_trimmed_chunk(field, chunk_size)¶
-
exetera.core.operations.first_untrimmed_chunk(field, chunk_size)¶
-
exetera.core.operations.fixed_string_transform(column_inds, column_vals, column_offsets, col_idx, written_row_count, strlen, memory)¶ Transform method for fixed string importer in field_importer.py
-
exetera.core.operations.foreign_key_is_in_primary_key(primary_key, foreign_key)¶ DEPRECATED
-
exetera.core.operations.generate_ordered_map_to_inner_both_unique_partial(left, i_max, right, j_max, l_result, r_result, i_off, j_off, i, j, r)¶
-
exetera.core.operations.generate_ordered_map_to_inner_both_unique_streamed(left: exetera.core.abstract_types.Field, right: exetera.core.abstract_types.Field, l_result: exetera.core.abstract_types.Field, r_result: exetera.core.abstract_types.Field, invalid: Union[numpy.int32, numpy.int64], chunksize: Optional[int] = 1048576, rdtype=<class 'numpy.int32'>)¶
-
exetera.core.operations.generate_ordered_map_to_inner_left_unique_partial(left, i_max, right, j_max, l_result, r_result, i_off, j_off, i, j, r)¶
-
exetera.core.operations.generate_ordered_map_to_inner_left_unique_streamed(left: exetera.core.abstract_types.Field, right: exetera.core.abstract_types.Field, l_result: exetera.core.abstract_types.Field, r_result: exetera.core.abstract_types.Field, invalid: Union[numpy.int32, numpy.int64], chunksize: Optional[int] = 1048576, rdtype=<class 'numpy.int32'>)¶
-
exetera.core.operations.generate_ordered_map_to_inner_partial(left, i_max, right, j_max, l_result, r_result, i_off, j_off, i, j, r, ii, jj, ii_max, jj_max, inner)¶ This function performs generates a mapping from a subset of a left key to a subset of a a right key, writing the resulting mapping to a buffer, where both keys can contain repeated entries.
Example:
left = [10, 20, 30, 40, 40, 50, 50] right = [20, 30, 30, 40, 40, 40, 60, 70] i j op r lres rres 0 0 < 0 0 INV 1 0 = 1 1 0 2 1 = 2 2 1 2 2 3 2 2 3 3 4 3 3 3 4 5 3 4 3 5 6 3 5 4 3 7 4 3 4 4 8 4 4 4 5 9 4 5 5 6 10 5 INV 6 6 11 6 INV left_map = [0, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 6] right_map = [INV, 1, 2, 2, 3, 3, 3, 4, 4, 4, INV, INV]
Everything about this function is optimised for performance under njit. It is effectively a finite state machine that iterates through left, right, and result arrays. The various…
i and i_max are used to track the index of the left source j and j_max are used to track the index of the right source
-
exetera.core.operations.generate_ordered_map_to_inner_right_unique_partial(left, i_max, right, j_max, l_result, r_result, i_off, j_off, i, j, r)¶
-
exetera.core.operations.generate_ordered_map_to_inner_right_unique_streamed(left: exetera.core.abstract_types.Field, right: exetera.core.abstract_types.Field, l_result: exetera.core.abstract_types.Field, r_result: exetera.core.abstract_types.Field, invalid: Union[numpy.int32, numpy.int64], chunksize: Optional[int] = 1048576, rdtype=<class 'numpy.int32'>)¶
-
exetera.core.operations.generate_ordered_map_to_inner_streamed(left: exetera.core.abstract_types.Field, right: exetera.core.abstract_types.Field, l_result: exetera.core.abstract_types.Field, r_result: exetera.core.abstract_types.Field, chunksize: Optional[int] = 1048576, rdtype=<class 'numpy.int32'>)¶ This function performs the most generic type of left to right mapping calculation in which both key fields can have repeated key values. At its heart, the function generates a mapping from left to right that can then be used to map data in the right space to data in the left space. Note that this can also be used to generate the inverse mapping my simply flipping left and right collections.
As the Fields
leftandrightcan contain arbitrarily long sequences of data, the data is streamed through the algorithm in a series of chunks. Similarly, the resulting map is written to a buffer that is written to theresultfield in chunks.This streamed function makes a sequence of calls to a corresponding _partial function that does the heavy lifting. Inside the _partial function, a finite state machine (FSM) iterates over the data, performing the mapping. The _partial function call exits whenever any of the chunks (
left_,right_orresult_that it is passed become exhausted.Please take a look at the documentation for the partial function to understand the finite state machine parameters to understand that role that the various parameters play.
- We have to make some adjustments to the finite state machine between calls to _partial:
if the call used all the
left_data, add the size of that data chunk toi_offif the call used all of the
right_data, add the size of that data chunk toj_offwrite the accumulated
result_data to the result` field, and resetrto 0
-
exetera.core.operations.generate_ordered_map_to_left_both_unique(first, second, result, invalid)¶
-
exetera.core.operations.generate_ordered_map_to_left_both_unique_partial(left, right, r_result, invalid, j_off, i, j, r)¶
-
exetera.core.operations.generate_ordered_map_to_left_both_unique_streamed(left: exetera.core.abstract_types.Field, right: exetera.core.abstract_types.Field, r_result: exetera.core.abstract_types.Field, invalid: Union[numpy.int32, numpy.int64], chunksize: Optional[int] = 1048576, rdtype=<class 'numpy.int32'>)¶
-
exetera.core.operations.generate_ordered_map_to_left_left_unique_partial(left, right, j_max, l_result, r_result, invalid, i_off, j_off, i, j, r)¶
-
exetera.core.operations.generate_ordered_map_to_left_left_unique_streamed(left: exetera.core.abstract_types.Field, right: exetera.core.abstract_types.Field, l_result: exetera.core.abstract_types.Field, r_result: exetera.core.abstract_types.Field, invalid: Union[numpy.int32, numpy.int64], chunksize: Optional[int] = 1048576, rdtype=<class 'numpy.int32'>)¶
-
exetera.core.operations.generate_ordered_map_to_left_partial(left, i_max, right, j_max, l_result, r_result, invalid, i_off, j_off, i, j, r, ii, jj, ii_max, jj_max, inner)¶ This function performs generates a mapping from a subset of a left key to a subset of a a right key, writing the resulting mapping to a buffer, where both keys can contain repeated entries.
Example:
left = [10, 20, 30, 40, 40, 50, 50] right = [20, 30, 30, 40, 40, 40, 60, 70] i j op r lres rres 0 0 < 0 0 INV 1 0 = 1 1 0 2 1 = 2 2 1 2 2 3 2 2 3 3 4 3 3 3 4 5 3 4 3 5 6 3 5 4 3 7 4 3 4 4 8 4 4 4 5 9 4 5 5 6 10 5 INV 6 6 11 6 INV left_map = [0, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 6] right_map = [INV, 1, 2, 2, 3, 3, 3, 4, 4, 4, INV, INV]
Everything about this function is optimised for performance under njit. It is effectively a finite state machine that iterates through left, right, and result arrays. The various…
i and i_max are used to track the index of the left source j and j_max are used to track the index of the right source
-
exetera.core.operations.generate_ordered_map_to_left_remaining(i_max, l_result, r_result, i_off, i, r, invalid)¶
-
exetera.core.operations.generate_ordered_map_to_left_right_unique(first, second, result, invalid)¶
-
exetera.core.operations.generate_ordered_map_to_left_right_unique_partial(left, i_max, right, r_result, invalid, j_off, i, j, r)¶
-
exetera.core.operations.generate_ordered_map_to_left_right_unique_partial_old(d_j, left, right, left_to_right, invalid)¶ Returns: [0]: how many positions forward i moved [1]: how many positions forward j moved [2]: how many elements were written
-
exetera.core.operations.generate_ordered_map_to_left_right_unique_remaining(i_max, r_result, i, r, invalid)¶
-
exetera.core.operations.generate_ordered_map_to_left_right_unique_streamed(left: exetera.core.abstract_types.Field, right: exetera.core.abstract_types.Field, r_result: exetera.core.abstract_types.Field, invalid: Union[numpy.int32, numpy.int64], chunksize: Optional[int] = 1048576, rdtype=<class 'numpy.int32'>)¶
-
exetera.core.operations.generate_ordered_map_to_left_right_unique_streamed_old(left, right, left_to_right, invalid=- 1, chunksize=1048576)¶
-
exetera.core.operations.generate_ordered_map_to_left_streamed(left: exetera.core.abstract_types.Field, right: exetera.core.abstract_types.Field, l_result: exetera.core.abstract_types.Field, r_result: exetera.core.abstract_types.Field, invalid: Union[numpy.int32, numpy.int64], chunksize: Optional[int] = 1048576, rdtype=<class 'numpy.int32'>)¶ This function performs the most generic type of left to right mapping calculation in which both key fields can have repeated key values. At its heart, the function generates a mapping from left to right that can then be used to map data in the right space to data in the left space. Note that this can also be used to generate the inverse mapping my simply flipping left and right collections.
As the Fields
leftandrightcan contain arbitrarily long sequences of data, the data is streamed through the algorithm in a series of chunks. Similarly, the resulting map is written to a buffer that is written to theresultfield in chunks.This streamed function makes a sequence of calls to a corresponding _partial function that does the heavy lifting. Inside the _partial function, a finite state machine (FSM) iterates over the data, performing the mapping. The _partial function call exits whenever any of the chunks (
left_,right_orresult_that it is passed become exhausted.Please take a look at the documentation for the partial function to understand the finite state machine parameters to understand that role that the various parameters play.
- We have to make some adjustments to the finite state machine between calls to _partial:
if the call used all the
left_data, add the size of that data chunk toi_offif the call used all of the
right_data, add the size of that data chunk toj_offwrite the accumulated
result_data to the result` field, and resetrto 0
-
exetera.core.operations.get_byte_map(string_map)¶ Getting byte indices and byte values from categorical key-value pair
-
exetera.core.operations.get_indexed_string_unique(indices, values, unique_result, unique_index, unique_inverse, unique_counts)¶ Find the unique elements for indexed string field using njit function.
-
exetera.core.operations.get_map_datatype_based_on_lengths(left_len, right_len)¶
-
exetera.core.operations.get_map_subchunks_based_on_index_lengths(map_, invalid, chunksize)¶
-
exetera.core.operations.get_next_chunk(start: int, chunk_size: int, field: exetera.core.abstract_types.Field)¶ This is a helper function that provides functionality specific to streaming ordered merges. It assumes that
fieldis in sorted order.This function is used to fetch chunks of memory from a field to be consumed by streaming merges. It first fetches the chunk of a given chunk size, or the size of the remaining memory, whichever is smaller. It then ‘trims’ that memory by removing the last sequence of equal values from the valid range.
- Parameters
start – The start of the chunk to be returned
chunksize – The size of the chunk to be considered. The returned chunk will always
be shorter than this unless it is the final chunk of the
fielddata :param field: The field from which data should be fetched. This field must be in sorted order :return: A tuple representing the range (inclusive, exclusive) and an numpy ndarray containing the data. Note, this is is typically longer than the range returned, as we do not trim the data for performance reasons.
-
exetera.core.operations.get_spans_for_field(ndarray)¶
-
exetera.core.operations.get_valid_value_extents(chunk, start, end, invalid=- 1)¶
-
exetera.core.operations.is_ordered(field)¶
-
exetera.core.operations.isin_for_indexed_string_field(test_elements, indices, values)¶
-
exetera.core.operations.isin_indexed_string_speedup(test_elements, indices, values)¶
-
exetera.core.operations.leaky_categorical_transform(chunk, freetext_indices, freetext_values, i_c, column_inds, column_vals, column_offsets, cat_keys, cat_index, cat_values)¶ Transform method for categorical importer in readerwriter.py
-
exetera.core.operations.map_valid(data_field, map_field, result=None, invalid=- 1)¶
-
exetera.core.operations.merge_entries_segment(i_start, cur_old_start, old_map, new_map, to_keep, old_src, new_src, dest)¶ - Parameters
i_start – the initial value to apply to ‘i’
cur_old_start – the initial value to apply to ‘cur_old
old_map – the map (in i-space) for the existing records
new_map – the map (in i-space) for the new records
to_keep – the flags (in i-space) indicating whether the new record should be kept
old_src – the source for the existing records
new_src – the source for the new records
dest – the sink for the merged sources
- Returns
-
exetera.core.operations.merge_indexed_journalled_entries(old_map, new_map, to_keep, old_src_inds, old_src_vals, new_src_inds, new_src_vals, dest_inds, dest_vals)¶
-
exetera.core.operations.merge_indexed_journalled_entries_count(old_map, new_map, to_keep, old_src_inds, new_src_inds)¶
-
exetera.core.operations.merge_journalled_entries(old_map, new_map, to_keep, old_src, new_src, dest)¶
-
exetera.core.operations.next_chunk(current: int, length: int, desired: int)¶ This is a helper function that can be used whenever you want to access a large sequence of data in chunks. It simply carries out the calculation that returns the extents of the next chunk taking into account the
lengthof the sequence. The sequence itself is not required here, only the length. :param current: the starting point of the chunk :param length: the length of the sequence being chunked :param desired: the requested length of the chunk :return: A tuple of the chunk extents. The first value is inclusive; the second is exclusive
-
exetera.core.operations.next_map_subchunk(map_, sm, invalid, chunksize)¶
-
exetera.core.operations.next_trimmed_chunk(field, chunk, chunk_size)¶
-
exetera.core.operations.next_untrimmed_chunk(field, chunk, chunk_size)¶
-
exetera.core.operations.numeric_bool_transform(elements, validity, column_inds, column_vals, column_offsets, col_idx, written_row_count, invalid_value, validation_mode, field_name)¶ Transform method for numeric importer (bool) in readerwriter.py
-
exetera.core.operations.ordered_generate_journalling_indices(old, new)¶
-
exetera.core.operations.ordered_get_last_as_filter(field)¶
-
exetera.core.operations.ordered_inner_map(left, right, left_to_inner, right_to_inner)¶
-
exetera.core.operations.ordered_inner_map_both_unique(left, right, left_to_inner, right_to_inner)¶
-
exetera.core.operations.ordered_inner_map_left_unique(left, right, left_to_inner, right_to_inner)¶
-
exetera.core.operations.ordered_inner_map_left_unique_partial(d_i, d_j, left, right, left_to_inner, right_to_inner)¶ Returns: [0]: how many positions forward i moved [1]: how many positions forward j moved [2]: how many elements were written
-
exetera.core.operations.ordered_inner_map_left_unique_streamed(left, right, left_to_inner, right_to_inner, chunksize=1048576)¶
-
exetera.core.operations.ordered_inner_map_result_size(left, right)¶
-
exetera.core.operations.ordered_left_map_result_size(left, right)¶
-
exetera.core.operations.ordered_map_valid_indexed_partial(sm_values, sm_start, sm_end, indices, i_start, i_max, values, mv_start, result_indices, result_values, invalid, sm, ri, rv, ri_accum)¶
-
exetera.core.operations.ordered_map_valid_indexed_stream(data_field, map_field, result_field, invalid=- 1, chunksize=1048576, value_factor=8)¶
-
exetera.core.operations.ordered_map_valid_partial(values, map_values, sm_start, sm_end, d_start, result_data, invalid, invalid_value)¶
-
exetera.core.operations.ordered_map_valid_partial_old(d, data_field, map_field, result, invalid)¶
-
exetera.core.operations.ordered_map_valid_stream(data_field, map_field, result_field, invalid=- 1, chunksize=1048576)¶ - . for each map chunk
- . calculate sub chunks based on indices
- . for each sub chunk
. map indices for sub chunk
-
exetera.core.operations.ordered_map_valid_stream_old(data_field, map_field, result_field, invalid=- 1, chunksize=1048576)¶
-
exetera.core.operations.ordered_outer_map_result_size_both_unique(left, right)¶
-
exetera.core.operations.raiseNumericException(exception_message, exception_args)¶
-
exetera.core.operations.safe_map_indexed_values(data_indices, data_values, map_field, map_filter, empty_value=None)¶
-
exetera.core.operations.safe_map_values(data_field, map_field, map_filter, empty_value=None)¶
-
exetera.core.operations.str_to_dtype(str_dtype)¶
-
exetera.core.operations.streaming_sort_partial(in_chunk_indices, in_chunk_lengths, src_value_chunks, src_index_chunks, dest_value_chunk, dest_index_chunk)¶
-
exetera.core.operations.transform_float(column_inds, column_vals, column_offsets, col_idx, written_row_count, invalid_value, validation_mode, data_type, field_name)¶ Transform float method for numeric importer in field_importer.py
-
exetera.core.operations.transform_int(column_inds, column_vals, column_offsets, col_idx, written_row_count, invalid_value, validation_mode, data_type, field_name)¶ Transform int method for numeric importer in field_importer.py
-
exetera.core.operations.transform_to_values(column_inds, column_vals, column_offsets, col_idx, written_row_count)¶ Trasnform method for byte data from np.int to np.bytes_
-
exetera.core.operations.unique_for_indexed_string(indices, values, return_index, return_inverse, return_counts)¶ Find the unique elements for indexed string field.
exetera.core.regression module¶
exetera.core.session module¶
-
class
exetera.core.session.Session(chunksize: int = 1048576, timestamp: str = '2023-01-18 11:14:27.097526+00:00')¶ Bases:
exetera.core.abstract_types.AbstractSessionSession is the top-level object that is used to create and open ExeTera Datasets. It also provides operations that can be performed on Fields. For a more detailed explanation of Session and examples of its usage, please refer to https://github.com/KCL-BMEIS/ExeTera/wiki/Session-API
- Parameters
chunksize – Change the default chunksize that fields created with this dataset use. Note this is a hint parameter and future versions of Session may choose to ignore it if it is no longer required. In general, it should only be changed for testing.
timestamp – Set the official timestamp for the Session’s creation rather than taking the current date/time.
-
aggregate_count(index, dest=None)¶ Finds the number of entries within each sub-group of index.
Example:
Index: a a a b b x a c c d d d Result: 3 2 1 1 2 3
- Parameters
index – A numpy array or Field containing the index that defines the ranges over which count is applied.
dest – If set, a Field to which the resulting counts are written
- Returns
A numpy array containing the resulting values
-
aggregate_custom(predicate, index, target=None, dest=None)¶
-
aggregate_first(index, target=None, dest=None)¶ Finds the first entries within each sub-group of index.
Example:
Index: a a a b b x a c c d d d Target: 1 2 3 4 5 6 7 8 9 0 1 2 Result: 1 4 6 7 8 0
- Parameters
index – A numpy array or Field containing the index that defines the ranges over which count is applied.
target – A numpy array to which the index and predicate are applied
dest – If set, a Field to which the resulting counts are written
- Returns
A numpy array containing the resulting values
-
aggregate_last(index, target=None, dest=None)¶ Finds the first entries within each sub-group of index.
Example:
Index: a a a b b x a c c d d d Target: 1 2 3 4 5 6 7 8 9 0 1 2 Result: 3 5 6 7 9 2
- Parameters
index – A numpy array or Field containing the index that defines the ranges over which count is applied.
target – A numpy array to which the index and predicate are applied
dest – If set, a Field to which the resulting counts are written
- Returns
A numpy array containing the resulting values
-
aggregate_max(index, target=None, dest=None)¶ Finds the maximum value within each sub-group of index.
Example:
Index: a a a b b x a c c d d d Target: 1 2 3 5 4 6 7 8 9 2 1 0 Result: 3 5 6 7 9 2
- Parameters
index – A numpy array or Field containing the index that defines the ranges over which max is applied.
target – A numpy array to which the index and predicate are applied
dest – If set, a Field to which the resulting counts are written
- Returns
A numpy array containing the resulting values
-
aggregate_min(index, target=None, dest=None)¶ Finds the minimum value within each sub-group of index.
Example:
Index: a a a b b x a c c d d d Target: 1 2 3 5 4 6 7 8 9 2 1 0 Result: 1 4 6 7 8 0
- Parameters
index – A numpy array or Field containing the index that defines the ranges over which min is applied.
target – A numpy array to which the index and predicate are applied
dest – If set, a Field to which the resulting counts are written
- Returns
A numpy array containing the resulting values
-
apply_filter(filter_to_apply, src, dest=None)¶ Apply a filter to an a src field. The filtered field is written to dest if it set, and returned from the function call. If the field is an IndexedStringField, the indices and values are returned separately.
- Parameters
filter_to_apply – the filter to be applied to the source field, an array of boolean
src – the field to be filtered
dest – optional - a field to write the filtered data to
- Returns
the filtered values
-
apply_index(index_to_apply, src, dest=None)¶ Apply a index to an a src field. The indexed field is written to dest if it set, and returned from the function call. If the field is an IndexedStringField, the indices and values are returned separately.
- Parameters
index_to_apply – the index to be applied to the source field, must be one of Group, Field, or ndarray
src – the field to be index
dest – optional - a field to write the indexed data to
- Returns
the indexed values
-
apply_spans_concat(spans, target, dest, src_chunksize=None, dest_chunksize=None, chunksize_mult=None)¶
-
apply_spans_count(spans: numpy.ndarray, dest: exetera.core.abstract_types.Field = None)¶ Finds the number of entries within each span.
- Parameters
spans – the numpy array of spans to be applied
dest – if set, the field to which the results are written
- Returns
A numpy array containing the resulting values
-
apply_spans_first(spans: numpy.ndarray, target: numpy.ndarray, dest: exetera.core.abstract_types.Field = None)¶ Finds the first entry within each span on a target field.
- Parameters
spans – the numpy array of spans to be applied
target – the field to which the spans are applied
dest – if set, the field to which the results are written
- Returns
A numpy array containing the resulting values
-
apply_spans_index_of_first(spans: numpy.ndarray, dest: exetera.core.abstract_types.Field = None)¶ Finds the index of the first entry within each span.
- Parameters
spans – the numpy array of spans to be applied
dest – if set, the field to which the results are written
- Returns
A numpy array containing the resulting values
-
apply_spans_index_of_last(spans: numpy.ndarray, dest: exetera.core.abstract_types.Field = None)¶ Finds the index of the last entry within each span.
- Parameters
spans – the numpy array of spans to be applied
dest – if set, the field to which the results are written
- Returns
A numpy array containing the resulting values
-
apply_spans_index_of_max(spans: numpy.ndarray, target: numpy.ndarray, dest: exetera.core.abstract_types.Field = None)¶ Finds the index of the maximum value within each span on a target field.
- Parameters
spans – the numpy array of spans to be applied
target – the field to which the spans are applied
dest – if set, the field to which the results are written
- Returns
A numpy array containing the resulting values
-
apply_spans_index_of_min(spans: numpy.ndarray, target: numpy.ndarray, dest: exetera.core.abstract_types.Field = None)¶ Finds the index of the minimum value within each span on a target field.
- Parameters
spans – the numpy array of spans to be applied
target – the field to which the spans are applied
dest – if set, the field to which the results are written
- Returns
A numpy array containing the resulting values
-
apply_spans_last(spans: numpy.ndarray, target: numpy.ndarray, dest: exetera.core.abstract_types.Field = None)¶ Finds the last entry within each span on a target field.
- Parameters
spans – the numpy array of spans to be applied
target – the field to which the spans are applied
dest – if set, the field to which the results are written
- Returns
A numpy array containing the resulting values
-
apply_spans_max(spans: numpy.ndarray, target: numpy.ndarray, dest: exetera.core.abstract_types.Field = None)¶ Finds the maximum value within each span on a target field.
- Parameters
spans – the numpy array of spans to be applied
target – the field to which the spans are applied
dest – if set, the field to which the results are written
- Returns
A numpy array containing the resulting values
-
apply_spans_min(spans: numpy.ndarray, target: numpy.ndarray, dest: exetera.core.abstract_types.Field = None)¶ Finds the minimum value within span on a target field.
- Parameters
spans – the numpy array of spans to be applied
target – the field to which the spans are applied
dest – if set, the field to which the results are written
- Returns
A numpy array containing the resulting values
-
chunks(length: int, chunksize: Optional[int] = None)¶ Note: this function is deprecated, and provided only for compatibility with existing scripts. It will be removed in a future version.
‘chunks’ is a convenience method that, given an overall length and a chunksize, will yield a set of ranges for the chunks in question. ie. chunks(1048576, 500000) -> (0, 500000), (500000, 1000000), (1000000, 1048576)
- Parameters
length – The range to be split into chunks
chunksize – Optional parameter detailing the size of each chunk. If not set, the chunksize that the Session was initialized with is used.
-
close()¶ Close all open datasets.
- Returns
None
-
close_dataset(name: str)¶ Close the dataset with the given name. If there is no dataset with that name, do nothing.
- Parameters
name – The name of the dataset to be closed
- Returns
None
-
create_categorical(group, name, nformat, key, timestamp=None, chunksize=None)¶ Create a categorical field in the given DataFrame with the given name. This function also takes a numerical format for the numeric representation of the categories, and a key that maps numeric values to their string string descriptions.
- Parameters
group – The group in which the new field should be created
name – The name of the new field
nformat – A numerical type in the set (int8, uint8, int16, uint18, int32, uint32, int64). It is recommended to use ‘int8’.
key – A dictionary that maps numerical values to their string representations
timestamp – If set, the timestamp that should be given to the new field. If not set datetime.now() is used.
chunksize – If set, the chunksize that should be used to create the new field. In general, this should not be set unless you are writing unit tests.
-
create_fixed_string(group, name, length, timestamp=None, chunksize=None)¶ Create a fixed string field in the given DataFrame, given name, and given max string length per entry.
- Parameters
group – The group in which the new field should be created
name – The name of the new field
length – The maximum length in bytes that each entry can have.
timestamp – If set, the timestamp that should be given to the new field. If not set datetime.now() is used.
chunksize – If set, the chunksize that should be used to create the new field. In general, this should not be set unless you are writing unit tests.
-
create_indexed_string(group, name, timestamp=None, chunksize=None)¶ Create an indexed string field in the given DataFrame with the given name.
- Parameters
group – The group in which the new field should be created
name – The name of the new field
timestamp – If set, the timestamp that should be given to the new field. If not set datetime.now() is used.
chunksize – If set, the chunksize that should be used to create the new field. In general, this should not be set unless you are writing unit tests.
-
create_like(field, dest_group, dest_name, timestamp=None, chunksize=None)¶ Create a field of the same type as an existing field, in the location and with the name provided.
Example:
with Session as s: ... a = s.get(table_1['a']) b = s.create_like(a, table_2, 'a_times_2') b.data.write(a.data[:] * 2)
- Parameters
field – The Field whose type is to be copied
dest_group – The group in which the new field should be created
dest_name – The name of the new field
-
create_numeric(group, name, nformat, timestamp=None, chunksize=None)¶ Create a numeric field in the given DataFrame with the given name.
- Parameters
group – The group in which the new field should be created
name – The name of the new field
nformat – A numerical type in the set (int8, uint8, int16, uint18, int32, uint32, int64, uint64, float32, float64). It is recommended to avoid uint64 as certain operations in numpy cause conversions to floating point values.
timestamp – If set, the timestamp that should be given to the new field. If not set datetime.now() is used.
chunksize – If set, the chunksize that should be used to create the new field. In general, this should not be set unless you are writing unit tests.
-
create_timestamp(group, name, timestamp=None, chunksize=None)¶ Create a timestamp field in the given group with the given name.
-
dataset_sort_index(sort_indices, index=None)¶ Generate a sorted index based on a set of fields upon which to sort and an optional index to apply to the sort_indices.
- Parameters
sort_indices – a tuple or list of indices that determine the sorted order
index – optional - the index by which the initial field should be permuted
- Returns
the resulting index that can be used to permute unsorted fields
-
distinct(field=None, fields=None, filter=None)¶ todo: confirm deprecated.
-
get(field: Union[exetera.core.abstract_types.Field, h5py._hl.group.Group])¶ Get a Field from a h5py Group.
Example:
# this code for context with Session() as s: # open a dataset about wildlife src = s.open_dataset("/my/wildlife/dataset.hdf5", "r", "src") # fetch the group containing bird data birds = src['birds'] # get the bird decibel field bird_decibels = s.get(birds['decibels'])
- Parameters
field – The Field or Group object to retrieve.
-
get_dataset(name: str)¶ Get the dataset with the given name. If there is no dataset with that name, raise a KeyError indicating that the dataset with that name is not present.
- Parameters
name – Name of the dataset to be fetched. This is the name that was given to it when it was opened through
open_dataset().- Returns
Dataset with that name.
-
get_index(target, foreign_key, destination=None)¶ Note: this function is deprecated, and provided only for compatibility with existing scripts. It will be removed in a future version.
Please make use of Dataframe.merge functionality instead. This method can be emulated by adding an index (via np.arange) to a dataframe, performing a merge and then fetching the mapped index field.
‘get_index’ maps a primary key (‘target’) into the space of a foreign key (‘foreign_key’).
-
get_or_create_group(group: Union[h5py._hl.group.Group, h5py._hl.files.File], name: str)¶ Note: this function is deprecated, and provided only for compatibility with existing scripts. It will be removed in a future version.
Create a shared index based on a tuple of numpy arrays containing keys. This function generates the sorted union of a tuple of key fields and then maps the individual arrays to their corresponding indices in the sorted union.
- Parameters
keys – a tuple of groups, fields or ndarrays whose contents represent keys
Example:
key_1 = ['a', 'b', 'e', 'g', 'i'] key_2 = ['b', 'b', 'c', 'c', 'e', 'g', 'j'] key_3 = ['a', 'c' 'd', 'e', 'g', 'h', 'h', 'i'] sorted_union = ['a', 'b', 'c', 'd', 'e', 'g', 'h', 'i', 'j'] key_1_index = [0, 1, 4, 5, 7] key_2_index = [1, 1, 2, 2, 4, 5, 8] key_3_index = [0, 3, 4, 5, 6, 6, 7]
-
get_spans(field: Union[exetera.core.abstract_types.Field, numpy.ndarray] = None, dest: exetera.core.abstract_types.Field = None, **kwargs)¶ Calculate a set of spans that indicate contiguous equal values. The entries in the result array correspond to the inclusive start and exclusive end of the span (the ith span is represented by element i and element i+1 of the result array). The last entry of the result array is the length of the source field.
Only one of ‘field’ or ‘fields’ may be set. If ‘fields’ is used and more than one field specified, the fields are effectively zipped and the check for spans is carried out on each corresponding tuple in the zipped field.
Example:
field: [1, 2, 2, 1, 1, 1, 3, 4, 4, 4, 2, 2, 2, 2, 2] result: [0, 1, 3, 6, 7, 10, 15]
- Parameters
field – A Field or numpy array to be evaluated for spans
dest – A destination Field to store the result
**kwargs – See below. For parameters set in both argument and kwargs, use kwargs
- Keyword Arguments
field – Similar to field parameter, in case user specify field as keyword
fields – A tuple of Fields or tuple of numpy arrays to be evaluated for spans
dest – Similar to dest parameter, in case user specify as keyword
- Returns
The resulting set of spans as a numpy array
-
join(destination_pkey, fkey_indices, values_to_join, writer=None, fkey_index_spans=None)¶ This method is due for removal and should not be used. Please use the merge or ordered_merge functions instead.
-
list_datasets()¶ List the open datasets for this Session object. This is returned as a tuple of strings rather than the datasets themselves. The individual datasets can be fetched using
get_dataset().Example:
names = s.list_datasets() datasets = [s.get_dataset(n) for n in names]
- Returns
A tuple containing the names of the currently open datasets for this Session object
-
merge_inner(left_on, right_on, left_fields=None, left_writers=None, right_fields=None, right_writers=None)¶ Note: this function is deprecated, and provided only for compatibility with existing scripts. It will be removed in a future version.
Please use DataFrame.merge instead.
Perform a database-style inner join on left_fields, outputting the result to left_writers, if set.
- Parameters
left_on – The key to perform the join on on the left hand side
right_on – The key to perform the join on on the right hand side
left_fields – The fields to be mapped from left to inner
left_writers – Optional parameter providing the fields to which the mapped data should be written. If this is not set, the mapped data is returned as numpy arrays and lists instead.
right_fields – The fields to be mapped from right to inner
right_writers – Optional parameter providing the fields to which the mapped data should be written. If this is not set, the mapped data is returned as numpy arrays and lists instead.
-
merge_left(left_on, right_on, right_fields=(), right_writers=None)¶ Note: this function is deprecated, and provided only for compatibility with existing scripts. It will be removed in a future version.
Please use DataFrame.merge instead.
Perform a database-style left join on right_fields, outputting the result to right_writers, if set.
- Parameters
left_on – The key to perform the join on on the left hand side
right_on – The key to perform the join on on the right hand side
right_fields – The fields to be mapped from right to left
right_writers – Optional parameter providing the fields to which the mapped data should be written. If this is not set, the mapped data is returned as numpy arrays and lists instead.
-
merge_right(left_on, right_on, left_fields=(), left_writers=None)¶ Note: this function is deprecated, and provided only for compatibility with existing scripts. It will be removed in a future version.
Please use DataFrame.merge instead.
Perform a database-style right join on left_fields, outputting the result to left_writers, if set.
- Parameters
left_on – The key to perform the join on on the left hand side
right_on – The key to perform the join on on the right hand side
left_fields – The fields to be mapped from right to left
left_writers – Optional parameter providing the fields to which the mapped data should be written. If this is not set, the mapped data is returned as numpy arrays and lists instead.
-
open_dataset(dataset_path: Union[str, IO[bytes]], mode: str, name: str)¶ Open a dataset with the given access mode.
- Parameters
dataset_path – the path to the dataset
mode – the mode in which the dataset should be opened. This is one of “r”, “r+” or “w”.
name – the name that is associated with this dataset. This can be used to retrieve the dataset when calling
get_dataset().
- Returns
The top-level dataset object
-
ordered_merge_inner(left_on, right_on, left_field_sources=(), left_field_sinks=None, right_field_sources=(), right_field_sinks=None, left_unique=False, right_unique=False)¶ Note: this function is deprecated, and provided only for compatibility with existing scripts. It will be removed in a future version.
Please use DataFrame.merge instead.
Generate the results of an inner join and apply it to the fields described in the tuple ‘right_field_sources’. If ‘right_field_sinks’ is set, the mapped values are written to the fields / arrays set there.
Note: in order to achieve best scalability, you should use groups / fields rather than numpy arrays and provide a tuple of groups/fields to right_field_sinks, so that the session and compute the merge and apply the mapping in a streaming fashion.
- Parameters
left_on – the group/field/numba array that contains the left key values
right_on – the group/field/numba array that contains the right key values
right_to_left_map – a group/field/numba array that the map is written to. If it is a numba array, it must be the size of the resulting merge
right_field_sources – a tuple of group/fields/numba arrays that contain the fields to be joined
right_field_sinks – optional - a tuple of group/fields/numba arrays that the mapped fields should be written to
left_unique – a hint to indicate whether the ‘left_on’ field contains unique values
right_unique – a hint to indicate whether the ‘right_on’ field contains unique values
- Returns
If right_field_sinks is not set, a tuple of the output fields is returned
-
ordered_merge_left(left_on, right_on, right_field_sources=(), left_field_sinks=None, left_to_right_map=None, left_unique=False, right_unique=False)¶ Note: this function is deprecated, and provided only for compatibility with existing scripts. It will be removed in a future version.
Please use DataFrame.merge instead.
Generate the results of a left join and apply it to the fields described in the tuple ‘left_field_sources’. If ‘left_field_sinks’ is set, the mapped values are written to the fields / arrays set there. Note: in order to achieve best scalability, you should use groups / fields rather than numpy arrays and provide a tuple of groups/fields to left_field_sinks, so that the session and compute the merge and apply the mapping in a streaming fashion.
- Parameters
left_on – the group/field/numba array that contains the left key values
right_on – the group/field/numba array that contains the right key values
left_to_right_map – a group/field/numba array that the map is written to. If it is a numba array, it must be the size of the resulting merge
left_field_sources – a tuple of group/fields/numba arrays that contain the fields to be joined
left_field_sinks – optional - a tuple of group/fields/numba arrays that the mapped fields should be written to
left_unique – a hint to indicate whether the ‘left_on’ field contains unique values
right_unique – a hint to indicate whether the ‘right_on’ field contains unique values
- Returns
If left_field_sinks is not set, a tuple of the output fields is returned
-
ordered_merge_right(left_on, right_on, left_field_sources=(), right_field_sinks=None, right_to_left_map=None, left_unique=False, right_unique=False)¶ Note: this function is deprecated, and provided only for compatibility with existing scripts. It will be removed in a future version.
Please use DataFrame.merge instead.
Generate the results of a right join and apply it to the fields described in the tuple ‘right_field_sources’. If ‘right_field_sinks’ is set, the mapped values are written to the fields / arrays set there.
Note: in order to achieve best scalability, you should use groups / fields rather than numpy arrays and provide a tuple of groups/fields to right_field_sinks, so that the session and compute the merge and apply the mapping in a streaming fashion.
- Parameters
left_on – the group/field/numba array that contains the left key values
right_on – the group/field/numba array that contains the right key values
right_to_left_map – a group/field/numba array that the map is written to. If it is a numba array, it must be the size of the resulting merge
right_field_sources – a tuple of group/fields/numba arrays that contain the fields to be joined
right_field_sinks – optional - a tuple of group/fields/numba arrays that the mapped fields should be written to
left_unique – a hint to indicate whether the ‘left_on’ field contains unique values
right_unique – a hint to indicate whether the ‘right_on’ field contains unique values
- Returns
If right_field_sinks is not set, a tuple of the output fields is returned
-
set_timestamp(timestamp: str = '2023-01-18 11:14:27.097586+00:00')¶ Set the default timestamp to be used when creating fields without specifying an explicit timestamp.
- Parameters
timestamp – a string representing a valid Datetime
- Returns
None
-
sort_on(src_group: h5py._hl.group.Group, dest_group: h5py._hl.group.Group, keys: Union[tuple, list], timestamp=datetime.datetime(2023, 1, 18, 11, 14, 27, 97592, tzinfo=datetime.timezone.utc), write_mode='write', verbose=True)¶ Sort a group (src_group) of fields by the specified set of keys, and write the sorted fields to dest_group.
- Parameters
src_group – the group of fields that are to be sorted
dest_group – the group into which sorted fields are written
keys – fields to sort on
timestamp – optional - timestamp to write on the sorted fields
write_mode – optional - write mode to use if the destination fields already exist
- Returns
None
-
temp_filename()¶
exetera.core.utils module¶
-
class
exetera.core.utils.Timer(start_msg, new_line=False, end_msg='completed in')¶ Bases:
object
-
exetera.core.utils.build_histogram(dataset, filtered_records=None, tx=None)¶
-
exetera.core.utils.check_input_lengths(names, fields)¶
-
exetera.core.utils.count_flag_empty(flags)¶
-
exetera.core.utils.count_flag_not_set(flags, flag_to_test)¶
-
exetera.core.utils.count_flag_set(flags, flag_to_test)¶
-
exetera.core.utils.datetime_to_seconds(dt)¶
-
exetera.core.utils.filter_field(fields, filter_list, f_missing, f_bad, is_type_fn, type_fn, valid_fn)¶
-
exetera.core.utils.find_longest_sequence_of(string, char)¶
-
exetera.core.utils.get_min_max(value_type)¶
-
exetera.core.utils.guess_encoding(filename)¶ Attempt to determine the encodig of the given text file by reading the byte order mark, defaulting to utf-8 if none is found.
- Parameters
filename – path to a text file containing possible UTF-8, UTF-16, or UTF-32 text
- Returns
encoding name, one of utf-8, utf-8-sig, utf-16, utf-32
-
exetera.core.utils.map_between_categories(first_map, second_map)¶
-
exetera.core.utils.one_dim_data_to_indexed_for_test(data, field_size)¶
-
exetera.core.utils.string_to_datetime(field)¶
-
exetera.core.utils.timestamp_to_day(field)¶
-
exetera.core.utils.validate_file_exists(file_name)¶
exetera.core.validation module¶
-
exetera.core.validation.all_same_basic_type(name, fields)¶
-
exetera.core.validation.array_from_field_or_lower(name, field)¶
-
exetera.core.validation.array_from_parameter(session, name, field)¶
-
exetera.core.validation.ensure_valid_field(name, field)¶
-
exetera.core.validation.ensure_valid_field_like(name, field)¶
-
exetera.core.validation.field_from_parameter(session, name, field)¶
-
exetera.core.validation.is_field_parameter(field)¶
-
exetera.core.validation.raw_array_from_parameter(datastore, name, field)¶
-
exetera.core.validation.validate_all_field_length_in_df(df: exetera.core.abstract_types.DataFrame)¶
-
exetera.core.validation.validate_and_get_key_fields(side, df, key)¶
-
exetera.core.validation.validate_and_normalize_categorical_key(param_name, key)¶
-
exetera.core.validation.validate_boolean_row_filter(name, field)¶
-
exetera.core.validation.validate_chunk_size(chunk_size_name, chunk_size)¶
-
exetera.core.validation.validate_field_lengths(side, lens, df, names=None)¶
-
exetera.core.validation.validate_filter(filter_to_apply)¶
-
exetera.core.validation.validate_groupby_target(target, by, all)¶
-
exetera.core.validation.validate_key_field_consistency(lname, rname, lkey, rkey)¶
-
exetera.core.validation.validate_key_lengths(side, df, key)¶
-
exetera.core.validation.validate_require_key(context, key, dictionary)¶
-
exetera.core.validation.validate_selected_keys(by, all)¶