What is ExeTera

ExeTera is a data curation and analysis software that enables users to work effectively with large, related tables of data. It focuses on the scale and efficiency which we consider essential to being able to effective curate and analyse data:

Scale

The python software ecosystem has provided amazing tools such as numpy and pandas that have made complex analysis of data available to a wide audience. These tools are used by countless individuals to analyse all manner of datasets and their accessibility and their ability to be integrated into the wider python ecosystem has made data science more accessible to all.

These tools come with a limitation, however, and that limitation is scale. These tools are not typically designed to work with more than the amount of RAM that a computer has. When dealing with large datasets, this places an effective upper limit on what can be processed and users typically have to resort to one or more measures to work around the problem:

  • Cut down the dataset somehow so that the subset is manageable

  • Buy a machine with more memory

  • Install a relational database such as postgres (or other type of datastore) and learn its API and how to maintain it

The problem of scale is surmountable. All of the operations required to analyse relational tables, such as sorts, joins, aggregations, and so forth have scalable implementations that can make use of a hard drive, and, if these operations are provided, it is possible to analyse data approaching terabyte scales on a typical laptop computer.