Thoughts about schema-on-write and schema-on-read

There are two approcahes, which we can select for designing storage of the data. They are schema-on-read and schema-on-write.

The main difference between these approaches is the moment, when we add the structure to our data.

If we select schema-on-write approach, we should design the data model in advance. That means we should know the typical usage of data and depending of this we select the appropriate data model.

If we select schema-on-read approach, we can save our data in unstructured way, or minimal structured way. But we need to apply the structure at the later point of time, when we know use case for data usage.

Important thing, we need to define the structure for data anyway.

Using the schema-on-write way the saving of the data will be slow. Because during the saving we need to apply the structure. Using the schema-on-read way the reading of the data will be slow. Because we need to apply structure during the reading of data (for example by map step in map-reduce paradigm).

As usual there is no free lunch and silver bullet. We should know what we are doing by designing data processing applications.