Data Schema

Databases require a configuration file which specifies the data schema - the type and structure of the data, as well as any customized settings. If a data structure changes, the configuration must be modified accordingly.
Configuration File
Hyperspace requires the configuration to be provided in a standard .json format
  • Data configuration is provided under a key named ‘configuration’
  • Lists, sets and dicts must be declared under a key named “'struct_type'”.
  • Vectors are defined under a designated key, including dimension and similarity metric.
  • Fields that are used for cardinality aggregation are defined as “cardinality_field”.
  • low_cardinality, high_cardinality, cardinality_field fields should include proper boolean key in description.
  • Settings can be included under the key ‘settings’.
{'configuration':{‘series’: {'type': 'keyword'},  
                         ‘age_group’: {'type': 'keyword', ‘cardinality_field’: True},  
                         ‘genres': {'struct_type': 'list', 'type': 'keyword', ‘low_cardinality’: True},  
                         'id': {'type': 'integer'},  
                         ‘text embedding': {'dim': 1024, 'metric': 'IP', 'type': 'dense_vector'},  
                         'production_companies': {'struct_type': 'list', 'type': 'keyword'}, 
                         'production_countries': {'struct_type': 'list', 'type': 'keyword'},  
                         'rating': {'type': 'float'},
                         ‘spoken_languages': {'struct_type': 'list', 'type': 'keyword'}, 
                         ‘title': {'type': 'keyword'}
                         }, 
'settings': {‘delimiter’:”’”}
}