Understand your data
Information on how your data is formatted for content coming from external data sources using APIs.
📘
NoteThe code and sample data for this tutorial is available onGitHub.
Optimizely Graph lets you sync and query for content coming from external data sources using APIs. This data can be loaded as a batch job. To demonstrate the use of external data, this tutorial uses thenon-commercial datasets from IMDb.
Before you can start syncing other data sources to Optimizely Graph, you need to understand the data. This includes knowing the format, the fields used, and the properties of these fields.
- Format – The IMDb datasets are in TSV format, where the first row is the column headers, each row is a record, and each column is a value.
- Field names – Headers of CSV and TSV files can be used as field names. This tutorial uses the headers as field names.
- Types of fields – Columns have specific types in this tutorial. they are one of the following:
- String
- Integer
- Floats
- Boolean
- String array
Other types that are currently supported but not used in this tutorial areDateand object types.
- Properties of fields – You can check whether any of these fields are useful forfull-text search. If so, you can set them as
searchable.
This tutorial focuses on 3 datasets of the IMDb data.
- Names
- Titles
- Ratings
When looking at the names, theprimaryName andprimaryProfession are useful for full-text searches. To use the headers as field names, append the suffix___searchable to the column headers on the first row.
📘
Note
___searchablehas three underscores.
nconst primaryName___searchable birthYear deathYear primaryProfession___searchable knownForTitlesnm0000001 Fred Astaire 1899 1987 soundtrack,actor,miscellaneous tt0050419,tt0031983,tt0053137,tt0072308nm0000002 Lauren Bacall 1924 2014 actress,soundtrack tt0075213,tt0037382,tt0117057,tt0038355nm0000003 Brigitte Bardot 1934 \N actress,soundtrack,music_department tt0049189,tt0054452,tt0057345,tt0056404nm0000004 John Belushi 1949 1982 actor,soundtrack,writer tt0080455,tt0077975,tt0072562,tt0078723nm0000005 Ingmar Bergman 1918 2007 writer,director,actor tt0083922,tt0050976,tt0050986,tt0069467nm0000006 Ingrid Bergman 1915 1982 actress,soundtrack,producer tt0038787,tt0034583,tt0036855,tt0038109nm0000007 Humphrey Bogart 1899 1957 actor,soundtrack,producer tt0037382,tt0034583,tt0042593,tt0043265When looking at the titles,primaryTitle andgenres are useful for full-text search, so mark these assearchable.
tconst titleType primaryTitle___searchable originalTitle isAdult startYear endYear runtimeMinutes genres___searchablett0000001 short Carmencita Carmencita 0 1894 \N 1 Documentary,Shorttt0000002 short Le clown et ses chiens Le clown et ses chiens 0 1892 \N 5 Animation,Shorttt0000003 short Pauvre Pierrot Pauvre Pierrot 0 1892 \N 4 Animation,Comedy,Romancett0000004 short Un bon bock Un bon bock 0 1892 \N 12 Animation,ShortFinally, when looking at the ratings, there are integers and floats, but no changes are required to the headers.
tconst averageRating numVotestt0000001 5.7 1996tt0000002 5.8 268tt0000003 6.5 1885tt0000004 5.5 177tt0000005 6.2 2670Updated 2 months ago
