Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Cover image for AdTech using SingleStoreDB, Kafka and Metabase
SingleStore profile imageAkmal Chaudhri
Akmal Chaudhri forSingleStore

Posted on • Edited on

     

AdTech using SingleStoreDB, Kafka and Metabase

Abstract

SingleStoreDB can ingest and process large quantities of streaming data with ease. In this article, we'll build an AdTech application that simulates millions of Ad Campaign events streaming from a Kafka broker into SingleStoreDB. We'll also use Metabase to build a dashboard to visualise the data in SingleStoreDB.

Introduction

In aprevious article series, we've seen how to use SingleStoreDB with Kafka hosted on the Confluent Cloud. The example simulated global sensors sending temperature readings that could be stored and analysed in SingleStoreDB. In this article, we'll use a more localised Kafka broker and create a visual dashboard to provide insights into and analysis of the streamed data.

The complete SQL code is listed in the Appendix.

Create a SingleStoreDB Cloud account

Aprevious article showed the steps required to create a free SingleStoreDB Cloud account. We'll useAdTech Demo Group as our Workspace Group Name andadtech-demo as our Workspace Name.

Once we've created our database in the following steps, we'll make a note of ourpassword andhost name.

Create a Database and Tables

In our SingleStoreDB Cloud account, we'll use theSQL Editor to create a new database, as follows:

CREATEDATABASEIFNOTEXISTSadtech;
Enter fullscreen modeExit fullscreen mode

We'll also create two tables. First, theevents table:

USEadtech;CREATETABLEevents(user_idINT,event_nameVARCHAR(128),advertiserVARCHAR(128),campaignINT(11),genderVARCHAR(128),incomeVARCHAR(128),page_urlVARCHAR(128),regionVARCHAR(128),countryVARCHAR(128),SORTKEYadtmidx(user_id,event_name,advertiser,campaign),SHARDKEYuser_id(user_id));
Enter fullscreen modeExit fullscreen mode

This table stores details of theadvertiser,campaign and various demographic information about the user, such asgender andincome.

Second, thecampaigns table:

CREATEREFERENCETABLEcampaigns(campaign_idSMALLINT(6)NOTNULLDEFAULT'0',campaign_nameVARCHAR(255)CHARACTERSETutf8COLLATEutf8_general_ciDEFAULTNULL,PRIMARYKEY(campaign_id));
Enter fullscreen modeExit fullscreen mode

We have a small number of campaigns and usingREFERENCE will keep the table in memory and provide very fast access.

Next, we'll insert the details of the campaigns:

INSERTINTOcampaignsVALUES(1,'demand great'),(2,'blackout'),(3,'flame broiled'),(4,'take it from a fish'),(5,'thank you'),(6,'designed by you'),(7,'virtual launch'),(8,'ultra light'),(9,'warmth'),(10,'run healthy'),(11,'virtual city'),(12,'online lifestyle'),(13,'dream burger'),(14,'super bowl tweet');
Enter fullscreen modeExit fullscreen mode

Create a Pipeline

Pipelines allow us to create streaming ingest feeds from various sources, such as Kafka, S3 and HDFS, using a single command. With pipelines, we can also perform ETL operations.

For our use case, we can create a simple pipeline in SingleStoreDB as follows:

CREATEPIPELINEeventsASLOADDATAKAFKA'public-kafka.memcompute.com:9092/ad_events'BATCH_INTERVAL2500INTOTABLEeventsFIELDSTERMINATEDBY'\t'ENCLOSEDBY''ESCAPEDBY'\\'LINESSTARTINGBY''(user_id,event_name,advertiser,campaign,gender,income,page_url,region,country);
Enter fullscreen modeExit fullscreen mode

We can control the rate of data ingestion using theBATCH_INTERVAL. Initially, we'll set this to 2500 milliseconds.

We can configure the pipeline to start from the earliest offset, as follows:

ALTERPIPELINEeventsSETOFFSETSEARLIEST;
Enter fullscreen modeExit fullscreen mode

and we can test the pipeline before we start running it, as follows:

TESTPIPELINEeventsLIMIT1;
Enter fullscreen modeExit fullscreen mode

The pipeline test should result in an error, as the value for thecampaign column is out of range:

ERROR 1264 ER_WARN_DATA_OUT_OF_RANGE: ... Out of range value for column 'campaign'
Enter fullscreen modeExit fullscreen mode

Let's look at the data in thead_events topic from the Kafka broker and see if we can identify the problem. We'll installkcat (formerly kafkacat):

sudoaptinstallkafkacat
Enter fullscreen modeExit fullscreen mode

and then run it as follows:

kafkacat-C-b public-kafka.memcompute.com:9092-t ad_events
Enter fullscreen modeExit fullscreen mode

Here are 10 rows of example output:

...607982731 Downstream Conversion Verizon Wireless 5 9 Male 50k - 75k / Utah US607982732 Impression AT&T Wireless 4 3 Female unknown /2014/04/make-spring-centerpiece-with-tulips.html Florida US607982732 Click AT&T Wireless 4 3 Female unknown /2014/04/make-spring-centerpiece-with-tulips.html Florida US607982732 Impression AT&T Wireless 5 9 Female unknown /category/31-day-drawing-challenge/ Florida US607982732 Impression AT&T Wireless 5 7 Female unknown /2016/05/lars-book-club-may.html/ Florida US607982732 Impression AT&T Wireless 5 7 Female unknown /2013/01/polka-dot-your-wrapping.html/ Florida US607982732 Impression AT&T Wireless 13 3 Female unknown /2016/03/12-best-paper-flowers-volume-3.html/ Florida US607982733 Impression Dominos Pizza 5 6 Female 25k and below /2014/07/make-the-midsummer-floral-crown.html/ Pennsylvania US607982734 Impression Lowes Companies 5 13 Female unknown /2016/01/eyeball-drink-stirrers.html/ New York US607982735 Impression Lowes Companies 4 6 Male unknown /2016/02/review-of-the-5-best-planners-of-2016.html/2/ California US...
Enter fullscreen modeExit fullscreen mode

If we take the first row above, and check for hidden characters, we'll see that we have the tab character (\t) separating the fields:

607982731<tab>Downstream<space>Conversion<tab>Verizon<space>Wireless<tab>5<space>9<tab>Male<tab>50k<space>-<space>75k<tab>/<tab>Utah<tab>US
Enter fullscreen modeExit fullscreen mode

Mapping the fields back to the table schema, we'll see that thecampaign column is meant to be an integer, but the stream contains two integers separated by a space:

... <tab>integer<space>integer<tab> ...
Enter fullscreen modeExit fullscreen mode

We can change the pipeline slightly to deal with this problem. First, we'll drop the pipeline:

DROPPIPELINEevents;
Enter fullscreen modeExit fullscreen mode

Then we'll modify line 4 of the pipeline with the keywordIGNORE, as follows:

CREATEPIPELINEeventsASLOADDATAKAFKA'public-kafka.memcompute.com:9092/ad_events'BATCH_INTERVAL2500IGNOREINTOTABLEeventsFIELDSTERMINATEDBY'\t'ENCLOSEDBY''ESCAPEDBY'\\'LINESSTARTINGBY''(user_id,event_name,advertiser,campaign,gender,income,page_url,region,country);
Enter fullscreen modeExit fullscreen mode

We can configure the pipeline again to start from the earliest offset, as follows:

ALTERPIPELINEeventsSETOFFSETSEARLIEST;
Enter fullscreen modeExit fullscreen mode

and we can test the pipeline again before we start running it, as follows:

TESTPIPELINEeventsLIMIT1;
Enter fullscreen modeExit fullscreen mode

This time the pipeline test should be successful and we should see some output, similar to the following:

+-----------+------------+------------+----------+---------+---------+----------------------------------+--------+---------+| user_id   | event_name | advertiser | campaign | gender  | income  | page_url                         | region | country |+-----------+------------+------------+----------+---------+---------+----------------------------------+--------+---------+| 605804911 | Impression | Aldi       |       13 | unknown | unknown | /2016/05/day-9-draw-a-rose.html/ | Nevada | US      |+-----------+------------+------------+----------+---------+---------+----------------------------------+--------+---------+
Enter fullscreen modeExit fullscreen mode

We can now start the pipeline:

STARTPIPELINEevents;
Enter fullscreen modeExit fullscreen mode

Install Metabase

To use Metabase, we'll follow theJAR file installation instructions and run the JAR file, as follows:

java-jar metabase.jar
Enter fullscreen modeExit fullscreen mode

Next, we'll open the following webpage:

http://localhost:3000/setup
Enter fullscreen modeExit fullscreen mode

We should see the following, shown in Figure 1.

Figure 1. Metabase setup.

Figure 1. Metabase setup.

We'll work through the various wizards to complete the language and user settings. To add our data, we'll select MySQL and we'll need to fill in the details for our connection, as follows:

  • Display name: S2
  • Host:<host>
  • Port: 3306
  • Database name: adtech
  • Username: admin
  • Password:<password>

We'll replace the<host> and<password> with the values from our SingleStoreDB Cloud account.

We'll click theConnect database button. The connection should be successful.

Finally, we'll clickFinish and thenTake me to Metabase.

Next, we should see the following, as shown in Figure 2.

Figure 2. Metabase Home.

Figure 2. Metabase Home.

Create Dashboard

We are now ready to create our dashboard.

1. Total Number of Events

From the top right, we'll select+ New and thenSQL Query as shown in Figure 3.

Figure 3. New SQL Query.

Figure 3. New SQL Query.

After selecting theS2 database, we'll enter the following into theSQL Editor in Metabase:

SELECTCOUNT(*)FROMevents;
Enter fullscreen modeExit fullscreen mode

We can run the query using theright arrow button (▶) in the bottom right and use theSave link in the top right to save this query. We'll call the queryTotal Number of Events. We'll create a new dashboard calledAdTech and add the query to the dashboard, as shown in Figure 4.

Figure 4. AdTech Dashboard.

Figure 4. AdTech Dashboard.

2. Events by Region

We'll repeat the previous process of creating an SQL Query and adding it to the AdTech dashboard for the following query (Events by Region):

SELECTevents.countryAS`events.country`,COUNT(events.country)AS'events.countofevents'FROMadtech.eventsASeventsGROUPBY1;
Enter fullscreen modeExit fullscreen mode

With this query, we have the option of creating aVisualization. We'll create a pie chart.

3. Events by Top 5 Advertisers

We'll repeat the previous process of creating an SQL Query and adding it to the AdTech dashboard for the following query (Events by Top 5 Advertisers):

SELECTevents.advertiserAS`events.advertiser`,COUNT(*)AS`events.count`FROMadtech.eventsASeventsWHERE(events.advertiserLIKE'%Subway%'ORevents.advertiserLIKE'%McDonals%'ORevents.advertiserLIKE'%Starbucks%'ORevents.advertiserLIKE'%Dollar General%'ORevents.advertiserLIKE'%YUM! Brands%'ORevents.advertiserLIKE'%Dunkin Brands Group%')GROUPBY1ORDERBY`events.count`DESC;
Enter fullscreen modeExit fullscreen mode

With this query, we have the option of creating aVisualization. We'll create a bar chart.

4. Ad Visitors by Gender and Income

We'll repeat the previous process of creating an SQL Query and adding it to the AdTech dashboard for the following query (Ad Visitors by Gender and Income):

SELECT*FROM(SELECT*,DENSE_RANK()OVER(ORDERBYxx.z___min_rank)ASz___pivot_row_rank,RANK()OVER(PARTITIONBYxx.z__pivot_col_rankORDERBYxx.z___min_rank)ASz__pivot_col_ordering,CASEWHENxx.z___min_rank=xx.z___rankTHEN1ELSE0ENDASz__is_highest_ranked_cellFROM(SELECT*,Min(aa.z___rank)OVER(PARTITIONBYaa.`events.income`)ASz___min_rankFROM(SELECT*,RANK()OVER(ORDERBYCASEWHENbb.z__pivot_col_rank=1THEN(CASEWHENbb.`events.count`ISNOTNULLTHEN0ELSE1END)ELSE2END,CASEWHENbb.z__pivot_col_rank=1THENbb.`events.count`ELSENULLENDDESC,bb.`events.count`DESC,bb.z__pivot_col_rank,bb.`events.income`)ASz___rankFROM(SELECT*,DENSE_RANK()OVER(ORDERBYCASEWHENww.`events.gender`ISNULLTHEN1ELSE0END,ww.`events.gender`)ASz__pivot_col_rankFROM(SELECTevents.genderAS`events.gender`,events.incomeAS`events.income`,COUNT(*)AS`events.count`FROMadtech.eventsASeventsWHERE(events.income<>'unknown'ORevents.incomeISNULL)GROUPBY1,2)ww)bbWHEREbb.z__pivot_col_rank<=16384)aa)xx)zzWHERE(zz.z__pivot_col_rank<=50ORzz.z__is_highest_ranked_cell=1)AND(zz.z___pivot_row_rank<=500ORzz.z__pivot_col_ordering=1)ORDERBYzz.z___pivot_row_rank;
Enter fullscreen modeExit fullscreen mode

With this query, we have the option of creating aVisualization. We'll create a table.

Final Dashboard

Figure 5 shows an example of the charts being sized and positioned on the AdTech dashboard. We'll set the auto-refresh option to 1 minute.

Figure 5. Final Dashboard.

Figure 5. Final Dashboard.

Summary

In this article, we have created a pipeline in SingleStoreDB to connect to a Kafka broker. We have identified a problem with the incoming data and modified our pipeline to handle the issue. We have also used Metabase to create a quick visual dashboard to help us gain insights into our Ad Campaign. For additional details, please see the SingleStore documentation on how toLoad and Analyze AdTech Data.

Appendix

CREATEDATABASEIFNOTEXISTSadtech;USEadtech;CREATETABLEevents(user_idINT,event_nameVARCHAR(128),advertiserVARCHAR(128),campaignINT(11),genderVARCHAR(128),incomeVARCHAR(128),page_urlVARCHAR(128),regionVARCHAR(128),countryVARCHAR(128),SORTKEYadtmidx(user_id,event_name,advertiser,campaign),SHARDKEYuser_id(user_id));CREATEREFERENCETABLEcampaigns(campaign_idSMALLINT(6)NOTNULLDEFAULT'0',campaign_nameVARCHAR(255)CHARACTERSETutf8COLLATEutf8_general_ciDEFAULTNULL,PRIMARYKEY(campaign_id));INSERTINTOcampaignsVALUES(1,'demand great'),(2,'blackout'),(3,'flame broiled'),(4,'take it from a fish'),(5,'thank you'),(6,'designed by you'),(7,'virtual launch'),(8,'ultra light'),(9,'warmth'),(10,'run healthy'),(11,'virtual city'),(12,'online lifestyle'),(13,'dream burger'),(14,'super bowl tweet');CREATEPIPELINEeventsASLOADDATAKAFKA'public-kafka.memcompute.com:9092/ad_events'BATCH_INTERVAL2500INTOTABLEeventsFIELDSTERMINATEDBY'\t'ENCLOSEDBY''ESCAPEDBY'\\'LINESSTARTINGBY''(user_id,event_name,advertiser,campaign,gender,income,page_url,region,country);ALTERPIPELINEeventsSETOFFSETSEARLIEST;TESTPIPELINEeventsLIMIT1;DROPPIPELINEevents;CREATEPIPELINEeventsASLOADDATAKAFKA'public-kafka.memcompute.com:9092/ad_events'BATCH_INTERVAL2500IGNOREINTOTABLEeventsFIELDSTERMINATEDBY'\t'ENCLOSEDBY''ESCAPEDBY'\\'LINESSTARTINGBY''(user_id,event_name,advertiser,campaign,gender,income,page_url,region,country);ALTERPIPELINEeventsSETOFFSETSEARLIEST;TESTPIPELINEeventsLIMIT1;STARTPIPELINEevents;-- Total Number of EventsSELECTCOUNT(*)FROMevents;-- Events by RegionSELECTevents.countryAS`events.country`,COUNT(events.country)AS'events.countofevents'FROMadtech.eventsASeventsGROUPBY1;-- Events by Top 5 AdvertisersSELECTevents.advertiserAS`events.advertiser`,COUNT(*)AS`events.count`FROMadtech.eventsASeventsWHERE(events.advertiserLIKE'%Subway%'ORevents.advertiserLIKE'%McDonals%'ORevents.advertiserLIKE'%Starbucks%'ORevents.advertiserLIKE'%Dollar General%'ORevents.advertiserLIKE'%YUM! Brands%'ORevents.advertiserLIKE'%Dunkin Brands Group%')GROUPBY1ORDERBY`events.count`DESC;-- Ad Visitors by Gender and IncomeSELECT*FROM(SELECT*,DENSE_RANK()OVER(ORDERBYxx.z___min_rank)ASz___pivot_row_rank,RANK()OVER(PARTITIONBYxx.z__pivot_col_rankORDERBYxx.z___min_rank)ASz__pivot_col_ordering,CASEWHENxx.z___min_rank=xx.z___rankTHEN1ELSE0ENDASz__is_highest_ranked_cellFROM(SELECT*,Min(aa.z___rank)OVER(PARTITIONBYaa.`events.income`)ASz___min_rankFROM(SELECT*,RANK()OVER(ORDERBYCASEWHENbb.z__pivot_col_rank=1THEN(CASEWHENbb.`events.count`ISNOTNULLTHEN0ELSE1END)ELSE2END,CASEWHENbb.z__pivot_col_rank=1THENbb.`events.count`ELSENULLENDDESC,bb.`events.count`DESC,bb.z__pivot_col_rank,bb.`events.income`)ASz___rankFROM(SELECT*,DENSE_RANK()OVER(ORDERBYCASEWHENww.`events.gender`ISNULLTHEN1ELSE0END,ww.`events.gender`)ASz__pivot_col_rankFROM(SELECTevents.genderAS`events.gender`,events.incomeAS`events.income`,COUNT(*)AS`events.count`FROMadtech.eventsASeventsWHERE(events.income<>'unknown'ORevents.incomeISNULL)GROUPBY1,2)ww)bbWHEREbb.z__pivot_col_rank<=16384)aa)xx)zzWHERE(zz.z__pivot_col_rank<=50ORzz.z__is_highest_ranked_cell=1)AND(zz.z___pivot_row_rank<=500ORzz.z__pivot_col_ordering=1)ORDERBYzz.z___pivot_row_rank;
Enter fullscreen modeExit fullscreen mode

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

Trending onDEV CommunityHot

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp