Author: bebok ze klopsztangi

  • Building a Marketing Data Platform

    Building a Marketing Data Platform

    How we connected 30 price comparison APIs, BigQuery, and Azure to create a unified marketing data hub

    How we connected 30 price comparison APIs, BigQuery, and Azure to create a unified marketing data hub

    In today’s performance marketing, data isn’t just numbers — it’s decisions, budgets, and actual revenue.
    But what if your cost data is scattered across dozens of different sources?

    In this project, we set out to solve that problem by building an end-to-end marketing data platform that connects cost data from 30+ price comparison websites, advertising platforms, CRM systems, and analytics exports into one unified ecosystem.


    🧩 Project Overview

    Our main goal was to create a central tool for marketing cost integration and attribution — a system that both marketing and category managers could rely on.

    We aimed to:

    • eliminate data silos between marketing channels,
    • automate data collection and processing,
    • and implement consistent attribution models (Last Click, Time Decay, U-Shape).

    ☁️ Architecture at a Glance

    Below is a simplified overview of the system architecture:

    1. Company Data Warehouse (Azure) – primary source for CRM and product data.
    2. Processing Layer – Databricks & Cloud Functions – collects and processes external API data.
    3. Storage (Staging) – Google Cloud Storage – intermediate Parquet data layer.
    4. BigQuery – main analytical data warehouse.
    5. Dataform – for data modeling and orchestration.
    6. Reporting Layer – Power BI & Looker Studio – final dashboards for business users.

    🖼️ [Placeholder: insert architecture diagram — e.g., GCP + Azure + Databricks + BI flow chart]


    🔄 From APIs to Insights — How the Data Flows

    1. Collecting Cost Data

    We pull cost and performance data from 30+ price comparison APIs (including Kelkoo, Zbozi, Ceneo, and others).
    Data ingestion and normalization happen in Databricks, allowing us to scale workloads and parallelize API calls efficiently.

    💻 [Placeholder: short code snippet — example Databricks notebook calling external APIs and writing to Parquet]

    2. Storage & Integration

    All raw data is saved in Parquet format inside Google Cloud Storage (staging) for efficient querying and downstream use.

    Next, we load this data into BigQuery, where it’s joined with:

    • GA4 export tables,
    • Google Ads and Facebook Ads cost data,
    • and product data from Azure Data Warehouse.

    3. Data Modeling & Attribution

    Inside Dataform, we orchestrate and model the data transformations.
    This includes:

    • standardizing table structures,
    • unifying cost and revenue sources,
    • and enriching with attribution models — Last Click, Time Decay, and U-Shape.

    💻 [Placeholder: short SQL or Dataform snippet — attribution logic or transformation model]

    This gives us a unified, transparent view of marketing efficiency across all channels.

    4. Currency Exchange Module

    To ensure accuracy across multiple currencies, we built a Currency Exchange module using Google Cloud Functions.
    It periodically fetches exchange rates from the NBP API and updates BigQuery tables automatically.


    📊 Reporting & Business Value

    We separated the reporting layer by user needs:

    • Looker Studio → for marketing teams: performance dashboards, CPC, ROI, and attribution analysis.
    • Power BI → for category managers: product- and margin-oriented reports, combined with CRM insights.

    This architecture enables business teams to:

    • see complete marketing costs in one place,
    • make faster budget decisions,
    • and manage cross-channel attribution with confidence.

    🖼️ [Placeholder: dashboard screenshot — Looker Studio or Power BI view]


    🚀 What’s Next

    Future development areas include:

    • Anomaly detection and alerting for cost spikes,
    • introducing a data-driven attribution model,
    • and deeper automation within Dataform and Databricks.

    💡 Takeaways

    This project shows how a modern, cloud-based data stack can unify scattered marketing sources into a single, actionable platform.
    Instead of juggling CSV exports and spreadsheets, we now have a scalable, automated, and transparent system that empowers teams to focus on insights — not data cleaning.

  • E-commerce webanalytics solution

    E-commerce webanalytics solution

    How we unified GA4, Google Ads, and Search Console in Dataform for true product analytics

    In e-commerce, seeing beyond “traffic” means connecting data from every digital channel — not just knowing that a product is visible, but understanding how it performs in organic search (SEO), paid campaigns (SEM), and on-site conversion.
    This project was designed to break the country-or-channel silo, enabling unified product analytics across SEO, SEM, and website behavior for all our countries at once.

    🧩 Project Overview

    The goal:
    Create an integrated analytics platform that joins Google Analytics 4 (GA4), Google Ads, and Search Console data for product-level insights, covering all markets.
    Key objectives:

    • Analyze product performance in organic search (Search Console), paid ads (Google Ads), and website behavior/conversions (GA4).
    • Enable merchandising, recommendations, and assortment decisions based on real data — not guesses.
    • Optimize ad budgets toward the products and categories with the highest ROI.

    ☁️ Architecture at a Glance

    Architecture stack overview:

    • BigQuery: All analytical tables (nested & repeated fields; ~35M rows) — product, event, stock & price history.
    • Dataform: Data modeling, orchestration, and incremental loads; custom scopes for events, items, and content grouping.
    • GA4 Export: Unified schema for all projects and countries, including e-commerce data and custom channel grouping.
    • Google Search Console: Visibility and click data, product-matched.
    • Google Ads: Campaign costs, conversions; changes ongoing for full product-level matching.
    • Dashboarding: Aggregated views for category managers and marketers.

    🔄 From Raw Sources to Unified Product Insights

    1. Data Ingestion & Storage
      • Weekly exports from GA4, Search Console, and Google Ads into BigQuery, using Dataform’s incremental logic to avoid duplicates and maximize query efficiency.
      • Product, stock, and price history stored as STRUCT/ARRAY types, leveraging nested fields for massive row count reduction (~35M vs. 175M in legacy designs).
        💻 [Placeholder: example Dataform SQL for GA4 incremental event load]
    2. Data Modeling & Transformation
      • Event and item scopes implemented in Dataform, allowing detailed analysis by channel, content group, and custom dimensions.
      • Unified product info combined with ad spend/clicks and SEO visibility.
      • Currency conversion and multi-country logic established; assertions handle duplicate transactions, bot traffic, and missing consent.
        💻 [Placeholder: Dataform assertion or transformation snippet — e.g. checking duplicate transactions]
    3. Quality & Observability
      • Dashboards include tracking quality checks: e-commerce event completeness, pages with high bounce rates or 404 errors, and data consistency between sources.
      • Automated tests verify item/event coverage across all domains each week.
        🖼️ [Placeholder: dashboard screen showing product performance across channels]
        🖼️ [Placeholder: observability panel (404 rates, event completeness)]

    📊 Reporting & Impact

    With integrated data, we gained:

    • A single view of product performance across SEO, SEM, and UX for all markets, not just one at a time.
    • The ability to identify top-converting items and shift ad budgets in real time.
    • Improved merchandising — data-driven recommendations for assortment and promotions.
    • Greater tracking observability: faster detection of data gaps, broken pages, bot issues.

    🚀 What’s Next

    Planned improvements:

    • Full rollout of Search Console integration for all product SKUs.
    • Data-driven attribution enhancements using advanced models in Dataform.
    • More automated observability checks (consent, bot, currency anomalies).
    • Optimization of Google Ads transfer for deeper product-level analytics.

    💡 Takeaways

    This integrated stack now gives the team the ability to analyze and act on real product-level data from every major channel.
    Instead of manually switching queries and reconciling exports for each country, we have a unified, scalable solution — one that supports marketing, merchandising, and tracking teams across the business.
    It’s a major step forward for e-commerce analytics.

  • Bebok je richtig padnyty

    Bebok je richtig padnyty

    Close cooperation with cross-functional teams, including product development, IT, marketing, content, to align data analysis efforts with business goals and priorities and provide data-driven insights to support decision-making across departments