About

Hi, I am a Data Engineer 🚀

Skills

“A reflection on my past accomplishments and experiments.”

Language
  • Python: PySpark, Pandas / Polars (Robust single node data processing), Django / Flask (Backend system), Streamlit
  • Scala: Spark, Spark Streaming
  • SQL
Database
  • Relational Database (MySQL, PostgreSQL)
  • No structured Database (MongoDB, Cassandra, Redis)
Data Visualization
  • Enterprise: Microsoft Power BI
  • Open-source: Superset, Metabase, Grafana, Plotly, Matplotlib (Experienced with server configuration & BI Dashboard embedding)
Data Processing
  • Apache Airflow (Orchestration), Airbyte (Data ingestion layer), DBT (Data Builder Tool) (Transformation layer)
  • Spark, Spark Streaming (Scala) - Distributed data processing
  • Flask backend API: Data-driven Application, Streamlit: Data product
  • Enterprise: Azure Data Factory, Databricks flow (Microsoft Azure)
Data Warehouse
  • ClickHouse (columnar data warehouse)
  • Hive Metastore on Databricks
  • Delta Lake (Lake house architecture)
Data Lake – Object Storage
  • Enterprise: Google Cloud Storage, Microsoft Azure Data Lake Storage
  • Open-source: MinIO
Cloud Service
  • Microsoft Azure (DataBricks, Azure Data Lake, Azure Data Warehouse)
  • AWS Cloud (EC2, RDS, S3)
  • Google Cloud (Google Cloud Storage, Google Big Query, Google Kubernetes Engine, Google Sheet API, Google Drive API) – Docker
  • Hosting: Cloudflare
Data Operations
  • Google Kubernetes Engine, Docker
  • Data Catalog: Datahub, dbdiagrams, dbt documents sever
  • Data Quality: Great Expectation (dbt_expectation)

My Journey

Senior Data Engineer - FPT Software | May 2024 - Now

Updating…

Data Engineer - Advesa Digital & Breadstack Technologies Company | Oct 2022 - May 2024

  • Scope of works: Data Engineer, Data Architecture, Data Governance.
  • General responsibilities:
    • Build Data warehouse architectures, manage the data model designs, and data lineages
    • Develop ETL & ELT pipelines to ingest data from internal sources & external sources
    • Propose, develop, and manage Data Catalog system & Metadata for all Data repositories and Data products to deliver analytics results to Business Users.
  1. Design and manage all data architecture for these projects:
    • Sales & Marketing Data warehouse project: Clickhouse on Google Kubernetes Engine, Apache Airflow + DBT + Polars (ELT system), Google Cloud Storage, MongoDB, PostgreSQL, Power BI/ Metabase, Data Catalog system with Open Metadata.
    • HR project Data warehouse project: MongoDB, Mongo Atlas, Airflow, Streamlit, MinIO, PostgreSQL
    • Data products: Streamlit + Google Drive API (Data Collector), Slackbot (Using Flask as event handling)
  2. Data sources integrations:
    • Google Analytics, CRM, ERP system, Klaviyo, Social Platforms (Facebook / Instagram / Twitter / Linkedin)
    • Company’s CRM softwares: Breadstack, Chatso.
    • Task management: Jira, Trello.

Technology usages: ClickHouse · Apache Superset · Data Warehousing · Data Engineering · Mongo Atlas · MinIO · SQL · Streamlit · Python (Programming Language) · Data Modeling · Apache Airflow · PostgreSQL · MongoDB

Analytics Engineering Mentor - FUNiX Technologies School | Jul 2023 - Now

  • Empowered Future Data Professionals: Mentored at FUNiX School, providing practical guidance in data analytics and engineering.

ALM Specialist - TPBank | Jul 2022 - Oct 2022

  • General responsibility:
    • Build Python module to auto-cleanse data, build reports, and ETL data automatically
    • ALCO report / GAP report.
    • Daily FTP data management.
    • Coordinating implementation of liquidity management, optimizing cash flow on the balance sheet scale.
    • Make management reports as required and assigned.
    • Build automatic EDA with Python engine, transform old report to BI visualization. Technology usages: Python · PowerBI · SQL · Qlikview

Data Analyst - Hong Ngoc Group | Apr 2020 - May 2022

  • General responsibility:
    • Manage Services’ cost components, Services’ pricing.
    • Work with BE team to improve the data system.
    • Building report dashboard, make ad-hoc by MS. Excel & Power BI
    • Cleansing data & Analyze Revenue, Profit, ROS and KPI of branches.
    • Improve and build the whole new Report dashboard, and improve data processing method of the team.

Validation

IBM Data Engineering Professional Certificate   Certification Icon

“For me, certificates are not an end in themselves. As far as I am concerned, it’s about the progress to achieve all of that”. Please visit Linkedin to see all my certificates:   Certification Icon