About

Hi, I am a Data Engineer 🚀

Skills

“A reflection on my past accomplishments and experiments.”

Language
  • Python: PySpark, Pandas / Polars (Robust single node data processing), Django / Flask (Backend system), Streamlit
  • Scala: Spark, Spark Streaming
  • SQL
Database
  • Relational Database (MySQL, PostgreSQL)
  • No structured Database (MongoDB, Cassandra, Redis)
Data Visualization
  • Enterprise: Microsoft Power BI
  • Open-source: Superset, Metabase, Grafana, Plotly, Matplotlib (Experienced with server configuration & BI Dashboard embedding)
Data Processing
  • Apache Airflow (Orchestration), Airbyte (Data ingestion layer), DBT (Data Builder Tool) (Transformation layer)
  • Spark, Spark Streaming (Scala) - Distributed data processing
  • Flask backend API: Data-driven Application, Streamlit: Data product
  • Enterprise: Azure Data Factory, Databricks flow (Microsoft Azure)
Data Warehouse
  • ClickHouse (columnar data warehouse)
  • Hive Metastore on Databricks
  • Delta Lake (Lake house architecture)
Data Lake – Object Storage
  • Enterprise: Google Cloud Storage, Microsoft Azure Data Lake Storage
  • Open-source: MinIO
Cloud Service
  • Microsoft Azure (DataBricks, Azure Data Lake, Azure Data Warehouse)
  • AWS Cloud (EC2, RDS, S3)
  • Google Cloud (Google Cloud Storage, Google Big Query, Google Kubernetes Engine, Google Sheet API, Google Drive API) – Docker
  • Hosting: Cloudflare
Data Operations
  • Google Kubernetes Engine, Docker
  • Data Catalog: Datahub, dbdiagrams, dbt documents sever
  • Data Quality: Great Expectation (dbt_expectation)

My Journey

Senior Data Engineer - FPT Software | Apr 2024 - Now

Updating…

Data Engineer - Advesa Digital & Breadstack Technologies Company | Oct 2022 - Apr 2024

  • Scope of works: Data Engineer, Data Architecture, Data Governance.
  • General responsibilities:
    • Build Data warehouse architectures, manage the data model designs, and data lineages
    • Develop ETL & ELT pipelines to ingest data from internal sources & external sources
    • Propose, develop, and manage Data Catalog system & Metadata for all Data repositories and Data products to deliver analytics results to Business Users.
  1. Design and manage all data architecture for these projects:
    • Sales & Marketing Data warehouse project: Clickhouse on Google Kubernetes Engine, Apache Airflow + DBT + Polars (ELT system), Google Cloud Storage, MongoDB, PostgreSQL, Power BI/ Metabase, Data Catalog system with Open Metadata.
    • HR project Data warehouse project: MongoDB, Mongo Atlas, Airflow, Streamlit, MinIO, PostgreSQL
    • Data products: Streamlit + Google Drive API (Data Collector), Slackbot (Using Flask as event handling)
  2. Data sources integrations:
    • Google Analytics, CRM, ERP system, Klaviyo, Social Platforms (Facebook / Instagram / Twitter / Linkedin)
    • Company’s CRM softwares: Breadstack, Chatso.
    • Task management: Jira, Trello.

Technology usages: ClickHouse · Apache Superset · Data Warehousing · Data Engineering · Mongo Atlas · MinIO · SQL · Streamlit · Python (Programming Language) · Data Modeling · Apache Airflow · PostgreSQL · MongoDB

Data Engineer & Analyst Mentor - FUNiX Technologies School | Jul 2023 - Now

  • Empowered Future Data Professionals: Mentored at FUNiX School, providing practical guidance in data analytics and engineering.

ALM Specialist - TPBank | Jul 2022 - Oct 2022

  • General responsibility:
    • Build Python module to auto-cleanse data, build reports, and ETL data automatically
    • ALCO report / GAP report.
    • Daily FTP data management.
    • Coordinating implementation of liquidity management, optimizing cash flow on the balance sheet scale.
    • Make management reports as required and assigned.
    • Build automatic EDA with Python engine, transform old report to BI visualization. Technology usages: Python · PowerBI · SQL · Qlikview

Data Analyst - Hong Ngoc Group | Apr 2020 - May 2022

  • General responsibility:
    • Manage Services’ cost components, Services’ pricing.
    • Work with BE team to improve the data system.
    • Building report dashboard, make ad-hoc by MS. Excel & Power BI
    • Cleansing data & Analyze Revenue, Profit, ROS and KPI of branches.
    • Improve and build the whole new Report dashboard, and improve data processing method of the team.

Validation

AWS Certified Data Engineer – Associate   Certification Icon

IBM Data Engineering Professional Certificate   Certification Icon

“For me, certificates are not an end in themselves. As far as I am concerned, it’s about the progress to achieve all of that”. Please visit Linkedin to see all my certificates:   Certification Icon