About
Hi, I am a Data Engineer 🚀
Skills
“A reflection on my past accomplishments and experiments.”
Language
- Python: PySpark, Pandas / Polars (Robust single node data processing), Django / Flask (Backend system), Streamlit
- Scala: Spark, Spark Streaming
- SQL
Database
- Relational Database (MySQL, PostgreSQL)
- No structured Database (MongoDB, Cassandra, Redis)
Data Visualization
- Enterprise: Microsoft Power BI
- Open-source: Superset, Metabase, Grafana, Plotly, Matplotlib (Experienced with server configuration & BI Dashboard embedding)
Data Processing
- Apache Airflow (Orchestration), Airbyte (Data ingestion layer), DBT (Data Builder Tool) (Transformation layer)
- Spark, Spark Streaming (Scala) - Distributed data processing
- Flask backend API: Data-driven Application, Streamlit: Data product
- Enterprise: Azure Data Factory, Databricks flow (Microsoft Azure)
Data Warehouse
- ClickHouse (columnar data warehouse)
- Hive Metastore on Databricks
- Delta Lake (Lake house architecture)
Data Lake – Object Storage
- Enterprise: Google Cloud Storage, Microsoft Azure Data Lake Storage
- Open-source: MinIO
Cloud Service
- Microsoft Azure (DataBricks, Azure Data Lake, Azure Data Warehouse)
- AWS Cloud (EC2, RDS, S3)
- Google Cloud (Google Cloud Storage, Google Big Query, Google Kubernetes Engine, Google Sheet API, Google Drive API) – Docker
- Hosting: Cloudflare
Data Operations
- Google Kubernetes Engine, Docker
- Data Catalog: Datahub, dbdiagrams, dbt documents sever
- Data Quality: Great Expectation (dbt_expectation)
My Journey
Senior Data Engineer - FPT Software | Apr 2024
- Now
Updating…
Data Engineer - Advesa Digital & Breadstack Technologies Company | Oct 2022
- Apr 2024
- Scope of works: Data Engineer, Data Architecture, Data Governance.
- General responsibilities:
- Build Data warehouse architectures, manage the data model designs, and data lineages
- Develop ETL & ELT pipelines to ingest data from internal sources & external sources
- Propose, develop, and manage Data Catalog system & Metadata for all Data repositories and Data products to deliver analytics results to Business Users.
- Design and manage all data architecture for these projects:
- Sales & Marketing Data warehouse project: Clickhouse on Google Kubernetes Engine, Apache Airflow + DBT + Polars (ELT system), Google Cloud Storage, MongoDB, PostgreSQL, Power BI/ Metabase, Data Catalog system with Open Metadata.
- HR project Data warehouse project: MongoDB, Mongo Atlas, Airflow, Streamlit, MinIO, PostgreSQL
- Data products: Streamlit + Google Drive API (Data Collector), Slackbot (Using Flask as event handling)
- Data sources integrations:
- Google Analytics, CRM, ERP system, Klaviyo, Social Platforms (Facebook / Instagram / Twitter / Linkedin)
- Company’s CRM softwares: Breadstack, Chatso.
- Task management: Jira, Trello.
Technology usages: ClickHouse · Apache Superset · Data Warehousing · Data Engineering · Mongo Atlas · MinIO · SQL · Streamlit · Python (Programming Language) · Data Modeling · Apache Airflow · PostgreSQL · MongoDB
Data Engineer & Analyst Mentor - FUNiX Technologies School | Jul 2023
- Now
- Empowered Future Data Professionals: Mentored at FUNiX School, providing practical guidance in data analytics and engineering.
ALM Specialist - TPBank | Jul 2022
- Oct 2022
- General responsibility:
- Build Python module to auto-cleanse data, build reports, and ETL data automatically
- ALCO report / GAP report.
- Daily FTP data management.
- Coordinating implementation of liquidity management, optimizing cash flow on the balance sheet scale.
- Make management reports as required and assigned.
- Build automatic EDA with Python engine, transform old report to BI visualization. Technology usages: Python · PowerBI · SQL · Qlikview
Data Analyst - Hong Ngoc Group | Apr 2020
- May 2022
- General responsibility:
- Manage Services’ cost components, Services’ pricing.
- Work with BE team to improve the data system.
- Building report dashboard, make ad-hoc by MS. Excel & Power BI
- Cleansing data & Analyze Revenue, Profit, ROS and KPI of branches.
- Improve and build the whole new Report dashboard, and improve data processing method of the team.
Validation
“For me, certificates are not an end in themselves. As far as I am concerned, it’s about the progress to achieve all of that”. Please visit