See all developers
Daniel K Senior Software Infrastructure Engineer

Daniel K

  • $
  • Singapore
  • 4 years
  • Full-time (40 hrs/week)
Daniel is now available for hire Hire Daniel

About Me

I’m a seasoned infrastructure engineer with 3 years of experience. I have worked with team from big tech company like TikTok. Recently, I worked in ArtaFinance, a financial tech startups that provides RoboInvest service and private equity investment.

My expertise covers:

  • Kubernetes - I've spear-headed the design and automation of provisioning production-grade Kubernetes clusters, implementing network security policies, and hardening cluster security.
  • CI/CD Pipeline - I can built and maintain build release system, continuous integration and continuous deployment pipelines using familiar tools: Jenkins, ArgoCD, Github CI Actions, Gitlab CI Runner, and Docker. For modern CD stack, I usually go with ArgoCD that implements GitOps concepts and plays well with Kubernetes.
  • Monitoring and Logging - I have  robust system monitoring and observability solutions, utilizing technologies such as Prometheus, Grafana, and PagerDuty, to help with fine-tuning application performance and proactive incident management.
  • IAC - I have expertise in infrastructure-as-code for provisioning and automating infrastructure management using Pulumi (Typescript) and Terraform (HCL).
    • Significantly reducing infrastructure setup time with CI/CD automation.
    • Make provisioning repeat-able and auditable.
    • Improving infrastructure reliability.

Skills and experiences

Skills and experiences

Senior Infrastructure Engineer

Trading Infra team, ArtaFinance Inc August 2022 - Present

Work scope and contribution:

  • Ledger System: Architect-ed and developed ledger system that serves as source of truth for all transactions and implements strict business rules governing all money movements, balance sheet, & accounts states. The ledger system uses strong cryptographic data protection and tamper evident storage system for storing PII and business confidential information using Google Spanner backend.

  • Trade Order Processor: Architect-ed and built from ground-up, an event-driven system queue for processing ad-hoc trading and block trading orders. The system is capable for scheduling future events as well as processing event on demand. The system also records history of processed events and causes for entity changes. Scaled and optimizes system for concurrent processing to handle up to 2,000 tps.

  • Trading Readiness Check & Orders Safeties: Designed and built safeties rule engine check for trade order readiness and trade safeties for direct and block trade order safeties. Trade order readiness check that trades and accounts are reconciled the previous day. Safeties rule engine serves as last line of safeties defense against spurious trade orders being sent to our custodian rapidly in a short amount of time, such that human (FinOps) intervention would not be quick enough to prevent costly errors. Modules are implemented as middlewares and enforced in staging and production servers.

  • Trade Monitoring and Observability: Employed metrics service (Prometheus) for monitoring trading service health and systemic risk. Integrated alerts for trades issues and risk-safeties failures for quarantined trades to pager duty and slack channel.

Tech stack: Python, Typescript, Apache Airflow, Apache Beam, Google Dataflow, Google BigQuery, Google Kubernetes Engine (GKE)

Senior Data Platform Engineer

ByteHouse team, TikTok Inc July 2020 - June 2022

Work scope and contribution:

  • Launched Bytehouse 1.0: A SaaS cloud-compute data warehousing platform by TikTok, Bytedance. Battle-tested by TikTok Ads engineering team.

  • SQL Gateway Service: Architected and built from ground-up a SQL Gateway (TCP and HTTP) service that processes and routes client SQL queries into virtual warehouse cluster. Optimized data transfer throughput to reach 655 MB/s data transfer rate from client to data warehouse storage and vice versa. Integrated monitoring, service profiling and distributed tracing for observability and SLOs evaluation.

  • Data Express System: Architected and built from ground-up a data orchestration system that runs asynchronous workload of data loading job that transfer data from customer data source into ByteHouse storage. Supports data inflow and outflow from file-upload, Kafka connect, AWS S3, and Hive with various data formats.

  • Virtual Warehouse Usage Billing: Architected and built data pipelines for capturing usage metering from customers’ virtual warehouse clusters and store metrics in InfluxDB. Serves usage data in billing dashboard and for auto-recurring payment charge. Optimizes time-series aggregate queries latency for retrieving summary of usages over bucket of time-windows down to P95 200 ms

Tech stack: Golang, Jaeger Tracing, InfluxDB, Prometheus, Grafanna, Victoria Metrics, Apache Kafka, Apache Spark, ClickHouse

Machine Learning Engineer

KYC team, Gojek February 2020 - June 2020

Work scope and contribution:

  • OCR ML Model for KYC: Developed OCR models to parse Gojek Drivers Identity Card as part of KYC onboarding flow using simple OpenCV; improved model recall by 2% and latency to P95 400ms

  • Object Detection for Go Screen Ads: Trained Object Detection models for pedestrian scenes and deployed trained models to LCD screen device using ONNX and Torchscript. Benchmarked off-the-shelf model like Retina Net and Yolov2 Net and employed Kalman filtering to track detected objects once bounding boxes are generated.

  • MLFlow Wrapper: Merlin: Took part in development and release of Merlin python3 SDK to deploy ML instances to staging and production clusters in GCP. SDK helped orchestrate model releases, deployment, and evaluation. SDK is built on top of MLFlow library.

Tech stack: C++, Python, MLFlow, Pytorch, Tensorflow, GKE

Software Engineer

Automation team, Zendesk January 2019 - June 2019

Work scope and contributions

  • Zopim Automation Framework: Automation testing framework: Developed and maintained Zendesk end-end automation testing tools that manages and runs rigorously 60+ API contract tests and 100 scenarios for UI tests covering 6 different Zendesk product suites. Employed page-object model and factory that keeps tests code clean and independent of the UI changes for the page; only the page object needs to change. Written using Selenium framework.

  • Zopim Automation Dashboard: Test Reporting, Troubleshooting and Observability: Spear-headed test reporting and troubleshooting by building integration with Saucelabs in the testing framework and configured structured test logs, test metrics, page screenshots and test outcome reporting dashboard. Automation to hook Slack message notification and JIRA tickets generation upon test failures.

  • Jenkins Pipelines: Setup Jenkins in Kubernetes cluster and orchestrate Jenkins build automation pipeline for end-to-end testing in staging and production pods. Configured administrative and third-party auth credentials, setup identity-aware proxy for accessing Jenkins dashboard, and employed HPA to scale in/out Jenkins worker pods to balance between on-demand usage spike and cost.

Educations and Certifications

Nanyang Technological University

Electrical and Electronic Engineering July 2016 - June 2020

Bachelor of Engineering - Electrical and Electronic Engineering. Specializing in Signal Processing.

  • Final Thesis Title: Deep-Learning in Video Denoising with GAN
  • First Class Honours; Final CGPA of 4.73/5.00
Want to hire Daniel K or just want to talk? Schedule chat with Daniel

Other developers