Omar Baig
Data Scientist
Summary Profile

I am a curious, solution-oriented Data Scientist seeking to assist end-to-end decision making. After earning a degree in Computer Science at the University of Georgia, I began using my programming skills at Griffin & Strong. I communicate with expert lawyers and economists who specialize in Disparity Studies and automate their research methodology using Python. So far, I've created unique data pipelines to clean and analyze three different government's procurement data. I am currently building a Natural Language Processing model for categorizing data which reduces the need for difficult manual work. My focus is always on making sure any software or analysis platform can be deployed on any computer, locally and on the cloud. 

Work Experience
Griffin & Strong P.C.Data Analyst
Nov. 2018 - Current
  • Performs multifaceted analysis on financial data from public organizations
  • Translates research methodology into software and adapts models to each organization's circumstances
  • Isolates systematic problems with regards to data gaps and anomalies
  • Communicates with external organizations to collect data
  • Improves workflow by implementing solutions using new technologies

Work Category Text Classifier

An NLP machine learning algorithm which classifies procurement data into one of five work categories. NLTK is used to process the manually categorized data records and then train the classification model.

Pandas Profiler

A Python software allowing non-programmers to take advantage of the open-source "pandas_profiling" library which creates comprehensive HTML summaries of raw data files. This summary includes data warning, missing values, and correlations. The front-end was built using PyQt and can run on any operating system.

Equity Lib

A Python library containing pre- and post-processing algorithms for wrangling common problem patterns in procurement data. The primary problem solved is resolving entities in joined data systems.

Forecasting Avocado Prices

An analysis of historical avocado price data implemented using the open-source Prophet library to predict future prices

PASSNYC: Data Science for Good

An analysis of NYC public schools which provides a recommendation to PASSNYC for how they should distribute their services to improve the diversity of the specialized high schools

Programming: Python, C, Java
Data Analysis: Pandas, Plotly, Dash, Altair, iPyWidgets, Superset, Voila
Database/Big Data: SQL, NoSQL, Spark
Machine Learning: Scikit-learn, NLTK, Tensorflow
Workflow: Linux, Emacs, Org-mode, Git
Cloud: AWS, GCP
Devops: Docker, Kubernetes, Terraform
University of Georgia
Bachelor of Science Computer Science 2018
• Cumulative GPA: 3.55 (Cum Laude)