Omar Baig
Data Scientist
Summary Profile

I am a curious, solution-oriented Data Scientist seeking to assist end-to-end decision making. After earning a degree in Computer Science at the University of Georgia, I began using my programming skills at Griffin & Strong. I communicate with expert lawyers and economists who specialize in Disparity Studies and automate their research methodology using Python. So far, I've created unique data pipelines to clean and analyze three different government's procurement data. I am currently building a Natural Language Processing model for categorizing data which reduces the need for difficult manual work. My focus is always on making sure any software or analysis platform is unit tested and can be deployed on any computer, locally and on the cloud. Please visit my website to see examples of my work.

Work Experience
Griffin & Strong P.C.Data Analyst
Nov. 2018 - Current
  • Performs multifaceted analysis on collected data from governmental organizations
  • Translates research methodology into software and adapts models to each organization's constraints
  • Isolates data gaps and anomalies and cleans data for reproducible analysis
  • Communicates with clients to collect data and present findings 
  • Leads efforts to take on time consuming and costly problems using automation


Projects
Disparity Analysis

The completed disparity analysis of the City of Chattanooga, Cuyahoga County, and Mecklenburg procurement data. The analysis is meant to serve as a factual predicate for proposed policy changes and inclusion programs. This includes everything from collecting, cleaning, and analyzing data then presenting the findings.

Data Profiler

A Python software allowing non-programmers to take advantage of the open-source "pandas_profiling" library which creates comprehensive HTML summaries of raw data files. This summary includes data warning, missing values, and correlations. The front-end was built using PyQt and can run on any operating system.

Equity Lib

A Python library containing pre- and post-processing algorithms for wrangling common problem patterns in procurement data. The primary problem solved is resolving entities in joined data systems.

Forecasting Avocado Prices

An analysis of historical avocado price data implemented using the open-source Prophet library to predict future prices

PASSNYC: Data Science for Good

An analysis of NYC public schools which provides a recommendation to PASSNYC for how they should distribute their services to improve the diversity of the specialized high schools

Tools
Programming: Python, R, C++, Java
Data Analysis: Pandas, Plotly, Dash, Altair, Seaborn, Superset, Voila, ggplot2
Database/Big Data: SQL, NoSQL, Spark
Machine Learning: Scikit-learn, NLTK, Tensorflow
Workflow: Jupyter, Emacs, Org-mode, Git, Unix
Cloud: AWS, GCP
Devops: Docker, Kubernetes, Terraform
Education
University of Georgia
Bachelor of Science Computer Science 2018
• Cumulative GPA: 3.55 (Cum Laude)