Omar Baig
Data Scientist
Summary Profile

I am a curious, solution-oriented Data Scientist seeking to assist end-to-end decision making. After earning a degree in Computer Science at the University of Georgia, I began using my programming skills at Griffin & Strong. I communicate with expert lawyers and economists who specialize in Disparity Studies and automate their research methodology using Python. So far, I've created unique data pipelines to clean and analyze three different government's procurement data. The final analysis now serves as a legal factual predicate. Please visit my website to see examples of my work.

Work Experience
Griffin & Strong P.C.Data Scientist
May 2020 - Current

  • Leads efforts to take on time consuming and costly problems using automation and machine learning
  • Creates ETL pipelines for reproducible transformations of data
  • Standardizes operating practices for efficient and validated research
  • Mentors junior analyst in improving their analysis process

Griffin & Strong P.C.Data Analyst
Nov. 2018 - May 2020
  • Performs multifaceted analysis on collected data from governmental organizations
  • Translates research methodology into software and adapts models to each organization's constraints
  • Isolates data gaps and anomalies and cleans data for reproducible analysis
  • Communicates with clients to collect data and present findings 
Projects
Disparity Analysis

The completed disparity analysis of the City of Chattanooga, Cuyahoga County, and Mecklenburg procurement data. The analysis is meant to serve as a factual predicate for proposed policy changes and inclusion programs. This includes everything from data collection to analysis. The findings were then presented to stakeholders within these organizations.

Data Profiler

A Python software allowing non-programmers to take advantage of the open-source "pandas_profiling" library which creates comprehensive HTML summaries of raw data files. This summary includes data warning, missing values, and correlations. The front-end was built using PyQt and can run on any operating system.

Equity Lib

A Python library containing pre- and post-processing algorithms for wrangling common problem patterns in Excel files. The primary problem solved is resolving entities in joined data systems and traceability of rule based decisions.

Forecasting Avocado Prices

An analysis of historical avocado price data implemented using the open-source Prophet library to predict future prices

PASSNYC: Data Science for Good

An analysis of NYC public schools which provides a recommendation to PASSNYC for how they should distribute their services to improve the diversity of the specialized high schools

Tools
Programming: Python, R, C++, Java
Data Analysis: Pandas, Plotly, Dash, Altair, Seaborn, Superset, ggplot2
Database/Big Data: SQL, NoSQL, Spark
Machine Learning: Scikit-learn, NLTK, Tensorflow
Workflow: Bash, Git, Jupyter, Org-mode, Unix, Emacs, Kedro
Cloud: AWS, GCP
Devops: Docker, Kubernetes, Terraform
Education
University of Georgia
Bachelor of Science Computer Science 2018
• Cumulative GPA: 3.55 (Cum Laude)