Data Science Course Objectives
1. Assess, Structure, manage and implement large data sets derived from real-time sources.
1. Mogi Lounge Analysis - I do an extensive data analysis on a large scale data set from the 13th season of Mogi Lounge, a competitive mario kart wii braket. I am using python libraries such as matplotlib, seaborn, numpy, all in a jupyter notebook format.​​

2. WWMogi (SIP Project)
2. Compile data from multiple sources, including data selection, data scrubbing and feature engineering, with an emphasis on tidy data and the tidyverse.
1. Mogi tabling in SQL - In this video, I demonstrate the use of sql queries on a large scale database (same database as in objective one). For this demonstration I am using SQLite.

2. GDP to Life Expectancy Analysis
3. Apply statistical tests and tools appropriately to analyze data sets drawn from different types of sources (nature, humankind, organizations, etc.), making inferences and projections from the data.
1. GDP to life expectancy analysis - In this project I demonstrate my skills in spreadsheets (google sheets). I take a dataset from the WHO (World Health Organization) and find the correlation between GDP and life expectancy worldwide. I also take data from 2000 to 2015 in Vietnam and made an accurate projection of Vietnam's life expectancy in 2025.
Github link:
Video Link:
2. Mogi tabling in SQL
4. Create visualizations of large data sets in ways that clarify understanding of their meaning and implications.
WWMogi (SIP Project) - This is a webscraper that calls an API, gets a JSON file, converts the file to a dataframe, selects important columsn from that dataframe and creates a new JSON file to be displayed on a GUI.
Current Product
Future Improvements



Video Link:
2. Mogi Lounge Analysis
Credits:
Preston Chapman for helping me with the API call functionality
5. Design and implement big data, artificial intelligence, and statistical and visual analysis solutions that provide people and organizations with understanding, guidance and options drawn from the data.
1. Sentiment Analysis - In this project, I create a synthetic data set of some laptop reviews and perform sentiment analysis with a NLP (Natural Language Processing) library in python. I derive sentiments from the reviews, categorize them into, "positive", "neutral", and "negative". I then visualize the categories and highlight some instructive, negative reviews to give constructive feedback.
Github link:
Video Link:
2. Employee Retention Analysis - This project is an analysis of an AI enhanced dataset of employees at a company. The set has important information on employee data that can be related to retention. I identify key features and give the turnover percentage. I then offer guidance of how to solve the turnover issues based on the analysis.
Github link:
6. Demonstrate best practices regarding digital privacy and ethical use of personal information.
1. Hashing Lab - For this objective, the GitHub link will point to a hashing lab that I completed as part of a digital forensics class. I showcase verifying file integrity through hash preservation.
Github link:
2. File Encryption Lab - The GitHub link will take you to a file encryption lab I did for a network security class. In this document I showcase using WireShark (a packet sniffing software) for https scanning, packet sniffing, and encrypting a file.










