i.am.aiAI Expert Roadmap

2022. 11. 19. 19:54레퍼런스/Tech : 기술

i.am.aiAI Expert Roadmap
https://i.am.ai/roadmap/#note

 

AI Roadmap

Follow these roadmaps to become an Artificial Intelligence expert.

i.am.ai

 

required for any path

Git 

Sementic versioning 

keep a changelog 

 

fundamentals - Basics 

Matrics & LinearAlgebra Fundamentals 

Database Basics

    - Relational vs non-relational databases 

    - SQL + Joins (Inner, Outer, Cross, Theta Join) 

    - NoSQL 

Tabular Data 

DataFrame & Series 

Extract, Transform, Load (ETL)

Reporting vs BI vs Analytics 

Data Formats 

    - JSON / XML / CSV 

Regular Expressions (RegEx) 

 

Python Programming 

Python Basics

    - Expressions 

    -  Variables 

    - Data Structures 

    - Functions 

    - install packages (via pip, conda or similar) 

    - Codestyle, e.g. PEP8 

Important libraries 

    - Numpy 

    - Pandas 

Virtual Enviroments 

Jupyter Notebooks / Lab 

 

Data Sources 

Data Mining 

Web Scraping

Awesome Public Datasets 

Kaggle

 

Exploratory Data Analysis / Data Munging / - Wrangling 

Principal Component Analysis (PCA) 

Dimensionality & Numerosity Reduction 

Normalization 

Data Scrubbing, Handling Missing Values 

Unbiased Estimators 

Binning sparse values

Feature Extraction 

Denoising 

Sampling 

 

DataScience Roadmap 

Statistics 

Probability Theory

    - Randomness, random variable and random sample 

    - Probability distribution

    - Conditional probability and Bayes' theorem 

    - (Statistical) Independence 

    - iid 

    - cdf, pdf, pmf 

       - Cumulative distribution function (cdf)  

       - Probabiltiy density function (pdf)  

       - Probability mass function (pmf) 

Continuous distributions (pdf's) 

    - Normal / Gaussian 

    - Uniform (continuous)  

    - Beta  

    - Dirichlet 

    - Exponential  

    - χ² (Chi-squared) 

Discrete distributions (pmf's) 

    - Uniform (discrete) 

    - Binomial 

    - Multinomial 

    - Hypergeometric 

    - Poisson 

    - Geometric 

Summary statistics 

    - Expectation and mean

    - Variance, standard deviation (sd) 

    - Covariance and correlation 

    - Median, quartile 

    - Interquartile range 

    - Percentile / Quantile 

    - Mode 

Important Laws 

    - Law of large numbers (LLN)

    - Central limit theorem (CLT) 

Estimation 

    - Maximum Likelihood Estimation (MLE) 

    - Kernel Density Estimation (KDE) 

Hypothesis Testing 

   - p-Value 

   - Chi² test 

   - F-test 

   - t-test 

Confidence Interval (CI) 

Monte Carlo Method 

 

Visualization 

Chart Suggestions thought starter

streamlit 

Python 

    - Matplotlib 

    - plotnine (like ggplot in R) 

    - Bokeh 

    - seaborn 

    - ipyvolume (3D data) 

Web 

    - Vega-Lite 

    - D3.js 

Dashboards 

    - Dash 

BI 

    - Tableau 

    - PowerBI

 

Machine Learning Roadmap 

General 

Concepts, Input & Attributes 

    - Categorical Variables 

    - Ordinal Variables 

    - Numerical Variables 

Cost functions and gradient descent 

Overfitting / Underfitting 

Training, validation and test data 

Precision vs Recall 

Bias & Variance 

Lift 

 

Methods 

Supervised Learning 

    - Regression 

       - Linear Regression 

       - Poisson Regression 

    - Classification 

       - Classification Rate 

       - Decision Trees 

       - Logistic Regression 

       - Naive Bayes Classifiers 

       - K-Nearest Neighbour 

       - SVM 

       - Gaussian Mixture Models 

Unsupervised Learning 

    - Clustering 

       - Hierachical Clustering

       - K-Means Clustering 

       - DBSCAN 

       - HDBSCAN 

       - Fuzzy C-Means 

       - Mean Shift 

       - Agglomerative 

       - OPTICS 

    - Association Rule Learning 

       - Apriori Algorithm 

       - ECLAT algorithm 

       - FT Trees 

    - Dimensionality Reduction 

       - Principal Component Analysis (PCA) 

       - Random Projection 

       - NMF 

       - T-SNE 

       - UMAP 

Ensemble Learning 

    - Boosting 

    - Bagging 

    - Stacking 

Reinforcement Learning 

    - Q-Learning 

 

Use Cases 

Sentiment Analysis 

Collaborative Filtering 

Tagging 

Prediction 

 

Tools 

Important libraries 

    - scikit-learn 

    - spacy (NLP) 

 

Deep Learning Roadmap 

Papers 

Deep Learning Papers Reading Roadmap 

Papers with code 

Papers with code - state of the art 

 

Neural Networks 

Understanding Neural Networks 

Loss Functions 

Activation Functions 

Weight Initialization 

Vanishing / Exploding Gradient Problem 

 

Architectures 

Feedforward neural network 

Autoencoder 

Convolutional Neural Network (CNN) 

    - Pooling 

Recurrent Neural Network (RNN) 

    - LSTM 

    - GRU 

Transformer  

    - Encoder  

    - Decoder  

    - Attention 

Siamese Network 

Generative Adversarial Network (GAN) 

Evolving Architectures / NEAT 

Residual Connections 

 

Training 

Optimizers 

    - SGD 

    - Momentum 

    - Adam 

    - AdaGrad 

    - AdaDelta 

    - Nadam 

    - RMSProp 

Learning Rate Schedule 

Batch Normalization 

Batch Size Effects 

Regularization 

    - Early Stopping 

    - Dropout 

    - Parameter Penalities 

    - Data Augmentation 

    - Adversarial Training 

Multitask Learning 

Transfer Learning  

Curriculum Learning 

 

Tools 

Important Libraires 

    - Awesome Deep Learning 

    - Huggingface Transformers

Tensorflow 

PyTorch 

Tensorboard 

MLFlow 

 

Model optimization (advanced) 

Distillation 

Quantization 

Neural Architecture Search (NAS) 

 

Data Engineer Roadmap 

Summary of Data Formats

Data Discovery 

Data Source & Acquisition 

Data Integration 

Data Fusion 

Transformation & Enrichment 

Data Survey 

OpenRefine 

How much Data 

Using ETL 

Data Lake vs Data Warehouse 

Dockerize your Python Application 

 

Big Data Engineer Roadmap 

Big Data Architectures 

Architectural Patterns & Best Practices (video) 

 

Principles 

Horizontal vs vertical scaling 

Map Reduce 

Data Replication 

Name & Data Nodes 

Job & Task Tracker 

 

Tools 

Check the Awesome Big Data List 

Hadoop (large data) 

    - HDFS 

    - Loading data with Sqoop and Pig

    - Storm: Hadoop Realtime 

Spark (in memory) 

RAPIDS (on GPU) 

Flume, Scribe : For Unstruct Data  

Data Warehouse with Hive 

Elastic (EKL) Stack 

Avro  

Flink 

Dask 

Numba 

Onnx 

OpenVino 

MLFlow 

Kafka & KSQL 

Databases 

    - Cassandra 

    - MongoDB, Neo4j

Scalability 

    - Zookeeper 

    - Kubernetes 

Cloud Services 

    - AWS SageMaker  

    - Google ML Engine 

    -  Microsoft Azure Machine Learning Studio 

Awesome Production ML