내 커리어에서 미지의 세계를 개척해나가는 기분으로 인해,
멘토링이 필요한것인지 의문을 느끼던 와중에,
박재호 강사님의 파이썬 학습 로드맵 라이브세션을 듣다가,
나에게 필요한건 로드멥이구나 라는 생각이 들었다.
개시하고 자 하는 로드맵은 독일의 한 AI 전문 기업(본사: Baden-Württemberg Karlsruhe)에서 제공 한 것이다.
AI 전문가 로드맵에 따르면, 신입 직원을 AI 전문가로 만들기 위해 이 차트를 만들었지만 커뮤니티를 돕기 위해 공유하였다고 한다. (이정도면 믿고 따라야겠지..)
박재호 강사님에 의하면 본인이 스스로 연구논문을 리드해서 작성할 위치가 되지 않는다면 연구원 보다는 엔지니어의 길을 택하는게 앞으로의 전망에 좋을것이다 라고 하셨다. - 연구 논문 작성은 쉽다 어렵다를 떠나서 다니는 회사가 논문 작성을 위한 연구를 한다/안한다 로 논문을 작성할 기회가 있거나 말거나 이지만, 박사 급 이상은 되어야 나름 연구 리딩을 할 수 있지 않을까 싶다.
(그 전에 데이터 엔지니어링으로 풍부한 경험을 쌓고 석/박사에 도전을 하는게 더 현실적으로 괜찮은 길이지 않을까 생각이 든다.)
앞으로 해당 로드맵의 각 주제를 갖고 블로그에 내용을 채워 나가보도록 하겠다.
(차곡차곡 쌓아올릴테니 이 글이 마음에 드시는 분은 함께 성장하면서 조금만 인내심을 갖고 기다려 주시길 바랍니다..;;)
Becoming an AI Expert
진로 상관없이 다 알아야 하는것
Fundamentals
Basics
- 선형대수 기초
- 데이터베이스 기초
- Relational vs. Non-relational database
- SQL + Joins (Inner, Outer, Cross, Theta Join)
- NoSQL
- Tabular Data
- Data Frames & Series
- Extract, Transform, Load (ETL)
- Reporting vs BI vs Analytics
- Data Formats
- JSON
- XML
- CSV
- Regular Expressions (RegEx) - 정규표현식
Python Programming
- Python Basics
- Expressions
- Variables
- Data Structures
- Functions
- Install packages (via pip, conda, or similar)
- Codestyle, e.g. PEP8
- Important Libraries
- Numpy
- Pandas
- Virtual Environments
- Jupyter Notebooks / Lab
Data Sources
- Data Mining
- Web Scraping
- Awesome Public Datasets
- Kaggle
EDA(Exploratory Data Analysis) / Data Munging / - Wrangling
- Principal Component Analysis (PCA)
- Dimensionality & Numerosity Reduction
- Normalization
- Data Scrubbing, Handling Missing Values
- Unbiased Estimators
- Binning sparse values
- Feature Extraction
- Denoising
- Sampling
두갈래길
- Data Scientist
- Machine Learning
- Deep Learning
- Data Engineer
- Big Data Engineer
Data Scientist
Statistics
- Probability Theory
- Continuous distributions
- Discrete distributions
- Summary statistics
- Important Laws
- Estimation
- Hypothesis Testing
- Confidence Interval (CI)
- Monte Carlo Method
Visualization
- Chart Suggestions thought starter
- Python
- Web
- Dashboards
- BI
Machine Learning
- General
- Concepts, Inputs & Attributes
- Cost functions and gradient descent
- Overfitting / Underfitting
- Training, validation and test data
- Precision vs Recall
- Bias vs Variance
- Lift
- Methods
- Supervised Learning
- Unsupervised Learning
- Ensemble Learning
- Reinforcement Learning
- Use Cases
- Sentiment Analysis
- Collaborative Filtering
- Tagging
- Prediction
- Tools
- Important Libraries
- sikit-learn
- spacy(NLP)
- Important Libraries
Deep Learning
- Papers
- Neural Networks
- Understanding Neural Networks
- Loss Functions
- Activation Functions
- Weight Initialization
- Vanishing / Exploding Gradient Problem
- Architectures
- Feedforward neural network
- Autoencoder
- Convolutional Neural Network (CNN)
- Pooling
- Recurrent Neural Network (RNN)
- LSTM
- GRU
- Transformer
- Encoder
- Decoder
- Attention
- Siamese Network
- Generative Adversarial Network (GAN)
- Evolving Architecutres / NEAT
- Residual Connections
- Training
- Optimizers
- SGD, Momentum, Adam, AdaGrad, AdaDelta, Nadam, RMSProp
- Learning Rate Schedule
- Batch Normalization
- Batch Size Effects
- Regularization
- Early Stopping, Dropout, Parameter Penalties, Data Augumentation, Adversarial Training
- Multitask Learning
- Transfer Learning
- Curriculum Learning
- Optimizers
- Tools
- Important Libraries
- Tensorflow
- PyTorch
- Pytorch Lightning
- Tensorboard / Weight and Bias
- MLFlow
- Model Optimization (advanced)
- Distillation
- Quantization
- Neural Architecture Search (NAS)
- Keep exploring and stay up-to-date (expert~!!)
Data Engineer
- Summary of Data Formats
- Data Discovery
- Data Source & Acquisition
- Data Integration
- Data Fusion
- Transformation & Enrichment
- Data Survey
- OpenRefine
- How much Data
- Using ETL
- Data Lake vs Data Warehouse
- Dockerize your Python Application
- Keep exploring and stay up-to-date (expert~!!)
Big Data Engineer
- Architectural Patterns & Best Practices (video)
- Principles
- Horizonal vs vertical scaling
- Map Reduce
- Data Replication
- Name & Data Nodes
- Job & Task Tracker
- Tools
- Check the Awesome Big Data List
- Hadoop (large data)
- HDFS
- Loading data with Sqoop and Pig
- Storm: Hadoop Realtime
- Spark (in memory)
- RAPIDS (on GPU)
- Flume, Scribe: For Unstruct Data
- Data Warehouse with Hive
- Elastic (EKL) Stack - to get data (e.g. logging), search, analyze and visualize it in realtime
- Avro
- Flink
- Dask
- Numba
- Onnx
- OpenVino
- MLFlow
- Kafka & KSQL
- DataBases
- Cassandra
- MongoDB, Neo4j
- Scalability
- ZooKeeper
- Kubernetes
- Cloud Services
- AWS SageMaker
- Google ML Engine
- Microsoft Azure Machine Learning Studio
- Awesome Production ML
- Keep exploring and stay up-to-date (expert~!!)
화이팅..!
Reference
AI Roadmap
Follow these roadmaps to become an Artificial Intelligence expert.
i.am.ai
https://github.com/AMAI-GmbH/AI-Expert-Roadmap
GitHub - AMAI-GmbH/AI-Expert-Roadmap: Roadmap to becoming an Artificial Intelligence Expert in 2022
Roadmap to becoming an Artificial Intelligence Expert in 2022 - GitHub - AMAI-GmbH/AI-Expert-Roadmap: Roadmap to becoming an Artificial Intelligence Expert in 2022
github.com
댓글