[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.] [@mdancho84](/creator/twitter/mdancho84) "🚨NEW: Awesome Generative AI Data Scientist GitHub Repo LangGraph Ecosystem: X. Prebuilt Agents X. AI Data Science Agents X. LangMem X. LangGraph Supervisor X. Open Deep Research X. LangGraph Reflection X. LangGraph Big Tool X. LangGraph CodeAct X. LangGraph Swarm XX. LangGraph MCP Adapters"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1948042272076857471) 2025-07-23 15:27:25 UTC 82K followers, 7091 engagements "- Tree-Based Methods - SVMs - Deep Learning - Survival Analysis - Clustering & PCA Let's unpack a few of my favorites:"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1948769084331491560) 2025-07-25 15:35:30 UTC 82K followers, XXX engagements "6. Clustering K-means and Hierarchical Clustering give way to unsupervised learning techniques that can be useful for uncovering groups in business data"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1948769143265661125) 2025-07-25 15:35:44 UTC 82K followers, XXX engagements "1. Uber has recently released QueryGPT inside of their SQL operations: QueryGPT uses large language models (LLM) vector databases and similarity search to generate complex queries from English questions that are provided by the user as input"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1948704456972894609) 2025-07-25 11:18:42 UTC 82K followers, XXX engagements "8. Loss Functions: Distributions will come up in Loss Functions in Machine Learning (e.g. XGBoost LightGBM CatBoost). Selecting the right Loss Function can often improve performance. Examples: - Poisson is used for count data. - Tweedie for mixed continuous data with many zeros like intermittent demand forecasting problems"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1949133018121019453) 2025-07-26 15:41:39 UTC 82K followers, XXX engagements "🚨NEW: Awesome Generative AI Data Scientist GitHub Repo Data Science and Analytics AI Agents: X. AI Data Science Team X. PandasAI X. Microsoft Data Formulator X. Jupyter Agent X. Jupyter AI X. WrenAI X. Google GenAI Toolbox X. Vanna AI"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1948345022773882904) 2025-07-24 11:30:26 UTC 82K followers, 4356 engagements "9. PROBLEM: EVERY DATA SCIENTIST NEEDS TO LEARN AI IN 2025 XX% of them are overlooking AI. This is a massive opportunity for you. I'd like to help"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1949133021266743390) 2025-07-26 15:41:40 UTC 82K followers, XXX engagements "8. Evaluation: Each PCA component accounts for a certain amount of the total variance in a dataset. The cumulative proportion of variance explained is just the cumulative sum of each PCA's variance explained. Often this is plotted on a Scree plot with Top N PCA components"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1949072120736067878) 2025-07-26 11:39:40 UTC 82K followers, XXX engagements "My AI data science stack: X. Python ($0) X. Pandas ($0) X. Scikit Learn ($0) X. LangChain ($0) X. LangGraph ($0) X. OpenAI API ($1/month) You can grow into a $200000 career by spending less than $XX this year"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1947258124164886678) 2025-07-21 11:31:29 UTC 82K followers, 26.7K engagements "4. Probability Mass Function (Discrete): Discrete distributions are described by a probability mass function which gives the probability that a discrete random variable is exactly equal to some value. In a graph a discrete distribution is often represented by a series of bars where each bar represents the probability of each discrete outcome"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1949132998353261035) 2025-07-26 15:41:34 UTC 82K followers, XXX engagements "2. Easy to Install pip install sqlite-vec"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1948405211694825776) 2025-07-24 15:29:36 UTC 82K followers, XXX engagements "EVERY DATA SCIENTIST NEEDS TO LEARN AI IN 2025. I want to help. This is how:"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1948769147245957213) 2025-07-25 15:35:45 UTC 82K followers, XXX engagements "Every data scientist needs to learn AI in 2025. I want to help. This is how:"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1948704477780861023) 2025-07-25 11:18:47 UTC 82K followers, XXX engagements "2. Operational Impact: Uber's data platform handles 1200000 queries each month. Before QueryGPT Uber's Operations (3000 employees) spent 10+ minutes per SQL query. After QueryGPT average query is X minutes with fewer errors. 3X speedup"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1948704464107397143) 2025-07-25 11:18:44 UTC 82K followers, XXX engagements "Data scientists are out. The Generative AI Data Scientist is in. Let me explain:"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1946894459578286275) 2025-07-20 11:26:25 UTC 82K followers, 15.9K engagements "EVERY DATA SCIENTIST NEEDS TO LEARN AI IN 2025. XX% of data scientists are overlooking AI. I want to help. This is how:"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1949072124389232716) 2025-07-26 11:39:41 UTC 82K followers, XX engagements "🚨Uber launches QueryGPT Natural Language to SQL Using Generative AI"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1948704453609107884) 2025-07-25 11:18:41 UTC 82K followers, 9370 engagements "I'm excited to introduce my AI Exploratory Data Analysis (EDA) Agent that built EDA reports performs correlation analysis and missing data analysis in XX seconds. Today I'll share with you how to automate creating EDA reports with the AI EDA Agent which is available on GitHub. We'll create an EDA Agent focusing on a Customer Churn Problem. I'll guide you through setting up the EDA Agent creating EDA reports and analyzing the findings. This AI agent is a huge time-saver Table of Contents: 00:00 Introduction to EDA Tools Agent 01:20 Get the AI Data Science Team 04:49 Create the EDA Tools Agent"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1946996820174864806) 2025-07-20 18:13:10 UTC 82K followers, 9351 engagements "2. How It Works Most existing vector databases store and query just the embeddings and their metadata. The actual data is stored elsewhere requiring you to manage its storage and versioning separately"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1947983404835733895) 2025-07-23 11:33:30 UTC 82K followers, XX engagements "LanceDB supports storage of the actual data itself alongside the embeddings and metadata. You can persist your images videos text documents audio files and more in the Lance format which provides automatic data versioning and blazing fast retrievals and filtering via LanceDB"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1947983408480579930) 2025-07-23 11:33:31 UTC 82K followers, XX engagements "5. Probability Density Function (Continuous): Continuous distributions are described by a probability density function. The probability of the variable falling within a particular range is given by the area under the curve of the PDF within that range. In a graph a continuous distribution is usually represented by a smooth curve"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1949133005877870607) 2025-07-26 15:41:36 UTC 82K followers, XXX engagements "2. Linear Regression Linear regression is a foundational method. The starting point for statistical learning. ISLR helped me gain a deep understanding which had compound effects when learning more complex models like GLMs SVMs and even Tree Based methods"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1948769096595607674) 2025-07-25 15:35:33 UTC 82K followers, XXX engagements "Principal Component Analysis (PCA) is the gold standard in dimensionality reduction. But almost every beginner struggles understanding how it works (and why to use it). In X minutes I'll demolish your confusion:"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1949072057628533137) 2025-07-26 11:39:25 UTC 82K followers, 7176 engagements "1. SQLite Vec sqlite-vec is an extremely small "fast enough" vector search SQLite extension that runs anywhere"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1948405204052848669) 2025-07-24 15:29:34 UTC 82K followers, XXX engagements "I'm excited to introduce my AI Machine Learning Agent that built XX ML models in XX seconds. Today I'll share with you how to automate building 100s of ML models with the AI ML Agent which is available on GitHub. We'll create an ML Agent focusing on a Customer Churn Problem. I'll guide you through setting up the ML Agent creating dozens of ML models and loading the best model for production. This AI is a huge time-saver Table of Contents: 00:00 Introduction to my AI Data Science Team 02:56 Setting Up AI Data Science Team 04:48 Running the ML Agent Code 07:12 Create (and Run) the AI Machine"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1947328290705813906) 2025-07-21 16:10:18 UTC 82K followers, 12.9K engagements "K-means is an essential algorithm for Data Science. But it's confusing for beginners. Let me demolish your confusion:"  [@mdancho84](/creator/x/mdancho84) on [X](/post/tweet/1945868440004128978) 2025-07-17 15:29:23 UTC 81.9K followers, 34.6K engagements
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
@mdancho84
"🚨NEW: Awesome Generative AI Data Scientist GitHub Repo LangGraph Ecosystem: X. Prebuilt Agents X. AI Data Science Agents X. LangMem X. LangGraph Supervisor X. Open Deep Research X. LangGraph Reflection X. LangGraph Big Tool X. LangGraph CodeAct X. LangGraph Swarm XX. LangGraph MCP Adapters" @mdancho84 on X 2025-07-23 15:27:25 UTC 82K followers, 7091 engagements
"- Tree-Based Methods - SVMs - Deep Learning - Survival Analysis - Clustering & PCA Let's unpack a few of my favorites:" @mdancho84 on X 2025-07-25 15:35:30 UTC 82K followers, XXX engagements
"6. Clustering K-means and Hierarchical Clustering give way to unsupervised learning techniques that can be useful for uncovering groups in business data" @mdancho84 on X 2025-07-25 15:35:44 UTC 82K followers, XXX engagements
"1. Uber has recently released QueryGPT inside of their SQL operations: QueryGPT uses large language models (LLM) vector databases and similarity search to generate complex queries from English questions that are provided by the user as input" @mdancho84 on X 2025-07-25 11:18:42 UTC 82K followers, XXX engagements
"8. Loss Functions: Distributions will come up in Loss Functions in Machine Learning (e.g. XGBoost LightGBM CatBoost). Selecting the right Loss Function can often improve performance. Examples: - Poisson is used for count data. - Tweedie for mixed continuous data with many zeros like intermittent demand forecasting problems" @mdancho84 on X 2025-07-26 15:41:39 UTC 82K followers, XXX engagements
"🚨NEW: Awesome Generative AI Data Scientist GitHub Repo Data Science and Analytics AI Agents: X. AI Data Science Team X. PandasAI X. Microsoft Data Formulator X. Jupyter Agent X. Jupyter AI X. WrenAI X. Google GenAI Toolbox X. Vanna AI" @mdancho84 on X 2025-07-24 11:30:26 UTC 82K followers, 4356 engagements
"9. PROBLEM: EVERY DATA SCIENTIST NEEDS TO LEARN AI IN 2025 XX% of them are overlooking AI. This is a massive opportunity for you. I'd like to help" @mdancho84 on X 2025-07-26 15:41:40 UTC 82K followers, XXX engagements
"8. Evaluation: Each PCA component accounts for a certain amount of the total variance in a dataset. The cumulative proportion of variance explained is just the cumulative sum of each PCA's variance explained. Often this is plotted on a Scree plot with Top N PCA components" @mdancho84 on X 2025-07-26 11:39:40 UTC 82K followers, XXX engagements
"My AI data science stack: X. Python ($0) X. Pandas ($0) X. Scikit Learn ($0) X. LangChain ($0) X. LangGraph ($0) X. OpenAI API ($1/month) You can grow into a $200000 career by spending less than $XX this year" @mdancho84 on X 2025-07-21 11:31:29 UTC 82K followers, 26.7K engagements
"4. Probability Mass Function (Discrete): Discrete distributions are described by a probability mass function which gives the probability that a discrete random variable is exactly equal to some value. In a graph a discrete distribution is often represented by a series of bars where each bar represents the probability of each discrete outcome" @mdancho84 on X 2025-07-26 15:41:34 UTC 82K followers, XXX engagements
"2. Easy to Install pip install sqlite-vec" @mdancho84 on X 2025-07-24 15:29:36 UTC 82K followers, XXX engagements
"EVERY DATA SCIENTIST NEEDS TO LEARN AI IN 2025. I want to help. This is how:" @mdancho84 on X 2025-07-25 15:35:45 UTC 82K followers, XXX engagements
"Every data scientist needs to learn AI in 2025. I want to help. This is how:" @mdancho84 on X 2025-07-25 11:18:47 UTC 82K followers, XXX engagements
"2. Operational Impact: Uber's data platform handles 1200000 queries each month. Before QueryGPT Uber's Operations (3000 employees) spent 10+ minutes per SQL query. After QueryGPT average query is X minutes with fewer errors. 3X speedup" @mdancho84 on X 2025-07-25 11:18:44 UTC 82K followers, XXX engagements
"Data scientists are out. The Generative AI Data Scientist is in. Let me explain:" @mdancho84 on X 2025-07-20 11:26:25 UTC 82K followers, 15.9K engagements
"EVERY DATA SCIENTIST NEEDS TO LEARN AI IN 2025. XX% of data scientists are overlooking AI. I want to help. This is how:" @mdancho84 on X 2025-07-26 11:39:41 UTC 82K followers, XX engagements
"🚨Uber launches QueryGPT Natural Language to SQL Using Generative AI" @mdancho84 on X 2025-07-25 11:18:41 UTC 82K followers, 9370 engagements
"I'm excited to introduce my AI Exploratory Data Analysis (EDA) Agent that built EDA reports performs correlation analysis and missing data analysis in XX seconds. Today I'll share with you how to automate creating EDA reports with the AI EDA Agent which is available on GitHub. We'll create an EDA Agent focusing on a Customer Churn Problem. I'll guide you through setting up the EDA Agent creating EDA reports and analyzing the findings. This AI agent is a huge time-saver Table of Contents: 00:00 Introduction to EDA Tools Agent 01:20 Get the AI Data Science Team 04:49 Create the EDA Tools Agent" @mdancho84 on X 2025-07-20 18:13:10 UTC 82K followers, 9351 engagements
"2. How It Works Most existing vector databases store and query just the embeddings and their metadata. The actual data is stored elsewhere requiring you to manage its storage and versioning separately" @mdancho84 on X 2025-07-23 11:33:30 UTC 82K followers, XX engagements
"LanceDB supports storage of the actual data itself alongside the embeddings and metadata. You can persist your images videos text documents audio files and more in the Lance format which provides automatic data versioning and blazing fast retrievals and filtering via LanceDB" @mdancho84 on X 2025-07-23 11:33:31 UTC 82K followers, XX engagements
"5. Probability Density Function (Continuous): Continuous distributions are described by a probability density function. The probability of the variable falling within a particular range is given by the area under the curve of the PDF within that range. In a graph a continuous distribution is usually represented by a smooth curve" @mdancho84 on X 2025-07-26 15:41:36 UTC 82K followers, XXX engagements
"2. Linear Regression Linear regression is a foundational method. The starting point for statistical learning. ISLR helped me gain a deep understanding which had compound effects when learning more complex models like GLMs SVMs and even Tree Based methods" @mdancho84 on X 2025-07-25 15:35:33 UTC 82K followers, XXX engagements
"Principal Component Analysis (PCA) is the gold standard in dimensionality reduction. But almost every beginner struggles understanding how it works (and why to use it). In X minutes I'll demolish your confusion:" @mdancho84 on X 2025-07-26 11:39:25 UTC 82K followers, 7176 engagements
"1. SQLite Vec sqlite-vec is an extremely small "fast enough" vector search SQLite extension that runs anywhere" @mdancho84 on X 2025-07-24 15:29:34 UTC 82K followers, XXX engagements
"I'm excited to introduce my AI Machine Learning Agent that built XX ML models in XX seconds. Today I'll share with you how to automate building 100s of ML models with the AI ML Agent which is available on GitHub. We'll create an ML Agent focusing on a Customer Churn Problem. I'll guide you through setting up the ML Agent creating dozens of ML models and loading the best model for production. This AI is a huge time-saver Table of Contents: 00:00 Introduction to my AI Data Science Team 02:56 Setting Up AI Data Science Team 04:48 Running the ML Agent Code 07:12 Create (and Run) the AI Machine" @mdancho84 on X 2025-07-21 16:10:18 UTC 82K followers, 12.9K engagements
"K-means is an essential algorithm for Data Science. But it's confusing for beginners. Let me demolish your confusion:" @mdancho84 on X 2025-07-17 15:29:23 UTC 81.9K followers, 34.6K engagements
/creator/twitter::815555071517872128/posts