Install our extension to search inside any video instantly.

so many options for processing data SQL, Pandas, Spark... #Shorts #itsallykrinsky
Added: 2026-05-16

548 views111:59itsallykrinskyOriginal Release: 2026-05-11

When selecting a data pre-processing tool, the primary decision factor is data location and size: SQL is best for database or data warehouse data, Pandas works well for smaller datasets that fit in memory for easy column transformations and feature engineering, while Spark or other big data processing tools are necessary for large datasets that exceed memory capacity and cannot fit into Pandas dataframes.

[00:00:00]Should you use SQL or Python to pre-process your data? As a data scientist, you're constantly manipulating data and there are so so many options that you can use to actually achieve these transformations.

[00:00:10]So, let's talk about why you may use certain technologies or tools over others. SQL and Python are the two most common languages you'll find as a data scientist or really anyone working with data in general. That is because they're both very very capable and ideal for manipulating data of different varieties. So, how would you decide which one to use or when to use something completely different? The first question you should probably ask yourself is where is my data located?

[00:00:34]Where is it coming from? Is it living in a CSV file? Is it living in a database?

[00:00:39]In a data warehouse? In a data lake? A lot of data these days is stored in the cloud and it is stored in some sort of database or data lake. You probably are manipulating this data so that it can be training data for a machine learning model. And these models oftentimes require very large data sets. So, oftentimes you'll have to use SQL or another big data processing power to manipulate these data sets as they're not going to fit into a Pandas data frame. Pandas can be a great option for a lot of data pre-processing, but it does have a limitation of requiring being stored in memory. So, you'll need to be able to load that entire data set into the memory of whatever you're working with, whether that's your local computer, your virtual computer, or some other container. You will need to be able to load the entire data set. If you can't load the entire data set, Pandas is not an option for you.

[00:01:27]Now, Pandas is a great option because it's so easy to work with. So, if it does fit their size requirements, you might want to consider using Pandas because you can use really specific column transformations and you can apply a lot on like text, for example. You can do a lot of your feature engineering, but for larger data sets, you are going to have to use a big data processing power. If you're interested in topics like this or other data science topics, make sure you check out my data science guide of how to become a data scientist. The link is in my bio. Feel free to go check that out. Follow on for more content like this. Bye.

#ai productivity tools #ai tools #budget templates #budgeting apps #claude ai

Related Videos

Computer Science

Agentforce NOW AMA: Build with React and Salesforce Multi-Framework

SalesforceDevs

490 views•2026-05-28

Computer Science

How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust

aiDotEngineer

450 views•2026-05-28

Computer Science

Re: 🗣️📍theprophedu📍2026 GST 103 CLASS (E-EXAM REVISION)

theprophedu

636 views•2026-06-04

Computer Science

WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅

LearnwithSahera

1K views•2026-05-29

Computer Science

More tests are always better? How to use AI to identify tests that bring little value

Alliance4Qualification

335 views•2026-05-29

Computer Science

Search Algorithms Explained in 60 Seconds! 🤖💨

samarthtuliofficial

218 views•2026-06-01

Computer Science

People of Game of Thrones using JavaScript DOM

AltCampus

296 views•2026-05-30

Computer Science

Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA

ascensionix

107 views•2026-05-29

Trending

Computer Science

The Meta AI Hack Is a DISASTER

LowLevelTV

141K views•2026-06-03

The Casino Had Us Guessing All Day

VegasMatt

157K views•2026-06-03

The Dancing Plague...

HoodieGuyStories

1730K views•2026-05-30

The Fastest Way To Board A Plane 😮

zackdfilms

6504K views•2026-05-29