
The pandas apply() function applies a callable to each column (default) or each row of a DataFrame, or to each element of a Series. It is the standard tool for running custom logic that cannot be expressed as a vectorized pandas or NumPy operation. This tutorial covers the complete DataFrame.apply() parameter set, row-wise and column-wise usage, argument passing with args= and **kwargs, output shape control with result_type, element-wise application with DataFrame.map() (the pandas 2.1+ replacement for applymap()), and a structured comparison of apply() against map(), transform(), and vectorized operations.
apply() applies a function along axis=0 (column-wise) or axis=1 (row-wise) on a DataFrame, or element-wise on a Series.axis=1 to access values from two or more columns within a single row in one function call.args= for positional parameters or **kwargs for keyword parameters.result_type='expand' to split a returned list or tuple into separate output columns.applymap() was deprecated in pandas 2.1 and replaced by DataFrame.map() for element-wise operations.by_row parameter (pandas 2.1+) only has an effect when func is a list-like or dict-like of functions (and not a string). Default 'compat' first tries to translate each function into an equivalent pandas method; False passes the whole Series to each function.apply() iterates under the hood and is significantly slower than vectorized operations; use it only when the logic requires Python-level control flow.DataFrame.map() coverage)Install or upgrade with:
pip install --upgrade pandas numpy
All output blocks in this tutorial were generated with pandas 3.0.2. On pandas 2.x, string Series display as dtype: object rather than dtype: str.
Verify installed versions:
import pandas as pd
import numpy as np
print(pd.__version__)
print(np.__version__)
Most examples in this tutorial use the following DataFrame:
# Sample DataFrame used throughout this tutorial
import pandas as pd
import numpy as np
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Carol'],
'score': [85, 92, 78],
'grade': ['B', 'A', 'C'],
'bonus': [5.0, 8.0, 3.0]
})
print(df)
Output name score grade bonus
0 Alice 85 B 5.0
1 Bob 92 A 8.0
2 Carol 78 C 3.0
DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), by_row='compat', **kwargs)
| Parameter | Type | Default | Description |
|---|---|---|---|
func |
callable | required | Function applied to each column or row |
axis |
int or str | 0 |
0 or 'index' applies column-wise; 1 or 'columns' applies row-wise |
raw |
bool | False |
When True, passes a raw NumPy array instead of a Series. Useful when func uses only NumPy operations; skipping the Series wrapper reduces overhead and can improve performance. |
result_type |
str or None | None |
Controls output shape: 'expand', 'reduce', or 'broadcast' |
args |
tuple | () |
Positional arguments forwarded to func after the Series |
by_row |
str or bool | 'compat' |
Added in pandas 2.1. Only has an effect when func is a list-like or dict-like of functions (not a string). Default 'compat' first tries to translate each function into an equivalent pandas method; False passes the whole Series to each function. |
**kwargs |
dict | Keyword arguments forwarded to func |
With the default axis=0, apply() passes each column to func as a Series object. With axis=1, it passes each row as a Series object, with column names as the index labels. Inside the function, you access individual values by label: row['column_name'].
Warning: Functions that mutate the Series passed to them are explicitly unsupported and can produce unexpected behavior or errors. Do not modify row or col in place inside the function. Return a new value instead.
# axis=0 (default): function receives each column as a Series
col_max = df[['score', 'bonus']].apply(max, axis=0)
print(col_max)
Outputscore 92.0
bonus 8.0
dtype: float64
# axis=1: function receives each row as a Series
row_max = df[['score', 'bonus']].apply(max, axis=1)
print(row_max)
Output0 85.0
1 92.0
2 78.0
dtype: float64
The naming confuses most people at first: axis=0 sounds like it targets rows but processes column by column. Think of it as “the function moves along the index axis, one column at a time.” axis=1 moves along the column axis, one row at a time. When in doubt, add print(type(x), x) inside the function and run it on a small DataFrame to confirm what arrives.
df['column_name'].apply(func) passes each element of that column as a scalar to func.
# Compute a letter grade from a numeric score using a named function
def letter_grade(score):
if score >= 90:
return 'A'
elif score >= 80:
return 'B'
else:
return 'C'
df['computed_grade'] = df['score'].apply(letter_grade)
print(df[['name', 'score', 'computed_grade']])
Output name score computed_grade
0 Alice 85 B
1 Bob 92 A
2 Carol 78 C
# Apply a pandas apply lambda to scale each score by 1.05
df['scaled_score'] = df['score'].apply(lambda x: round(x * 1.05, 1))
print(df[['name', 'score', 'scaled_score']])
Output name score scaled_score
0 Alice 85 89.2
1 Bob 92 96.6
2 Carol 78 81.9
With axis=1, the function receives each row as a Series. Use label-based indexing to access individual column values.
# Sum score and bonus per row using pandas apply row-wise
def total_score(row):
return row['score'] + row['bonus']
df['total'] = df.apply(total_score, axis=1)
print(df[['name', 'score', 'bonus', 'total']])
Output name score bonus total
0 Alice 85 5.0 90.0
1 Bob 92 8.0 100.0
2 Carol 78 3.0 81.0
# Classify each row based on multi-column thresholds
def classify(row):
if row['score'] >= 90 and row['bonus'] >= 7:
return 'High Performer'
elif row['score'] >= 80:
return 'On Track'
else:
return 'Needs Review'
df['status'] = df.apply(classify, axis=1)
print(df[['name', 'status']])
Output name status
0 Alice On Track
1 Bob High Performer
2 Carol Needs Review
Subset the DataFrame to the target columns before calling apply() to avoid unintended side effects on non-numeric or irrelevant columns.
# Apply numpy square root to numeric columns only
numeric_df = df[['score', 'bonus']]
sqrt_df = numeric_df.apply(np.sqrt)
print(sqrt_df)
Output score bonus
0 9.219544 2.236068
1 9.591663 2.828427
2 8.831761 1.732051
# Normalize score and bonus to a 0-1 range using pandas apply with multiple columns
def normalize_column(col):
return col / col.max()
scaled = df[['score', 'bonus']].apply(normalize_column, axis=0)
print(scaled)
Output score bonus
0 0.923913 0.625
1 1.000000 1.000
2 0.847826 0.375
The args= parameter accepts a tuple of positional values passed to func after the Series or element.
# Add a fixed offset and multiplier to each score using args= for positional arguments
def adjust_score(x, offset, multiplier):
return (x + offset) * multiplier
adjusted = df['score'].apply(adjust_score, args=(5, 1.1))
print(adjusted)
Output0 99.0
1 106.7
2 91.3
Name: score, dtype: float64
Keyword arguments are passed directly as named parameters after args=.
# Clamp score values between a low and high threshold using keyword arguments
def clamp(x, low=0, high=100):
return max(low, min(x, high))
clamped = df['score'].apply(clamp, low=80, high=90)
print(clamped)
Output0 85
1 90
2 80
Name: score, dtype: int64
The result_type parameter controls how DataFrame.apply() structures its output for axis=1 when func returns a list-like (for example, a list or tuple). Use result_type='expand' to split list-like values into columns, 'reduce' to return a Series when possible, or 'broadcast' to broadcast results to the original shape.
result_type='expand' splits each returned list or tuple into a separate column.
# Return score and bonus as a list; expand into two output columns
def score_bonus_pair(row):
return [row['score'], row['bonus']]
expanded = df.apply(score_bonus_pair, axis=1, result_type='expand')
expanded.columns = ['score_out', 'bonus_out']
print(expanded)
Output score_out bonus_out
0 85.0 5.0
1 92.0 8.0
2 78.0 3.0
result_type='broadcast' broadcasts a scalar return value back to all column positions in the row, preserving the original column labels.
# Broadcast the row mean back to all columns using result_type='broadcast'
broadcast_df = df[['score', 'bonus']].apply(
lambda row: row.mean(), axis=1, result_type='broadcast'
)
print(broadcast_df)
Output score bonus
0 45.0 45.0
1 50.0 50.0
2 40.5 40.5
DataFrame.map() applies a function to each individual element of a DataFrame. It replaces applymap() starting with pandas 2.1.
# Apply square root to every cell in a DataFrame using DataFrame.map() (pandas 2.1+)
import math
measurements = pd.DataFrame({'area': [4.0, 9.0, 16.0], 'length': [25.0, 36.0, 49.0]})
result = measurements.map(math.sqrt)
print(result)
Output area length
0 2.0 5.0
1 3.0 6.0
2 4.0 7.0
Warning (pandas 2.1+): DataFrame.applymap() was deprecated in pandas 2.1.0 and removed in pandas 3.0. Replace all uses of df.applymap(func) with df.map(func). The function signature and behavior are identical; only the method name changed.
For users on pandas < 2.1:
# pandas < 2.1: DataFrame.map is unavailable; pandas 2.1–2.x: applymap works but is deprecated (FutureWarning)
import math
result = measurements.applymap(math.sqrt)
Use df.map(func) in place of df.applymap(func) on pandas 2.1+.
| Method | Scope | Input per call | Typical use case |
|---|---|---|---|
DataFrame.apply(axis=0) |
DataFrame | One column as Series | Custom column-level aggregations |
DataFrame.apply(axis=1) |
DataFrame | One row as Series | Row-level logic using multiple columns |
Series.apply() |
Series | One element (scalar) | Element-level transformations on one column |
DataFrame.map() |
DataFrame | One element (scalar) | Element-wise transforms across the full DataFrame (pandas 2.1+) |
Series.map() |
Series | One element (scalar) | Element-wise mapping or dict-based substitution |
DataFrame.transform() |
DataFrame | One column as Series | Group-aware transforms that preserve original shape and index |
Vectorized (+, *, str., np.*) |
DataFrame or Series | Entire array | Arithmetic, string ops, NumPy ufuncs |
apply(axis=1) when the logic requires values from two or more columns in the same row and cannot be expressed as a vectorized operation.apply(axis=0) for custom column-level aggregations not covered by built-in methods like sum() or mean().Series.apply() for element-level transformations on a single column that require Python control flow.DataFrame.map() (pandas 2.1+) for element-wise transformations across the full DataFrame.transform() with groupby() to apply a function within groups while preserving the original DataFrame index.transform() fills every row with a group-level result while keeping the original shape, something apply() on a grouped object cannot do without losing the original index.
# Fill each row with its department average salary using groupby().transform()
employees = pd.DataFrame({
'dept': ['eng', 'eng', 'mkt', 'mkt', 'mkt'],
'salary': [90000, 110000, 70000, 80000, 75000]
})
employees['dept_avg'] = employees.groupby('dept')['salary'].transform('mean')
print(employees)
Output dept salary dept_avg
0 eng 90000 100000.0
1 eng 110000 100000.0
2 mkt 70000 75000.0
3 mkt 80000 75000.0
4 mkt 75000 75000.0
Because transform() guarantees the output index matches the input, assigning the result directly back as a new column is safe. Use apply() on grouped data only when transform() cannot express the logic.
apply() executes Python-level loops and carries significant overhead compared to vectorized operations, which run at the C level across entire arrays. Use the following block to measure the gap on your own data:
# Benchmark apply(axis=1) vs vectorized on a 100,000-row DataFrame
import timeit
import numpy as np
import pandas as pd
np.random.seed(42)
big = pd.DataFrame({
'score': np.random.randint(50, 100, 100_000),
'bonus': np.random.uniform(1, 10, 100_000)
})
t_vec = timeit.timeit(
lambda: np.where(big['score'] >= 80, 'pass', 'fail'), number=10
) / 10
t_apply = timeit.timeit(
lambda: big.apply(lambda r: 'pass' if r['score'] >= 80 else 'fail', axis=1),
number=3
) / 3
print(f"np.where: {t_vec * 1000:.1f} ms")
print(f"apply(axis=1): {t_apply * 1000:.1f} ms")
print(f"Ratio: ~{t_apply / t_vec:.0f}x")
Outputnp.where: 0.8 ms
apply(axis=1): 445.7 ms
Ratio: ~525x
Exact timings vary by hardware and operation complexity; the ratio, not the absolute milliseconds, is the signal. The table below shows representative ranges; profile your specific function before optimizing.
| Method | Relative speed | Notes |
|---|---|---|
NumPy vectorized (np.where, np.select) |
1x (baseline) | C-level array operation; fastest option |
pandas vectorized (df['col'] * 2) |
~1-2x | Near-native; uses optimized pandas internals |
Series.map() |
~10-50x slower | Python loop per element |
apply(axis=0) |
~10-50x slower | Python loop per column |
apply(axis=1) |
~100x–1,000x+ slower | Python loop per row; gap depends on operation complexity |
iterrows() |
~500x–5,000x+ slower | Never use for transformations at scale |
Reserve apply() for logic that genuinely requires Python control flow. For everything else, the next section shows how to convert the two most common apply() patterns to vectorized equivalents.
Single condition: replace apply(axis=1) with np.where:
# Slower: apply() with a single conditional per row
df['result'] = df.apply(lambda row: 'pass' if row['score'] >= 80 else 'fail', axis=1)
# Faster: np.where performs the same logic on the full array at once
df['result'] = np.where(df['score'] >= 80, 'pass', 'fail')
Multiple conditions: replace apply(axis=1) with np.select. The assign_tier function defined in Conditional Logic per Row is a direct example: the apply version and its vectorized equivalent side by side:
# Slower: apply() with nested conditionals (same logic as assign_tier below)
df['tier'] = df.apply(
lambda row: 'Platinum' if (row['score'] >= 90 and row['bonus'] >= 7)
else 'Gold' if row['score'] >= 80
else 'Silver',
axis=1
)
# Faster: np.select performs the same logic without a Python loop
conditions = [
(df['score'] >= 90) & (df['bonus'] >= 7),
df['score'] >= 80
]
df['tier'] = np.select(conditions, ['Platinum', 'Gold'], default='Silver')
If the logic is too complex to express as array operations (external API calls, stateful parsing, irregular branching), keep apply(). Use np.where and np.select for everything that fits a condition-to-value mapping.
# Strip whitespace and apply title case to name strings
messy_names = pd.DataFrame({'name': [' alice ', 'BOB', ' carol ']})
clean_names = messy_names['name'].apply(lambda x: x.strip().title())
print(clean_names)
Output0 Alice
1 Bob
2 Carol
Name: name, dtype: str
# Assign a tier label based on score and bonus thresholds using axis=1
def assign_tier(row):
if row['score'] >= 90 and row['bonus'] >= 7:
return 'Platinum'
elif row['score'] >= 80:
return 'Gold'
else:
return 'Silver'
df['tier'] = df.apply(assign_tier, axis=1)
print(df[['name', 'score', 'bonus', 'tier']])
Output name score bonus tier
0 Alice 85 5.0 Gold
1 Bob 92 8.0 Platinum
2 Carol 78 3.0 Silver
# Safely convert mixed-type string values to float; return NaN on failure
mixed = pd.DataFrame({'value': ['3.14', 'N/A', '2.71', '', '1.41']})
def safe_float(x):
try:
return float(x)
except (ValueError, TypeError):
return float('nan')
mixed['parsed'] = mixed['value'].apply(safe_float)
print(mixed)
Output value parsed
0 3.14 3.14
1 N/A NaN
2 2.71 2.71
3 NaN
4 1.41 1.41
apply() works at the Series or DataFrame level. On a DataFrame, it passes each column (axis=0) or row (axis=1) as a full Series to the function. DataFrame.map() (pandas 2.1+) always operates element-wise, receiving one scalar at a time. Series.map() applies element-wise substitution or transformation on a single Series, and also accepts a dictionary for label-based mapping.
Use df['column_name'].apply(func) to call func on each element of that column:
# pandas apply function to column: element-wise
df['score'].apply(lambda x: x * 1.1)
To apply a function to the column as a whole Series (for aggregation or normalization), use df[['column_name']].apply(func) with axis=0.
Pass the lambda directly as the func argument:
# pandas apply lambda on a single column
df['score'].apply(lambda x: x + 10)
# pandas apply lambda row-wise using axis=1
df.apply(lambda row: row['score'] + row['bonus'], axis=1)
Yes. Use args= for positional arguments and named keyword arguments for keyword parameters:
def adjust(x, offset, multiplier=1.0):
return (x + offset) * multiplier
# pandas apply with arguments: positional and keyword
df['score'].apply(adjust, args=(5,), multiplier=1.1)
axis=0 (default) applies the function to each column. The function receives one column as a Series per call. axis=1 applies the function to each row. The function receives one row as a Series per call, with column names as index labels.
apply() executes Python-level loops and is significantly slower than vectorized pandas or NumPy operations for simple transformations. Use vectorized expressions (+, *, np.where, str accessor methods) wherever possible. Use apply() only when the function requires multi-column access per row, complex branching, or external library calls that cannot be expressed as array operations.
DataFrame.applymap() was deprecated in pandas 2.1.0 and renamed to DataFrame.map(). The function signature and behavior are unchanged. Replace df.applymap(func) with df.map(func) in any codebase targeting pandas 2.1 or later. The method was fully removed in pandas 3.0.
Subset the DataFrame to the target columns, call apply() with axis=1, and set result_type='expand' to split the returned list into separate columns:
def extract_values(row):
return [row['score'] * 1.1, row['bonus'] * 2]
result = df.apply(extract_values, axis=1, result_type='expand')
result.columns = ['adj_score', 'adj_bonus']
print(result)
Output adj_score adj_bonus
0 93.5 10.0
1 101.2 16.0
2 85.8 6.0
This tutorial covered DataFrame.apply() from its core mechanics to practical usage patterns. You applied functions column-wise and row-wise using the axis parameter, passed positional and keyword arguments with args= and **kwargs, controlled output shape with result_type='expand' and result_type='broadcast', and applied element-wise transformations using DataFrame.map() (the pandas 2.1+ replacement for the deprecated applymap()).
With these patterns, you can implement custom data transformations across any DataFrame column or row without resorting to manual iteration using iterrows(). The performance comparison table provides a practical reference for choosing between apply(), map(), transform(), and vectorized operations based on the complexity of the logic and the size of the dataset.
For related topics, see the Python Pandas Module Tutorial for foundational DataFrame operations, and the official pandas apply() API documentation for the complete parameter reference and version history.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
I help Businesses scale with AI x SEO x (authentic) Content that revives traffic and keeps leads flowing | 3,000,000+ Average monthly readers on Medium | Sr Technical Writer(Team Lead) @ DigitalOcean | Ex-Cloud Consultant @ AMEX | Ex-Site Reliability Engineer(DevOps)@Nutanix
Java and Python Developer for 20+ years, Open Source Enthusiast, Founder of https://www.askpython.com/, https://www.linuxfordevices.com/, and JournalDev.com (acquired by DigitalOcean). Passionate about writing technical articles and sharing knowledge with others. Love Java, Python, Unix and related technologies. Follow my X @PankajWebDev
Building future-ready infrastructure with Linux, Cloud, and DevOps. Full Stack Developer & System Administrator. Technical Writer @ DigitalOcean | GitHub Contributor | Passionate about Docker, PostgreSQL, and Open Source | Exploring NLP & AI-TensorFlow | Nailed over 50+ deployments across production environments.
Thank you as I have been searching for ways to apply functions to pandas df as the current data is in insufficient!
- Carolyn
Hi, I have one problem in which two columns have 10 values and all are same assume 890 in one column and 689 in another and i have 3rd column where values are like this =>value = [23, 45, 67, 89, 90, 234, 1098, 4567] i want another column in which i have to add the value of third column and first compare it to 2nd column if it equals i have to stop adding for that column and then take next column i have to add values of 3rd column till its value equal to other column and collect its corresponding date where the sum has stopped since i will have one more column which contains a different date. 3980 0 2021-04-12 00:00:00 9.4 3980 0 2021-04-13 00:00:00 9.4 3980 0 2021-04-12 00:00:00 9.8 3980 0 2021-04-13 00:00:00 9.8 3980 0 2021-03-01 00:00:00 760 3980 0 2021-03-02 00:00:00 1630 3980 0 2021-03-03 00:00:00 1150 3980 0 2021-03-04 00:00:00 1000 3980 0 2021-03-05 00:00:00 20 3980 0 2021-03-08 00:00:00 210 3980 0 2021-03-09 00:00:00 340 3980 0 2021-03-10 00:00:00 150 3980 0 2021-03-11 00:00:00 160 3980 0 2021-03-12 00:00:00 50 3980 0 2021-03-15 00:00:00 10 3980 0 2021-03-16 00:00:00 350 3980 0 2021-03-17 00:00:00 200 3980 0 2021-03-18 00:00:00 50 If you find any solution please mail me
- swetha
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
From GPU-powered inference and Kubernetes to managed databases and storage, get everything you need to build, scale, and deploy intelligent applications.