Report this

What is the reason for this report?

Pandas DataFrame apply(): Methods, Examples, and Uses

Updated on June 16, 2026
Pandas DataFrame apply(): Methods, Examples, and Uses

Introduction

The pandas apply() function applies a callable to each column (default) or each row of a DataFrame, or to each element of a Series. It is the standard tool for running custom logic that cannot be expressed as a vectorized pandas or NumPy operation. This tutorial covers the complete DataFrame.apply() parameter set, row-wise and column-wise usage, argument passing with args= and **kwargs, output shape control with result_type, element-wise application with DataFrame.map() (the pandas 2.1+ replacement for applymap()), and a structured comparison of apply() against map(), transform(), and vectorized operations.

Key Takeaways

  • apply() applies a function along axis=0 (column-wise) or axis=1 (row-wise) on a DataFrame, or element-wise on a Series.
  • Use axis=1 to access values from two or more columns within a single row in one function call.
  • Pass extra arguments to the function using args= for positional parameters or **kwargs for keyword parameters.
  • Use result_type='expand' to split a returned list or tuple into separate output columns.
  • applymap() was deprecated in pandas 2.1 and replaced by DataFrame.map() for element-wise operations.
  • The by_row parameter (pandas 2.1+) only has an effect when func is a list-like or dict-like of functions (and not a string). Default 'compat' first tries to translate each function into an equivalent pandas method; False passes the whole Series to each function.
  • apply() iterates under the hood and is significantly slower than vectorized operations; use it only when the logic requires Python-level control flow.

Prerequisites

Required Packages and Versions

  • Python 3.8 or later
  • pandas 1.5 or later (pandas 2.1+ recommended for DataFrame.map() coverage)
  • NumPy

Install or upgrade with:

pip install --upgrade pandas numpy

All output blocks in this tutorial were generated with pandas 3.0.2. On pandas 2.x, string Series display as dtype: object rather than dtype: str.

Verify installed versions:

import pandas as pd
import numpy as np

print(pd.__version__)
print(np.__version__)

Sample DataFrame Used in This Tutorial

Most examples in this tutorial use the following DataFrame:

# Sample DataFrame used throughout this tutorial
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'name':  ['Alice', 'Bob', 'Carol'],
    'score': [85, 92, 78],
    'grade': ['B', 'A', 'C'],
    'bonus': [5.0, 8.0, 3.0]
})

print(df)
Output
name score grade bonus 0 Alice 85 B 5.0 1 Bob 92 A 8.0 2 Carol 78 C 3.0

How Pandas DataFrame apply() Works

Function Signature and Parameters

DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), by_row='compat', **kwargs)
Parameter Type Default Description
func callable required Function applied to each column or row
axis int or str 0 0 or 'index' applies column-wise; 1 or 'columns' applies row-wise
raw bool False When True, passes a raw NumPy array instead of a Series. Useful when func uses only NumPy operations; skipping the Series wrapper reduces overhead and can improve performance.
result_type str or None None Controls output shape: 'expand', 'reduce', or 'broadcast'
args tuple () Positional arguments forwarded to func after the Series
by_row str or bool 'compat' Added in pandas 2.1. Only has an effect when func is a list-like or dict-like of functions (not a string). Default 'compat' first tries to translate each function into an equivalent pandas method; False passes the whole Series to each function.
**kwargs dict Keyword arguments forwarded to func

What Gets Passed to the Function (Series Objects)

With the default axis=0, apply() passes each column to func as a Series object. With axis=1, it passes each row as a Series object, with column names as the index labels. Inside the function, you access individual values by label: row['column_name'].

Warning: Functions that mutate the Series passed to them are explicitly unsupported and can produce unexpected behavior or errors. Do not modify row or col in place inside the function. Return a new value instead.

Axis=0 vs Axis=1 Explained with Examples

# axis=0 (default): function receives each column as a Series
col_max = df[['score', 'bonus']].apply(max, axis=0)
print(col_max)
Output
score 92.0 bonus 8.0 dtype: float64
# axis=1: function receives each row as a Series
row_max = df[['score', 'bonus']].apply(max, axis=1)
print(row_max)
Output
0 85.0 1 92.0 2 78.0 dtype: float64

The naming confuses most people at first: axis=0 sounds like it targets rows but processes column by column. Think of it as “the function moves along the index axis, one column at a time.” axis=1 moves along the column axis, one row at a time. When in doubt, add print(type(x), x) inside the function and run it on a small DataFrame to confirm what arrives.

Applying a Function to a Single Column

Using a Named Function

df['column_name'].apply(func) passes each element of that column as a scalar to func.

# Compute a letter grade from a numeric score using a named function
def letter_grade(score):
    if score >= 90:
        return 'A'
    elif score >= 80:
        return 'B'
    else:
        return 'C'

df['computed_grade'] = df['score'].apply(letter_grade)
print(df[['name', 'score', 'computed_grade']])
Output
name score computed_grade 0 Alice 85 B 1 Bob 92 A 2 Carol 78 C

Using a Lambda Function

# Apply a pandas apply lambda to scale each score by 1.05
df['scaled_score'] = df['score'].apply(lambda x: round(x * 1.05, 1))
print(df[['name', 'score', 'scaled_score']])
Output
name score scaled_score 0 Alice 85 89.2 1 Bob 92 96.6 2 Carol 78 81.9

Applying a Function Across Rows (axis=1)

Accessing Multiple Column Values per Row

With axis=1, the function receives each row as a Series. Use label-based indexing to access individual column values.

# Sum score and bonus per row using pandas apply row-wise
def total_score(row):
    return row['score'] + row['bonus']

df['total'] = df.apply(total_score, axis=1)
print(df[['name', 'score', 'bonus', 'total']])
Output
name score bonus total 0 Alice 85 5.0 90.0 1 Bob 92 8.0 100.0 2 Carol 78 3.0 81.0

Returning a Scalar Value per Row

# Classify each row based on multi-column thresholds
def classify(row):
    if row['score'] >= 90 and row['bonus'] >= 7:
        return 'High Performer'
    elif row['score'] >= 80:
        return 'On Track'
    else:
        return 'Needs Review'

df['status'] = df.apply(classify, axis=1)
print(df[['name', 'status']])
Output
name status 0 Alice On Track 1 Bob High Performer 2 Carol Needs Review

Applying a Function to Multiple Columns

Subsetting the DataFrame Before apply()

Subset the DataFrame to the target columns before calling apply() to avoid unintended side effects on non-numeric or irrelevant columns.

# Apply numpy square root to numeric columns only
numeric_df = df[['score', 'bonus']]
sqrt_df = numeric_df.apply(np.sqrt)
print(sqrt_df)
Output
score bonus 0 9.219544 2.236068 1 9.591663 2.828427 2 8.831761 1.732051

Using apply() with Two Specific Columns

# Normalize score and bonus to a 0-1 range using pandas apply with multiple columns
def normalize_column(col):
    return col / col.max()

scaled = df[['score', 'bonus']].apply(normalize_column, axis=0)
print(scaled)
Output
score bonus 0 0.923913 0.625 1 1.000000 1.000 2 0.847826 0.375

Passing Arguments to apply() Functions

Using args= for Positional Arguments

The args= parameter accepts a tuple of positional values passed to func after the Series or element.

# Add a fixed offset and multiplier to each score using args= for positional arguments
def adjust_score(x, offset, multiplier):
    return (x + offset) * multiplier

adjusted = df['score'].apply(adjust_score, args=(5, 1.1))
print(adjusted)
Output
0 99.0 1 106.7 2 91.3 Name: score, dtype: float64

Using **kwargs for Keyword Arguments

Keyword arguments are passed directly as named parameters after args=.

# Clamp score values between a low and high threshold using keyword arguments
def clamp(x, low=0, high=100):
    return max(low, min(x, high))

clamped = df['score'].apply(clamp, low=80, high=90)
print(clamped)
Output
0 85 1 90 2 80 Name: score, dtype: int64

Controlling Output Shape with result_type

The result_type parameter controls how DataFrame.apply() structures its output for axis=1 when func returns a list-like (for example, a list or tuple). Use result_type='expand' to split list-like values into columns, 'reduce' to return a Series when possible, or 'broadcast' to broadcast results to the original shape.

result_type=‘expand’ for Splitting Return Values

result_type='expand' splits each returned list or tuple into a separate column.

# Return score and bonus as a list; expand into two output columns
def score_bonus_pair(row):
    return [row['score'], row['bonus']]

expanded = df.apply(score_bonus_pair, axis=1, result_type='expand')
expanded.columns = ['score_out', 'bonus_out']
print(expanded)
Output
score_out bonus_out 0 85.0 5.0 1 92.0 8.0 2 78.0 3.0

result_type=‘broadcast’ for Preserving Shape

result_type='broadcast' broadcasts a scalar return value back to all column positions in the row, preserving the original column labels.

# Broadcast the row mean back to all columns using result_type='broadcast'
broadcast_df = df[['score', 'bonus']].apply(
    lambda row: row.mean(), axis=1, result_type='broadcast'
)
print(broadcast_df)
Output
score bonus 0 45.0 45.0 1 50.0 50.0 2 40.5 40.5

Applying Functions Element-Wise

Using DataFrame.map() (pandas 2.1+)

DataFrame.map() applies a function to each individual element of a DataFrame. It replaces applymap() starting with pandas 2.1.

# Apply square root to every cell in a DataFrame using DataFrame.map() (pandas 2.1+)
import math

measurements = pd.DataFrame({'area': [4.0, 9.0, 16.0], 'length': [25.0, 36.0, 49.0]})
result = measurements.map(math.sqrt)
print(result)
Output
area length 0 2.0 5.0 1 3.0 6.0 2 4.0 7.0

Note on applymap() Deprecation in pandas 2.1

Warning (pandas 2.1+): DataFrame.applymap() was deprecated in pandas 2.1.0 and removed in pandas 3.0. Replace all uses of df.applymap(func) with df.map(func). The function signature and behavior are identical; only the method name changed.

For users on pandas < 2.1:

# pandas < 2.1: DataFrame.map is unavailable; pandas 2.1–2.x: applymap works but is deprecated (FutureWarning)
import math
result = measurements.applymap(math.sqrt)

Use df.map(func) in place of df.applymap(func) on pandas 2.1+.

apply() vs map() vs transform() vs Vectorized Operations

Comparison Table

Method Scope Input per call Typical use case
DataFrame.apply(axis=0) DataFrame One column as Series Custom column-level aggregations
DataFrame.apply(axis=1) DataFrame One row as Series Row-level logic using multiple columns
Series.apply() Series One element (scalar) Element-level transformations on one column
DataFrame.map() DataFrame One element (scalar) Element-wise transforms across the full DataFrame (pandas 2.1+)
Series.map() Series One element (scalar) Element-wise mapping or dict-based substitution
DataFrame.transform() DataFrame One column as Series Group-aware transforms that preserve original shape and index
Vectorized (+, *, str., np.*) DataFrame or Series Entire array Arithmetic, string ops, NumPy ufuncs

When to Use Each Method

  • Use apply(axis=1) when the logic requires values from two or more columns in the same row and cannot be expressed as a vectorized operation.
  • Use apply(axis=0) for custom column-level aggregations not covered by built-in methods like sum() or mean().
  • Use Series.apply() for element-level transformations on a single column that require Python control flow.
  • Use DataFrame.map() (pandas 2.1+) for element-wise transformations across the full DataFrame.
  • Use transform() with groupby() to apply a function within groups while preserving the original DataFrame index.
  • Use vectorized operations for any arithmetic, comparison, or string operation that can be expressed as a pandas or NumPy expression.

Using transform() with groupby()

transform() fills every row with a group-level result while keeping the original shape, something apply() on a grouped object cannot do without losing the original index.

# Fill each row with its department average salary using groupby().transform()
employees = pd.DataFrame({
    'dept':   ['eng', 'eng', 'mkt', 'mkt', 'mkt'],
    'salary': [90000, 110000, 70000, 80000, 75000]
})
employees['dept_avg'] = employees.groupby('dept')['salary'].transform('mean')
print(employees)
Output
dept salary dept_avg 0 eng 90000 100000.0 1 eng 110000 100000.0 2 mkt 70000 75000.0 3 mkt 80000 75000.0 4 mkt 75000 75000.0

Because transform() guarantees the output index matches the input, assigning the result directly back as a new column is safe. Use apply() on grouped data only when transform() cannot express the logic.

Performance Considerations

apply() executes Python-level loops and carries significant overhead compared to vectorized operations, which run at the C level across entire arrays. Use the following block to measure the gap on your own data:

# Benchmark apply(axis=1) vs vectorized on a 100,000-row DataFrame
import timeit
import numpy as np
import pandas as pd

np.random.seed(42)
big = pd.DataFrame({
    'score': np.random.randint(50, 100, 100_000),
    'bonus': np.random.uniform(1, 10, 100_000)
})

t_vec = timeit.timeit(
    lambda: np.where(big['score'] >= 80, 'pass', 'fail'), number=10
) / 10

t_apply = timeit.timeit(
    lambda: big.apply(lambda r: 'pass' if r['score'] >= 80 else 'fail', axis=1),
    number=3
) / 3

print(f"np.where:      {t_vec  * 1000:.1f} ms")
print(f"apply(axis=1): {t_apply * 1000:.1f} ms")
print(f"Ratio:         ~{t_apply / t_vec:.0f}x")
Output
np.where: 0.8 ms apply(axis=1): 445.7 ms Ratio: ~525x

Exact timings vary by hardware and operation complexity; the ratio, not the absolute milliseconds, is the signal. The table below shows representative ranges; profile your specific function before optimizing.

Method Relative speed Notes
NumPy vectorized (np.where, np.select) 1x (baseline) C-level array operation; fastest option
pandas vectorized (df['col'] * 2) ~1-2x Near-native; uses optimized pandas internals
Series.map() ~10-50x slower Python loop per element
apply(axis=0) ~10-50x slower Python loop per column
apply(axis=1) ~100x–1,000x+ slower Python loop per row; gap depends on operation complexity
iterrows() ~500x–5,000x+ slower Never use for transformations at scale

Reserve apply() for logic that genuinely requires Python control flow. For everything else, the next section shows how to convert the two most common apply() patterns to vectorized equivalents.

Converting apply() to Vectorized Operations

Single condition: replace apply(axis=1) with np.where:

# Slower: apply() with a single conditional per row
df['result'] = df.apply(lambda row: 'pass' if row['score'] >= 80 else 'fail', axis=1)

# Faster: np.where performs the same logic on the full array at once
df['result'] = np.where(df['score'] >= 80, 'pass', 'fail')

Multiple conditions: replace apply(axis=1) with np.select. The assign_tier function defined in Conditional Logic per Row is a direct example: the apply version and its vectorized equivalent side by side:

# Slower: apply() with nested conditionals (same logic as assign_tier below)
df['tier'] = df.apply(
    lambda row: 'Platinum' if (row['score'] >= 90 and row['bonus'] >= 7)
    else 'Gold' if row['score'] >= 80
    else 'Silver',
    axis=1
)

# Faster: np.select performs the same logic without a Python loop
conditions = [
    (df['score'] >= 90) & (df['bonus'] >= 7),
    df['score'] >= 80
]
df['tier'] = np.select(conditions, ['Platinum', 'Gold'], default='Silver')

If the logic is too complex to express as array operations (external API calls, stateful parsing, irregular branching), keep apply(). Use np.where and np.select for everything that fits a condition-to-value mapping.

Common Patterns and Real-World Examples

String Normalization Across a Column

# Strip whitespace and apply title case to name strings
messy_names = pd.DataFrame({'name': ['  alice ', 'BOB', ' carol  ']})
clean_names = messy_names['name'].apply(lambda x: x.strip().title())
print(clean_names)
Output
0 Alice 1 Bob 2 Carol Name: name, dtype: str

Conditional Logic per Row

# Assign a tier label based on score and bonus thresholds using axis=1
def assign_tier(row):
    if row['score'] >= 90 and row['bonus'] >= 7:
        return 'Platinum'
    elif row['score'] >= 80:
        return 'Gold'
    else:
        return 'Silver'

df['tier'] = df.apply(assign_tier, axis=1)
print(df[['name', 'score', 'bonus', 'tier']])
Output
name score bonus tier 0 Alice 85 5.0 Gold 1 Bob 92 8.0 Platinum 2 Carol 78 3.0 Silver

Parsing and Cleaning Mixed-Type Data

# Safely convert mixed-type string values to float; return NaN on failure
mixed = pd.DataFrame({'value': ['3.14', 'N/A', '2.71', '', '1.41']})

def safe_float(x):
    try:
        return float(x)
    except (ValueError, TypeError):
        return float('nan')

mixed['parsed'] = mixed['value'].apply(safe_float)
print(mixed)
Output
value parsed 0 3.14 3.14 1 N/A NaN 2 2.71 2.71 3 NaN 4 1.41 1.41

FAQ

What is the difference between apply() and map() in pandas?

apply() works at the Series or DataFrame level. On a DataFrame, it passes each column (axis=0) or row (axis=1) as a full Series to the function. DataFrame.map() (pandas 2.1+) always operates element-wise, receiving one scalar at a time. Series.map() applies element-wise substitution or transformation on a single Series, and also accepts a dictionary for label-based mapping.

How do I apply a function to a specific column in a pandas DataFrame?

Use df['column_name'].apply(func) to call func on each element of that column:

# pandas apply function to column: element-wise
df['score'].apply(lambda x: x * 1.1)

To apply a function to the column as a whole Series (for aggregation or normalization), use df[['column_name']].apply(func) with axis=0.

How do I use a lambda function with pandas apply()?

Pass the lambda directly as the func argument:

# pandas apply lambda on a single column
df['score'].apply(lambda x: x + 10)

# pandas apply lambda row-wise using axis=1
df.apply(lambda row: row['score'] + row['bonus'], axis=1)

Can I pass arguments to the function used in apply()?

Yes. Use args= for positional arguments and named keyword arguments for keyword parameters:

def adjust(x, offset, multiplier=1.0):
    return (x + offset) * multiplier

# pandas apply with arguments: positional and keyword
df['score'].apply(adjust, args=(5,), multiplier=1.1)

What does axis=0 vs axis=1 mean in DataFrame.apply()?

axis=0 (default) applies the function to each column. The function receives one column as a Series per call. axis=1 applies the function to each row. The function receives one row as a Series per call, with column names as index labels.

Is apply() slow? When should I use vectorized operations instead?

apply() executes Python-level loops and is significantly slower than vectorized pandas or NumPy operations for simple transformations. Use vectorized expressions (+, *, np.where, str accessor methods) wherever possible. Use apply() only when the function requires multi-column access per row, complex branching, or external library calls that cannot be expressed as array operations.

What happened to applymap() in pandas 2.1?

DataFrame.applymap() was deprecated in pandas 2.1.0 and renamed to DataFrame.map(). The function signature and behavior are unchanged. Replace df.applymap(func) with df.map(func) in any codebase targeting pandas 2.1 or later. The method was fully removed in pandas 3.0.

How do I apply a function to multiple columns and return a DataFrame?

Subset the DataFrame to the target columns, call apply() with axis=1, and set result_type='expand' to split the returned list into separate columns:

def extract_values(row):
    return [row['score'] * 1.1, row['bonus'] * 2]

result = df.apply(extract_values, axis=1, result_type='expand')
result.columns = ['adj_score', 'adj_bonus']
print(result)
Output
adj_score adj_bonus 0 93.5 10.0 1 101.2 16.0 2 85.8 6.0

Conclusion

This tutorial covered DataFrame.apply() from its core mechanics to practical usage patterns. You applied functions column-wise and row-wise using the axis parameter, passed positional and keyword arguments with args= and **kwargs, controlled output shape with result_type='expand' and result_type='broadcast', and applied element-wise transformations using DataFrame.map() (the pandas 2.1+ replacement for the deprecated applymap()).

With these patterns, you can implement custom data transformations across any DataFrame column or row without resorting to manual iteration using iterrows(). The performance comparison table provides a practical reference for choosing between apply(), map(), transform(), and vectorized operations based on the complexity of the logic and the size of the dataset.

For related topics, see the Python Pandas Module Tutorial for foundational DataFrame operations, and the official pandas apply() API documentation for the complete parameter reference and version history.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author(s)

Anish Singh Walia
Anish Singh Walia
Author
Sr Technical Content Strategist and Team Lead
See author profile

I help Businesses scale with AI x SEO x (authentic) Content that revives traffic and keeps leads flowing | 3,000,000+ Average monthly readers on Medium | Sr Technical Writer(Team Lead) @ DigitalOcean | Ex-Cloud Consultant @ AMEX | Ex-Site Reliability Engineer(DevOps)@Nutanix

Pankaj Kumar
Pankaj Kumar
Author
See author profile

Java and Python Developer for 20+ years, Open Source Enthusiast, Founder of https://www.askpython.com/, https://www.linuxfordevices.com/, and JournalDev.com (acquired by DigitalOcean). Passionate about writing technical articles and sharing knowledge with others. Love Java, Python, Unix and related technologies. Follow my X @PankajWebDev

Vinayak Baranwal
Vinayak Baranwal
Editor
Technical Writer II
See author profile

Building future-ready infrastructure with Linux, Cloud, and DevOps. Full Stack Developer & System Administrator. Technical Writer @ DigitalOcean | GitHub Contributor | Passionate about Docker, PostgreSQL, and Open Source | Exploring NLP & AI-TensorFlow | Nailed over 50+ deployments across production environments.

Category:

Still looking for an answer?

Was this helpful?

Thank you as I have been searching for ways to apply functions to pandas df as the current data is in insufficient!

- Carolyn

Hi, I have one problem in which two columns have 10 values and all are same assume 890 in one column and 689 in another and i have 3rd column where values are like this =>value = [23, 45, 67, 89, 90, 234, 1098, 4567] i want another column in which i have to add the value of third column and first compare it to 2nd column if it equals i have to stop adding for that column and then take next column i have to add values of 3rd column till its value equal to other column and collect its corresponding date where the sum has stopped since i will have one more column which contains a different date. 3980 0 2021-04-12 00:00:00 9.4 3980 0 2021-04-13 00:00:00 9.4 3980 0 2021-04-12 00:00:00 9.8 3980 0 2021-04-13 00:00:00 9.8 3980 0 2021-03-01 00:00:00 760 3980 0 2021-03-02 00:00:00 1630 3980 0 2021-03-03 00:00:00 1150 3980 0 2021-03-04 00:00:00 1000 3980 0 2021-03-05 00:00:00 20 3980 0 2021-03-08 00:00:00 210 3980 0 2021-03-09 00:00:00 340 3980 0 2021-03-10 00:00:00 150 3980 0 2021-03-11 00:00:00 160 3980 0 2021-03-12 00:00:00 50 3980 0 2021-03-15 00:00:00 10 3980 0 2021-03-16 00:00:00 350 3980 0 2021-03-17 00:00:00 200 3980 0 2021-03-18 00:00:00 50 If you find any solution please mail me

- swetha

Creative CommonsThis work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 4.0 International License.
Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Start building today

From GPU-powered inference and Kubernetes to managed databases and storage, get everything you need to build, scale, and deploy intelligent applications.

Dark mode is coming soon.