
You can compare strings in Python using the equality (==) and comparison (<, >, !=, <=, >=) operators. There are no special methods to compare two strings. The == operator is the right choice for almost every string comparison: it checks whether two strings contain the same characters, and it works the same way everywhere Python runs.
Python string comparison compares the characters in both strings one by one. When different characters are found, Python looks at each character’s Unicode code point, which is simply the number Unicode assigns to that character. The character with the lower number is considered to be smaller. The official Python language reference describes it like this: “Strings (instances of str) compare lexicographically using the numerical Unicode code points (the result of the built-in function ord()) of their characters.”
In this article, you’ll learn how each of the operators works when comparing strings. You’ll also learn how == differs from is, why is gives unreliable results for strings, and how to compare strings while ignoring upper and lower case using lower() and casefold(). Finally, you’ll cover the trickier cases: checking for substrings, measuring how similar two strings are with difflib, comparing multiline strings, and handling accented characters.
Key Takeaways:
== and != to check whether two strings have the same value. This is the recommended way to compare strings in Python.ord() function returns.ord('A') is 65 while ord('a') is 97.is operator checks whether two names point to the exact same object in memory, not whether the values match, so it should never be used to compare string values.is return True for some equal strings and False for others. You cannot rely on this behavior..casefold() on both strings. It handles special characters, such as the German ß, that .lower() misses.in operator to check whether one string appears inside another, and startswith() and endswith() to check how a string begins or ends, as recommended by PEP 8.difflib.SequenceMatcher to measure how similar two strings are, and unicodedata.normalize() before comparing text that may contain accented characters.cmp() function was removed in Python 3, so code that relies on it must use the comparison operators instead.Declare the string variable:
fruit1 = 'Apple'
The following table shows the results of comparing identical strings (Apple to Apple) using different operators.
| Operator | Code | Output |
|---|---|---|
| Equality | print(fruit1 == 'Apple') |
True |
| Not equal to | print(fruit1 != 'Apple') |
False |
| Less than | print(fruit1 < 'Apple') |
False |
| Greater than | print(fruit1 > 'Apple') |
False |
| Less than or equal to | print(fruit1 <= 'Apple') |
True |
| Greater than or equal to | print(fruit1 >= 'Apple') |
True |
Both the strings are exactly the same. In other words, they’re equal. The equality operator and the other equal to operators return True.
If you compare strings of different values, then you get the exact opposite output.
If you compare strings that contain the same substring, such as Apple and ApplePie, then the longer string is considered larger. You can read more about how these operators behave with other data types in this Python operators reference.
This example code takes and compares input from the user. Then the program uses the results of the comparison to print additional information about the alphabetical order of the input strings. In this case, the program assumes that the smaller string comes before the larger string.
fruit1 = input('Enter the name of the first fruit:\n')
fruit2 = input('Enter the name of the second fruit:\n')
if fruit1 < fruit2:
print(fruit1 + " comes before " + fruit2 + " in the dictionary.")
elif fruit1 > fruit2:
print(fruit1 + " comes after " + fruit2 + " in the dictionary.")
else:
print(fruit1 + " and " + fruit2 + " are the same.")
Here’s an example of the potential output when you enter different values:
OutputEnter the name of the first fruit:
Apple
Enter the name of the second fruit:
Banana
Apple comes before Banana in the dictionary.
Here’s an example of the potential output when you enter identical strings:
OutputEnter the name of the first fruit:
Orange
Enter the name of the second fruit:
Orange
Orange and Orange are the same.
Note: For this example to work, both strings must use consistent casing throughout. For example, if the user enters the strings apple and Banana, then the output will be apple comes after Banana in the dictionary, which is incorrect.
This discrepancy occurs because the Unicode code point values of uppercase letters are always smaller than the Unicode code point values of lowercase letters: the value of a is 97 and the value of B is 66. You can test this yourself by using the ord() function to print the Unicode code point value of the characters.
Python compares strings lexicographically, which is a formal way of saying “like a dictionary orders words”. Python walks both strings from left to right, comparing one character at a time. The first position where the characters differ decides the result, based on the Unicode code point of each character. If one string runs out of characters first and all earlier characters matched, the shorter string is the smaller one.
You can see the numbers behind the ordering by passing characters to ord():
print(ord('A'), ord('Z'), ord('a'), ord('z'))
Running this prints the following output:
Output65 90 97 122
Uppercase A through Z use code points 65 through 90, while lowercase a through z use 97 through 122. Because every uppercase letter has a lower code point than every lowercase letter, "Zebra" < "apple" evaluates to True even though “apple” comes first alphabetically. The same rule explains how Python sorts lists of strings:
print("apple" < "banana")
print("apple" == "Apple")
print("Zebra" < "apple")
print(sorted(["banana", "Apple", "cherry"]))
Running this prints the following output:
OutputTrue
False
True
['Apple', 'banana', 'cherry']
The capitalized Apple sorts before the lowercase strings for the same code point reason. If you need true alphabetical sorting that ignores case, sort with a key function such as sorted(words, key=str.casefold). For text in languages with accented characters, the standard library locale module can sort text using the rules of a specific language, which gives more natural results than plain code point ordering.
== vs is when comparing stringsUse == when you want to know whether two strings contain the same characters, and save is for checks like x is None. The == operator compares values, while is compares identity: it returns True only when both names point to the exact same object in memory. Two strings can hold the same text while still being two separate objects.
The following example shows the difference:
str1 = "hello"
str2 = "hello"
print(str1 == str2)
print(str1 is str2)
str3 = "".join(["he", "llo"])
print(str3 == str1)
print(str3 is str1)
Running this prints the following output:
OutputTrue
True
True
False
All three variables hold the value hello, so == returns True in both comparisons. But is returns True for the first two strings and False for the string built with join(). The next section explains why.
CPython, the standard version of Python that most people install, saves memory by reusing some string objects instead of creating copies. This is called string interning. Strings written directly in your code (string literals), and short strings that look like variable names, are often interned. That is why str1 and str2 above end up pointing at the same object and is happens to return True.
Strings created while the program runs, such as the result of join(), joining variables with +, or user input, are usually not interned. That is why str3 is str1 returns False even though the values are equal.
You cannot rely on these rules. They are internal details of CPython: they have changed between Python versions and work differently in other versions of Python such as PyPy. Code that uses is for string values may pass your tests with short literals and then fail later with real data. This is exactly why the pattern is a frequent source of beginner bugs and a recurring question on Stack Overflow.
isPython’s official style guide says is belongs to special one-of-a-kind objects like None. PEP 8 states: “Comparisons to singletons like None should always be done with is or is not, never the equality operators.” In other words, value comparisons, including all string comparisons, belong to == and !=. Following this rule keeps your code correct no matter how interning behaves.
For everyday code, the speed difference between string comparison methods is tiny, so choose the method that is correct rather than the one that looks fastest in a benchmark. Under the hood, the == operator in CPython first checks whether both sides are the same object, then compares their lengths, and only then compares the actual characters. This means even long equal strings compare quickly.
The following benchmark uses the timeit module to compare three approaches on two equal 1,000-character strings:
import timeit
str1 = "a" * 1000
str2 = "a" * 1000
equality_time = timeit.timeit(lambda: str1 == str2, number=100000)
identity_time = timeit.timeit(lambda: str1 is str2, number=100000)
casefold_time = timeit.timeit(lambda: str1.casefold() == str2.casefold(), number=100000)
print(f"Equality (==): {equality_time:.4f} seconds")
print(f"Identity (is): {identity_time:.4f} seconds")
print(f"Casefolded equality: {casefold_time:.4f} seconds")
Running this prints output similar to the following (exact timings vary by machine and Python version):
OutputEquality (==): 0.0025 seconds
Identity (is): 0.0024 seconds
Casefolded equality: 0.0621 seconds
The is check is slightly faster than == because it only compares memory addresses, but it answers a different question and gives wrong results for equal strings that are separate objects, so the tiny saving is never worth it. The real cost in this benchmark is the case-insensitive comparison: calling casefold() creates two new strings on every comparison, making it roughly 25 times slower here. If you compare the same strings many times while ignoring case, casefold them once, store the result, and compare the stored values.
When comparing strings, you need to think about two things: case (the difference between uppercase and lowercase characters) and language-specific characters such as accented letters. The following best practices keep string comparisons accurate and efficient.
lower()To compare strings while ignoring upper and lower case, use the .lower() method to convert both strings to lowercase before comparing them. This approach is simple and works well for most cases. Here’s an example:
str1 = "Hello World"
str2 = "HELLO WORLD"
print(str1.lower() == str2.lower())
Running this prints the following output:
OutputTrue
However, .lower() may not be enough for languages that have more complex case rules, such as German or Turkish.
casefold()For more advanced case handling, use the .casefold() method. The Python documentation describes it this way: “Casefolding is similar to lowercasing but more aggressive because it is intended to remove all case distinctions in a string.” The classic example is the German lowercase letter ß, which is equivalent to ss when case is ignored. The .lower() method leaves ß unchanged, while .casefold() converts it to ss:
str1 = "Straße"
str2 = "STRASSE"
print(str1.lower() == str2.lower())
print(str1.casefold() == str2.casefold())
Running this prints the following output:
OutputFalse
True
The .lower() comparison fails because "Straße".lower() is "straße" while "STRASSE".lower() is "strasse". The .casefold() comparison succeeds because both strings casefold to "strasse". If your app handles text in many languages, prefer .casefold(); the Python documentation states that “casefolded strings may be used for caseless matching.”
When working with international text, you need to handle special characters and accents correctly. This includes characters like umlauts (ü), accents (é), and other accent marks. The tricky part is that the same visible character can be stored in different ways: é can be a single code point (U+00E9) or the plain letter e followed by a separate accent mark (U+0301). They look identical on screen, but the == operator sees them as different strings.
To compare such strings correctly, first convert both to the same standard form (this is called Unicode normalization) with the unicodedata module:
import unicodedata
str1 = "caf\u00e9" # "café" as a single precomposed code point
str2 = "cafe\u0301" # "café" as "e" plus a combining accent
print(str1 == str2)
nfc1 = unicodedata.normalize("NFC", str1)
nfc2 = unicodedata.normalize("NFC", str2)
print(nfc1 == nfc2)
Running this prints the following output:
OutputFalse
True
Beyond normalization, two more strategies help with international text:
Here’s an example that removes an accent before comparing:
str7 = "café" # String with accent
str8 = "cafe" # String without accent
# Preprocess to remove diacritical marks
preprocessed_str7 = str7.replace('é', 'e')
print(preprocessed_str7 == str8)
Running this prints the following output:
OutputTrue
By following these best practices, you can ensure that your string comparisons are accurate, efficient, and culturally sensitive, even when working with large strings and international text.
in, startswith(), and endswith()Sometimes you don’t need to compare whole strings. You only need to check whether one string appears inside, at the start of, or at the end of another. Use the in operator to check whether a string contains another string, and the startswith() and endswith() methods to check how a string begins or ends:
message = "Hello, World!"
print("World" in message)
print(message.startswith("Hello"))
print(message.endswith("World!"))
Running this prints the following output:
OutputTrue
True
True
All three checks care about case, so combine them with casefold() when case should not matter, for example "world" in message.casefold(). PEP 8 recommends these methods over cutting the string up with slicing: “Use ''.startswith() and ''.endswith() instead of string slicing to check for prefixes or suffixes.” They are clearer and avoid small indexing mistakes. For more complex patterns, such as “starts with a digit followed by a dash”, use regular expressions from the standard library re module instead.
difflibSometimes you need to know how similar two strings are, not just whether they are exactly equal. For that, use the difflib module from the standard library. Its SequenceMatcher class calculates a similarity score between 0 and 1, where 1 means the strings are identical:
from difflib import SequenceMatcher
str1 = "Hello, World!"
str2 = "Hello, Universe!"
print(SequenceMatcher(None, str1, str2).ratio())
Running this prints the following output:
Output0.6206896551724138
A score of about 0.62 tells you the two strings are mostly similar but not the same. This technique is useful for spotting near-duplicate records, suggesting corrections for misspelled input, or matching user-entered names against a known list. Pick a cutoff value that fits your data (0.8 is a common starting point) and treat anything above it as a likely match. For exact equality checks, stay with ==; SequenceMatcher is much slower and is meant for measuring similarity, not testing equality.
Multiline strings are compared exactly like single-line strings: == looks at every character, and that means spaces, newline characters, and indentation all count. Two strings that look the same when printed can still be unequal because one uses Unix-style \n line endings and the other uses Windows-style \r\n:
text1 = "first line\nsecond line"
text2 = "first line\r\nsecond line"
print(text1 == text2)
print(text1.splitlines() == text2.splitlines())
Running this prints the following output:
OutputFalse
True
The direct comparison fails because of the invisible \r character, while splitlines() splits on any kind of line ending and produces equal lists. When formatting differences should be ignored, clean both strings up before comparing: use strip() to remove spaces at the start and end, splitlines() to ignore line-ending differences, or " ".join(text.split()) to squeeze all repeated whitespace down to single spaces. The right cleanup depends on whether the whitespace carries meaning in your data.
Python 3 has two different types for text-like data, str and bytes, and knowing which one you are comparing helps you avoid many bugs. The following subsections walk through each one.
Unicode strings are the standard way to represent text in Python. They are sequences of Unicode characters, which are represented by the str type. Unicode strings are the default string type in Python 3. They can contain characters from any language, including non-ASCII characters like accents, umlauts, and non-Latin scripts.
Here’s an example of creating a Unicode string in Python:
unicode_str = "Hëllo, Wørld!"
print(unicode_str)
Running this prints the following output:
OutputHëllo, Wørld!
Notice how the string contains non-ASCII characters like the accented ‘e’ (ë) and the slashed ‘o’ (ø). These characters are correctly represented and can be manipulated like any other string in Python.
ASCII strings are a subset of Unicode strings that only contain characters from the ASCII character set. ASCII strings are typically used when working with legacy systems or when there’s a need to ensure compatibility with systems that only support ASCII characters.
In Python, ASCII strings are also represented by the str type, but they are limited to characters with ASCII code points (0-127). Here’s an example of creating an ASCII string in Python:
ascii_str = "Hello, World!"
print(ascii_str)
Running this prints the following output:
OutputHello, World!
Notice how the string only contains characters from the ASCII character set.
Byte strings, on the other hand, are sequences of bytes, which are represented by the bytes type in Python. Byte strings are typically used when working with binary data, such as reading or writing files, network communication, or cryptographic operations.
Here’s an example of creating a byte string in Python:
byte_str = b"Hello, World!"
print(byte_str)
Running this prints the following output:
Outputb'Hello, World!'
Notice the b prefix before the string literal, which indicates that it’s a byte string. Byte strings can be converted to Unicode strings using the decode() method, and vice versa using the encode() method.
For example, to convert a Unicode string to a byte string:
unicode_str = "Hëllo, Wørld!"
byte_str = unicode_str.encode('utf-8')
print(byte_str)
Running this prints the following output:
Outputb'H\xc3\xabllo, W\xc3\xb8rld!'
And to convert a byte string back to a Unicode string:
byte_str = b'H\xc3\xabllo, W\xc3\xb8rld!'
unicode_str = byte_str.decode('utf-8')
print(unicode_str)
Running this prints the following output:
OutputHëllo, Wørld!
By understanding the differences between Unicode, ASCII, and byte strings in Python, you can effectively work with various types of text data and ensure that your applications handle text correctly, regardless of the language or character set used. If you want a refresher on string fundamentals first, see this article on working with strings in Python.
If you work in more than one programming language, Python’s approach is simple: one operator, ==, compares string values, and there is no easy way to accidentally compare memory addresses when you meant to compare values. The table below shows how the same task looks in other languages.
| Language | Value Comparison | Identity/Reference Comparison | Notes |
|---|---|---|---|
| Python | a == b |
a is b |
is checks object identity and should not be used for value comparison |
| JavaScript | a === b |
(Not separate for primitives) | === compares primitive string values without type coercion; == coerces types first |
| Java | a.equals(b) |
a == b |
== on String objects compares references—a classic source of bugs |
| C | strcmp(a, b) == 0 |
a == b |
== compares pointers; strcmp() returns negative, zero, or positive |
Java’s == trap is the closest match to misusing is in Python: both compare object references, and both sometimes appear to work because the language reuses string objects behind the scenes, then fail unpredictably. C’s strcmp() return style (negative, zero, or positive) is also what Python 2’s removed cmp() function copied. Keeping these differences in mind helps you avoid carrying habits from one language into another.
The right comparison method depends on the question you are asking about the strings. This table brings together the recommendations from this tutorial:
| Method | Use it for | Notes |
|---|---|---|
==, != |
Checking if two strings are equal or not equal | The default choice for comparing strings |
<, >, <=, >= |
Ordering strings | Based on Unicode code points; uppercase sorts before lowercase |
is, is not |
Checking against special objects like None |
Never use for string value comparison |
casefold() + == |
Equality while ignoring upper and lower case | Handles all languages; prefer over lower() for international text |
in |
Checking if a string contains another string | Cares about case; combine with casefold() if needed |
startswith(), endswith() |
Checking how a string begins or ends | Recommended by PEP 8 over slicing |
difflib.SequenceMatcher |
Measuring how similar two strings are | Returns a score between 0 and 1; slower than == |
unicodedata.normalize() + == |
Text with accented characters | Convert both sides to the same form (NFC or NFD) first |
One more good habit is worth adopting: when a value might be None, check value is not None before comparing it to a string, for example if value is not None and value == "expected". Comparing None to a string with == is safe (it simply returns False), but a clear None check shows your intent and prevents errors from calls like value.casefold(), which raise an AttributeError when value is None.
The equality operator == is used to compare two strings in Python. It checks if the values of the strings are equal, character by character. This means that the comparison is done based on the actual characters in the strings, not their memory locations. For example:
str1 = "Hello, World!"
str2 = "Hello, World!"
print(str1 == str2)
This prints True because both strings contain exactly the same characters.
== and is in Python string comparison?The equality operator == is used to compare the values of two strings, while the identity operator is checks if both strings are the same object in memory. This distinction is important because two strings can have the same value but be different objects in memory. For example:
str1 = "".join(["Hello, ", "World!"])
str2 = "Hello, World!"
print(str1 == str2)
print(str1 is str2)
This prints True and then False: the strings have the same value but are different objects in memory, so == returns True while is returns False. Always use == for string value comparison.
To compare strings while ignoring upper and lower case, you can use the .lower() method to convert both strings to lowercase before comparing, or the more thorough .casefold() method for international text. This way, the case of the characters does not affect the result. For example:
str1 = "Hello, World!"
str2 = "HELLO, WORLD!"
print(str1.lower() == str2.lower())
This prints True. For strings that may contain non-ASCII characters, prefer str1.casefold() == str2.casefold().
You can use the .startswith() and .endswith() methods to check if a string starts or ends with a specific substring. These methods return True if the string starts or ends with the specified substring, and False otherwise. For example:
str1 = "Hello, World!"
print(str1.startswith("Hello"))
print(str1.endswith("World!"))
Both checks print True. PEP 8 recommends these methods over string slicing because they are cleaner and less error prone.
You can use the == operator to compare multiple strings at once. This can be done by chaining multiple == operators together. For example:
str1 = "Hello, World!"
str2 = "Hello, World!"
str3 = "Hello, World!"
print(str1 == str2 == str3)
This prints True only if every string in the chain is equal to the next one. Chained comparison is equivalent to str1 == str2 and str2 == str3.
The speed differences between string comparison methods in Python are very small for most use cases. The is operator is slightly faster than == because it only compares memory addresses rather than string contents, but it answers a different question and must not be used for value comparison. The slow operations are the ones that create new strings, such as calling .lower() or .casefold() on every comparison. If you compare the same values many times while ignoring case, casefold them once and reuse the result. The .startswith() and .endswith() methods are also faster and clearer than slicing the string and comparing the pieces.
Yes, you can compare strings that came from different encodings. However, you must first convert both byte strings to regular str text using the .decode() method so you are comparing text with text. For example:
str1 = b"Hello, World!".decode('utf-8')
str2 = b"Hello, World!".decode('utf-8')
print(str1 == str2)
This prints True. Comparing a bytes object directly to a str object always returns False in Python 3, so decode first and then compare.
You can use the difflib module to check if two strings are nearly identical or similar. The difflib.SequenceMatcher class provides a way to measure the similarity between two sequences, including strings. For example:
from difflib import SequenceMatcher
str1 = "Hello, World!"
str2 = "Hello, Universe!"
print(SequenceMatcher(None, str1, str2).ratio())
This prints 0.6206896551724138. The ratio() method returns a measure of the sequences’ similarity as a float in the range [0, 1]. A ratio of 1 means the sequences are identical, and a ratio of 0 means they have nothing in common.
Lexicographic comparison means Python compares strings character by character, like a dictionary orders words, using the Unicode code point value of each character (the same value returned by ord()). The first character that differs decides the result, so "apple" < "banana" evaluates to True because a has a lower code point than b. Uppercase letters count as smaller than lowercase letters because their code points are lower, which is why "Zebra" < "apple" is also True.
Use the in operator: "world" in "hello world" returns True. The check cares about case, so casefold both sides when case should not matter. To check how a string begins or ends, use str.startswith() and str.endswith(), and to find the position of the substring use str.find() or str.index().
lower() and casefold() for string comparison?The lower() method converts uppercase letters to lowercase using simple rules, while casefold() goes further and handles special characters that lower() misses. The difference shows up with characters such as the German ß: "Straße".lower() keeps the ß, but "Straße".casefold() converts it to "strasse", which correctly matches "STRASSE".casefold(). If your app handles text in many languages, prefer casefold().
In this article, you learned how to compare strings in Python using the equality (==) and comparison (<, >, !=, <=, >=) operators. You also learned why is checks identity rather than value and should be saved for checks like x is None. Finally, you covered the harder cases: ignoring case with casefold(), checking for substrings with in, startswith(), and endswith(), measuring similarity with difflib, and comparing accented text safely with Unicode normalization. This is a fundamental skill in Python programming, and mastering string comparison is essential for working with text data.
To further expand your knowledge of Python strings, we recommend exploring the following tutorials:
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
Java and Python Developer for 20+ years, Open Source Enthusiast, Founder of https://www.askpython.com/, https://www.linuxfordevices.com/, and JournalDev.com (acquired by DigitalOcean). Passionate about writing technical articles and sharing knowledge with others. Love Java, Python, Unix and related technologies. Follow my X @PankajWebDev
I help Businesses scale with AI x SEO x (authentic) Content that revives traffic and keeps leads flowing | 3,000,000+ Average monthly readers on Medium | Sr Technical Writer(Team Lead) @ DigitalOcean | Ex-Cloud Consultant @ AMEX | Ex-Site Reliability Engineer(DevOps)@Nutanix
With over 6 years of experience in tech publishing, Mani has edited and published more than 75 books covering a wide range of data science topics. Known for his strong attention to detail and technical knowledge, Mani specializes in creating clear, concise, and easy-to-understand content tailored for developers.
your day of love may bring the gratitude of others for life.
- Hobbes.Christine
print(‘Apple’ < ‘ApplePie’) does not return True because of the length. print(‘2’ < ‘11’) will return False.
- Ammar S Salman
when comparing strings, is only unicode of first letter considered or addition of unicodes of all the letters is considered?
- BS
You missed one thing, if it’s ‘applebanana’ and ‘appleorange’ then ‘appleorange’ is greater than ‘applebanana’. Hopefully, this helps.
- Akash
what if I want to get the difference in term of percentage.For instance , Apple and apple instead of getting false can I get a percentage of similarity like 93%
- Ahmed
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
From GPU-powered inference and Kubernetes to managed databases and storage, get everything you need to build, scale, and deploy intelligent applications.