You can find a hundred Python interview question lists in about ten seconds. Most of them are the same: here's the question, here's the answer, memorize it, good luck. Final Round AI's popular roundup runs to 95 questions in exactly that shape. Those lists optimize for the wrong thing.
I've sat on the interviewing side of enough Python screens to know what actually moves a decision, and it's almost never whether the candidate could recite the definition of a decorator. It's whether they could read a stack trace without flinching, whether they reached for a list comprehension or a four-line loop, whether they knew when a Pandas operation was about to blow up memory. Those signals don't show up on a flashcard.
This guide does something different. For every question, you get a short version of the strong answer, then the part that matters: what the question actually predicts about you on the job, and a trivia tax flag when the question rewards memorization more than skill. Use it to spend your prep hours where they count.
Why most Python question lists waste your prep time
Python is everywhere in interviews because it's everywhere in work. In the 2025 Stack Overflow Developer Survey, 57.9 percent of developers reported using Python, up seven points in a single year, the largest jump of any major language. It sits behind only JavaScript, HTML/CSS, and SQL. If you're interviewing for software engineering, data science, ML, or analytics, a Python screen is close to guaranteed.
That ubiquity is also why generic question lists fail you. When a topic is this broad, a list of 95 questions has to stay shallow to cover the surface. You end up with fifteen variations on "what's the difference between a list and a tuple" and nothing on the questions that actually separate candidates: reading unfamiliar code, debugging under pressure, choosing the right data structure when it matters.
There's a second problem. Interviewers know these lists exist, and they've adjusted. In interviewing.io's 2025 survey of 67 interviewers (52 of them at FAANG companies), 81 percent suspected candidates of using AI to cheat and 75 percent believed AI assistance was letting weaker candidates pass interviews they'd otherwise fail. The response has been more follow-up questions, more "walk me through why you did that," more probing of whether you understand the code on the screen. A memorized answer survives the first question and falls apart on the second.
The goal is to study the questions that build transferable reasoning and to spot the pure trivia, so you can give the trivia five minutes instead of fifty.
How to read this list
Each question below carries two notes.
Signal is what a strong answer tells an interviewer about how you'd perform on the job. Data wrangling speed, debugging instinct, idiomatic style, library fluency, systems thinking. This is the reason the question gets asked, even when the interviewer couldn't articulate it.
Trivia tax is a flag for when a question mostly rewards having seen it before. These questions still get asked, so you should know the answers, but memorizing them teaches you nothing you'd use writing real code. Learn them fast and move on.
To be clear about method: the signal and trivia-tax calls here are editorial judgment from time spent on the interviewing side, not the output of a formal study. Where I cite numbers, they come from named public sources, linked inline. The example questions are drawn from real screens and from Four-Leaf's own practice question bank.
Core language and idioms
This is where interviewers check whether you write Python or whether you write some other language using Python syntax. The questions look basic. The signal is in how idiomatic your answer is.
What's the difference between a list and a tuple, and when would you use each?
Lists are mutable, tuples are immutable and hashable, so tuples can be dictionary keys and set members while lists can't. The "when" matters more than the "what": tuples signal a fixed record (a coordinate, a row), lists signal a growing collection.
Signal: whether you think about mutability as a design choice, not just a property.
Trivia tax: partial. The definition is rote, but the "when would you use each" turns it into a real question.
What does a list comprehension do, and when should you not use one?
It builds a list in a single expression like [x * 2 for x in nums if x > 0]. The strong answer includes the "not": skip comprehensions when the logic needs multiple statements or side effects, and skip building a full list when a generator expression would stream the values lazily.
Signal: idiomatic style plus judgment about memory. A candidate who knows comprehensions but never knows when to stop will write unreadable nested ones.
Explain *args and **kwargs.
*args collects extra positional arguments into a tuple, **kwargs collects extra keyword arguments into a dict. You use them to write functions that forward arguments or accept a flexible signature.
Signal: low on its own. It matters with the follow-up: write a decorator that works on any function, which forces real use of both.
Trivia tax: yes, in isolation. Know it cold, spend no real time on it.
What is a decorator? Write one.
A decorator is a function that takes a function and returns a new function, used to wrap behavior like timing, logging, or caching without touching the original. A clean answer uses functools.wraps to preserve the wrapped function's name and docstring.
import functools
import time
def timed(fn):
@functools.wraps(fn)
def wrapper(*args, **kwargs):
start = time.perf_counter()
result = fn(*args, **kwargs)
print(f"{fn.__name__} took {time.perf_counter() - start:.4f}s")
return result
return wrapper
Signal: high. Decorators sit at the intersection of closures, first-class functions, and *args/**kwargs. A candidate who writes one cleanly understands a lot of Python at once. The functools.wraps detail separates people who've shipped decorators from people who've only read about them.
What's the difference between is and ==?
== compares values, is compares identity (whether two names point to the same object). The trap is small-integer and string interning, where a is b can be True for 256 but False for 257.
Signal: whether you understand that variables are references to objects. The interning trivia is a distraction.
Trivia tax: the interning edge case is pure trivia. The reference-model understanding underneath it is not.
How does Python handle default mutable arguments?
The default is evaluated once, at function definition, so def f(x, acc=[]) shares one list across all calls. The fix is acc=None then acc = acc or [] inside the body.
Signal: high, and underrated. This is a real bug that ships to production. A candidate who's hit it has written enough Python to have been burned, which is exactly the experience interviewers are probing for.
What are generators and why use them?
A generator produces values lazily with yield, holding only one value in memory at a time instead of building the whole sequence. You use them to process large or infinite streams without loading everything at once.
Signal: memory awareness and a grasp of laziness. Candidates who reach for generators on a "process this 10GB file" question are showing real instinct.
Explain how Python's GIL affects multithreading.
The Global Interpreter Lock means only one thread executes Python bytecode at a time, so threads don't speed up CPU-bound work. For I/O-bound work threads still help (the GIL releases during I/O waits), and for CPU-bound parallelism you use multiprocessing or native extensions.
Signal: high for backend roles. The follow-up that matters is "so when would you use threads at all," which separates people who memorized "GIL bad" from people who understand the I/O-bound case.
What's the difference between @staticmethod, @classmethod, and an instance method?
Instance methods take self, class methods take cls and can construct or configure the class, static methods take neither and are just namespaced functions. Class methods are the idiomatic way to write alternative constructors.
Signal: moderate. The alternative-constructor use of classmethod is the part that shows real OOP fluency.
Trivia tax: partial. The definitions are rote, the "when would you use a classmethod" is not.
What does if __name__ == "__main__": do?
It guards code that should run only when the file is executed directly, not when it's imported as a module. Without it, your script's side effects fire on import.
Signal: low. It's a useful idiom but knowing it predicts almost nothing about engineering ability.
Trivia tax: yes. One of the most over-asked Python questions. Know the one-sentence answer, move on.
What's the difference between shallow copy and deep copy?
A shallow copy duplicates the outer object but shares references to nested objects, so mutating a nested list shows up in both copies. copy.deepcopy recursively duplicates everything.
Signal: moderate. Connects to the reference model and to a real bug class. Candidates who've debugged a shared-nested-object bug answer this with conviction.
Data structures and algorithms in Python
Here the language matters less than the reasoning, but Python-specific tools (dicts, sets, collections, heapq, slicing) are exactly what interviewers want to see you reach for. Solving with the right standard-library tool is itself a signal.
Sum all numbers in a nested list of arbitrary depth.
Recurse: if an item is a list, recurse into it, otherwise add it. Use isinstance(item, list) rather than type(item) == list so subclasses work too.
Signal: clean recursion and the isinstance detail. The detail is small but it's the kind of correctness instinct that shows up everywhere in real code.
Check whether a string's characters can be rearranged into a palindrome.
Count character frequencies; a palindrome allows at most one character with an odd count. collections.Counter plus a single pass over the counts does it.
Signal: whether you reach for Counter instead of building a frequency dict by hand. Reinventing Counter isn't a sin, but it tells the interviewer you don't know the standard library well.
Find the minimum window in a string that contains all characters of a target string.
Sliding window with a "missing" counter: expand the right edge until the window is valid, then shrink from the left while tracking the best window seen. This is a hard question and interviewers know it.
Signal: high. Sliding window is one of the highest-value patterns to internalize because it transfers across dozens of problems. Getting the shrink condition right under pressure is a strong signal.
Implement an LRU cache with O(1) get and put.
In Python the shortcut is collections.OrderedDict with move_to_end on access and popitem(last=False) on eviction. The deeper answer is a hash map plus a doubly linked list, which is what you'd write if asked to do it without the standard library.
Signal: high, and it's a great question precisely because it has two valid altitudes. Knowing the OrderedDict trick shows Python fluency; being able to drop to the linked-list version shows you understand why it's O(1).
Implement a topological sort.
Kahn's algorithm: compute in-degrees, start from the zero-in-degree nodes, and reduce neighbors' in-degrees as you remove nodes. If you can't process every node, there's a cycle.
Signal: graph reasoning and the cycle-detection insight. Comes up more than people expect because dependency ordering is a real problem (build systems, task schedulers).
Count the number of islands in a 2D grid.
Scan the grid; on each unvisited land cell, increment the count and flood-fill (DFS or BFS) to mark the connected region. Mutating visited cells in place avoids a separate visited structure.
Signal: grid traversal and DFS, both extremely common. The in-place-visited trick is a small efficiency signal.
How would you remove duplicates from a list while preserving order?
list(dict.fromkeys(items)). Dicts preserve insertion order since Python 3.7, so this is both correct and idiomatic. The naive answer rebuilds with a seen-set, which works but is more code.
Signal: idiomatic Python. The dict.fromkeys answer reliably surprises interviewers in a good way.
Trivia tax: partial. Knowing the one-liner is a bit of a party trick, but the underlying insight (dicts are ordered, sets aren't) is real.
What's the time complexity of common Python operations?
List append and index are O(1), list membership (x in list) is O(n), dict and set lookup are O(1) average. The trap is x in some_list inside a loop, which quietly makes an algorithm O(n^2).
Signal: high. This is the single most practically useful complexity knowledge, because the list-membership trap shows up in real code constantly. Candidates who instinctively switch a list to a set for membership checks are showing exactly the right reflex.
Given a stream of numbers, return the k largest at any point.
Maintain a min-heap of size k with heapq: push each number, and pop the smallest whenever the heap exceeds k. The top of the heap is your kth largest.
Signal: whether you know heapq exists and when a heap beats sorting. Sorting the whole stream is O(n log n) per query; the heap is O(n log k), which matters at scale.
Libraries that actually come up
For data and backend roles, library fluency often matters more than raw algorithms. These questions test whether you've used the tools, not just read about them.
In Pandas, what's the difference between loc and iloc?
loc selects by label, iloc selects by integer position. The bug they're probing for is chained indexing like df[df.a > 0]['b'] = 1, which can silently fail; the fix is a single loc call.
Signal: real Pandas mileage. Anyone who's used Pandas seriously has been bitten by the SettingWithCopyWarning, and mentioning it unprompted is a strong tell.
Why is vectorized NumPy or Pandas code faster than a Python loop?
The operations run in compiled C over contiguous memory, avoiding Python's per-element interpreter overhead and object boxing. A loop over a DataFrame row by row can be hundreds of times slower than the vectorized equivalent.
Signal: high for data roles. The follow-up is usually "so how would you avoid iterating this DataFrame," and the strong answer reaches for vectorization, with .apply only as a last resort.
When would you use apply versus a vectorized operation in Pandas?
Prefer vectorized operations whenever they exist; apply runs a Python function per row or per group and loses the C-speed advantage. Reach for apply only when the logic genuinely can't be vectorized.
Signal: whether you treat apply as a convenience or a performance cliff. Candidates who reach for apply first are usually newer to Pandas.
How do you handle missing data in Pandas?
dropna removes it, fillna replaces it, and the real answer is "it depends on why it's missing." The strong candidate asks whether the data is missing at random before choosing, because filling with a mean can distort a model.
Signal: high for data science. This is where statistical thinking shows through a Pandas question. The mechanical answer is easy; the judgment is the signal.
What's a NumPy broadcasting rule?
NumPy stretches arrays of compatible shapes so element-wise operations work without copying, comparing dimensions from the right and treating size-1 dimensions as stretchable. Adding a shape (3,1) array to a shape (1,4) array yields (3,4).
Signal: real NumPy fluency. Broadcasting is the thing people either understand or fake, and a clean shape example is hard to fake.
Explain async/await and when it helps.
async/await lets a single thread handle many I/O-bound tasks by suspending one while it waits and running another. It helps for network calls, database queries, and file I/O; it does nothing for CPU-bound work.
Signal: high for backend roles. The discriminating follow-up is "would async speed up a heavy computation," and the right answer is no, because it doesn't add parallelism.
What does the requests library do, and how do you handle a failed request?
It's the standard HTTP client. The mature answer covers response.raise_for_status(), timeouts (always set one), and retry logic with backoff for transient failures.
Signal: production instinct. Junior answers stop at requests.get(url).json(). Senior answers mention the timeout unprompted, because they've had a request hang forever in production.
How do you write a test in pytest?
Write a function named test_* with a plain assert. Use fixtures for shared setup, parametrize to run one test over many inputs, and monkeypatch or mocking to isolate external calls.
Signal: whether testing is a habit or an afterthought. Mentioning parametrize and fixtures unprompted signals someone who actually writes tests, not someone who's heard tests are good.
What's a context manager and why use one?
An object that defines __enter__ and __exit__, used with with to guarantee cleanup (closing files, releasing locks) even if an exception fires. You can also write one with contextlib.contextmanager and a generator.
Signal: moderate to high. Knowing with open(...) is table stakes; being able to write your own context manager shows real depth.
How do you read a large file that doesn't fit in memory?
Iterate over the file object line by line (for line in f), which streams rather than loading everything, or read in fixed-size chunks. For structured data, Pandas read_csv with chunksize gives you an iterator of DataFrames.
Signal: high. Memory-aware file handling is a real-world skill that pure algorithm questions miss entirely.
Debugging and code-reading questions interviewers actually use
This is the section the competitor lists skip, and it's the one that predicts the job best. On the job you read and fix far more code than you write from scratch. Good interviewers know it, so they show you broken code and watch how you reason.
Here's a function that's slow. Make it faster.
The strong move is to profile before guessing: cProfile or even a few time.perf_counter() calls to find the actual hot spot. The most common real culprit is an O(n) membership test (x in list) inside a loop, fixable by switching to a set.
Signal: very high. Profiling before optimizing is the clearest separator between engineers who've worked on real performance problems and those who guess. Candidates who immediately start rewriting without measuring are showing you how they'd behave on the job.
This code throws a KeyError intermittently. How do you debug it?
Reproduce it, read the traceback to the exact line, then reason about why the key is sometimes absent (a race, a missing default, an assumption about input). Tools: dict.get with a default, collections.defaultdict, or a guard, depending on the cause.
Signal: high. Reading a traceback calmly and working from the bottom line up is a learnable skill that many candidates visibly lack. Watching someone debug is more informative than watching them code.
What's wrong with this code?
def add_item(item, items=[]):
items.append(item)
return items
The mutable default argument is shared across calls, so the list accumulates across every call that doesn't pass its own list. Fix with items=None and items = items if items is not None else [].
Signal: high. This is the mutable-default bug in disguise, and recognizing it on sight tells the interviewer you've been bitten before, which means real experience.
Read this comprehension out loud and tell me what it does.
result = [x for row in matrix for x in row if x > 0]
It flattens a 2D matrix and keeps positive values. The two for clauses read left to right like nested loops, which trips up people who only ever write single-level comprehensions.
Signal: code-reading fluency. Being able to parse dense Python you didn't write is a daily-work skill that whiteboard questions never touch.
This test passes locally but fails in CI. What do you check?
Order-dependence between tests, shared mutable state, hardcoded paths, timezone or locale assumptions, and unpinned dependencies. The meta-signal is whether the candidate has a systematic checklist or just shrugs.
Signal: high for anyone past junior. Flaky-test debugging is a real and frustrating part of the job, and having a mental checklist is exactly the experience interviewers want.
Walk me through what happens when this code runs.
Interviewers increasingly hand you working code and ask you to trace it, specifically because tracing is hard to fake with AI. The strong answer narrates state changes step by step and flags any line that would surprise a reader.
Signal: high, and rising. Given that 81 percent of interviewers in interviewing.io's survey suspect AI-assisted cheating, expect more code-reading and fewer blank-page prompts. The skill being tested is genuine comprehension.
Data science and ML-flavored Python questions
For data science and ML roles, Python is the medium and the real questions are about statistics, modeling, and judgment. The interviewer wants to know you can turn a vague problem into clean code and defensible reasoning.
Explain the bias-variance tradeoff.
High bias means the model is too simple and underfits; high variance means it's too complex and overfits to noise. The tradeoff is choosing model complexity so test error is minimized, often with regularization to pull a complex model back.
Signal: foundational. Nearly every DS loop asks some version of this. A strong answer connects it to a concrete decision (why you'd add regularization), not just the textbook definition.
Trivia tax: partial. The definition is rote, but the "how would you act on it" is real.
What's the difference between L1 and L2 regularization?
L1 (Lasso) adds the absolute value of coefficients to the loss, which drives some to exactly zero and performs feature selection. L2 (Ridge) adds squared coefficients, which shrinks all of them smoothly without zeroing them out.
Signal: whether you understand the geometric reason L1 produces sparsity, not just that it does. The follow-up "why does L1 zero things out and L2 doesn't" separates memorizers from understanders.
How do you handle an imbalanced dataset?
Resampling (oversampling the minority, undersampling the majority, or SMOTE), class weights in the model, and crucially the right metric: accuracy is useless on a 99/1 split, so use precision, recall, F1, or AUC. The best answer starts with "what's the business cost of each error type."
Signal: high. This question rewards judgment over recipe. Candidates who jump straight to SMOTE without asking about the cost of false negatives are missing the point.
Explain precision versus recall and when you'd optimize for each.
Precision is the fraction of positive predictions that are correct; recall is the fraction of actual positives you caught. Optimize precision when false positives are costly (spam filtering), recall when false negatives are costly (cancer screening).
Signal: high. The concrete examples are what matter. A candidate who can map precision and recall onto a real decision understands the metrics; one who only recites the formulas usually doesn't.
How does gradient descent work, and what's the difference between batch, mini-batch, and stochastic?
Gradient descent walks the parameters downhill along the loss gradient, scaled by a learning rate. Batch uses the whole dataset per step (stable, slow), stochastic uses one example (noisy, fast), mini-batch splits the difference and is the standard in practice.
Signal: whether you understand the speed-versus-stability tradeoff and the role of the learning rate. The learning-rate sensitivity is the part that shows real training experience.
How does a random forest work and when would you choose it?
It's an ensemble of decision trees trained on bootstrapped samples with random feature subsets, averaging their predictions to reduce variance. Choose it when you want a strong baseline with little tuning and some feature-importance insight, on tabular data.
Signal: moderate. Knowing the mechanism is table stakes; the "when would you choose it over gradient boosting" follow-up is where real modeling judgment shows.
Explain backpropagation in simple terms.
A forward pass computes the prediction and loss; the backward pass uses the chain rule to compute how much each weight contributed to the loss, and the weights update in the direction that reduces it. It's the chain rule applied systematically across layers.
Signal: high for ML roles. The chain-rule framing is the discriminator. Candidates who can explain it without hand-waving understand what their framework is doing under loss.backward().
You're given a messy dataset and asked to predict X. Walk me through your approach.
The strong answer is a process, not an algorithm: understand the target and the business question, explore and clean the data, establish a simple baseline, then iterate with better features and models while validating honestly. Mentioning a baseline first is the senior tell.
Signal: very high, and the most realistic question in any DS loop. It maps directly to the actual job. Candidates who jump to "I'd train XGBoost" without mentioning a baseline or validation are showing inexperience.
How would you design an A/B test, and how do you know when to stop it?
Define the metric and minimum detectable effect, compute the sample size for adequate power before you start, randomize properly, then run until you hit that sample size rather than peeking and stopping at the first significant result. Peeking inflates false positives.
Signal: high for product DS roles. The "don't peek" insight is the one that separates people who've actually run experiments from those who've only read about p-values.
What are word embeddings and why are they useful?
They map words to dense vectors where semantic similarity becomes geometric closeness, so "king" and "queen" sit near each other and analogies fall out of vector arithmetic. They let models transfer learned meaning instead of treating words as opaque IDs.
Signal: moderate for NLP-flavored roles. With LLMs now dominant, the more current follow-up is how embeddings relate to what a transformer learns, which tests whether you've kept up.
What to drill if your interview is in less than 7 days
You don't have time for all of this. Spend it where the signal density is highest.
- Core idioms that carry signal: decorators, generators, the mutable-default bug, list comprehensions, and the time complexity of dict, set, and list operations. These show up constantly and reveal fluency fast.
- Two algorithm patterns: sliding window and graph traversal (DFS and BFS). They cover a large share of medium questions and transfer across problems.
- One debugging rep per day: take a slow or broken snippet and fix it out loud. Profiling before optimizing and reading a traceback calmly are the highest-return skills you can build in a week.
- For data roles: Pandas
loc/ilocand theSettingWithCopyWarning, vectorization versusapply, missing-data judgment, and precision and recall mapped to a real decision. - Skip the pure trivia:
if __name__ == "__main__", reversing a string, reciting*argsand**kwargs. Know the one-line answers, spend nothing more.
How to practice so the answer comes out clean under pressure
Reading answers builds recognition. It does not build the ability to produce a clean answer while someone watches and the clock runs. Those are different skills, and the gap between knowing your answer and delivering it under pressure is where good candidates lose offers.
The fix is to practice out loud, under something like real conditions. Explain your reasoning as you go, because interviewers score your thinking as much as your code, and because narrating your approach is exactly what the rise in AI-cheating suspicion has made interviewers want to hear. Solve a problem you haven't seen, talk through the tradeoffs, and get feedback on where your explanation went fuzzy.
That's the gap Four-Leaf's voice mock interviews are built to close. You practice answering real questions out loud, get scored on substance and delivery, and drill the spots where you freeze, so the answer comes out clean when it counts. You can generate fresh Python questions by role and difficulty and run a full mock before your real one. The questions in this guide are a map of what gets tested. Practicing them out loud is how you turn the map into an offer.