A tuple is essentially K consecutive pointers pointing to values, so to store K columns from a database row would require K pointers (over the representation of the data).
A dictionary, on the other hand, has overhead for the array of buckets and the list of items in each bucket. Or maybe for a small dictionary, Python skips the buckets, but it still has to store the column headers (keys), the values, and the "next" pointers for the singly linked list. So, call it 3K pointers for a single row, minimum.
Multiply that by N rows for a large table, and things get exceed the available memory. I once tried to load all the diabetes data into python using dictionaries and Python just croaked.
Sure. That's an array of the command-line arguments.
For sure! It's the notion that if you incorporate the user-specified information into the query before the query is parsed, the user-specified information can have malicious intent.
Consider Mallory asking a naive person to find out whether they have the following file in their account:
; rm -r ~/
Mallory is hoping they will blindly copy/paste that "filename" after an "ls" to see if they have the file.
The %s is where the user-specified information will be inserted after the command is parsed.
Think of it as like an argument to a function.
Historical reasons. F-strings weren't even a gleam in Guido's eye
when PyMySQL first started. Even the format method might post-date it
(I don't know). But %s
has been in Python since the
beginning of time (because it was part of C, which is almost as old as I am).
Of course! Let's do that: prepared queries
Yes, they could still try to be malicious, but they won't succeed. To return to the ls
example above, suppose instead we did:
With the ls
Python function, the most nefarious filename in the world won't be parsed as a Unix command.
I didn't intend it to be a trick question. It was more about thoroughly reading the document, including the footnote.
Fair question. Python has limited control over the MySQL end of the transaction, so it can't pass both at once. Instead, it either has to do two network transactions, which is time-consuming, or sanitize the inputs locally. It chose the latter.