
Quiz
- Why do dictionary cursors take noticeably more memory than tuple cursors when working with large datasets, and is there a rule of thumb for when to prefer one over the other? / I am still confused about the section on tuples versus dictionaries could you explain more? Thank you
A tuple is essentially K consecutive pointers pointing to values, so to store K columns from a database row would require K pointers (over the representation of the data).
A dictionary, on the other hand, has overhead for the array of buckets and the list of items in each bucket. Or maybe for a small dictionary, Python skips the buckets, but it still has to store the column headers (keys), the values, and the "next" pointers for the singly linked list. So, call it 3K pointers for a single row, minimum.
Multiply that by N rows for a large table, and things get exceed the available memory. I once tried to load all the diabetes data into python using dictionaries and Python just croaked.
- Can you explain what 'sys.argv' means in the complete example of a prepared query?
Sure. That's an array of the command-line arguments.
- I think I'm still confused on what SQL injection is, could you go over it again in class?
For sure! It's the notion that if you incorporate the user-specified information into the query before the query is parsed, the user-specified information can have malicious intent.
Consider Mallory asking a naive person to find out whether they have the following file in their account:
; rm -r ~/
Mallory is hoping they will blindly copy/paste that "filename" after an "ls" to see if they have the file.
- I think I'm a little confused about prepared queries and the use of %s.
The %s is where the user-specified information will be inserted after the command is parsed.
Think of it as like an argument to a function.
- In PyMySQL prepared queries, why do we use %s as the placeholder instead of Python's .format() or f-strings?
Historical reasons. F-strings weren't even a gleam in Guido's eye when PyMySQL first started. Even the format method might post-date it (I don't know). But
%s
has been in Python since the beginning of time (because it was part of C, which is almost as old as I am). - can you go over what each line in the correct prepared query means?
Of course! Let's do that: prepared queries
- Also I dont quite understand how the placeholders being set later escapes the problem of SQL injection. Like for the example in the reading: curs.execute(sql, [year, month, date]) - couldnt year, month, and date have the injected untrustworthy information here too? Or i guess my question is where is this year month date sourced from? I guess I would love more clarity regarding how -> sql = tmpl.format(name) is bad when this is also filling in the placeholder of {}.
Yes, they could still try to be malicious, but they won't succeed. To return to the
ls
example above, suppose instead we did:With the
ls
Python function, the most nefarious filename in the world won't be parsed as a Unix command. - This isn't a question about the matieral, just a note: I'm not sure if Q4 is a trick question, since we are supposed to treat PyMySQL as though it does real prepared queries (but in reality, it doesn't).
I didn't intend it to be a trick question. It was more about thoroughly reading the document, including the footnote.
- Why must PyMySQL fake the prepared queries, why doesnt it do the argument passing described in the footnote?
Fair question. Python has limited control over the MySQL end of the transaction, so it can't pass both at once. Instead, it either has to do two network transactions, which is time-consuming, or sanitize the inputs locally. It chose the latter.
- None! / none. All is clear! / Everything makes sense so far!