Python Notes: Difference between revisions
(→Pandas) |
(→Pandas) |
||
Line 330: | Line 330: | ||
connecting pandas to mysql: | connecting pandas to mysql: | ||
< | <pre> | ||
import MySQLdb | import MySQLdb | ||
mysql_cn= MySQLdb.connect(host='myhost', | mysql_cn= MySQLdb.connect(host='myhost', | ||
Line 338: | Line 338: | ||
print 'loaded dataframe from MySQL. records:', len(df_mysql) | print 'loaded dataframe from MySQL. records:', len(df_mysql) | ||
mysql_cn.close() | mysql_cn.close() | ||
< | <pre> | ||
==== Series ==== | ==== Series ==== |
Revision as of 15:08, 18 April 2019
Start
Docs:
- Latest: docs.python.org
- Old release http://docs.python.org/release/2.2.1/lib/module-cgi.html
package management
use pip.
got yourself in a bind and get use network enabled pip ( cause you pooched your ssl libs? ).. fetch and "pip install -e ." -> https://pypi.org/simple/
testing
- in visual studio code et al
- https://code.visualstudio.com/docs/python/unit-testing
Basics
string encoding
at the top of the file ( second line ) :
(2.7.11 on freenas 9 this worked)
# coding=utf_8
more: https://www.python.org/dev/peps/pep-0263/
formatting
'{:>10}'.format('test') # padding
'{} {}'.format('one', 'two')
'{0!s} {0!r}'.format(Data()) # forms
'{:.5}'.format('xylophone') # truncate
'{:4d}'.format(42)
Functions
def functionname(argument1,argument2): """"Docstring can be Multiline """ dostuff() return(stuff)
ghetto command line argv
#!/usr/bin/python import sys print 'Number of arguments:', len(sys.argv), 'arguments.' print 'Argument List:', str(sys.argv)
Less ghetto argparse
import argparse parser = argparse.ArgumentParser() parser.add_argument('access_key', help='Access Key'); parser.add_argument('secret_key', help='Secret Key'); args = parser.parse_args() global access_key global secret_key access_key = args.access_key secret_key = args.secret_key
make bcrypt password
import bcrypt bcrypt.hashpw('myPlainPassword', bcrypt.gensalt())
what's in that object?
https://download.tuxfamily.org/jeremyblog/diveintopython-5.4/py/apihelper.py
then use it like this:
#!/usr/bin/env python from apihelper import info mything info(mything)
AutoVivification
import pprint class Vividict(dict): def __missing__(self, key): value = self[key] = type(self)() return value d = Vividict() d['foo']['bar'] d['foo']['baz'] d['fizz']['buzz'] d['primary']['secondary']['tertiary']['quaternary'] pprint.pprint(d)
vim tabs
in file:
# vim: tabstop=8 expandtab shiftwidth=4 softtabstop=4
in ~/.vimrc:
set tabstop=8 set expandtab set shiftwidth=4 set softtabstop=4
parse json
ref: https://docs.python.org/2/library/json.html
python -m json.tool my_json.json
parse yaml
python -c "from yaml import load, Loader; load(open('filename.yml'), Loader=Loader)"
parse lines
in the context of scapy packets:
from scapy.all import * from scapy.layers import http myhttp = pkt.getlayer(http.HTTPRequest).fields headers = myhttp['Headers'].splitlines() for header in headers: ( k, v ) = header.split(": ")
extract by character
useful for files with very long lines
first find the line that has _many_ characters and put it in it's own file:
sed -n '459p' postgresql-9.3-main.log > tmp
Then read the characters one by one..
#!/usr/local/bin/env python import sys with open('tmp') as f: for x in range(600): c = f.read(1) if not c: print "End of file" break sys.stdout.write(c) print
tuples
>>> x = [(1,2), (3,4), (5,6)] >>> for item in x: ... print "A tuple", item A tuple (1, 2) A tuple (3, 4) A tuple (5, 6) >>> for a, b in x: ... print "First", a, "then", b First 1 then 2 First 3 then 4 First 5 then 6
Exception handling
try: # rv, data = M.search(None, "ALL") # rv, data = M.search(None, 'SENTSINCE 1-Jan-2017 SENTBEFORE 31-Dec-2017') rv, data = M.search(None, 'SENTSINCE 1 Jan 2017') except Exception, e: print "M.search failed" print "Error %s" % M.error.message print "Exception is %s" % str(e)
also:
try: pass except Exception as e: # Just print(e) is cleaner and more likely what you want, # but if you insist on printing message specifically whenever possible... if hasattr(e, 'message'): print(e.message) else: print(e)
tempfile
temp file
f = NamedTemporaryFile(delete=False) f # <open file '<fdopen>', mode 'w+b' at 0x384698> f.name # '/var/folders/5q/5qTPn6xq2RaWqk+1Ytw3-U+++TI/-Tmp-/tmpG7V1Y0' f.write("Hello World!\n") f.close() os.unlink(f.name) os.path.exists(f.name) # False
Pretty Printer
import pprint pp = pprint.PrettyPrinter(indent=4) pp.pprint(stuff)
List Comprehension
[ <output expression> for <iterator variable> in <iterable> if <predicate expression> ].
Converting your brain:
new_things = []
for ITEM in old_things:
if condition_based_on(ITEM):
new_things.append("something with " + ITEM)
You can rewrite the above for loop as a list comprehension like this:
new_things = ["something with " + ITEM for ITEM in old_things if condition_based_on(ITEM)]
unconditionally:
doubled_numbers = [] for n in numbers: doubled_numbers.append(n * 2)
That same code written as a comprehension:
doubled_numbers = [n * 2 for n in numbers]
reference: http://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/
Tag extract
I have no idea why this works:
you get an instance from boto3 aws describe_instances and you want _one_ tag:
retention_days = [ int(t.get('Value')) for t in instance['Tags'] if t['Key'] == 'snapshot_retention'][0]
Libs of note
urllib
https://stackoverflow.com/questions/3238925/python-urllib-urllib2-post
come dave, get with the times:
https://urllib3.readthedocs.io/en/latest/
datetime
stuff
def last_day_of_month(any_day): next_month = any_day.replace(day=28) + datetime.timedelta(days=4) # this will never fail return next_month - datetime.timedelta(days=next_month.day)
Pandas
on chopping up data:
coloumns and rows.
https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-6fcd0170be9c
connecting pandas to mysql:
import MySQLdb mysql_cn= MySQLdb.connect(host='myhost', port=3306,user='myusername', passwd='mypassword', db='information_schema') df_mysql = pd.read_sql('select * from VIEWS;', con=mysql_cn) print 'loaded dataframe from MySQL. records:', len(df_mysql) mysql_cn.close() <pre> ==== Series ==== datacamp exercises: <pre> # Import pandas import pandas as pd # Import Twitter data as DataFrame: df df = pd.read_csv('tweets.csv') # Initialize an empty dictionary: langs_count langs_count = {} # Extract column from DataFrame: col col = df['lang'] # Iterate over lang column in DataFrame for entry in col: # If the language is in langs_count, add 1 if entry in langs_count.keys(): langs_count[entry] = langs_count[entry] + 1 # Else add the language to langs_count, set the value to 1 else: langs_count[entry] = 1 # Print the populated dictionary print(langs_count)
now as a fucntion
# Define count_entries() def count_entries(df, col_name): """Return a dictionary with counts of occurrences as value for each key.""" # Initialize an empty dictionary: langs_count langs_count = {} # Extract column from DataFrame: col col = df[col_name] # Iterate over lang column in DataFrame for entry in col: # If the language is in langs_count, add 1 if entry in langs_count.keys(): langs_count[entry] = langs_count[entry] + 1 # Else add the language to langs_count, set the value to 1 else: langs_count[entry] = 1 # Return the langs_count dictionary return(langs_count) # Call count_entries(): result result = count_entries(tweets_df,'lang') # Print the result print(result)
Matplotlib
Categorization
Ref: https://datascience.stackexchange.com/questions/14039/tool-to-label-images-for-classification
I just hacked together a very basic helper in python it requires that all images are stored in a python list allImages.
import matplotlib.pyplot as plt category=[] plt.ion() for i,image in enumerate(allImages): plt.imshow(image) plt.pause(0.05) category.append(raw_input('category: '))