Python Notes
Start
Docs:
- Latest: docs.python.org
- Old release http://docs.python.org/release/2.2.1/lib/module-cgi.html
package management
use pip.
got yourself in a bind and get use network enabled pip ( cause you pooched your ssl libs? ).. fetch and "pip install -e ." -> https://pypi.org/simple/
testing
- in visual studio code et al
- https://code.visualstudio.com/docs/python/unit-testing
Basics
string encoding
at the top of the file ( second line ) :
(2.7.11 on freenas 9 this worked)
# coding=utf_8
more: https://www.python.org/dev/peps/pep-0263/
formatting
'{:>10}'.format('test') # padding
'{} {}'.format('one', 'two')
'{0!s} {0!r}'.format(Data()) # forms
'{:.5}'.format('xylophone') # truncate
'{:4d}'.format(42)
Functions
def functionname(argument1,argument2): """"Docstring can be Multiline """ dostuff() return(stuff)
ghetto command line argv
#!/usr/bin/python import sys print 'Number of arguments:', len(sys.argv), 'arguments.' print 'Argument List:', str(sys.argv)
Less ghetto argparse
import argparse parser = argparse.ArgumentParser() parser.add_argument('access_key', help='Access Key'); parser.add_argument('secret_key', help='Secret Key'); args = parser.parse_args() global access_key global secret_key access_key = args.access_key secret_key = args.secret_key
make bcrypt password
import bcrypt bcrypt.hashpw('myPlainPassword', bcrypt.gensalt())
what's in that object?
https://download.tuxfamily.org/jeremyblog/diveintopython-5.4/py/apihelper.py
then use it like this:
#!/usr/bin/env python from apihelper import info mything info(mything)
AutoVivification
import pprint class Vividict(dict): def __missing__(self, key): value = self[key] = type(self)() return value d = Vividict() d['foo']['bar'] d['foo']['baz'] d['fizz']['buzz'] d['primary']['secondary']['tertiary']['quaternary'] pprint.pprint(d)
Join to lists
on is the keys, the other is the values:
use zip:
keys = ['a', 'b', 'c'] values = [1, 2, 3] dictionary = dict(zip(keys, values)) print(dictionary) {'a': 1, 'b': 2, 'c': 3}
vim tabs
in file:
# vim: tabstop=8 expandtab shiftwidth=4 softtabstop=4
in ~/.vimrc:
set tabstop=8 set expandtab set shiftwidth=4 set softtabstop=4
parse json
ref: https://docs.python.org/2/library/json.html
python -m json.tool my_json.json
parse yaml
python -c "from yaml import load, Loader; load(open('filename.yml'), Loader=Loader)"
parse lines
in the context of scapy packets:
from scapy.all import * from scapy.layers import http myhttp = pkt.getlayer(http.HTTPRequest).fields headers = myhttp['Headers'].splitlines() for header in headers: ( k, v ) = header.split(": ")
extract by character
useful for files with very long lines
first find the line that has _many_ characters and put it in it's own file:
sed -n '459p' postgresql-9.3-main.log > tmp
Then read the characters one by one..
#!/usr/local/bin/env python import sys with open('tmp') as f: for x in range(600): c = f.read(1) if not c: print "End of file" break sys.stdout.write(c) print
another example.
rump is a rdis sync tool that writes one line of "r" or "w" for each read or write operation.
Here is a python to count those operations:
file: status.py
#!/usr/bin/env python3 import sys filename="mylog" write=0 read=0 with open(sys.argv[1]) as f: while True: c = f.read(1) if c =="w": write = write + 1 elif c =="r": read = read + 1 else: break print("read {} write {}".format(read,write))
tuples
>>> x = [(1,2), (3,4), (5,6)] >>> for item in x: ... print "A tuple", item A tuple (1, 2) A tuple (3, 4) A tuple (5, 6) >>> for a, b in x: ... print "First", a, "then", b First 1 then 2 First 3 then 4 First 5 then 6
Exception handling
try: # rv, data = M.search(None, "ALL") # rv, data = M.search(None, 'SENTSINCE 1-Jan-2017 SENTBEFORE 31-Dec-2017') rv, data = M.search(None, 'SENTSINCE 1 Jan 2017') except Exception, e: print "M.search failed" print "Error %s" % M.error.message print "Exception is %s" % str(e)
also:
try: pass except Exception as e: # Just print(e) is cleaner and more likely what you want, # but if you insist on printing message specifically whenever possible... if hasattr(e, 'message'): print(e.message) else: print(e)
tempfile
temp file
f = NamedTemporaryFile(delete=False) f # <open file '<fdopen>', mode 'w+b' at 0x384698> f.name # '/var/folders/5q/5qTPn6xq2RaWqk+1Ytw3-U+++TI/-Tmp-/tmpG7V1Y0' f.write("Hello World!\n") f.close() os.unlink(f.name) os.path.exists(f.name) # False
Pretty Printer
aka Object Dump
import pprint pp = pprint.PrettyPrinter(indent=4) pp.pprint(stuff)
List Comprehension
[ <output expression> for <iterator variable> in <iterable> if <predicate expression> ].
Converting your brain:
new_things = []
for ITEM in old_things:
if condition_based_on(ITEM):
new_things.append("something with " + ITEM)
You can rewrite the above for loop as a list comprehension like this:
new_things = ["something with " + ITEM for ITEM in old_things if condition_based_on(ITEM)]
unconditionally:
doubled_numbers = [] for n in numbers: doubled_numbers.append(n * 2)
That same code written as a comprehension:
doubled_numbers = [n * 2 for n in numbers]
reference: http://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/
Tag extract
I have no idea why this works:
you get an instance from boto3 aws describe_instances and you want _one_ tag:
retention_days = [ int(t.get('Value')) for t in instance['Tags'] if t['Key'] == 'snapshot_retention'][0]
Log file loop + re
#!/usr/bin/env python3 # vim: tabstop=8 expandtab shiftwidth=4 softtabstop=4 import sys import re from dateutil import parser myfile="/root/selenoid.log.Jan6" perhour = {} # Dec 22 09:44:10 chrome01 xvfb-run[34607]: 2019/12/22 09:44:10 [8157622] [TERMINATED_PROCESS] [41686] [0.02s] # mon day hh :min :sec hostname print pid line_regex = re.compile(r"(\D*)\s(\d*)\s(\d*):(\d*):(\d*)\s(\w)\s(\w)\[(\d*)\]") line_regex = re.compile(r"(\D*)\s(\d*)\s(\d*):(\d*):(\d*)\s(\S*)\s(\S*)\s(\S*)\s(\S*)\s\[(\S*)\]\s\[(\S*)\]\s") with open(myfile) as f: for line in f: try: # print ("line {}".format(line),) m = line_regex.search(line) if (m): # print (".",end='') #print ("zero {}".format(m.group(0))) #print ("month {}".format(m.group(1))) #print ("date {}".format(m.group(2))) #print ("hour {}".format(m.group(3))) #print ("min {}".format(m.group(4))) #print ("sec {}".format(m.group(5))) #print ("hostname {}".format(m.group(6))) #print ("processname {}".format(m.group(7))) #print ("date2 {}".format(m.group(8))) #print ("time2 {}".format(m.group(9))) #print ("session {}".format(m.group(10))) #print ("verb {}".format(m.group(11))) if ( m.group(11) == "LOCATING_SERVICE" ) : key = m.group(1)+m.group(2)+m.group(3) if key in perhour: perhour[key] = perhour[key] + 1 else: perhour[key] = 1 #else: # print("x",end='') except StopIteration: print("StopIteration") sys.exit() print("end of file") parser.parse("Jan 01 00:00:25") print("per hour") for x in perhour: print("{}: {}".format(x,perhour[x]))
Libs of note
urllib
https://stackoverflow.com/questions/3238925/python-urllib-urllib2-post
come dave, get with the times:
https://urllib3.readthedocs.io/en/latest/
datetime
stuff
def last_day_of_month(any_day): next_month = any_day.replace(day=28) + datetime.timedelta(days=4) # this will never fail return next_month - datetime.timedelta(days=next_month.day)
t0 = datetime(1, 1, 1) now = datetime.utcnow() seconds = (now - t0).total_seconds()
Pandas
on chopping up data:
coloumns and rows.
https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-6fcd0170be9c
connecting pandas to mysql:
import MySQLdb mysql_cn= MySQLdb.connect(host='myhost', port=3306,user='myusername', passwd='mypassword', db='information_schema') df_mysql = pd.read_sql('select * from VIEWS;', con=mysql_cn) print 'loaded dataframe from MySQL. records:', len(df_mysql) mysql_cn.close()
dialects in SQLAlchemy:
from sqlalchemy import create_engine import pandas as pd engine = create_engine('dialect://user:pass@host:port/schema', echo=False) f = pd.read_sql_query('SELECT * FROM mytable', engine, index_col = 'ID')
( reference: https://stackoverflow.com/questions/10065051/python-pandas-and-databases-like-mysql )
from sqlalchemy import create_engine engine = create_engine('postgresql://user@localhost:5432/mydb')
Series
datacamp exercises:
# Import pandas import pandas as pd # Import Twitter data as DataFrame: df df = pd.read_csv('tweets.csv') # Initialize an empty dictionary: langs_count langs_count = {} # Extract column from DataFrame: col col = df['lang'] # Iterate over lang column in DataFrame for entry in col: # If the language is in langs_count, add 1 if entry in langs_count.keys(): langs_count[entry] = langs_count[entry] + 1 # Else add the language to langs_count, set the value to 1 else: langs_count[entry] = 1 # Print the populated dictionary print(langs_count)
now as a fucntion
# Define count_entries() def count_entries(df, col_name): """Return a dictionary with counts of occurrences as value for each key.""" # Initialize an empty dictionary: langs_count langs_count = {} # Extract column from DataFrame: col col = df[col_name] # Iterate over lang column in DataFrame for entry in col: # If the language is in langs_count, add 1 if entry in langs_count.keys(): langs_count[entry] = langs_count[entry] + 1 # Else add the language to langs_count, set the value to 1 else: langs_count[entry] = 1 # Return the langs_count dictionary return(langs_count) # Call count_entries(): result result = count_entries(tweets_df,'lang') # Print the result print(result)
Matplotlib
Categorization
Ref: https://datascience.stackexchange.com/questions/14039/tool-to-label-images-for-classification
I just hacked together a very basic helper in python it requires that all images are stored in a python list allImages.
import matplotlib.pyplot as plt category=[] plt.ion() for i,image in enumerate(allImages): plt.imshow(image) plt.pause(0.05) category.append(raw_input('category: '))