Python Notes: Difference between revisions

From Federal Burro of Information
Jump to navigationJump to search
Line 157: Line 157:
for header in headers:
for header in headers:
   ( k, v )  = header.split(": ")
   ( k, v )  = header.split(": ")
</pre>
== extract by character ==
useful for files with very long lines
first find the line that has _many_ characters and put it in it's own file:
sed -n '459p' postgresql-9.3-main.log > tmp
Then read the characters one by one..
<pre>
#!/usr/local/bin/env python
import sys
with open('tmp') as f:
  for x in range(600):
    c = f.read(1)
    if not c:
      print "End of file"
      break
    sys.stdout.write(c)
  print
</pre>
</pre>



Revision as of 17:09, 10 December 2018

Start

Docs:

package management

use pip.

got yourself in a bind and get use network enabled pip ( cause you pooched your ssl libs? ).. fetch and "pip install -e ." -> https://pypi.org/simple/

testing

in visual studio code et al
https://code.visualstudio.com/docs/python/unit-testing

Basics

string encoding

at the top of the file ( second line ) :

(2.7.11 on freenas 9 this worked)

# coding=utf_8

more: https://www.python.org/dev/peps/pep-0263/

formatting

https://pyformat.info/

'{:>10}'.format('test') # padding
'{} {}'.format('one', 'two')
'{0!s} {0!r}'.format(Data()) # forms
'{:.5}'.format('xylophone') # truncate
'{:4d}'.format(42)

Functions


def functionname(argument1,argument2):
    """"Docstring can be
    Multiline """
    dostuff()
    return(stuff)

ghetto command line argv

#!/usr/bin/python

import sys

print 'Number of arguments:', len(sys.argv), 'arguments.'
print 'Argument List:', str(sys.argv)

Less ghetto argparse

import argparse
parser = argparse.ArgumentParser()
parser.add_argument('access_key', help='Access Key');
parser.add_argument('secret_key', help='Secret Key');
args = parser.parse_args()
global access_key
global secret_key
access_key = args.access_key
secret_key = args.secret_key

what's in that object?

https://download.tuxfamily.org/jeremyblog/diveintopython-5.4/py/apihelper.py

/apihelper.py

/apihelper.py - short

then use it like this:

#!/usr/bin/env python

from apihelper import info

mything

info(mything)


AutoVivification

import pprint

class Vividict(dict):
    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

d = Vividict()

d['foo']['bar']
d['foo']['baz']
d['fizz']['buzz']
d['primary']['secondary']['tertiary']['quaternary']
pprint.pprint(d)

vim tabs

in file:

# vim: tabstop=8 expandtab shiftwidth=4 softtabstop=4

in ~/.vimrc:

set tabstop=8
set expandtab
set shiftwidth=4
set softtabstop=4

parse json

ref: https://docs.python.org/2/library/json.html

python -m json.tool my_json.json

parse yaml

python -c "from yaml import load, Loader; load(open('filename.yml'), Loader=Loader)"

parse lines

in the context of scapy packets:

from scapy.all import *
from scapy.layers import http

myhttp = pkt.getlayer(http.HTTPRequest).fields
headers = myhttp['Headers'].splitlines()
for header in headers:
  ( k, v )  = header.split(": ")

extract by character

useful for files with very long lines

first find the line that has _many_ characters and put it in it's own file:

sed -n '459p' postgresql-9.3-main.log > tmp

Then read the characters one by one..

#!/usr/local/bin/env python
import sys

with open('tmp') as f:
  for x in range(600):
    c = f.read(1)
    if not c:
      print "End of file"
      break
    sys.stdout.write(c)
  print

tuples

>>> x = [(1,2), (3,4), (5,6)]
>>> for item in x:
...     print "A tuple", item
A tuple (1, 2)
A tuple (3, 4)
A tuple (5, 6)
>>> for a, b in x:
...     print "First", a, "then", b
First 1 then 2
First 3 then 4
First 5 then 6

Exception handling

try:
  # rv, data = M.search(None, "ALL")
  # rv, data = M.search(None, 'SENTSINCE 1-Jan-2017 SENTBEFORE 31-Dec-2017')
  rv, data = M.search(None, 'SENTSINCE 1 Jan 2017')
except Exception, e:
  print "M.search failed"
  print "Error %s" % M.error.message
  print "Exception is %s" % str(e)

also:

try:
    pass
except Exception as e:
    # Just print(e) is cleaner and more likely what you want,
    # but if you insist on printing message specifically whenever possible...
    if hasattr(e, 'message'):
        print(e.message)
    else:
        print(e)

tempfile

temp file

f = NamedTemporaryFile(delete=False)
f
# <open file '<fdopen>', mode 'w+b' at 0x384698>
f.name
# '/var/folders/5q/5qTPn6xq2RaWqk+1Ytw3-U+++TI/-Tmp-/tmpG7V1Y0'
f.write("Hello World!\n")
f.close()
os.unlink(f.name)
os.path.exists(f.name)
# False

Pretty Printer

import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(stuff)

List Comprehension

[ <output expression> for <iterator variable> in <iterable> if <predicate expression> ].

Converting your brain:

new_things = []
for ITEM in old_things:
   if condition_based_on(ITEM):
   new_things.append("something with " + ITEM)

You can rewrite the above for loop as a list comprehension like this:

new_things = ["something with " + ITEM for ITEM in old_things if condition_based_on(ITEM)]

unconditionally:

doubled_numbers = []
for n in numbers:
    doubled_numbers.append(n * 2)

That same code written as a comprehension:

doubled_numbers = [n * 2 for n in numbers]

reference: http://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/

Tag extract

I have no idea why this works:

you get an instance from boto3 aws describe_instances and you want _one_ tag:

        retention_days = [
            int(t.get('Value')) for t in instance['Tags']
            if t['Key'] == 'snapshot_retention'][0]

Libs of note

urllib

https://stackoverflow.com/questions/3238925/python-urllib-urllib2-post

come dave, get with the times:

https://urllib3.readthedocs.io/en/latest/

datetime

stuff

def last_day_of_month(any_day):
    next_month = any_day.replace(day=28) + datetime.timedelta(days=4)  # this will never fail
    return next_month - datetime.timedelta(days=next_month.day)


Pandas

on chopping up data:

coloumns and rows.

https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-6fcd0170be9c

Series

datacamp exercises:

# Import pandas
import pandas as pd

# Import Twitter data as DataFrame: df
df = pd.read_csv('tweets.csv')

# Initialize an empty dictionary: langs_count
langs_count = {}

# Extract column from DataFrame: col
col = df['lang']

# Iterate over lang column in DataFrame
for entry in col:

    # If the language is in langs_count, add 1
    if entry in langs_count.keys():
        langs_count[entry] = langs_count[entry] + 1
    # Else add the language to langs_count, set the value to 1
    else:
        langs_count[entry] = 1

# Print the populated dictionary
print(langs_count)

now as a fucntion

# Define count_entries()
def count_entries(df, col_name):
    """Return a dictionary with counts of 
    occurrences as value for each key."""

    # Initialize an empty dictionary: langs_count
    langs_count = {}
    
    # Extract column from DataFrame: col
    col = df[col_name]
    
    # Iterate over lang column in DataFrame
    for entry in col:

        # If the language is in langs_count, add 1
        if entry in langs_count.keys():
            langs_count[entry] = langs_count[entry] + 1
        # Else add the language to langs_count, set the value to 1
        else:
            langs_count[entry] = 1

    # Return the langs_count dictionary
    return(langs_count)

# Call count_entries(): result
result = count_entries(tweets_df,'lang')

# Print the result
print(result)

Matplotlib

Categorization

Ref: https://datascience.stackexchange.com/questions/14039/tool-to-label-images-for-classification

I just hacked together a very basic helper in python it requires that all images are stored in a python list allImages.

import matplotlib.pyplot as plt
category=[]
plt.ion()

for i,image in enumerate(allImages):
    plt.imshow(image)
    plt.pause(0.05)
    category.append(raw_input('category: '))