Python Notes: Difference between revisions

Revision as of 17:27, 23 October 2020

Start

Docs:

Latest: docs.python.org
Old release http://docs.python.org/release/2.2.1/lib/module-cgi.html

package management

use pip.

got yourself in a bind and get use network enabled pip ( cause you pooched your ssl libs? ).. fetch and "pip install -e ." -> https://pypi.org/simple/

testing

in visual studio code et al: https://code.visualstudio.com/docs/python/unit-testing

Basics

string encoding

at the top of the file ( second line ) :

(2.7.11 on freenas 9 this worked)

# coding=utf_8

more: https://www.python.org/dev/peps/pep-0263/

Encoding

python 3

>>> b'0376'
b'0376'
>>> b'376'
b'376'
>>> str(b'0376')
"b'0376'"
>>> str(b'\376')
"b'\\xfe'"
>>> str(b'\0376')
"b'\\x1f6'"
>>> str(b'\76')
"b'>'"
>>> str(b'\276')
"b'\\xbe'"
>>> str(b'\176')
"b'~'"
>>> 
(edited)
8:35
>>> str('\376')
'þ'
>>> str(b'\376')
"b'\\xfe'"
>>>

wc(1) - word count , useful, counts lines, words, and _bytes_.

$ echo 'þ' | wc 
       1       1       3

1 line
1 word
3 bytes!

od(1) - Octal dump.

$ echo 'þ' | od
0000000    137303  000012                                                
0000003

oops I meant -c for "ascii representation":

$ echo 'þ' | od -c
0000000    þ  **  \n                                                    
0000003
$

from od(1):

Multi-byte characters are displayed in the area corresponding to the first byte of the character. The remaining bytes are shown as `**'.

þ is 2 bytes + \n is 1 = 3 bytes.

>>> int.from_bytes(b'\x00\xfe', byteorder='big')
254
>>> int.from_bytes(b'\xfe', byteorder='big')
254
>>>

old way in python3:

>>> print('%(x)o' % {'x':254} )
376
>>>

"o" - octal format

new way in python 3:

>>> print('{!r}'.format(254) )
254
>>> print('{!a}'.format(254) )
254
>>> print('{!s}'.format(254) )
254
>>> print('{}'.format(oct(254)) )
0o376
>>>

r - repr()
a - ascii()
s - str()

debug

Use __debug__ in your code:

if __debug__:

   print 'Debug ON'

else:

   print 'Debug OFF'

Create a script abc.py with the above code and then

Run with python -O abc.py Run with python abc.py Observe the difference.

formatting

https://pyformat.info/

'{:>10}'.format('test') # padding

'{} {}'.format('one', 'two')

'{0!s} {0!r}'.format(Data()) # forms

'{:.5}'.format('xylophone') # truncate

'{:4d}'.format(42)

Functions


def functionname(argument1,argument2):
    """"Docstring can be
    Multiline """
    dostuff()
    return(stuff)

ghetto command line argv

#!/usr/bin/python

import sys

print 'Number of arguments:', len(sys.argv), 'arguments.'
print 'Argument List:', str(sys.argv)

Less ghetto argparse

import argparse
parser = argparse.ArgumentParser()
parser.add_argument('access_key', help='Access Key');
parser.add_argument('secret_key', help='Secret Key');
args = parser.parse_args()
global access_key
global secret_key
access_key = args.access_key
secret_key = args.secret_key

make bcrypt password

import bcrypt
bcrypt.hashpw('myPlainPassword', bcrypt.gensalt())

what's in that object?

https://download.tuxfamily.org/jeremyblog/diveintopython-5.4/py/apihelper.py

/apihelper.py

/apihelper.py - short

then use it like this:

#!/usr/bin/env python

from apihelper import info

mything

info(mything)

AutoVivification

import pprint

class Vividict(dict):
    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

d = Vividict()

d['foo']['bar']
d['foo']['baz']
d['fizz']['buzz']
d['primary']['secondary']['tertiary']['quaternary']
pprint.pprint(d)

Join to lists

on is the keys, the other is the values:

use zip:

keys = ['a', 'b', 'c']
values = [1, 2, 3]
dictionary = dict(zip(keys, values))
print(dictionary)
{'a': 1, 'b': 2, 'c': 3}

vim tabs

in file:

# vim: tabstop=8 expandtab shiftwidth=4 softtabstop=4

in ~/.vimrc:

set tabstop=8
set expandtab
set shiftwidth=4
set softtabstop=4

parse json

ref: https://docs.python.org/2/library/json.html

python -m json.tool my_json.json

parse yaml

pip3 install pyyaml

python3 -c 'import yaml,sys;yaml.safe_load(sys.stdin)' < file.yml

parse yaml

python -c "from yaml import load, Loader; load(open('filename.yml'), Loader=Loader)"

parse lines

in the context of scapy packets:

from scapy.all import *
from scapy.layers import http

myhttp = pkt.getlayer(http.HTTPRequest).fields
headers = myhttp['Headers'].splitlines()
for header in headers:
  ( k, v )  = header.split(": ")

extract by character

useful for files with very long lines

first find the line that has _many_ characters and put it in it's own file:

sed -n '459p' postgresql-9.3-main.log > tmp

Then read the characters one by one..

#!/usr/local/bin/env python
import sys

with open('tmp') as f:
  for x in range(600):
    c = f.read(1)
    if not c:
      print "End of file"
      break
    sys.stdout.write(c)
  print

another example.

rump is a rdis sync tool that writes one line of "r" or "w" for each read or write operation.

Here is a python to count those operations:

file: status.py

#!/usr/bin/env python3

import sys

filename="mylog"
write=0
read=0
with open(sys.argv[1]) as f:
  while True:
    c = f.read(1)
    if c =="w":
      write = write + 1
    elif c =="r":
      read = read + 1
    else:
      break

print("read {} write {}".format(read,write))

tuples

>>> x = [(1,2), (3,4), (5,6)]
>>> for item in x:
...     print "A tuple", item
A tuple (1, 2)
A tuple (3, 4)
A tuple (5, 6)
>>> for a, b in x:
...     print "First", a, "then", b
First 1 then 2
First 3 then 4
First 5 then 6

Decorators

https://python-3-patterns-idioms-test.readthedocs.io/en/latest/PythonDecorators.html

https://realpython.com/primer-on-python-decorators/

Exception handling

try:
  # rv, data = M.search(None, "ALL")
  # rv, data = M.search(None, 'SENTSINCE 1-Jan-2017 SENTBEFORE 31-Dec-2017')
  rv, data = M.search(None, 'SENTSINCE 1 Jan 2017')
except Exception, e:
  print "M.search failed"
  print "Error %s" % M.error.message
  print "Exception is %s" % str(e)

also:

try:
    pass
except Exception as e:
    # Just print(e) is cleaner and more likely what you want,
    # but if you insist on printing message specifically whenever possible...
    if hasattr(e, 'message'):
        print(e.message)
    else:
        print(e)

tempfile

temp file

f = NamedTemporaryFile(delete=False)
f
# <open file '<fdopen>', mode 'w+b' at 0x384698>
f.name
# '/var/folders/5q/5qTPn6xq2RaWqk+1Ytw3-U+++TI/-Tmp-/tmpG7V1Y0'
f.write("Hello World!\n")
f.close()
os.unlink(f.name)
os.path.exists(f.name)
# False

Pretty Printer

aka Object Dump

import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(stuff)

List Comprehension

[ <output expression> for <iterator variable> in <iterable> if <predicate expression> ].

Converting your brain:

new_things = []
for ITEM in old_things:
if condition_based_on(ITEM):
new_things.append("something with " + ITEM)

You can rewrite the above for loop as a list comprehension like this:

new_things = ["something with " + ITEM for ITEM in old_things if condition_based_on(ITEM)]

unconditionally:

doubled_numbers = []
for n in numbers:
    doubled_numbers.append(n * 2)

That same code written as a comprehension:

doubled_numbers = [n * 2 for n in numbers]

reference: http://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/

Tag extract

I have no idea why this works:

you get an instance from boto3 aws describe_instances and you want _one_ tag:

        retention_days = [
            int(t.get('Value')) for t in instance['Tags']
            if t['Key'] == 'snapshot_retention'][0]

Log file loop + re

#!/usr/bin/env python3
# vim: tabstop=8 expandtab shiftwidth=4 softtabstop=4
import sys
import re
from dateutil import parser
myfile="/root/selenoid.log.Jan6"
perhour = {}

# Dec 22 09:44:10 chrome01 xvfb-run[34607]: 2019/12/22 09:44:10 [8157622] [TERMINATED_PROCESS] [41686] [0.02s]
#                          mon    day    hh  :min  :sec   hostname print pid
line_regex = re.compile(r"(\D*)\s(\d*)\s(\d*):(\d*):(\d*)\s(\w)\s(\w)\[(\d*)\]")
line_regex = re.compile(r"(\D*)\s(\d*)\s(\d*):(\d*):(\d*)\s(\S*)\s(\S*)\s(\S*)\s(\S*)\s\[(\S*)\]\s\[(\S*)\]\s")

with open(myfile) as f:
    for line in f:
        try:
            # print ("line {}".format(line),)
            m = line_regex.search(line)
            if (m):
                # print (".",end='')
                #print ("zero {}".format(m.group(0)))
                #print ("month  {}".format(m.group(1)))
                #print ("date  {}".format(m.group(2)))
                #print ("hour  {}".format(m.group(3)))
                #print ("min  {}".format(m.group(4)))
                #print ("sec  {}".format(m.group(5)))
                #print ("hostname  {}".format(m.group(6)))
                #print ("processname  {}".format(m.group(7)))
                #print ("date2  {}".format(m.group(8)))
                #print ("time2  {}".format(m.group(9)))
                #print ("session  {}".format(m.group(10)))
                #print ("verb  {}".format(m.group(11)))
                if ( m.group(11) == "LOCATING_SERVICE" ) :
                    key = m.group(1)+m.group(2)+m.group(3)
                    if key in perhour:
                        perhour[key] = perhour[key] + 1
                    else:
                        perhour[key] = 1
            #else:
                # print("x",end='')
        except StopIteration:
            print("StopIteration")
            sys.exit()
    print("end of file")
    parser.parse("Jan 01 00:00:25")

print("per hour")
for x in perhour:
    print("{}: {}".format(x,perhour[x]))

Libs of note

urllib

https://stackoverflow.com/questions/3238925/python-urllib-urllib2-post

come dave, get with the times:

https://urllib3.readthedocs.io/en/latest/

datetime

stuff

def last_day_of_month(any_day):
    next_month = any_day.replace(day=28) + datetime.timedelta(days=4)  # this will never fail
    return next_month - datetime.timedelta(days=next_month.day)

t0 = datetime(1, 1, 1)
now = datetime.utcnow()
seconds = (now - t0).total_seconds()

Pandas

on chopping up data:

coloumns and rows.

https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-6fcd0170be9c

connecting pandas to mysql:

import MySQLdb
mysql_cn= MySQLdb.connect(host='myhost', 
                port=3306,user='myusername', passwd='mypassword', 
                db='information_schema')
df_mysql = pd.read_sql('select * from VIEWS;', con=mysql_cn)    
print 'loaded dataframe from MySQL. records:', len(df_mysql)
mysql_cn.close()

dialects in SQLAlchemy:

from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('dialect://user:pass@host:port/schema', echo=False)
f = pd.read_sql_query('SELECT * FROM mytable', engine, index_col = 'ID')

( reference: https://stackoverflow.com/questions/10065051/python-pandas-and-databases-like-mysql )


from sqlalchemy import create_engine
engine = create_engine('postgresql://user@localhost:5432/mydb')

Series

datacamp exercises:

# Import pandas
import pandas as pd

# Import Twitter data as DataFrame: df
df = pd.read_csv('tweets.csv')

# Initialize an empty dictionary: langs_count
langs_count = {}

# Extract column from DataFrame: col
col = df['lang']

# Iterate over lang column in DataFrame
for entry in col:

    # If the language is in langs_count, add 1
    if entry in langs_count.keys():
        langs_count[entry] = langs_count[entry] + 1
    # Else add the language to langs_count, set the value to 1
    else:
        langs_count[entry] = 1

# Print the populated dictionary
print(langs_count)

now as a fucntion

# Define count_entries()
def count_entries(df, col_name):
    """Return a dictionary with counts of 
    occurrences as value for each key."""

    # Initialize an empty dictionary: langs_count
    langs_count = {}
    
    # Extract column from DataFrame: col
    col = df[col_name]
    
    # Iterate over lang column in DataFrame
    for entry in col:

        # If the language is in langs_count, add 1
        if entry in langs_count.keys():
            langs_count[entry] = langs_count[entry] + 1
        # Else add the language to langs_count, set the value to 1
        else:
            langs_count[entry] = 1

    # Return the langs_count dictionary
    return(langs_count)

# Call count_entries(): result
result = count_entries(tweets_df,'lang')

# Print the result
print(result)

Matplotlib

Categorization

Ref: https://datascience.stackexchange.com/questions/14039/tool-to-label-images-for-classification

I just hacked together a very basic helper in python it requires that all images are stored in a python list allImages.

import matplotlib.pyplot as plt
category=[]
plt.ion()

for i,image in enumerate(allImages):
    plt.imshow(image)
    plt.pause(0.05)
    category.append(raw_input('category: '))

Python Notes: Difference between revisions

Revision as of 17:27, 23 October 2020

Start

package management

testing

Basics

string encoding

Encoding

__debug__

formatting

Functions

ghetto command line argv

Less ghetto argparse

make bcrypt password

what's in that object?

AutoVivification

Join to lists

vim tabs

parse json

parse yaml

parse yaml

parse lines

extract by character

tuples

Decorators

Exception handling

tempfile

Pretty Printer

List Comprehension

Tag extract

Log file loop + re

Libs of note

urllib

datetime

Pandas

Series

Matplotlib

Navigation menu

Search

debug