Python Notes: Difference between revisions

From Federal Burro of Information
Jump to navigationJump to search
Line 275: Line 275:


  python -m json.tool my_json.json
  python -m json.tool my_json.json
== parse yaml ==
pip3 install pyyaml
python3 -c 'import yaml,sys;yaml.safe_load(sys.stdin)' < file.yml


== parse yaml ==
== parse yaml ==

Revision as of 17:27, 23 October 2020

Start

Docs:

package management

use pip.

got yourself in a bind and get use network enabled pip ( cause you pooched your ssl libs? ).. fetch and "pip install -e ." -> https://pypi.org/simple/

testing

in visual studio code et al
https://code.visualstudio.com/docs/python/unit-testing

Basics

string encoding

at the top of the file ( second line ) :

(2.7.11 on freenas 9 this worked)

# coding=utf_8

more: https://www.python.org/dev/peps/pep-0263/


Encoding

python 3

>>> b'0376'
b'0376'
>>> b'376'
b'376'
>>> str(b'0376')
"b'0376'"
>>> str(b'\376')
"b'\\xfe'"
>>> str(b'\0376')
"b'\\x1f6'"
>>> str(b'\76')
"b'>'"
>>> str(b'\276')
"b'\\xbe'"
>>> str(b'\176')
"b'~'"
>>> 
(edited)
8:35
>>> str('\376')
'þ'
>>> str(b'\376')
"b'\\xfe'"
>>> 

wc(1) - word count , useful, counts lines, words, and _bytes_.

$ echo 'þ' | wc 
       1       1       3
  • 1 line
  • 1 word
  • 3 bytes!

od(1) - Octal dump.

$ echo 'þ' | od
0000000    137303  000012                                                
0000003

oops I meant -c for "ascii representation":

$ echo 'þ' | od -c
0000000    þ  **  \n                                                    
0000003
$ 

from od(1):

Multi-byte characters are displayed in the area corresponding to the first byte of the character. The remaining bytes are shown as `**'.

þ is 2 bytes + \n is 1 = 3 bytes.

>>> int.from_bytes(b'\x00\xfe', byteorder='big')
254
>>> int.from_bytes(b'\xfe', byteorder='big')
254
>>> 

old way in python3:

>>> print('%(x)o' % {'x':254} )
376
>>> 
  • "o" - octal format

new way in python 3:

>>> print('{!r}'.format(254) )
254
>>> print('{!a}'.format(254) )
254
>>> print('{!s}'.format(254) )
254
>>> print('{}'.format(oct(254)) )
0o376
>>>
  • r - repr()
  • a - ascii()
  • s - str()

__debug__

Use __debug__ in your code:

if __debug__:

   print 'Debug ON'

else:

   print 'Debug OFF'

Create a script abc.py with the above code and then

Run with python -O abc.py Run with python abc.py Observe the difference.


formatting

https://pyformat.info/

'{:>10}'.format('test') # padding
'{} {}'.format('one', 'two')
'{0!s} {0!r}'.format(Data()) # forms
'{:.5}'.format('xylophone') # truncate
'{:4d}'.format(42)

Functions


def functionname(argument1,argument2):
    """"Docstring can be
    Multiline """
    dostuff()
    return(stuff)

ghetto command line argv

#!/usr/bin/python

import sys

print 'Number of arguments:', len(sys.argv), 'arguments.'
print 'Argument List:', str(sys.argv)

Less ghetto argparse

import argparse
parser = argparse.ArgumentParser()
parser.add_argument('access_key', help='Access Key');
parser.add_argument('secret_key', help='Secret Key');
args = parser.parse_args()
global access_key
global secret_key
access_key = args.access_key
secret_key = args.secret_key

make bcrypt password

import bcrypt
bcrypt.hashpw('myPlainPassword', bcrypt.gensalt())

what's in that object?

https://download.tuxfamily.org/jeremyblog/diveintopython-5.4/py/apihelper.py

/apihelper.py

/apihelper.py - short

then use it like this:

#!/usr/bin/env python

from apihelper import info

mything

info(mything)


AutoVivification

import pprint

class Vividict(dict):
    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

d = Vividict()

d['foo']['bar']
d['foo']['baz']
d['fizz']['buzz']
d['primary']['secondary']['tertiary']['quaternary']
pprint.pprint(d)

Join to lists

on is the keys, the other is the values:

use zip:

keys = ['a', 'b', 'c']
values = [1, 2, 3]
dictionary = dict(zip(keys, values))
print(dictionary)
{'a': 1, 'b': 2, 'c': 3}

vim tabs

in file:

# vim: tabstop=8 expandtab shiftwidth=4 softtabstop=4

in ~/.vimrc:

set tabstop=8
set expandtab
set shiftwidth=4
set softtabstop=4

parse json

ref: https://docs.python.org/2/library/json.html

python -m json.tool my_json.json

parse yaml

pip3 install pyyaml
python3 -c 'import yaml,sys;yaml.safe_load(sys.stdin)' < file.yml

parse yaml

python -c "from yaml import load, Loader; load(open('filename.yml'), Loader=Loader)"

parse lines

in the context of scapy packets:

from scapy.all import *
from scapy.layers import http

myhttp = pkt.getlayer(http.HTTPRequest).fields
headers = myhttp['Headers'].splitlines()
for header in headers:
  ( k, v )  = header.split(": ")

extract by character

useful for files with very long lines

first find the line that has _many_ characters and put it in it's own file:

sed -n '459p' postgresql-9.3-main.log > tmp

Then read the characters one by one..

#!/usr/local/bin/env python
import sys

with open('tmp') as f:
  for x in range(600):
    c = f.read(1)
    if not c:
      print "End of file"
      break
    sys.stdout.write(c)
  print

another example.

rump is a rdis sync tool that writes one line of "r" or "w" for each read or write operation.

Here is a python to count those operations:

file: status.py

#!/usr/bin/env python3

import sys

filename="mylog"
write=0
read=0
with open(sys.argv[1]) as f:
  while True:
    c = f.read(1)
    if c =="w":
      write = write + 1
    elif c =="r":
      read = read + 1
    else:
      break

print("read {} write {}".format(read,write))

tuples

>>> x = [(1,2), (3,4), (5,6)]
>>> for item in x:
...     print "A tuple", item
A tuple (1, 2)
A tuple (3, 4)
A tuple (5, 6)
>>> for a, b in x:
...     print "First", a, "then", b
First 1 then 2
First 3 then 4
First 5 then 6

Decorators

https://python-3-patterns-idioms-test.readthedocs.io/en/latest/PythonDecorators.html

https://realpython.com/primer-on-python-decorators/

Exception handling

try:
  # rv, data = M.search(None, "ALL")
  # rv, data = M.search(None, 'SENTSINCE 1-Jan-2017 SENTBEFORE 31-Dec-2017')
  rv, data = M.search(None, 'SENTSINCE 1 Jan 2017')
except Exception, e:
  print "M.search failed"
  print "Error %s" % M.error.message
  print "Exception is %s" % str(e)

also:

try:
    pass
except Exception as e:
    # Just print(e) is cleaner and more likely what you want,
    # but if you insist on printing message specifically whenever possible...
    if hasattr(e, 'message'):
        print(e.message)
    else:
        print(e)

tempfile

temp file

f = NamedTemporaryFile(delete=False)
f
# <open file '<fdopen>', mode 'w+b' at 0x384698>
f.name
# '/var/folders/5q/5qTPn6xq2RaWqk+1Ytw3-U+++TI/-Tmp-/tmpG7V1Y0'
f.write("Hello World!\n")
f.close()
os.unlink(f.name)
os.path.exists(f.name)
# False

Pretty Printer

aka Object Dump

import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(stuff)

List Comprehension

[ <output expression> for <iterator variable> in <iterable> if <predicate expression> ].

Converting your brain:

new_things = []
for ITEM in old_things:
   if condition_based_on(ITEM):
   new_things.append("something with " + ITEM)

You can rewrite the above for loop as a list comprehension like this:

new_things = ["something with " + ITEM for ITEM in old_things if condition_based_on(ITEM)]

unconditionally:

doubled_numbers = []
for n in numbers:
    doubled_numbers.append(n * 2)

That same code written as a comprehension:

doubled_numbers = [n * 2 for n in numbers]

reference: http://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/

Tag extract

I have no idea why this works:

you get an instance from boto3 aws describe_instances and you want _one_ tag:

        retention_days = [
            int(t.get('Value')) for t in instance['Tags']
            if t['Key'] == 'snapshot_retention'][0]

Log file loop + re

#!/usr/bin/env python3
# vim: tabstop=8 expandtab shiftwidth=4 softtabstop=4
import sys
import re
from dateutil import parser
myfile="/root/selenoid.log.Jan6"
perhour = {}

# Dec 22 09:44:10 chrome01 xvfb-run[34607]: 2019/12/22 09:44:10 [8157622] [TERMINATED_PROCESS] [41686] [0.02s]
#                          mon    day    hh  :min  :sec   hostname print pid
line_regex = re.compile(r"(\D*)\s(\d*)\s(\d*):(\d*):(\d*)\s(\w)\s(\w)\[(\d*)\]")
line_regex = re.compile(r"(\D*)\s(\d*)\s(\d*):(\d*):(\d*)\s(\S*)\s(\S*)\s(\S*)\s(\S*)\s\[(\S*)\]\s\[(\S*)\]\s")

with open(myfile) as f:
    for line in f:
        try:
            # print ("line {}".format(line),)
            m = line_regex.search(line)
            if (m):
                # print (".",end='')
                #print ("zero {}".format(m.group(0)))
                #print ("month  {}".format(m.group(1)))
                #print ("date  {}".format(m.group(2)))
                #print ("hour  {}".format(m.group(3)))
                #print ("min  {}".format(m.group(4)))
                #print ("sec  {}".format(m.group(5)))
                #print ("hostname  {}".format(m.group(6)))
                #print ("processname  {}".format(m.group(7)))
                #print ("date2  {}".format(m.group(8)))
                #print ("time2  {}".format(m.group(9)))
                #print ("session  {}".format(m.group(10)))
                #print ("verb  {}".format(m.group(11)))
                if ( m.group(11) == "LOCATING_SERVICE" ) :
                    key = m.group(1)+m.group(2)+m.group(3)
                    if key in perhour:
                        perhour[key] = perhour[key] + 1
                    else:
                        perhour[key] = 1
            #else:
                # print("x",end='')
        except StopIteration:
            print("StopIteration")
            sys.exit()
    print("end of file")
    parser.parse("Jan 01 00:00:25")

print("per hour")
for x in perhour:
    print("{}: {}".format(x,perhour[x]))

Libs of note

urllib

https://stackoverflow.com/questions/3238925/python-urllib-urllib2-post

come dave, get with the times:

https://urllib3.readthedocs.io/en/latest/

datetime

stuff

def last_day_of_month(any_day):
    next_month = any_day.replace(day=28) + datetime.timedelta(days=4)  # this will never fail
    return next_month - datetime.timedelta(days=next_month.day)


t0 = datetime(1, 1, 1)
now = datetime.utcnow()
seconds = (now - t0).total_seconds()

Pandas

on chopping up data:

coloumns and rows.

https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-6fcd0170be9c

connecting pandas to mysql:

import MySQLdb
mysql_cn= MySQLdb.connect(host='myhost', 
                port=3306,user='myusername', passwd='mypassword', 
                db='information_schema')
df_mysql = pd.read_sql('select * from VIEWS;', con=mysql_cn)    
print 'loaded dataframe from MySQL. records:', len(df_mysql)
mysql_cn.close()

dialects in SQLAlchemy:

from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('dialect://user:pass@host:port/schema', echo=False)
f = pd.read_sql_query('SELECT * FROM mytable', engine, index_col = 'ID')

( reference: https://stackoverflow.com/questions/10065051/python-pandas-and-databases-like-mysql )


from sqlalchemy import create_engine
engine = create_engine('postgresql://user@localhost:5432/mydb')


Series

datacamp exercises:

# Import pandas
import pandas as pd

# Import Twitter data as DataFrame: df
df = pd.read_csv('tweets.csv')

# Initialize an empty dictionary: langs_count
langs_count = {}

# Extract column from DataFrame: col
col = df['lang']

# Iterate over lang column in DataFrame
for entry in col:

    # If the language is in langs_count, add 1
    if entry in langs_count.keys():
        langs_count[entry] = langs_count[entry] + 1
    # Else add the language to langs_count, set the value to 1
    else:
        langs_count[entry] = 1

# Print the populated dictionary
print(langs_count)

now as a fucntion

# Define count_entries()
def count_entries(df, col_name):
    """Return a dictionary with counts of 
    occurrences as value for each key."""

    # Initialize an empty dictionary: langs_count
    langs_count = {}
    
    # Extract column from DataFrame: col
    col = df[col_name]
    
    # Iterate over lang column in DataFrame
    for entry in col:

        # If the language is in langs_count, add 1
        if entry in langs_count.keys():
            langs_count[entry] = langs_count[entry] + 1
        # Else add the language to langs_count, set the value to 1
        else:
            langs_count[entry] = 1

    # Return the langs_count dictionary
    return(langs_count)

# Call count_entries(): result
result = count_entries(tweets_df,'lang')

# Print the result
print(result)

Matplotlib

Categorization

Ref: https://datascience.stackexchange.com/questions/14039/tool-to-label-images-for-classification

I just hacked together a very basic helper in python it requires that all images are stored in a python list allImages.

import matplotlib.pyplot as plt
category=[]
plt.ion()

for i,image in enumerate(allImages):
    plt.imshow(image)
    plt.pause(0.05)
    category.append(raw_input('category: '))