Mobile Malware Oct 15th 2015

From Federal Burro of Information
Revision as of 22:51, 30 October 2017 by David (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

malware in mobile

gtp-c
imsi (user) emie (device) and msisdn ( phone number )

_______


ML making and breaking
@cchio
works at shape security

anomaly detection

ML has seen logs of dev.

why is ML not used alot?

false positive and folase neg tollerance

semantic gap.
 you get an alert and you don't know WHY it was flagged.
 for example you get an IP address , why is it bad? annotation?

evaluation problem.
 it's harder to make an eval system than it is to make the system.
 the classical sample test is crusty and old.

adversarial impact
 advanced actors will spend time to bypass.
 snow shoing? ?
 it is still possible to circumvent.
 change model with time

how have AD (anomaly detection) systems failed in the past:
 model pointing
  hard ot find attack free training data.
  
libs and tools
 sklearn
 often use default parameters
 esp with hard deadlines

is it hopeless
 find out by actually doing it.
 four steps:
   1. gen time series
   2. select rep features
   3. train for normality
   4. alert if incoming point deviate.

example infrastructure - get a copy of the one from 5 years ago.

PCA anomaly detector
 builds a model.

manual validation is required, to cover fals pos and false neg.

common techniques
 clustrer
  svm
  neural nets
   subspace correlation based

how to build a model.
 using gets
 what are the X and Y
 selecting features, hardest part: use eye ball/ human
 ECA select features automatically
 isn't it just a parameter optimization problem?

Question: what platform do you use for crunching numbers.
 distributed tools?

if the feature is hard to explain then it's hard to decomose the results nad put it in context.

principle comp analysis
 auto selection of features.
 virtual features, synthetic features, compound feature. 
 purely statictical.

This guy is a pragmatist.
produces an ordered list of dimensions.

get most from your data with the least dimentions.

with PCA you can find dimentions that are latent.
SCREE plot
the earlier the knee the fewer dimentions required to cluster your data.

how to avoid common pitfalls
 understand your threat model.
 PCA not enough on it's own.
keep detection scope narrow.
close the semantic gap - how
evaluate your AD , how well can you filter false pos.

how to filer true positives
 image regocnition - DOD
 distinguish between tank and car?
 worked... but for the wrong reason
 Ha! : exactly, tracking on a correlated var, not a tank or car.
 speak to testing data. tanks on green , cars on roads... so tracked on road / green, not car or tank.

 how do we attack this problem?
  two ways:
    1. attack leanring so it learn wrong as right
    2. degrade performance of the system to compromise reliability.
  Chaff - set to confuse, moves the cluster.
    directed chaff moves center of cluster.
    undirected chaff weaken the "stregnth" of the clutster.
 "boiling frog attack" - chaff volume versus period of time.

DT: connecting tdd to ml training - the behaviors that come from test suites could be used to for calibration sets, that protect against chaff attacks.

a second way: decision boundry ration detection.
 slowly push points outward as a way to avoid the detection of chaff attacks.
 define deciions boundry area.

DT: connect decision boundry to software versions maybe.

can ML be secure? it depends. the point is to slow them down, and detect.

how to defend again PCA:
 antidote
 princ compo pursuit
 robust PCA
 uses median instead of means.

laplacian, gaussian , guassian is dist , also poisson

my own tests:
 uses data set: free apache data sets.
 let PCA do all the owrk, to see how PCA worked.
  "projection into target flow" versus "??"

shows naive versus robust
injecting chaff showes some movement of mean .
this is a decision boundry issue , not that naiv pca moved more than robust PCA.
simulated trianing periods

random detector - positive control for decision
even iwth robust pca 38% evasion success using boiling frog attack

Data mining for cyber security meetup.

sparkML

sklearn
_________________

steathier atacks 

zero math zero crypto talk.
@synackpse #TLSFP

when finger printing:
drop random
length fields
sessions iD
deobfuscation

MLsec

fast flux a tls fingerprint

fdr finger print defined routing - cool!

fingerprint canaries: keep wrong things open and alert on connect.
nation state attack: sigint.
karma police.

honey pots are useful,

tools:
 fingerprintls
 can take pcap
 server_name from extension

fingerprintout another tool , to print out the actual fingerprint in different formats.
 c
 snort
 xkeyscore output

fingerprintDB
 github

leebrotherston/tls-fingerprint
 
openssl defaults to sslv2

nist curve highly used


Leader: 0178147 0188120

frank / ops owns cost
pricing meeting for what.
reboot monitoring

Kelley tirangalo
 under jason borne
Partner criteria
partner program - reseller motivation


______________

seim and the art of log management

config operating or using

security plan
 break nets into zone
  like users and like asset type
  what and hwere to monitro
  NO ONE USES ZONES ANY MORE.
 Comprehensive incident response plan.
   TONS of things the plan for.
   assets, zones, 

succesful siem team invest the monitoring and alerting.

what the eff is a "siem team"

sucesful seim team configure and craft alerts themselves.

one big failing point : failure to keep siem in sync with net.
 manual process?
 same with software "gonna wanna reconsider everything there"
monitoring ungroked data.
siem team needs to be on the change board.
ids on siem.
you need the right hardware / software for a siem
 given an env it generates storage. + growth
monitor your own siem
HA and failover

mss guys: don't know your business
mss guys: subject manager
poaching protection
training v 
"get rid of human resource problem."
MSS is reposnible for data , ussually they take care of it.
 what are the data retention , destruction , availability SLAs?
deployment time is small - due to SoC
how do you drop sensitive data?
can also get other related services.
what makes a study comprehensive.
 save 1/3 of siem costs by going mss
 
_____

cymon

mozdef - incident from 

ml
binary tree model
alexa good domain
dga tracking
cloud architecture -.... so awesome.
 worker tier
  get config from bucket
  connect to sqs queue
  dumps results to rds
 web tier
  
 data tier 

cymon interceptor - chrome plugin