Precision Time Protocol daemon
Overview and Notes
- Use to keep a site in lock step. Not are useful for intersite work, or for internet sync.
- You still need to use ntp to get time from the internet.
- Had trouble with version earlier than 2.2.0. Version 2.2.0 has been good.
Increasingly, in trading environments, NTP ( Network time protocol ) is not cutting it. I had a client "complain" the large offsets between servers in the datacentre as detected in his app logs. So I installed "check_ntp_peer" and started graphing. Offsets were in the 100s of ms ( Milliseconds ). So we scratched our heads for a bit : update NIC drivers?, redhat's "realtime" linux: "MRG"? We considered a couple of options. The easiest one to try was replacing ntp with ptp (precision time protocol). Well not completely replacing it. See you leave one NTP server in place in each DC to get time from external (NRC BTW). That NTP server ALSO runs a PTP server. PTP runs over multicast. Then you drop the NTP clients on all your servers and replace them with PTP slaves.
Offsets went to 100s of nanoseconds: like orders of magnitude baby!
PTP is meant for "local" sync. As I understand it , it's not meant run over the internet at large. Think: inside the DC.
So hilariously some chump invited me to a planning meeting for a new trading client where I suggested that we demonstrate our expertise and pitch this ntp->ptp replacement. There was a lot of humming and hawing. But it was put in the project and given to someone else and not really give any "Thornton Attention".
We're not finished OAT and the client says: "Oh by the way, we got a pair of these 'Solarflare SFN6322F' devices, we'll walk you through how they work, please install them".
Wowzers! I peeked round a corner ( again ).
The upshot is we get to play with some new hardware.
As I understand it this device can do ptp over the conventional network. It can run a "cut through" stack for a dedicate network that just does ptp. I'm not sure if "cut through" is the technical term but basically it cuts out a pile of software between the network and the kernel. i.e. No full blown network stack.
Additionally with those crazy connectors you see you can run a pps network , pulse per second. I'm not 100% sure how that works but it thikn it's like and even simpler network that is really jsut pulses of electricity that act as the clock.
I've heard the term PPS used before when they describe the signal you get from you GPS when you plug it into your serial port to be you local clock. Clearly I need to learn more abnout it, mainly cause it's sooper cool.
Units of Time
1 Second = 1000 millisecond 1 millisecond = 1000 microseconds 1 microsecond = 1000 nano seconds 1.000000000 - second 0.001000000 - milli second 0.000001000 - micro second 0.000000001 - nano second
dl and compile
cd /root/ptp wget "http://downloads.sourceforge.net/project/ptpd/ptpd/2.2.0/ptpd-2.2.0.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fptpd%2Ffiles%2F&ts=1352085166&use_mirror=tenet" mv ptpd-2.2.0.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fptpd%2Ffiles%2F&ts=1352085166&use_mirror=tenet ptpd-2.2.0.tar.gz tar zxvf ptpd-2.2.0.tar.gz cd ptpd-2.2.0 cd src make
Slave setup susan
./ptpd2 -g -b eth3 -C
startup:
2012-11-04 22:51:13.608059, lstn_init 1 (ptpd info) 22:51:25.607385 (lstn_init) refreshed IGMP multicast memberships (ptpd info) 22:51:25.607804 (lstn_init) now in state PTP_LISTENING 2012-11-04 22:51:25.608073, lstn_init 1 (ptpd info) 22:51:37.607403 (lstn_init) refreshed IGMP multicast memberships (ptpd info) 22:51:37.607815 (lstn_init) now in state PTP_LISTENING 2012-11-04 22:51:37.608080, lstn_init 1 (ptpd info) 22:51:49.607391 (lstn_init) refreshed IGMP multicast memberships (ptpd info) 22:51:49.607810 (lstn_init) now in state PTP_LISTENING 2012-11-04 22:51:49.608070, lstn_init 1 (ptpd info) 22:52:01.607394 (lstn_init) refreshed IGMP multicast memberships (ptpd info) 22:52:01.607803 (lstn_init) now in state PTP_LISTENING 2012-11-04 22:52:01.608070, lstn_init 1 (ptpd info) 22:52:05.379091 (lstn_init) now in state PTP_SLAVE 2012-11-04 22:52:05.379149, slv 001cc0fffe5d6c1a(unknown)/01, 0.000000000, 0.000000000, 0.000000000, 0.000000000, 0, I (ptpd notice) 22:52:06.378997 (slv) Received first Sync from Master (ptpd notice) 22:52:06.379040 (slv) going to arm DelayReq timer for the first time, with initial rate: 0
Master setup , Athena
./ptpd2 -G -b eth0 -C
start up:
10:41:26 athena@athena ~/ptp/ptpd-2.2.0/src # ./ptpd2 -x -G -b eth0 -C (ptpd info) 22:41:32.590792 (___) Info: Going to check lock /var/run/kernel_clock (ptpd info) 22:41:32.608530 (___) Info: No ptpd daemons detected in parallel (as expected) (ptpd info) 22:41:32.625876 (___) Info: 1 ntpd daemons detected in parallel (as expected) (ptpd info) 22:41:32.625946 (___) Info: Going to check lock /var/run/kernel_clock (ptpd info) 22:41:32.626125 (___) Info: Startup finished sucessfully # Timestamp, State, Clock ID, One Way Delay, Offset From Master, Slave to Master, Master to Slave, Drift, Last packet Received 2012-11-04 22:41:32.626205, init (ptpd info) 22:41:32.690552 (init) refreshed IGMP multicast memberships (ptpd info) 22:41:32.691279 (init) now in state PTP_LISTENING 2012-11-04 22:41:32.691834, lstn_init 1 (ptpd info) 22:41:44.627144 (lstn_init) now in state PTP_MASTER 2012-11-04 22:41:44.627198, mst 001cc0fffe5d6c1a(unknown)/00
Command line
I had some trouble with the lock file. To spite closing the service down "properly", the lock file was not removed. I ended up using -L to "ignore lock file". The problem is I could end up with more than one process running. We can protect against that with runlevel script created "run" files.
Manual Mode
master:
/usr/local/bin/ptpd2 -G -b eth2 -C -D -V 1 -S
slave:
/usr/local/bin/ptpd2 -g -b eth2 -C -D -V 1 -S
Service mode
master:
/usr/local/bin/ptpd2 -G -b eth2 -C -D -V 1 -S
slave:
/usr/local/bin/ptpd2 -g -b eth2 -L -f /var/log/ptpd2/stats.log -D -V 10 -S
tcpdump
/usr/sbin/tcpdump -i eth0 not tcp port 5668 and not tcp port 5666 and not port snmp and not tcp port ssh and not icmp and not arp