Your browser version is outdated. We recommend that you update your browser to the latest version.

Real-time Datasets for Machine Learning and Data Science

Artificial Intelligence (AI) and Machine Learning (ML) datasets are used by developers and engineers to train their machines to make better predictions based on the strength of new facts and evidence found in the data. We provide live global retraining data for real-time machine modelling. 

Subpico owns and operates a global network of internet POP gateway processors. Our country performance datasets provide live statistics from all devices in a country relative to that POP gateway.

Applications include Real-time ML/Retraining, Data Science, Telecommunications OAM/Billing, IT Security & Cyber Threat Intelligence.

Real-time Machine Learning Use Cases

  • Telecommunications & Network Performance

  • Government and Lawful Interception

  • Signals identification, Classification and processing

  • InfoSec & Cybersecurity

Global Telecommunications and Network Performance

The image below shows real-time performance metrics of the communication channels and devices physically based in the countries of interest. Statistics are calculated per flow, per device and aggregated to the country level.

The correct application of machine learning models to network traffic problems starts with understanding the DoD 4 layer Communications model. When two applications talk to each other over the network they exchange TCP segments in a specific application and protocol-dependent way. The application protocol code uses the services of the OS TCP/IP to send and receive the data it needs to provide the user application logic.

Real world communication networks are based on the 4 DoD layers; 1) Data-link, 2) Internet working, 3) Transport and 4) Application. Data link consists of Firmware, Drivers and physical hardware. The Internet-networking layer is IP and carries ICMP, TCP, UDP and other layer 4 transport protocols. Applications have programmable logic and timers that send and receive from layer 4. This logic and timing is what we measure in real-time per flow, device and country.

Coding for state-machine per layer transactions and messages allows a deeper and faster learning trajectory than logging endless gigs of useless network flow stats do not support the lab results - real networks change the math.

Subpico we specailize in real-time unsupervised machine 'neural learning for telecoms and networks.

Total countries 254.

Real-time Country Performance Data  (CC Brochure)

Contact us for samples in csv format for formatting and testing.

Country Performance Metrics Fields:

Header: timestamp, node, cc, alpha_2, alpha_3, region_code, name, tcp_flows, ttl_avg, tcp_avg, pdu_avg, tcp_min, tcp_max, tcp_sdev, jitter, rtt_min, rtt_avg, rtt_max, rtt_mdev

 

 

 TABLE: Sample CSV record format

Field Value example Description
timestamp 2021-02-23 05:47:46 Timestamp at local POP GMT TZ adjusted
node 6 POP Node (point of measurement)
Country code             840  ISO 3166 Country code
Alpha_2 US Country ISO alpha-2 code
Alpha_3 USA Country ISO alpha-3 code
Region_code 19 Region Code
Name United States of America     
Country Name
Tcp_flows 327 Number of unique TCP fLows in Sample Window
Ttl_avg 67 Average IP TTL (time-to-live) for Country
Tcp_avg 168.235 Average connect three-way-handshake 3WHS *
Pdu_avg 915.080 Average IAT of content pdu’s (PUSH)
Tcp_min 11.023 Minimum connect time
Tcp_max 1239.15 Maximum connect time
Tcp_sdev 19.699 TCP/IP 3WHS Standard Deviation
Jitter 29.174 Variance in client connect time
Rtt_min 8.926 Minimum round trip time ICMP
Rtt_ave 28.567 Average round trip time ICMP
Rtt_max 366.579 Maximum round trip time ICMP

 

Performance Measurements in context of the DoD 4 Layer Model

Packet size, inter-arrival times and arrival order (of packets) of traffic flows are initialized at the handshake and initial payload exchange of each layer-4 connection. This is because these metrics are driven by the application layer to transport interface; the application-layer state machine, socket interface spec and high level user/app functional service requirements. What follows is a parallel execution of app layer protocol and OS tcp/ip finite state machine. Depending on where and how the applications to kernel OS service Active/Passive Open and Close functions are implemented we can classify and measure user-level characteristics and performance based on the mapping of packets to internet-working and transport in real-time.

 

We deliver live in csv (default), we also support ASN.1.

 

We offer two subscription models - standard and premium (which includes support.)

Subscription pricing options - monthly or yearly. Access details are provided on sign-up


 

Contact us for advice or help on your ML data project scope and any special requirements .