recs-normalizetime
Help from: --help-basic: Usage: recs-normalizetime <args> [<files>] Given a single key field containing a date/time value this recs processor will construct a normalized version of the value and place this new value into a field named "n_<key>" (where <key> is the key field appearing in the args). Arguments: --key|-k <key> Single Key field containing the date/time may be a key spec, see '--help-keyspecs' for more info --epoch|-e Assumes date/time field is expressed in epoch seconds (optional, defaults to non-epoch) --threshold|-n <time range> Number of seconds in the range. May also be a duration string like '1 week' or '5 minutes', parsable by Date::Manip --strict|-s Apply strict normalization (defaults to non-strict) --filename-key|fk <keyspec> Add a key with the source filename (if no filename is applicable will put NONE) Help Options: --help-all Output all help for this script --help This help screen --help-full Indepth description of normalization alogrithm --help-keyspecs Help on keyspecs, a way to index deeply and with regexes Examples: # Tag records with normalized time in 5 minute buckets from the date field ... | recs-normalizetime --strict --key date -n 300 # Normalize time with fuzzy normalization into 1 minute buckets from the # epoch-relative 'time' field ... | recs-normalizetime --key time -e -n 60 #Get 1 week buckets ... | recs-normalizetime --key timestamp -n '1 week' Help from: --help-full: Full Help This recs processor will generate normalized versions of date/time values and add this value as another attribute to the record stream. Used in conjunction with recs-collate you can aggregate information over the normalized time. For example if you use recs-normalized -k date --n 1 | recs-collate -k n_date -a firstrec then this picks a single record from a stream to serve in placement of lots of records which are close to each other in time. The normalized time value generated depends on whether or not you are using strict normalization or not. The default is to use non-strict. The use of the optional --epoch argument indicates that the date/time values are expressed in epoch seconds. This argument both speeds up the execution of an invocation (due to avoiding the expensive perl Date:Manip executions) and is required for correctness when the values are epoch seconds. 1. When using strict normalization then time is chunked up into fixed segments of --threshold seconds in each segment with the first segment occurring on January 1st 1970 at 0:00. So if the threshold is 60 seconds then the following record stream would be produced date n_date 1:00:00 1:00:00 1:00:14 1:00:00 1:00:59 1:00:00 1:02:05 1:02:00 1:02:55 1:02:00 1:03:15 1:03:00 2. When not using strict normalization then the time is again chunked up into fixed segments however the actual segment assigned to a value depends on the segement chunk seen in the prior record. The logic used is the following: - a time is distilled down to a representative sample where the precision is defined by the --threshold. For example if you said that the threshold is 10 (seconds) then 10:22:01 and 10:22:09 would both become 10:22:00. 10:22:10 would be 10:22:10. - as you can tell the representative values is the first second within the range that you define, with one exception - if the representative value of the prior record is in the prior range to the current representative value then the prior record value will be used So if the threshold is 60 seconds then the following record stream would be produced date n_date 1:00:00 1:00:00 1:00:59 1:00:00 1:02:05 1:02:00 1:02:55 1:02:00 1:03:15 1:02:00 ** Note - still matches prior representative value ** 1:05:59 1:05:00 1:06:15 1:05:00 ** Note - matches prior entry ** 1:07:01 1:07:00 ** Note - since the 1:05 and 1:06 had the same representative ** ** value then this is considered a new representative time slice ** Basically a 60 second threshold will match the current minute and the next minute unless the prior minute was seen and then the 60 second threshold matches the current minute and the prior minute. Example usage: if you have log records for "out of memory" exceptions which may occur multiple times because of exception catching and logging then you can distill them all down to a single logical event and then count the number of occurrences for a host via: grep "OutOfMemory" logs | recs-frommultire --re 'host=@([^:]*):' --re 'date=^[A-Za-z]* (.*) GMT ' | recs-normalizetime --key date --threshold 300 | recs-collate --perfect --key n_date -a firstrec | recs-collate --perfect --key firstrec_host -a count=count Help from: --help-keyspecs: KEY SPECS A key spec is short way of specifying a field with prefixes or regular expressions, it may also be nested into hashes and arrays. Use a '/' to nest into a hash and a '#NUM' to index into an array (i.e. #2) An example is in order, take a record like this: {"biz":["a","b","c"],"foo":{"bar 1":1},"zap":"blah1"} {"biz":["a","b","c"],"foo":{"bar 1":2},"zap":"blah2"} {"biz":["a","b","c"],"foo":{"bar 1":3},"zap":"blah3"} In this case a key spec of 'foo/bar 1' would have the values 1,2, and 3 in the respective records. Similarly, 'biz/#0' would have the value of 'a' for all 3 records You can also prefix key specs with '@' to engage the fuzzy matching logic Fuzzy matching works like this in order, first key to match wins 1. Exact match ( eq ) 2. Prefix match ( m/^/ ) 3. Match anywehre in the key (m//) So, in the above example '@b/#2', the 'b' portion would expand to 'biz' and 2 would be the index into the array, so all records would have the value of 'c' Simiarly, @f/b would have values 1, 2, and 3 You can escape / with a \. For example, if you have a record: {"foo/bar":2} You can address that key with foo\/bar
To install App::RecordStream, copy and paste the appropriate command in to your terminal.
cpanm
cpanm App::RecordStream
CPAN shell
perl -MCPAN -e shell install App::RecordStream
For more information on module installation, please visit the detailed CPAN module installation guide.