Word2vec::Interface - Interface module for word2vec.pm, word2phrase.pm, interface.pm modules and associated utilities.
use Word2vec::Interface; my $result = 0; # Compile a text corpus, execute word2vec training and compute cosine similarity of two words my $w2vinterface = Word2vec::Interface->new(); my $xmlconv = $w2vinterface->GetXMLToW2VHandler(); $xmlconv->SetWorkingDir( "Medline/XML/Directory/Here" ); $xmlconv->SetSavePath( "textcorpus.txt" ); $xmlconv->SetStoreTitle( 1 ); $xmlconv->SetStoreAbstract( 1 ); $xmlconv->SetBeginDate( "01/01/2004" ); $xmlconv->SetEndDate( "08/13/2016" ); $xmlconv->SetOverwriteExistingFile( 1 ); # If Compound Word File Exists, Store It In Memory # And Create Compound Word Binary Search Tree Using The Compound Word Data $xmlconv->ReadCompoundWordDataFromFile( "compoundword.txt" ); $xmlconv->CreateCompoundWordBST(); # Parse XML Files or Directory Of Files $result = $xmlconv->ConvertMedlineXMLToW2V( "/xmlDirectory/" ); # Check(s) print( "Error Parsing Medline XML Files\n" ) if ( $result == -1 ); exit if ( $result == -1 ); # Setup And Execute word2vec Training my $word2vec = $w2vinterface->GetWord2VecHandler(); $word2vec->SetTrainFilePath( "textcorpus.txt" ); $word2vec->SetOutputFilePath( "vectors.bin" ); $word2vec->SetWordVecSize( 200 ); $word2vec->SetWindowSize( 8 ); $word2vec->SetSample( 0.0001 ); $word2vec->SetNegative( 25 ); $word2vec->SetHSoftMax( 0 ); $word2vec->SetBinaryOutput( 0 ); $word2vec->SetNumOfThreads( 20 ); $word2vec->SetNumOfIterations( 12 ); $word2vec->SetUseCBOW( 1 ); $word2vec->SetOverwriteOldFile( 0 ); # Execute word2vec Training $result = $word2vec->ExecuteTraining(); # Check(s) print( "Error Training Word2vec On File: \"textcorpus.txt\"" ) if ( $result == -1 ); exit if ( $result == -1 ); # Read word2vec Training Data Into Memory And Store As A Binary Search Tree $result = $word2vec->ReadTrainedVectorDataFromFile( "vectors.bin" ); # Check(s) print( "Error Unable To Read Word2vec Trained Vector Data From File\n" ) if ( $result == -1 ); exit if ( $result == -1 ); # Compute Cosine Similarity Between "respiratory" and "arrest" $result = $word2vec->ComputeCosineSimilarity( "respiratory", "arrest" ); print( "Cosine Similarity Between \"respiratory\" and \"arrest\": $result\n" ) if defined( $result ); print( "Error Computing Cosine Similarity\n" ) if !defined( $result ); # Compute Cosine Similarity Between "respiratory arrest" and "heart attack" $result = $word2vec->ComputeMultiWordCosineSimilarity( "respiratory arrest", "heart attack" ); print( "Cosine Similarity Between \"respiratory arrest\" and \"heart attack\": $result\n" ) if defined( $result ); print( "Error Computing Cosine Similarity\n" ) if !defined( $result ); undef( $w2vinterface ); # or use Word2vec::Interface; my $result = 0; my $w2vinterface = Word2vec::Interface->new(); $w2vinterface->XTWSetWorkingDir( "Medline/XML/Directory/Here" ); $w2vinterface->XTWSetSavePath( "textcorpus.txt" ); $w2vinterface->XTWSetStoreTitle( 1 ); $w2vinterface->XTWSetStoreAbstract( 1 ); $w2vinterface->XTWSetBeginDate( "01/01/2004" ); $w2vinterface->XTWSetEndDate( "08/13/2016" ); $w2vinterface->XTWSetOverwriteExistingFile( 1 ); # If Compound Word File Exists, Store It In Memory # And Create Compound Word Binary Search Tree Using The Compound Word Data $w2vinterface->XTWReadCompoundWordDataFromFile( "compoundword.txt" ); $w2vinterface->XTWCreateCompoundWordBST(); # Parse XML Files or Directory Of Files $result = $w2vinterface->XTWConvertMedlineXMLToW2V( "/xmlDirectory/" ); $result = $w2vinterface->W2VExecuteTraining( "textcorpus.txt", "vectors.bin", 200, 8, undef, 0.001, 25, undef, 0, 0, 20, 15, 1, 0, undef, undef, undef, 1 ); # Read word2vec Training Data Into Memory And Store As A Binary Search Tree $result = $w2vinterface->W2VReadTrainedVectorDataFromFile( "vectors.bin" ); # Check(s) print( "Error Unable To Read Word2vec Trained Vector Data From File\n" ) if ( $result == -1 ); exit if ( $result == -1 ); # Compute Cosine Similarity Between "respiratory" and "arrest" $result = $w2vinterface->W2VComputeCosineSimilarity( "respiratory", "arrest" ); print( "Cosine Similarity Between \"respiratory\" and \"arrest\": $result\n" ) if defined( $result ); print( "Error Computing Cosine Similarity\n" ) if !defined( $result ); # Compute Cosine Similarity Between "respiratory arrest" and "heart attack" $result = $w2vinterface->W2VComputeMultiWordCosineSimilarity( "respiratory arrest", "heart attack" ); print( "Cosine Similarity Between \"respiratory arrest\" and \"heart attack\": $result\n" ) if defined( $result ); print( "Error Computing Cosine Similarity\n" ) if !defined( $result ); undef( $w2vinterface );
Word2vec::Interface is an interface module for utilization of word2vec, word2phrase, xmltow2v and their associated functions. This program houses a set of functions, modules and utilities for use with UMLS Similarity. XmlToW2v Features: - Compilation of a text corpus from plain or gun-zipped Medline XML files. - Multi-threaded text corpus compilation support. - Include text corpus articles via date range. - Include text corpus articles via title, abstract or both. - Compoundifying on-the-fly while building text corpus given a compound word file. Word2vec Features: - Word2vec training with user specified settings. - Manipulation of Word2vec word vectors. (Addition/Subtraction/Average) - Word2vec binary format to plain text file conversion. - Word2vec plain text to binary format file conversion. - Multi-word cosine similarity computation. (Sudo-compound word cosine similarity). Word2phrase Features: - Word2phrase training with user specified settings. Interface Features: - Word Sense Disambiguation via trained word2vec data.
Description:
Returns a new "Word2vec::Interface" module object. Note: Specifying no parameters implies default options. Default Parameters: word2vecDir = "../../External/word2vec" debugLog = 0 writeLog = 0 ignoreCompileErrors = 0 ignoreFileChecks = 0 exitFlag = 0 workingDir = "" word2vec = Word2vec::Word2vec->new() word2phrase = Word2vec::Word2phrase->new() xmltow2v = Word2vec::Xmltow2v->new() util = Word2vec::Interface() instanceAry = () senseAry = () instanceCount = 0 senseCount = 0
Input:
$word2vecDir -> Specifies word2vec package source/executable directory. $debugLog -> Instructs module to print debug statements to the console. ('1' = True / '0' = False) $writeLog -> Instructs module to print debug statements to a log file. ('1' = True / '0' = False) $ignoreCompileErrors -> Instructs module to ignore source code compilation errors. ('1' = True / '0' = False) $ignoreFileChecks -> Instructs module to ignore file checks. ('1' = True / '0' = False) $exitFlag -> In the event of a run-time check error, exitFlag is set to '1' which gracefully terminates the script. $workingDir -> Specifies the current working directory. $word2vec -> Word2vec::Word2vec object. $word2phrase -> Word2vec::Word2phrase object. $xmltow2v -> Word2vec::Xmltow2v object. $interface -> Word2vec::Interface object. $instanceAry -> Word Sense Disambiguation: Array of instances. $senseAry -> Word Sense Disambiguation: Array of senses. $instanceCount -> Number of Word Sense Disambiguation instances loaded in memory. $senseCount -> Number of Word Sense Disambiguation senses loaded in memory. Note: It is not recommended to specify all new() parameters, as it has not been thoroughly tested. Maximum recommended parameters to be specified include: "word2vecDir, debugLog, writeLog, ignoreCompileErrors, ignoreFileChecks"
Output:
Word2vec::Interface object.
Example:
use Word2vec::Interface; # Parameters: Word2Vec Directory = undef, DebugLog = True, WriteLog = False, IgnoreCompileErrors = False, IgnoreFileChecks = False my $interface = Word2vec::Interface->new( undef, 1, 0 ); undef( $interface ); # Or # Parameters: Word2Vec Directory = undef, DebugLog = False, WriteLog = False, IgnoreCompileErrors = False, IgnoreFileChecks = False use Word2vec::Interface; my $interface = Word2vec::Interface->new(); undef( $interface );
Removes member variables and file handle from memory.
None
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->DESTROY(); undef( $interface );
Runs word2vec file checks. Looks for word2vec executable files, if not found it will then look for the source code and compile automatically placing the executable files in the same directory. Errors out gracefully when word2vec executable files are not present and source files cannot be located. Notes : Word2vec Executable File List: word2vec, word2phrase, word-analogy, distance, compute-accuracy. : This method is called automatically in interface::new() function. It can be disabled by setting _ignoreFileChecks new() parameter to 1.
$string -> Word2vec source/executable directory.
$value -> Returns '1' if checks passed and '0' if file checks failed.
use Word2vec::Interface; my $interface = Word2vec::Interface->new( undef, 1, 0, 1, 1 ); my $result = $interface->RunFileChecks(); print( "Passed Word2Vec File Checks!\n" ) if $result == 0; print( "Failed Word2Vec File Checks!\n" ) if $result == 1; undef( $interface );
Checks specified executable file exists in a given directory.
$filePath -> Executable file path $fileName -> Executable file name
$value -> Returns '1' if file is found and '0' if otherwise.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $result = $interface->_CheckIfExecutableFileExists( "../../External/word2vec", "word2vec" ); print( "Executable File Exists!\n" ) if $result == 1; print( "Executable File Does Not Exist!\n" ) if $result == 0; undef( $interface );
Checks specified directory (string) for the filename (string). This ensures the specified files are of file type "text/cpp".
$value -> Returns '1' if file is found and of type "text/cpp" and '0' if otherwise.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $result = $interface->_CheckIfSourceFileExists( "../../External/word2vec", "word2vec" ); print( "Source File Exists!\n" ) if $result == 1; print( "Source File Does Not Exist!\n" ) if $result == 0; undef( $interface );
Compiles C++ source filename in a specified directory.
$filePath -> Source file path (string) $fileName -> Source file name (string)
$value -> Returns '1' if successful and '0' if un-successful.
use Word2vec::Interface; my $interface = Word2vec::Interface; my $result = $interface->_CompileSourceFile( "../../External/word2vec", "word2vec" ); print( "Compiled Source Successfully!\n" ) if $result == 1; print( "Source Compilation Attempt Unsuccessful!\n" ) if $result == 0; undef( $interface );
Checks file in given file path and if it exists, returns the file type.
$filePath -> File path
$string -> Returns file type (string).
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $fileType = $interface->GetFileType( "samples/textcorpus.txt" ); print( "File Type: $fileType\n" ); undef( $interface );
Returns current operating system (string).
$string -> Operating System Type. (String)
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $os = $interface->GetOSType(); print( "Operating System: $os\n" ); undef( $interface );
Modifies "word2vec.c" file for compilation under windows operating system.
$value -> '1' = Successful / '0' = Un-successful
This is a private function and should not be utilized.
Removes modification of "word2vec.c". Returns source file to its original state.
$value -> '1' = Successful / '0' = Un-successful.
Command-line Method: Computes cosine similarity between 'wordA' and 'wordB' using the specified 'filePath' for loading trained word2vec word vector data.
$filePath -> Word2Vec trained word vectors binary file path. (String) $wordA -> First word for cosine similarity comparison. $wordB -> Second word for cosine similarity comparison.
$value -> Cosine similarity value (float) or undefined.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $value = $interface->CLComputeCosineSimilarity( "../../samples/samplevectors.bin", "of", "the" ); print( "Cosine Similarity Between \"of\" and \"the\": $value\n" ) if defined( $value ); print( "Error: Cosine Similarity Could Not Be Computed\n" ) if !defined( $value ); undef( $interface );
Command-line Method: Computes cosine similarity between 'phraseA' and 'phraseB' using the specified 'filePath' for loading trained word2vec word vector data. Note: Supports multiple words concatenated by ':' for each string.
$filePath -> Word2Vec trained word vectors binary file path. (String) $phraseA -> First phrase for cosine similarity comparison. (String) $phraseB -> Second phrase for cosine similarity comparison. (String)
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $value = $interface->CLComputeMultiWordCosineSimilarity( "../../samples/samplevectors.bin", "heart:attack", "myocardial:infarction" ); print( "Cosine Similarity Between \"heart attack\" and \"myocardial infarction\": $value\n" ) if defined( $value ); print( "Error: Cosine Similarity Could Not Be Computed\n" ) if !defined( $value ); undef( $instance );
Command-line Method: Computes cosine similarity average of all words in 'phraseA' and 'phraseB', then takes cosine similarity between 'phraseA' and 'phraseB' average values using the specified 'filePath' for loading trained word2vec word vector data. Note: Supports multiple words concatenated by ':' for each string.
$filePath -> Word2Vec trained word vectors binary file path. (String) $phraseA -> First phrase for cosine similarity comparison. $phraseB -> Second phrase for cosine similarity comparison.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $value = $interface->CLComputeAvgOfWordsCosineSimilarity( "../../samples/samplevectors.bin", "heart:attack", "myocardial:infarction" ); print( "Cosine Similarity Between \"heart attack\" and \"myocardial infarction\": $value\n" ) if defined( $value ); print( "Error: Cosine Similarity Could Not Be Computed\n" ) if !defined( $value ); undef( $instance );
Command-line Method: Computes cosine similarity depending on user input given a vectorBinaryFile (string). Note: Words can be compounded by the ':' character.
$filePath -> Word2Vec trained word vectors binary file path. (String)
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->CLMultiWordCosSimWithUserInput( "../../samples/samplevectors.bin" ); undef( $instance );
Command-line Method: Loads the specified word2vec trained binary data file, adds word vectors and returns the summed result.
$filePath -> Word2Vec trained word vectors binary file path. (String) $wordDataA -> Word2Vec word data (String) $wordDataB -> Word2Vec word data (String)
$vectorData -> Summed '$wordDataA' and '$wordDataB' vectors
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $wordVtr = $interface->CLAddTwoWordVectors( "../../samples/samplevectors.bin", "of", "the" ); print( "Word Vector for \"of\" + \"the\": $wordVtr\n" ) if defined( $wordVtr ); print( "Word Vector Cannot Be Computed\n" ) if !defined( $wordVtr ); undef( $instance );
Command-line Method: Loads the specified word2vec trained binary data file, subtracts word vectors and returns the difference result.
$vectorData -> Difference of '$wordDataA' and '$wordDataB' vectors
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $wordVtr = $interface->CLSubtractTwoWordVectors( "../../samples/samplevectors.bin", "of", "the" ); print( "Word Vector for \"of\" - \"the\": $wordVtr\n" ) if defined( $wordVtr ); print( "Word Vector Cannot Be Computed\n" ) if !defined( $wordVtr ); undef( $instance );
Command-line Method: Executes word2vec training given the specified options hash.
$hashRef -> Hash reference of word2vec options
$value -> Returns '0' = Successful / '-1' = Un-successful.
use Word2vec::Interface; my %options; $options{'-trainfile'} = "../../samples/textcorpus.txt"; $options{'-outputfile'} = "../../samples/tempvectors.bin"; my $interface = Word2vec::Interface->new(); my $result = $interface->CLStartWord2VecTraining( \%options ); print( "Success!\n" ) if $result == 0; print( "Failed!\n" ) if $result == -1; undef( $interface );
Command-line Method: Executes word2phrase training given the specified options hash.
$hashRef -> Hash reference of word2vec options.
use Word2vec::Interface; my %options; $options{'-trainfile'} = "../../samples/textcorpus.txt"; $options{'-outputfile'} = "../../samples/tempvectors.bin"; my $interface = Word2vec::Interface->new(); my $result = $interface->CLStartWord2PhraseTraining( \%options ); print( "Success!\n" ) if $result == 0; print( "Failed!\n" ) if $result == -1; undef( $interface );
Command-line Method: Compiles a text corpus given the specified options hash.
$hashRef -> Hash reference of xmltow2v options.
use Word2vec::Interface; my %options; $options{'-workdir'} = "../../samples"; $options{'-savedir'} = "../../samples/textcorpus.txt"; my $interface = Word2vec::Interface->new(); my $result = $interface->CLCompileTextCorpus( \%options ); print( "Success!\n" ) if $result == 0; print( "Failed!\n" ) if $result == -1; undef( $interface );
Command-line Method: Converts conversion of word2vec binary format to plain text word vector data.
$filePath -> Word2Vec binary file path $savePath -> Path to save converted file
$value -> '0' = Successful / '-1' = Un-successful
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $result = $interface->CLConvertWord2VecVectorFileToText( "../../samples/samplevectors.bin", "../../samples/convertedvectors.bin" ); print( "Success!\n" ) if $result == 0; print( "Failed!\n" ) if $result == -1; undef( $interface );
Command-line Method: Converts conversion of plain text word vector data to word2vec binary format.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $result = $interface->CLConvertWord2VecVectorFileToBinary( "../../samples/samplevectors.bin", "../../samples/convertedvectors.bin" ); print( "Success!\n" ) if $result == 0; print( "Failed!\n" ) if $result == -1; undef( $interface );
Command-line Method: Converts conversion of plain text word vector data to sparse vector data format.
$filePath -> Vectors file path $savePath -> Path to save converted file
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $result = $interface->CLConvertWord2VecVectorFileToSparse( "../../samples/samplevectors.bin", "../../samples/convertedvectors.bin" ); print( "Success!\n" ) if $result == 0; print( "Failed!\n" ) if $result == -1; undef( $interface );
Command-line Method: Reads a specified plain text file at 'filePath' and 'compoundWordFile', then compoundifies and saves the file to 'savePath'.
$filePath -> Text file to compoundify $savePath -> Path to save compoundified file $compoundWordFile -> Compound word file path
$value -> Result '0' = Successful / '-1' = Un-successful
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $result = $interface->CLCompoundifyTextInFile( "../../samples/textcorpus.txt", "../../samples/compoundcorpus.txt", "../../samples/compoundword.txt" ); print( "Success!\n" ) if $result == 0; print( "Failed!\n" ) if $result == -1; undef( $interface );
Reads a specifed vector file in memory, sorts alphanumerically and saves to a file.
$hashRef -> Hash reference of parameters. (File path and overwrite parameters)
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my %options; %options{ "-filepath" } = "vectors.bin"; %options{ "-overwrite" } = 1; my $result = $interface->CLSortVectorFile(); print( "Success!\n" ) if $result == 0; print( "Failed!\n" ) if $result == -1; undef( $interface );
Fetches an array containing the nearest n terms using cosine similarity as the metric of determining similar terms.
$term -> Comparison term used to find similar terms. $numberOfSimilarTerms -> Integer value used to limit the number of elements in array returned.
$value -> 'Array reference' = Successful / 'undef' = Un-successful
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $result = $interface->W2VReadTrainedVectorDataFromFile( "vectors.bin" ); $result = $interface->CLFindSimilarTerms( "cookie", 10 ) if $result == 0; print "Success\n" if defined( $result ); print "Error: No Elements Returned\n" if !defined( $result ); return if !defined( $result ); for my $element ( @{ $result } ) { print "$element\n"; } undef( $interface );
Cleans up C object and executable files in word2vec directory.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $result = $interface->CleanWord2VecDirectory(); print( "Success!\n" ) if $result == 0; print( "Failed!\n" ) if $result == -1; undef( $interface );
Computes cosine similarity of average values for a list of specified word comparisons given a file. Note: Trained vector data must be loaded in memory previously before calling this method.
$filePath -> Text file with list of word comparisons by line.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $result = $interface->W2VReadTrainedVectorDataFromFile( "vectors.bin" ); $result = $interface->CLSimilarityAvg( "MiniMayoSRS.terms" ) if $result == 0; print( "Success!\n" ) if $result == 0; print( "Failed!\n" ) if $result == -1; undef( $interface );
Computes cosine similarity values for a list of specified compound word comparisons given a file. Note: Trained vector data must be loaded in memory previously before calling this method.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $result = $interface->W2VReadTrainedVectorDataFromFile( "vectors.bin" ); $result = $interface->CLSimilarityComp( "MiniMayoSRS.terms" ) if $result == 0; print( "Success!\n" ) if $result == 0; print( "Failed!\n" ) if $result == -1; undef( $interface );
Computes cosine similarity of summed values for a list of specified word comparisons given a file. Note: Trained vector data must be loaded in memory previously before calling this method.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $result = $interface->W2VReadTrainedVectorDataFromFile( "vectors.bin" ); $result = $interface->CLSimilaritySum( "MiniMayoSRS.terms" ) if $result == 0; print( "Success!\n" ) if $result == 0; print( "Failed!\n" ) if $result == -1; undef( $interface );
Command-line Method: Assigns a particular sense to each instance using word2vec trained word vector data. Stop words are removed if a stoplist is specified before computing cosine similarity average of each instance and sense context.
$instanceFilePath -> WSD instance file path $senseFilePath -> WSD sense file path $stopListfilePath -> Stop list file path
$value -> Returns '0' = Successful / '-1' = Un-successful
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $result = $interface->CLWordSenseDisambiguation( "ACE.instances.sval", "ACE.senses.sval", "vectors.bin", "stoplist" ); print( "Success!\n" ) if $result == 0; print( "Failed!\n" ) if $result == -1; undef( $interface );
Analyzes sense sval files for identification number mismatch and adjusts accordingly in memory.
Reads a WSD list when the '-list' parameter is specified.
$listPath -> WSD list file path
\%listOfFile -> List of files hash reference
Parses the specified list of files for Word Sense Disambiguation computation.
$listOfFilesHashRef -> Hash reference to a hash of file paths $vectorBinaryFile -> Word2vec trained word vector data file $stopListFilePath -> Stop list file path
Parses a specified file in SVL format and stores all context in memory. Utilized for Word Sense Disambiguation cosine similarity computation.
$filePath -> WSD instance or sense file path $stopListRegex -> Stop list regex ( Automatically generated with stop list file )
$arrayReference -> Array reference of WSD instances or WSD senses in memory.
For each instance stored in memory, this method computes an average cosine similarity for the context of each instance and sense with stop words removed via stop list regex. After average cosine similarity values are calculated for each instance and sense, the cosine similarity of each instance and sense is computed. The highest cosine similarity value of a given instance to a particular sense is assigned and stored.
Computes accuracy of assigned sense identification for each instance in memory.
$value -> Returns accuracy percentage (float) or '-1' if un-successful.
For each instance, this method prints standard information to the console window consisting of:
Note: Only prints to console if '--debuglog' or 'writelog' option is passed.
Saves WSD results post sense identification assignment in the 'instanceFilePath' (string) location. Saved data consists of:
$instanceFilePath -> WSD instance file path
Fetches saved results for all instance files and stores accuracies for each in a text file.
$workingDirectory -> Directory of "*.results.txt" files
Generates and returns a stop list regex given a 'stopListFilePath' (string). Returns undefined in the event of an error.
$stopListFilePath -> WSD Stop list file path
$stopListRegex -> Returns stop list regex of the WSD stop list file.
Converts passed string parameter to current OS line ending format. ie. DOS/Windows to Unix/Linux or Unix/Linux to DOS/Windows. Warning: This is incompatible with the legacy MacOS format, errors may occur as it is not supported.
$string -> String to convert
$string -> Output data with target OS line endings.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $tempStr = "samples text\r\n; $tempStr = $interface->ConvertStringLineEndingsToTargetOS( $tempStr ); undef( $interface );
Returns word2vec executable/source directory.
$string -> Word2vec file path
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $filePath = $interface->GetWord2VecDir(); print( "FilePath: $filePath\n" ); undef( $interface );
Returns the _debugLog member variable set during Word2vec::Word2phrase object initialization of new function.
$value -> 0 = False, 1 = True
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $debugLog = $interface->GetDebugLog(); print( "Debug Logging Enabled\n" ) if $debugLog == 1; print( "Debug Logging Disabled\n" ) if $debugLog == 0; undef( $interface );
Returns the _writeLog member variable set during Word2vec::Word2phrase object initialization of new function.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $writeLog = $interface->GetWriteLog(); print( "Write Logging Enabled\n" ) if $writeLog == 1; print( "Write Logging Disabled\n" ) if $writeLog == 0; undef( $interface );
Returns the _ignoreCompileErrors member variable set during Word2vec::Word2phrase object initialization of new function.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $ignoreCompileErrors = $interface->GetIgnoreCompileErrors(); print( "Ignore Compile Errors Enabled\n" ) if $ignoreCompileErrors == 1; print( "Ignore Compile Errors Disabled\n" ) if $ignoreCompileErrors == 0; undef( $interface );
Returns the _ignoreFileChecks member variable set during Word2vec::Word2phrase object initialization of new function.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $ignoreFileChecks = $interface->GetIgnoreFileChecks(); print( "Ignore File Checks Enabled\n" ) if $ignoreFileChecks == 1; print( "Ignore File Checks Disabled\n" ) if $ignoreFileChecks == 0; undef( $interface );
Returns the _exitFlag member variable set during Word2vec::Word2phrase object initialization of new function.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $exitFlag = $interface->GetExitFlag(); print( "Exit Flag Set\n" ) if $exitFlag == 1; print( "Exit Flag Not Set\n" ) if $exitFlag == 0; undef( $interface );
Returns file handle used by WriteLog() method.
$fileHandle -> Returns file handle blob used by 'WriteLog()' function or undefined.
Returns the _workingDir member variable set during Word2vec::Word2phrase object initialization of new function.
$string -> Returns working directory
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $dir = $interface->GetWorkingDirectory(); print( "Working Directory: $dir\n" ); undef( $interface );
Returns the _word2vec member variable set during Word2vec::Word2phrase object initialization of new function. Note: This returns a new object if not defined with word2vec::_debugLog and word2vec::_writeLog parameters mirroring interface::_debugLog and interface::_writeLog.
Word2vec::Word2vec -> Returns 'Word2vec::Word2vec' object.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $word2vec = $interface->GetWord2VecHandler(); undef( $word2vec ); undef( $interface );
Returns the _word2phrase member variable set during Word2vec::Word2phrase object initialization of new function. Note: This returns a new object if not defined with word2vec::_debugLog and word2vec::_writeLog parameters mirroring interface::_debugLog and interface::_writeLog.
Word2vec::Word2phrase -> Returns 'Word2vec::Word2phrase' object
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $word2phrase = $interface->GetWord2PhraseHandler(); undef( $word2phrase ); undef( $interface );
Returns the _xmltow2v member variable set during Word2vec::Word2phrase object initialization of new function. Note: This returns a new object if not defined with word2vec::_debugLog and word2vec::_writeLog parameters mirroring interface::_debugLog and interface::_writeLog.
Word2vec::Xmltow2v -> Returns 'Word2vec::Xmltow2v' object
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $xmltow2v = $interface->GetXMLToW2VHandler(); undef( $xmltow2v ); undef( $interface );
Returns the _instanceAry member variable set during Word2vec::Word2phrase object initialization of new function.
$instance -> Returns array reference of WSD instances.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $aryRef = $interface->GetInstanceAry(); my @instanceAry = @{ $aryRef }; undef( $interface );
Returns the _senseAry member variable set during Word2vec::Word2phrase object initialization of new function.
$senses -> Returns array reference of WSD senses.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $aryRef = $interface->GetSensesAry(); my @sensesAry = @{ $aryRef }; undef( $interface );
Returns the _instanceCount member variable set during Word2vec::Word2phrase object initialization of new function.
$value -> Returns number of stored WSD instances.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $count = $interface->GetInstanceCount(); print( "Stored WSD instances in memory: $count\n" ); undef( $interface );
Returns the _sensesCount member variable set during Word2vec::Word2phrase object initialization of new function.
$value -> Returns number of stored WSD senses.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $count = $interface->GetSensesCount(); print( "Stored WSD senses in memory: $count\n" ); undef( $interface );
Sets word2vec executable/source file directory.
$string -> Word2Vec Directory
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->SetWord2VecDir( "/word2vec" ); undef( $interface );
Instructs module to print debug statements to the console.
$value -> '1' = Print Debug Statements / '0' = Do Not Print Statements
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->SetDebugLog( 1 ); undef( $interface );
Instructs module to print a log file.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->SetWriteLog( 1 ); undef( $interface );
Instructs module to ignore compile errors when compiling source files.
$value -> '1' = Ignore warnings/errors, '0' = Display and process warnings/errors.
use Word2vec::Interface; my $instance = word2vec::instance->new(); $instance->SetIgnoreCompileErrors( 1 ); undef( $instance );
Instructs module to ignore file checking errors.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->SetIgnoreFileCheckErrors( 1 ); undef( $interface );
Sets current working directory.
$path -> Working directory path (String)
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->SetWorkingDirectory( "my/new/working/directory" ); undef( $interface );
Sets member instance array variable to de-referenced passed array reference parameter.
$arrayReference -> Array reference for Word Sense Disambiguation - Array of instances (Word2vec::Wsddata objects).
use word2vec::instance; # This array would theoretically contain 'Word2vec::Wsddata' objects. my @instanceAry = (); my $instance = word2vec::instance->new(); $instance->SetInstanceAry( \@instanceAry ); undef( $instance );
Clears member instance array.
use Word2vec::Interface; my $instance = word2vec::instance->new(); $instance->ClearInstanceAry(); undef( $instance );
Sets member sense array variable to de-referenced passed array reference parameter.
$arrayReference -> Array reference for Word Sense Disambiguation - Array of senses (Word2vec::Wsddata objects).
use Word2vec::Interface; # This array would theoretically contain 'Word2vec::Wsddata' objects. my @senseAry = (); my $interface = word2vec::instance->new(); $interface->SetSenseAry( \@senseAry ); undef( $instance );
Clears member sense array.
use word2vec::instance; my $instance = word2vec::instance->new(); $instance->ClearSenseAry(); undef( $instance );
Sets member instance count variable to passed value (integer).
$value -> Integer (Positive)
use word2vec::instance; my $instance = word2vec::instance->new(); $instance->SetInstanceCount( 12 ); undef( $instance );
Sets member sense count variable to passed value (integer).
use Word2vec::Interface; my $interface = word2vec::instance->new(); $instance->SetSenseCount( 12 ); undef( $instance );
Returns current time string in "Hour:Minute:Second" format.
$string -> XX:XX:XX ("Hour:Minute:Second")
use Word2vec::Interface: my $interface = Word2vec::Interface->new(); my $time = $interface->GetTime(); print( "Current Time: $time\n" ) if defined( $time ); undef( $interface );
Returns current month, day and year string in "Month/Day/Year" format.
$string -> XX/XX/XXXX ("Month/Day/Year")
use Word2vec::Interface: my $interface = Word2vec::Interface->new(); my $date = $interface->GetDate(); print( "Current Date: $date\n" ) if defined( $date ); undef( $interface );
Prints passed string parameter to the console, log file or both depending on user options. Note: printNewLine parameter prints a new line character following the string if the parameter is undefined and does not if parameter is 0.
$string -> String to print to the console/log file. $value -> 0 = Do not print newline character after string, all else prints new line character including 'undef'.
use Word2vec::Interface: my $interface = Word2vec::Interface->new(); $interface->WriteLog( "Hello World" ); undef( $interface );
Given a path, returns a string specifying whether this path represents a file or directory.
$path -> String representing path to check
$string -> Returns "file", "dir" or "unknown".
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $result = $interface->IsFileOrDirectory( "../samples/stoplist" ); print( "Path Type Is A File\n" ) if $result eq "file"; print( "Path Type Is A Directory\n" ) if $result eq "dir"; print( "Path Type Is Unknown\n" ) if $result eq "unknown"; undef( $interface );
Given a path and file tag string, returns a string of files consisting of the file tag string in the specified path.
$path -> String representing path $fileTag -> String consisting of file tag to fetch.
$string -> Returns string of file names consisting of $fileTag.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); # Looks in specified path for files including ".sval" in their file name. my $result = $interface->GetFilesInDirectory( "../samples/", ".sval" ); print( "Found File Name(s): $result\n" ) if defined( $result ); undef( $interface );
Calculates Spearman's Rank Correlation Score between two data-sets.
$fileA -> Data set to compare $fileB -> Data set to compare $includeCountsInResults -> Specifies whether to return file counts in score. (undef = False / defined = True)
$value -> "undef" or Spearman's Rank Correlation Score
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $score = $interface->SpCalculateSpearmans( "samples/MiniMayoSRS.term.comp_results", "Similarity/MiniMayoSRS.terms.coders", undef ); print "Spearman's Rank Correlation Score: $score\n" if defined( $score ); print "Spearman's Rank Correlation Score: undef\n" if !defined( $score ); undef( $interface );
Determines if a file is composed of CUI or word terms by checking the first line.
$string -> File Path
$string -> "undef" = Unable to determine, "cui" = CUI Term File, "word" = Word Term File
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $isWordOrCuiFile = $interface->SpIsFileWordOrCUIFile( "samples/MiniMayoSRS.terms" ); print( "MiniMayoSRS.terms File Is A \"$isWordOrCuiFile\" File\n" ) if defined( $isWordOrCuiFile ); print( "Unable To Determine Type Of File\n" ) if !defined( $isWordOrCuiFile ); undef( $interface );
Returns the number of decimal places after the decimal point of the Spearman's Rank Correlation Score to represent.
$value -> Integer
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); print "Spearman's Precision: " . $interface->SpGetPrecision() . "\n"; undef( $interface );
Returns the variable indicating whether the files to be parsed are files consisting of words or CUI terms.
$value -> "undef" = Auto-Detect, 0 = CUI Terms, 1 = Word Terms
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $isFileOfWords = $interface->SpGetIsFileOfWords(); print "Is File Of Words?: $isFileOfWords\n" if defined( $isFileOfWords ); print "Is File Of Words?: undef\n" if !defined( $isFileOfWords ); undef( $interface );
Returns the variable indicating whether the to print NValue.
$value -> "undef" = Do not print NValue, "defined" = Print NValue
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $printN = $interface->SpGetPrintN(); print "Print N\n" if defined( $printN ); print "Do Not Print N\n" if !defined( $printN ); undef( $interface );
Returns the non-negative count for file A.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); print "A Count: " . $interface->SpGetACount() . "\n"; undef( $interface );
Returns the non-negative count for file B.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); print "B Count: " . $interface->SpGetBCount() . "\n"; undef( $interface );
Returns the N value.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); print "N Value: " . $interface->SpGetNValue() . "\n"; undef( $interface );
Sets number of decimal places after the decimal point of the Spearman's Rank Correlation Score to represent.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->SpSetPrecision( 8 ); my $score = $interface->SpCalculateSpearmans( "samples/MiniMayoSRS.term.comp_results", "Similarity/MiniMayoSRS.terms.coders", undef ); print "Spearman's Rank Correlation Score: $score\n" if defined( $score ); print "Spearman's Rank Correlation Score: undef\n" if !defined( $score ); undef( $interface );
Specifies the main method to auto-detect if file consists of CUI or Word terms, or manual override with user setting.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->SpSetIsFileOfWords( undef ); my $score = $interface->SpCalculateSpearmans( "samples/MiniMayoSRS.term.comp_results", "Similarity/MiniMayoSRS.terms.coders", undef ); print "Spearman's Rank Correlation Score: $score\n" if defined( $score ); print "Spearman's Rank Correlation Score: undef\n" if !defined( $score ); undef( $interface );
Specifies the main method print _NValue post Spearmans::CalculateSpearmans() function completion.
$value -> "undef" = Do Not Print _NValue, "defined" = Print _NValue
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->SpSetPrintN( 1 ); my $score = $interface->SpCalculateSpearmans( "samples/MiniMayoSRS.term.comp_results", "Similarity/MiniMayoSRS.terms.coders", undef ); print "Spearman's Rank Correlation Score: $score\n" if defined( $score ); print "Spearman's Rank Correlation Score: undef\n" if !defined( $score ); undef( $interface );
Executes word2vec training based on parameters. Parameter variables have higher precedence than member variables. Any parameter specified will override its respective member variable. Note: If no parameters are specified, this module executes word2vec training based on preset member variables. Returns string regarding training status.
$trainFilePath -> Specifies word2vec text corpus training file in a given path. (String) $outputFilePath -> Specifies word2vec trained output data file name and save path. (String) $vectorSize -> Size of word2vec word vectors. (Integer) $windowSize -> Maximum skip length between words. (Integer) $minCount -> Disregard words that appear less than $minCount times. (Integer) $sample -> Threshold for occurrence of words. Those that appear with higher frequency in the training data will be randomly down-sampled. (Float) $negative -> Number of negative examples. (Integer) $alpha -> Set that start learning rate. (Float) $hs -> Hierarchical Soft-max (Integer) $binary -> Save trained data as binary mode. (Integer) $numOfThreads -> Number of word2vec training threads. (Integer) $iterations -> Number of training iterations to run prior to completion of training. (Integer) $useCBOW -> Enable Continuous Bag Of Words model or Skip-Gram model. (Integer) $classes -> Output word classes rather than word vectors. (Integer) $readVocab -> Read vocabulary from file path without constructing from training data. (String) $saveVocab -> Save vocabulary to file path. (String) $debug -> Set word2vec debug mode. (Integer) $overwrite -> Instructs the module to either overwrite any existing text corpus files or append to the existing file. ( '1' = True / '0' = False ) Note: It is not recommended to specify all new() parameters, as it has not been thoroughly tested.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetTrainFilePath( "textcorpus.txt" ); $interface->W2VSetOutputFilePath( "vectors.bin" ); $interface->W2VSetWordVecSize( 200 ); $interface->W2VSetWindowSize( 8 ); $interface->W2VSetSample( 0.0001 ); $interface->W2VSetNegative( 25 ); $interface->W2VSetHSoftMax( 0 ); $interface->W2VSetBinaryOutput( 0 ); $interface->W2VSetNumOfThreads( 20 ); $interface->W2VSetNumOfIterations( 15 ); $interface->W2VSetUseCBOW( 1 ); $interface->W2VSetOverwriteOldFile( 0 ); $interface->W2VExecuteTraining(); undef( $interface ); # or use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VExecuteTraining( "textcorpus.txt", "vectors.bin", 200, 8, 5, 0.001, 25, 0.05, 0, 0, 20, 15, 1, 0, "", "", 2, 0 ); undef( $interface );
$trainingStr -> String to train with word2vec. $outputFilePath -> Specifies word2vec trained output data file name and save path. (String) $vectorSize -> Size of word2vec word vectors. (Integer) $windowSize -> Maximum skip length between words. (Integer) $minCount -> Disregard words that appear less than $minCount times. (Integer) $sample -> Threshold for occurrence of words. Those that appear with higher frequency in the training data will be randomly down-sampled. (Float) $negative -> Number of negative examples. (Integer) $alpha -> Set that start learning rate. (Float) $hs -> Hierarchical Soft-max (Integer) $binary -> Save trained data as binary mode. (Integer) $numOfThreads -> Number of word2vec training threads. (Integer) $iterations -> Number of training iterations to run prior to completion of training. (Integer) $useCBOW -> Enable Continuous Bag Of Words model or Skip-Gram model. (Integer) $classes -> Output word classes rather than word vectors. (Integer) $readVocab -> Read vocabulary from file path without constructing from training data. (String) $saveVocab -> Save vocabulary to file path. (String) $debug -> Set word2vec debug mode. (Integer) $overwrite -> Instructs the module to either overwrite any existing text corpus files or append to the existing file. ( '1' = True / '0' = False ) Note: It is not recommended to specify all new() parameters, as it has not been thoroughly tested.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetOutputFilePath( "vectors.bin" ); $interface->W2VSetWordVecSize( 200 ); $interface->W2VSetWindowSize( 8 ); $interface->W2VSetSample( 0.0001 ); $interface->W2VSetNegative( 25 ); $interface->W2VSetHSoftMax( 0 ); $interface->W2VSetBinaryOutput( 0 ); $interface->W2VSetNumOfThreads( 20 ); $interface->W2VSetNumOfIterations( 15 ); $interface->W2VSetUseCBOW( 1 ); $interface->W2VSetOverwriteOldFile( 0 ); $interface->W2VExecuteStringTraining( "string to train here" ); undef( $interface ); # or use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VExecuteStringTraining( "string to train here", "vectors.bin", 200, 8, 5, 0.001, 25, 0.05, 0, 0, 20, 15, 1, 0, "", "", 2, 0 ); undef( $interface );
Computes cosine similarity between two words using trained word2vec vector data. Returns float value or undefined if one or more words are not in the dictionary. Note: Supports single words only and requires vector data to be in memory with W2VReadTrainedVectorDataFromFile() prior to function execution.
$string -> Single string word $string -> Single string word
$value -> Float or Undefined
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" ); print "Cosine similarity between words: \"of\" and \"the\": " . $interface->W2VComputeCosineSimilarity( "of", "the" ) . "\n"; undef( $interface );
Computes cosine similarity between two words or compound words using trained word2vec vector data. Returns float value or undefined. Note: Supports multiple words concatenated by ' ' and requires vector data to be in memory prior to method execution. This method will not error out when a word is not located within the dictionary. It will take the average of all found words for each parameter then cosine similarity of both word vectors.
$string -> string of single or multiple words separated by ' ' (space). $string -> string of single or multiple words separated by ' ' (space).
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" ); print "Cosine similarity between words: \"heart attack\" and \"acute myocardial infarction\": " . $interface->W2VComputeAvgOfWordsCosineSimilarity( "heart attack", "acute myocardial infarction" ) . "\n"; undef( $interface );
Computes cosine similarity between two words or compound words using trained word2vec vector data. Note: Supports multiple words concatenated by ' ' (space) and requires vector data to be in memory prior to method execution. If $allWordsMustExist is set to true, this function will error out when a specified word is not found and return undefined.
$string -> string of single or multiple words separated by ' ' (space). $string -> string of single or multiple words separated by ' ' (space). $allWordsMustExist -> 1 = True, 0 or undef = False
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" ); print "Cosine similarity between words: \"heart attack\" and \"acute myocardial infarction\": " . $interface->W2VComputeMultiWordCosineSimilarity( "heart attack", "acute myocardial infarction" ) . "\n"; undef( $interface );
Computes cosine similarity between two word vectors. Returns float value or undefined if one or more words are not in the dictionary. Note: Function parameters require actual word vector data with words removed.
$string -> string of word vector representation data separated by ' ' (space). $string -> string of word vector representation data separated by ' ' (space).
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" ); my $vectorAData = $interface->W2VGetWordVector( "heart" ); my $vectorBData = $interface->W2VGetWordVector( "attack" ); # Remove Words From Data $vectorAData = W2VRemoveWordFromWordVectorString( $vectorAData ); $vectorBData = W2VRemoveWordFromWordVectorString( $vectorBData ); undef( @tempAry ); print "Cosine similarity between words: \"heart\" and \"attack\": " . $interface->W2VComputeCosineSimilarityOfWordVectors( $vectorAData, $vectorBData ) . "\n"; undef( $interface );
Computes cosine similarity between two words using trained word2vec vector data based on user input. Note: No compound word support. Warning: Requires vector data to be in memory prior to method execution.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" ); $interface->W2VCosSimWIthUserInputTest(); undef( $interface );
Computes cosine similarity between two words or compound words using trained word2vec vector data based on user input. Note: Supports multiple words concatenated by ':'. Warning: Requires vector data to be in memory prior to method execution.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" ); $interface->W2VMultiWordCosSimWithUserInput(); undef( $interface );
Computes cosine similarity average of all found words given an array reference parameter of plain text words. Returns average values (string) or undefined. Warning: Requires vector data to be in memory prior to method execution.
$arrayReference -> Array reference of words
$string -> String of word2vec word average values
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" ); my @wordAry = qw( of the and ); my $data = $interface->W2VComputeAverageOfWords( \@wordAry ); print( "Computed Average Of Words: $data" ) if defined( $data ); undef( $interface );
Adds two word vectors and returns the result. Warning: This method also requires vector data to be in memory prior to method execution.
$string -> Word to add $string -> Word to add
$string -> String of word2vec summed word values
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" ); my $data = $interface->W2VAddTwoWords( "heart", "attack" ); print( "Computed Sum Of Words: $data" ) if defined( $data ); undef( $interface );
Subtracts two word vectors and returns the result. Warning: This method also requires vector data to be in memory prior to method execution.
$string -> Word to subtract $string -> Word to subtract
$string -> String of word2vec difference between word values
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" ); my $data = $interface->W2VSubtractTwoWords( "king", "man" ); print( "Computed Difference Of Words: $data" ) if defined( $data ); undef( $interface );
Adds two vector data strings and returns the result. Warning: Text word must be removed from vector data prior to calling this method. This method also requires vector data to be in memory prior to method execution.
$string -> Word2vec word vector data (with string word removed) $string -> Word2vec word vector data (with string word removed)
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" ); my $wordAData = $interface->W2VGetWordVector( "of" ); my $wordBData = $interface->W2VGetWordVector( "the" ); # Removing Words From Vector Data $wordAData = W2VRemoveWordFromWordVectorString( $wordAData ); $wordBData = W2VRemoveWordFromWordVectorString( $wordBData ); my $data = $interface->W2VAddTwoWordVectors( $wordAData, $wordBData ); print( "Computed Sum Of Words: $data" ) if defined( $data ); undef( $interface );
Subtracts two vector data strings and returns the result. Warning: Text word must be removed from vector data prior to calling this method. This method also requires vector data to be in memory prior to method execution.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" ); my $wordAData = $interface->W2VGetWordVector( "of" ); my $wordBData = $interface->W2VGetWordVector( "the" ); # Removing Words From Vector Data $wordAData = W2VRemoveWordFromWordVectorString( $wordAData ); $wordBData = W2VRemoveWordFromWordVectorString( $wordBData ); my $data = $interface->W2VSubtractTwoWordVectors( $wordAData, $wordBData ); print( "Computed Difference Of Words: $data" ) if defined( $data ); undef( $interface );
Computes the average of two vector data strings and returns the result. Warning: Text word must be removed from vector data prior to calling this method. This method also requires vector data to be in memory prior to method execution.
$string -> String of word2vec average between word values
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" ); my $wordAData = $interface->W2VGetWordVector( "of" ); my $wordBData = $interface->W2VGetWordVector( "the" ); # Removing Words From Vector Data $wordAData = W2VRemoveWordFromWordVectorString( $wordAData ); $wordBData = W2VRemoveWordFromWordVectorString( $wordBData ); my $data = $interface->W2VAverageOfTwoWordVectors( $wordAData, $wordBData ); print( "Computed Average Of Words: $data" ) if defined( $data ); undef( $interface );
Searches dictionary in memory for the specified string argument and returns the vector data. Returns undefined if not found. Warning: Requires vector data to be in memory prior to method execution.
$string -> Word to locate in word2vec vocabulary/dictionary
$string -> Found word2vec word + word vector data or undefined.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" ); my $wordData = $interface->W2VGetWordVector( "of" ); print( "Word2vec Word Data: $wordData\n" ) if defined( $wordData ); undef( $interface );
Checks to see if vector data has been loaded in memory.
$value -> '1' = True / '0' = False
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $result = $interface->W2VIsVectorDataInMemory(); print( "No vector data in memory\n" ) if $result == 0; print( "Yes vector data in memory\n" ) if $result == 1; $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" ); print( "No vector data in memory\n" ) if $result == 0; print( "Yes vector data in memory\n" ) if $result == 1; undef( $interface );
Checks to see if vector data consists of word or CUI terms.
$string -> 'cui', 'word' or undef
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" ); my $isWordOrCUIData = $interface->W2VIsWordOrCUIVectorData(); print( "Vector Data Consists Of \"$isWordOrCUIData\" Terms\n" ) if defined( $isWordOrCUIData ); print( "Cannot Determine Type Of Terms\n" ) if !defined( $isWordOrCUIData ); undef( $interface );
Checks to see if vector data header is signed as sorted in memory.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" ); my $result = $interface->IsVectorDataSorted(); print( "No vector data is not sorted\n" ) if $result == 0; print( "Yes vector data is sorted\n" ) if $result == 1; undef( $interface );
Checks specified file to see if vector data is in binary or plain text format. Returns 'text' for plain text and 'binary' for binary data.
$string -> File path
$string -> File Type ( "text" = Plain text file / "binary" = Binary data file )
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $fileType = $interface->W2VCheckWord2VecDataFileType( "samples/samplevectors.bin" ); print( "FileType: $fileType\n" ) if defined( $fileType ); undef( $fileType );
Reads trained vector data from file path in memory or searches for vector data from file. This function supports and automatically detects word2vec binary, plain text and sparse vector data formats. Note: If search word is undefined, the entire vector file is loaded in memory. If a search word is defined only the vector data is returned or undef.
$string -> Word2vec trained vector data file path $searchWord -> Searches trained vector data file for specific word vector
# Loading data in memory use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $result = $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" ); print( "Success Loading Data\n" ) if $result == 0; print( "Un-successful, Data Not Loaded\n" ) if $result == -1; undef( $interface ); # or # Searching vector data file for a specific word vector use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $result = $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin", "medical" ); print( "Found Vector Data In File\n" ) if $result != -1; print( "Vector Data Not Found\n" ) if $result == -1; undef( $interface );
Saves trained vector data at the location in specified format. Note: Leaving 'saveFormat' undefined will automatically save as plain text format.
$string -> Save Path $saveFormat -> Integer ( '0' = Save as plain text / '1' = Save data in word2vec binary format / '2' = Sparse vector data Ffrmat ) Note: Leaving $saveFormat as undefined will save the file in plain text format. Warning: If the vector data is stored as a binary search tree, this method will error out gracefully.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" ); $interface->W2VSaveTrainedVectorDataToFile( "samples/newvectors.bin" ); undef( $interface );
Compares two strings to check for equality, ignoring case-sensitivity. Note: This method is not case-sensitive. ie. "string" equals "StRiNg"
$string -> String to compare $string -> String to compare
$value -> '1' = Strings are equal / '0' = Strings are not equal
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $result = $interface->W2VStringsAreEqual( "hello world", "HeLlO wOrLd" ); print( "Strings are equal!\n" )if $result == 1; print( "Strings are not equal!\n" ) if $result == 0; undef( $interface );
Given a vector data string as input, it removed the vector word from its data returning only data.
$string -> Vector word & data string.
$string -> Vector data string.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $str = "cookie 1 0.234 9 0.0002 13 0.234 17 -0.0023 19 1.0000"; my $vectorData = $interface->W2VRemoveWordFromWordVectorString( $str ); print( "Success!\n" ) if length( vectorData ) < length( $str ); undef( $interface );
Converts sparse vector string to a dense vector format data array.
$arrayReference -> Reference to array of vector data.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $str = "cookie 1 0.234 9 0.0002 13 0.234 17 -0.0023 19 1.0000"; my @vectorData = @{ $interface->W2VConvertRawSparseTextToVectorDataAry( $str ) }; print( "Data conversion successful!\n" ) if @vectorData > 0; print( "Data conversion un-successful!\n" ) if @vectorData == 0; undef( $interface );
Converts sparse vector string to a dense vector format data hash.
$hashReference -> Reference to hash of vector data.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $str = "cookie 1 0.234 9 0.0002 13 0.234 17 -0.0023 19 1.0000"; my %vectorData = %{ $interface->W2VConvertRawSparseTextToVectorDataHash( $str ) }; print( "Data conversion successful!\n" ) if ( keys %vectorData ) > 0; print( "Data conversion un-successful!\n" ) if ( keys %vectorData ) == 0; undef( $interface );
Returns the _debugLog member variable set during Word2vec::Word2vec object initialization of new function.
$value -> '0' = False, '1' = True
use Word2vec::Interface; my $interface = Word2vec::Interface->new() my $debugLog = $interface->W2VGetDebugLog(); print( "Debug Logging Enabled\n" ) if $debugLog == 1; print( "Debug Logging Disabled\n" ) if $debugLog == 0; undef( $interface );
Returns the _writeLog member variable set during Word2vec::Word2vec object initialization of new function.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $writeLog = $interface->W2VGetWriteLog(); print( "Write Logging Enabled\n" ) if $writeLog == 1; print( "Write Logging Disabled\n" ) if $writeLog == 0; undef( $interface );
Returns the _fileHandle member variable set during Word2vec::Word2vec object instantiation of new function. Warning: This is a private function. File handle is used by WriteLog() method. Do not manipulate this file handle as errors can result.
$fileHandle -> Returns file handle for WriteLog() method or undefined.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $fileHandle = $interface->W2VGetFileHandle(); undef( $interface );
Returns the _trainFilePath member variable set during Word2vec::Word2vec object instantiation of new function.
$string -> Returns word2vec training text corpus file path.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $filePath = $interface->W2VGetTrainFilePath(); print( "Training File Path: $filePath\n" ); undef( $interface );
Returns the _outputFilePath member variable set during Word2vec::Word2vec object instantiation of new function.
$string -> Returns post word2vec training output file path.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $filePath = $interface->W2VGetOutputFilePath(); print( "File Path: $filePath\n" ); undef( $interface );
Returns the _wordVecSize member variable set during Word2vec::Word2vec object instantiation of new function.
$value -> Returns (integer) size of word2vec word vectors. Default value = 100
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $value = $interface->W2VGetWordVecSize(); print( "Word Vector Size: $value\n" ); undef( $interface );
Returns the _windowSize member variable set during Word2vec::Word2vec object instantiation of new function.
$value -> Returns (integer) word2vec window size. Default value = 5
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $value = $interface->W2VGetWindowSize(); print( "Window Size: $value\n" ); undef( $interface );
Returns the _sample member variable set during Word2vec::Word2vec object instantiation of new function.
$value -> Returns (integer) word2vec sample size. Default value = 0.001
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $value = $interface->W2VGetSample(); print( "Sample: $value\n" ); undef( $interface );
Returns the _hSoftMax member variable set during Word2vec::Word2vec object instantiation of new function.
$value -> Returns (integer) word2vec HSoftMax value. Default = 0
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $value = $interface->W2VGetHSoftMax(); print( "HSoftMax: $value\n" ); undef( $interface );
Returns the _negative member variable set during Word2vec::Word2vec object instantiation of new function.
$value -> Returns (integer) word2vec negative value. Default = 5
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $value = $interface->W2VGetNegative(); print( "Negative: $value\n" ); undef( $interface );
Returns the _numOfThreads member variable set during Word2vec::Word2vec object instantiation of new function.
$value -> Returns (integer) word2vec number of threads to use during training. Default = 12
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $value = $interface->W2VGetNumOfThreads(); print( "Number of threads: $value\n" ); undef( $interface );
Returns the _iterations member variable set during Word2vec::Word2vec object instantiation of new function.
$value -> Returns (integer) word2vec number of word2vec iterations. Default = 5
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $value = $interface->W2VGetNumOfIterations(); print( "Number of iterations: $value\n" ); undef( $interface );
Returns the _minCount member variable set during Word2vec::Word2vec object instantiation of new function.
$value -> Returns (integer) word2vec min-count value. Default = 5
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $value = $interface->W2VGetMinCount(); print( "Min Count: $value\n" ); undef( $interface );
Returns the _alpha member variable set during Word2vec::Word2vec object instantiation of new function.
$value -> Returns (integer) word2vec alpha value. Default = 0.05 for CBOW and 0.025 for Skip-Gram.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $value = $interface->W2VGetAlpha(); print( "Alpha: $value\n" ); undef( $interface );
Returns the _classes member variable set during Word2vec::Word2vec object instantiation of new function.
$value -> Returns (integer) word2vec classes value. Default = 0
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $value = $interface->W2VGetClasses(); print( "Classes: $value\n" ); undef( $interface );
Returns the _debug member variable set during Word2vec::Word2vec object instantiation of new function. Note: 0 = No debug output, 1 = Enable debug output, 2 = Even more debug output
$value -> Returns (integer) word2vec debug value. Default = 2
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $value = $interface->W2VGetDebugTraining(); print( "Debug: $value\n" ); undef( $interface );
Returns the _binaryOutput member variable set during Word2vec::Word2vec object instantiation of new function. Note: 1 = Save trained vector data in binary format, 2 = Save trained vector data in plain text format.
$value -> Returns (integer) word2vec binary flag. Default = 0
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $value = $interface->W2VGetBinaryOutput(); print( "Binary Output: $value\n" ); undef( $interface );
Returns the _readVocab member variable set during Word2vec::Word2vec object instantiation of new function.
$string -> Returns (string) word2vec read vocabulary file name or empty string if not set.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $str = $interface->W2VGetReadVocabFilePath(); print( "Read Vocab File Path: $str\n" ); undef( $interface );
Returns the _saveVocab member variable set during Word2vec::Word2vec object instantiation of new function.
$string -> Returns (string) word2vec save vocabulary file name or empty string if not set.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $str = $interface->W2VGetSaveVocabFilePath(); print( "Save Vocab File Path: $str\n" ); undef( $interface );
Returns the _useCBOW member variable set during Word2vec::Word2vec object instantiation of new function. Note: 0 = Skip-Gram Model, 1 = Continuous Bag Of Words Model.
$value -> Returns (integer) word2vec Continuous-Bag-Of-Words flag. Default = 1
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $value = $interface->W2VGetUseCBOW(); print( "Use CBOW?: $value\n" ); undef( $interface );
Returns the _workingDir member variable set during Word2vec::Word2vec object instantiation of new function.
$value -> Returns (string) working directory path or current directory if not specified.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $str = $interface->W2VGetWorkingDir(); print( "Working Directory: $str\n" ); undef( $interface );
Returns the _word2VecExeDir member variable set during Word2vec::Word2vec object instantiation of new function.
$value -> Returns (string) word2vec executable directory path or empty string if not specified.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $str = $interface->W2VGetWord2VecExeDir(); print( "Word2Vec Executable File Directory: $str\n" ); undef( $interface );
Returns the _hashRefOfWordVectors member variable set during Word2vec::Word2vec object instantiation of new function.
$value -> Returns hash reference of vocabulary/dictionary words. (Word2vec trained data in memory)
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my @vocabulary = $interface->W2VGetVocabularyHash(); undef( $interface );
Returns the _overwriteOldFile member variable set during Word2vec::Word2vec object instantiation of new function.
$value -> Returns 1 = True or 0 = False.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $value = $interface->W2VGetOverwriteOldFile(); print( "Overwrite Exiting File?: $value\n" ); undef( $interface );
Sets member variable to string parameter. Sets training file path.
$string -> Text corpus training file path
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetTrainFilePath( "samples/textcorpus.txt" ); undef( $interface );
Sets member variable to string parameter. Sets output file path.
$string -> Post word2vec training save file path
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetOutputFilePath( "samples/tempvectors.bin" ); undef( $interface );
Sets member variable to integer parameter. Sets word2vec word vector size.
$value -> Word2vec word vector size
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetWordVecSize( 100 ); undef( $interface );
Sets member variable to integer parameter. Sets word2vec window size.
$value -> Word2vec window size
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetWindowSize( 8 ); undef( $interface );
Sets member variable to integer parameter. Sets word2vec sample size.
$value -> Word2vec sample size
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetSample( 3 ); undef( $interface );
Sets member variable to integer parameter. Sets word2vec HSoftMax value.
$value -> Word2vec HSoftMax size
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetHSoftMax( 12 ); undef( $interface );
Sets member variable to integer parameter. Sets word2vec negative value.
$value -> Word2vec negative value
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetNegative( 12 ); undef( $interface );
Sets member variable to integer parameter. Sets word2vec number of training threads to specified value.
$value -> Word2vec number of threads value
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetNumOfThreads( 12 ); undef( $interface );
Sets member variable to integer parameter. Sets word2vec iterations value.
$value -> Word2vec number of iterations value
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetNumOfIterations( 12 ); undef( $interface );
Sets member variable to integer parameter. Sets word2vec min-count value.
$value -> Word2vec min-count value
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetMinCount( 7 ); undef( $interface );
Sets member variable to float parameter. Sets word2vec alpha value.
$value -> Word2vec alpha value. (Float)
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->SetAlpha( 0.0012 ); undef( $interface );
Sets member variable to integer parameter. Sets word2vec classes value.
$value -> Word2vec classes value.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetClasses( 0 ); undef( $interface );
Sets member variable to integer parameter. Sets word2vec debug parameter value.
$value -> Word2vec debug training value.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetDebugTraining( 0 ); undef( $interface );
Sets member variable to integer parameter. Sets word2vec binary parameter value.
$value -> Word2vec binary output mode value. ( '1' = Binary Output / '0' = Plain Text )
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetBinaryOutput( 1 ); undef( $interface );
Sets member variable to string parameter. Sets word2vec save vocabulary file name.
$string -> Word2vec save vocabulary file name and path.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetSaveVocabFilePath( "samples/vocab.txt" ); undef( $interface );
Sets member variable to string parameter. Sets word2vec read vocabulary file name.
$string -> Word2vec read vocabulary file name and path.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetReadVocabFilePath( "samples/vocab.txt" ); undef( $interface );
Sets member variable to integer parameter. Sets word2vec CBOW parameter value.
$value -> Word2vec CBOW mode value.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetUseCBOW( 1 ); undef( $interface );
Sets member variable to string parameter. Sets working directory.
$string -> Working directory
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetWorkingDir( "/samples" ); undef( $interface );
Sets member variable to string parameter. Sets word2vec executable file directory.
$string -> Word2vec directory
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetWord2VecExeDir( "/word2vec" ); undef( $interface );
Sets vocabulary/dictionary hash reference to hash reference parameter. Warning: This will overwrite any existing vocabulary/dictionary data in memory.
$hashReference -> Vocabulary/Dictionary hash reference of word2vec word vectors.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VReadTrainedVectorDataFromFile( "samples/samplevectors.bin" ); my $vocabularyHasReference = $interface->W2VGetVocabularyHash(); $interface->W2VSetVocabularyHash( $vocabularyHasReference ); undef( $interface );
Clears vocabulary/dictionary hash.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VClearVocabularyHash(); undef( $interface );
Adds word vector string to vocabulary/dictionary.
$string -> Word2vec word vector string
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); # Note: This is representational data of word2vec's word vector format and not actual data. $interface->W2VAddWordVectorToVocabHash( "of 0.4346 -0.1235 0.5789 0.2347 -0.0056 -0.0001" ); undef( $interface );
Sets member variable to integer parameter. Enables overwriting output file if one already exists.
$value -> '1' = Overwrite exiting file / '0' = Graceful termination when file with same name exists
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2VSetOverwriteOldFile( 1 ); undef( $interface );
Executes word2phrase training based on parameters. Parameter variables have higher precedence than member variables. Any parameter specified will override its respective member variable. Note: If no parameters are specified, this module executes word2phrase training based on preset member variables. Returns string regarding training status.
$trainFilePath -> Training text corpus file path $outputFilePath -> Vector binary file path $minCount -> Minimum bi-gram frequency (Positive Integer) $threshold -> Maximum bi-gram frequency (Positive Integer) $debug -> Displays word2phrase debug information during training. (0 = None, 1 = Show Debug Information, 2 = Show Even More Debug Information) $overwrite -> Overwrites old training file when executing training. (0 = False / 1 = True)
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2PSetMinCount( 12 ); $interface->W2PSetMaxCount( 20 ); $interface->W2PSetTrainFilePath( "textCorpus.txt" ); $interface->W2PSetOutputFilePath( "phraseTextCorpus.txt" ); $interface->W2PExecuteTraining(); undef( $interface ); # Or use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2PExecuteTraining( "textCorpus.txt", "phraseTextCorpus.txt", 12, 20, 2, 1 ); undef( $interface );
$trainingString -> String to train $outputFilePath -> Vector binary file path $minCount -> Minimum bi-gram frequency (Positive Integer) $threshold -> Maximum bi-gram frequency (Positive Integer) $debug -> Displays word2phrase debug information during training. (0 = None, 1 = Show Debug Information, 2 = Show Even More Debug Information) $overwrite -> Overwrites old training file when executing training. (0 = False / 1 = True)
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2PSetMinCount( 12 ); $interface->W2PSetMaxCount( 20 ); $interface->W2PSetTrainFilePath( "large string to train here" ); $interface->W2PSetOutputFilePath( "phraseTextCorpus.txt" ); $interface->W2PExecuteTraining(); undef( $interface ); # Or use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2PExecuteTraining( "large string to train here", "phraseTextCorpus.txt", 12, 20, 2, 1 ); undef( $interface );
Returns the _debugLog member variable set during Word2vec::Interface object initialization of new function.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $debugLog = $interface->W2PGetDebugLog(); print( "Debug Logging Enabled\n" ) if $debugLog == 1; print( "Debug Logging Disabled\n" ) if $debugLog == 0; undef( $interface );
Returns the _writeLog member variable set during Word2vec::Interface object initialization of new function.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $writeLog = $interface->W2PGetWriteLog(); print( "Write Logging Enabled\n" ) if $writeLog == 1; print( "Write Logging Disabled\n" ) if $writeLog == 0; undef( $interface );
Returns file handle used by word2phrase::WriteLog() method.
<This should not be called.>
Returns (string) training file path.
$string -> word2phrase training file path
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $filePath = $interface->W2PGetTrainFilePath(); print( "Output File Path: $filePath\n" ) if defined( $filePath ); undef( $interface );
Returns (string) output file path.
$string -> word2phrase output file path
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $filePath = $interface->W2PGetOutputFilePath(); print( "Output File Path: $filePath\n" ) if defined( $filePath ); undef( $interface );
Returns (integer) minimum bi-gram range.
$value -> Minimum bi-gram frequency (Positive Integer)
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $mincount = $interface->W2PGetMinCount(); print( "MinCount: $mincount\n" ) if defined( $mincount ); undef( $interface );
Returns (integer) maximum bi-gram range.
$value -> Maximum bi-gram frequency (Positive Integer)
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $mincount = $interface->W2PGetThreshold(); print( "MinCount: $mincount\n" ) if defined( $mincount ); undef( $interface );
Returns word2phrase debug parameter value.
$value -> 0 = No debugging, 1 = Show debugging, 2 = Show even more debugging
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $interfacedebug = $interface->W2PGetW2PDebug(); print( "Word2Phrase Debug Level: $interfacedebug\n" ) if defined( $interfacedebug ); undef( $interface );
Returns (string) working directory path.
$string -> Current working directory path
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $workingDir = $interface->W2PGetWorkingDir(); print( "Working Directory: $workingDir\n" ) if defined( $workingDir ); undef( $interface );
Returns (string) word2phrase executable directory path.
$string -> Word2Phrase executable directory path
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $workingDir = $interface->W2PGetWord2PhraseExeDir(); print( "Word2Phrase Executable Directory: $workingDir\n" ) if defined( $workingDir ); undef( $interface );
Returns the current value of the overwrite training file variable.
$value -> 1 = True/Overwrite or 0 = False/Append to current file
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $overwrite = $interface->W2PGetOverwriteOldFile(); if defined( $overwrite ) { print( "Overwrite Old File: " ); print( "Yes\n" ) if $overwrite == 1; print( "No\n" ) if $overwrite == 0; } undef( $interface );
Sets training file path.
$string -> Training file path
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2PSetTrainFilePath( "filePath" ); undef( $interface );
Sets word2phrase output file path.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->W2PSetOutputFilePath( "filePath" ); undef( $interface );
Sets minimum range value.
$value -> Minimum frequency value (Positive integer)
use Word2vec::Interface: my $interface = Word2vec::Interface->new(); $interface->W2PSetMinCount( 1 ); undef( $interface );
Sets maximum range value.
$value -> Maximum frequency value (Positive integer)
use Word2vec::Interface: my $interface = Word2vec::Interface->new(); $interface->W2PSetThreshold( 100 ); undef( $interface );
Sets word2phrase debug parameter.
$value -> word2phrase debug parameter (0 = No debug info, 1 = Show debug info, 2 = Show more debug info.)
use Word2vec::Interface: my $interface = Word2vec::Interface->new(); $interface->W2PSetW2PDebug( 2 ); undef( $interface );
Sets working directory path.
$string -> Current working directory path.
use Word2vec::Interface: my $interface = Word2vec::Interface->new(); $interface->W2PSetWorkingDir( "filePath" ); undef( $interface );
Sets word2phrase executable file directory path.
$string -> Word2Phrase executable directory path.
use Word2vec::Interface: my $interface = Word2vec::Interface->new(); $interface->W2PSetWord2PhraseExeDir( "filePath" ); undef( $interface );
Enables overwriting word2phrase output file if one already exists with the same output file name.
$value -> Integer: 1 = Overwrite old file, 0 = No not overwrite old file.
use Word2vec::Interface: my $interface = Word2vec::Interface->new(); $interface->W2PSetOverwriteOldFile( 1 ); undef( $interface );
Parses specified parameter Medline XML file or directory of files, creating a text corpus. Returns 0 if successful or -1 during an error. Note: Supports plain Medline XML or gun-zipped XML files.
$filePath -> XML file path to parse. (This can be a single file or directory of XML/XML.gz files).
$value -> '0' = Successful / '-1' = Un-Successful
use Word2vec::Interface; $interface = Word2vec::Interface->new(); # Note: Specifying no parameters implies default settings $interface->XTWSetSavePath( "testCorpus.txt" ); $interface->XTWSetStoreTitle( 1 ); $interface->XTWSetStoreAbstract( 1 ); $interface->XTWSetBeginDate( "01/01/2004" ); $interface->XTWSetEndDate( "08/13/2016" ); $interface->XTWSetOverwriteExistingFile( 1 ); $interface->XTWConvertMedlineXMLToW2V( "/xmlDirectory/" ); undef( $interface );
Creates a binary search tree using compound word data in memory and stores root node. This also clears the compound word array afterwards. Warning: Compound word file must be loaded into memory using XTWReadCompoundWordDataFromFile() prior to calling this method. This function will also delete the compound word array upon completion as it will no longer be necessary.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWReadCompoundWordDataFromFile( "samples/compoundword.txt" ); $interface->CreateCompoundWordBST();
Compoundifies string parameter based on compound word data in memory using the compound word binary search tree. Warning: Compound word file must be loaded into memory using XTWReadCompoundWordDataFromFile() prior to calling this method.
$string -> String to compoundify
$string -> Compounded string or "(null)" if string parameter is not defined.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWReadCompoundWordDataFromFile( "samples/compoundword.txt" ); $interface->CreateCompoundWordBST(); my $compoundedString = $interface->CompoundifyString( "String to compoundify" ); print( "Compounded String: $compoundedString\n" ); undef( $interface );
Reads compound word file and stores in memory. $autoSetMaxCompWordLength parameter is not required to be set. This parameter instructs the method to auto set the maximum compound word length dependent on the longest compound word found. Note: $autoSetMaxCompWordLength options: defined = True and Undefined = False.
$filePath -> Compound word file path $autoSetMaxCompWordLength -> Maximum length of a given compoundified phrase the module's compoundify algorithm will permit. Note: Calling this method with $autoSetMaxCompWordLength defined will automatically set the maxCompoundWordLength variable to the longest compound phrase.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWReadCompoundWordDataFromFile( "samples/compoundword.txt", 1 ); undef( $interface );
Saves compound word data in memory to a specified file location.
$savePath -> Path to save compound word list to file.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWReadCompoundWordDataFromFile( "samples/compoundword.txt" ); $interface->XTWSaveCompoundWordDataFromFile( "samples/newcompoundword.txt" ); undef( $interface );
Reads a plain text file with utf8 encoding in memory. Returns string data if successful and "(null)" if unsuccessful.
$filePath -> Text file to read into memory
$string -> String data if successful or "(null)" if un-successful.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $textData = $interface->XTWReadTextFromFile( "samples/textcorpus.txt" ); print( "Text Data: $textData\n" ); undef( $interface );
Saves a plain text file with utf8 encoding in a specified location.
$savePath -> Path to save string data. $string -> String to save
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $result = $interface->XTWSaveTextToFile( "text.txt", "Hello world!" ); print( "File saved\n" ) if $result == 0; print( "File unable to save\n" ) if $result == -1; undef( $interface );
Reads an XML file from a specified location. Returns string in memory if successful and "(null)" if unsuccessful.
$filePath -> File to read given path
Warning: This is a private function and is called by XML::Twig parsing functions. It should not be called outside of xmltow2v module.
Saves text corpus data to specified file path. This method will append to any existing file if $appendToFile parameter is defined or "overwrite" option is disabled. Enabling "overwrite" option will overwrite any existing files.
$savePath -> Path to save the text corpus $appendToFile -> Specifies whether the module will overwrite any existing data or append to existing text corpus data. Note: Leaving this variable undefined will fetch the "Overwrite" member variable and set the value to this parameter.
Checks to see if $date is within $beginDate and $endDate range. Returns 1 if true and 0 if false. Note: Date Format: XX/XX/XXXX (Month/Day/Year)
$date -> Date to check against minimum and maximum data range. (String) $beginDate -> Minimum date range (String) $endDate -> Maximum date range (String)
$value -> '1' = True/Date is within specified range Or '0' = False/Date is not within specified range.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); print( "Is \"01/01/2004\" within the date range: \"02/21/1985\" to \"08/13/2016\"?\n" ); print( "Yes\n" ) if $interface->XTWIsDateInSpecifiedRange( "01/01/2004", "02/21/1985", "08/13/2016" ) == 1; print( "No\n" ) if $interface->XTWIsDateInSpecifiedRange( "01/01/2004", "02/21/1985", "08/13/2016" ) == 0; undef( $interface );
Checks to see if specified path is a file or directory.
$path -> File or directory path. (String)
$string -> Returns: "file" = file, "dir" = directory and "unknown" if the path is not a file or directory (undefined).
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $path = "path/to/a/directory"; print( "Is \"$path\" a file or directory? " . $interface->XTWIsFileOrDirectory( $path ) . "\n" ); $path = "path/to/a/file.file"; print( "Is \"$path\" a file or directory? " . $interface->XTWIsFileOrDirectory( $path ) . "\n" ); undef( $interface );
Removes special characters from string parameter, removes extra spaces and converts text to lowercase. Note: This method is called when parsing and compiling Medline title/abstract data.
$string -> String passed to remove special characters from and convert to lowercase.
$string -> String with all special characters removed and converted to lowercase.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $str = "Heart Attack is$ an!@ also KNOWN as an Acute MYOCARDIAL inFARCTion!"; print( "Original String: $str\n" ); $str = $interface->XTWRemoveSpecialCharactersFromString( $str ); print( "Modified String: $str\n" ); undef( $interface );
Returns file data type (string).
$filePath -> File to check located at file path
$string -> File type
use Word2vec::Interface; my $interface = Word2vec::Interface->new() my $fileType = $interface->XTWGetFileType( "samples/textcorpus.txt" ); undef( $interface );
Checks specified begin and end date strings for formatting and logic errors.
$value -> "0" = Passed Checks / "-1" = Failed Checks
use Word2vec::Interface; my $interface = Word2vec::Interface->new() print "Passed Date Checks\n" if ( $interface->_DateCheck() == 0 ); print "Failed Date Checks\n" if ( $interface->_DateCheck() == -1 ); undef( $interface );
use Word2vec::Interface; my $interface = Word2vec::Interface->new() my $debugLog = $interface->XTWGetDebugLog(); print( "Debug Logging Enabled\n" ) if $debugLog == 1; print( "Debug Logging Disabled\n" ) if $debugLog == 0; undef( $interface );
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $writeLog = $interface->XTWGetWriteLog(); print( "Write Logging Enabled\n" ) if $writeLog == 1; print( "Write Logging Disabled\n" ) if $writeLog == 0; undef( $interface );
Returns the _storeTitle member variable set during Word2vec::Interface object instantiation of new function.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $storeTitle = $interface->XTWGetStoreTitle(); print( "Store Title Option: Enabled\n" ) if $storeTitle == 1; print( "Store Title Option: Disabled\n" ) if $storeTitle == 0; undef( $interface );
Returns the _storeAbstract member variable set during Word2vec::Interface object instantiation of new function.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $storeAbstract = $interface->XTWGetStoreAbstract(); print( "Store Abstract Option: Enabled\n" ) if $storeAbsract == 1; print( "Store Abstract Option: Disabled\n" ) if $storeAbstract == 0; undef( $interface );
Returns the _quickParse member variable set during Word2vec::Interface object instantiation of new function.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $quickParse = $interface->XTWGetQuickParse(); print( "Quick Parse Option: Enabled\n" ) if $quickParse == 1; print( "Quick Parse Option: Disabled\n" ) if $quickParse == 0; undef( $interface );
Returns the _compoundifyText member variable set during Word2vec::Interface object instantiation of new function.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $compoundify = $interface->XTWGetCompoundifyText(); print( "Compoundify Text Option: Enabled\n" ) if $compoundify == 1; print( "Compoundify Text Option: Disabled\n" ) if $compoundify == 0; undef( $interface );
Returns the _storeAsSentencePerLine member variable set during Word2vec::Xmltow2v object instantiation of new function.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $storeAsSentencePerLine = $interface->GetStoreAsSentencePerLine(); print( "Store As Sentence Per Line: Enabled\n" ) if $storeAsSentencePerLine == 1; print( "Store As Sentence Per Line: Disabled\n" ) if $storeAsSentencePerLine == 0; undef( $interface );
Returns the _numOfThreads member variable set during Word2vec::Interface object instantiation of new function.
$value -> Number of threads
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $numOfThreads = $interface->XTWGetNumOfThreads(); print( "Number of threads: $numOfThreads\n" ); undef( $interface );
Returns the _workingDir member variable set during Word2vec::Interface object instantiation of new function.
$string -> Working directory string
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $workingDirectory = $interface->XTWGetWorkingDir(); print( "Working Directory: $workingDirectory\n" ); undef( $interface );
Returns the _saveDir member variable set during Word2vec::Interface object instantiation of new function.
$string -> Save directory string
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $savePath = $interface->XTWGetSavePath(); print( "Save Directory: $savePath\n" ); undef( $interface );
Returns the _beginDate member variable set during Word2vec::Interface object instantiation of new function.
$date -> Beginning date range - Format: XX/XX/XXXX (Mon/Day/Year)
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $date = $interface->XTWGetBeginDate(); print( "Date: $date\n" ); undef( $interface );
Returns the _endDate member variable set during Word2vec::Interface object instantiation of new function.
$date -> End date range - Format: XX/XX/XXXX (Mon/Day/Year).
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $date = $interface->XTWGetEndDate(); print( "Date: $date\n" ); undef( $interface );
Returns the XML data (string) to be parsed.
Returns the _xmlStringToParse member variable set during Word2vec::Interface object instantiation of new function.
$string -> Medline XML data string
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $xmlStr = $interface->XTWGetXMLStringToParse(); print( "XML String: $xmlStr\n" ); undef( $interface );
Returns the _textCorpusStr member variable set during Word2vec::Interface object instantiation of new function.
$string -> Text corpus string
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $str = $interface->XTWGetTextCorpusStr(); print( "Text Corpus: $str\n" ); undef( $interface );
Returns the _fileHandle member variable set during Word2vec::Interface object instantiation of new function. Warning: This is a private function. File handle is used by 'xmltow2v::WriteLog()' method. Do not manipulate this file handle as errors can result.
$fileHandle -> Returns file handle for WriteLog() method.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $fileHandle = $interface->XTWGetFileHandle(); undef( $interface );
Returns XML::Twig handler.
Returns the _twigHandler member variable set during Word2vec::Interface object instantiation of new function. Warning: This is a private function and should not be called or manipulated.
$twigHandler -> XML::Twig handler.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $xmlHandler = $interface->XTWGetTwigHandler(); undef( $interface );
Returns the _parsedCount member variable set during Word2vec::Interface object instantiation of new function.
$value -> Number of parsed Medline articles.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $numOfParsed = $interface->XTWGetParsedCount(); print( "Number of parsed Medline articles: $numOfParsed\n" ); undef( $interface );
Returns the _tempStr member variable set during Word2vec::Interface object instantiation of new function. Warning: This is a private function and should not be called or manipulated. Used by module as a temporary storage location for parsed Medline 'Title' and 'Abstract' flag string data.
$string -> Temporary string storage location.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $tempStr = $interface->XTWGetTempStr(); print( "Temp String: $tempStr\n" ); undef( $interface );
Returns the _tempDate member variable set during Word2vec::Interface object instantiation of new function. Used by module as a temporary storage location for parsed Medline 'DateCreated' flag string data.
$date -> Date string - Format: XX/XX/XXXX (Mon/Day/Year).
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $date = $interface->XTWGetTempDate(); print( "Temp Date: $date\n" ); undef( $interface );
Returns the _compoundWordAry member array reference set during Word2vec::Interface object instantiation of new function. Warning: Compound word data must be loaded in memory first via XTWReadCompoundWordDataFromFile().
$arrayReference -> Compound word array reference.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $arrayReference = $interface->XTWGetCompoundWordAry(); my @compoundWord = @{ $arrayReference }; print( "Compound Word Array: @compoundWord\n" ); undef( $interface );
Returns the _compoundWordBST member variable set during Word2vec::Interface object instantiation of new function.
$bst -> Compound word binary search tree.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $bst = $interface->XTWGetCompoundWordBST(); undef( $interface );
Returns the _maxCompoundWordLength member variable set during Word2vec::Interface object instantiation of new function. Note: If not defined, it is automatically set to and returns 20.
$value -> Maximum number of compound words in a given phrase.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $compoundWordLength = $interface->XTWGetMaxCompoundWordLength(); print( "Maximum Compound Word Length: $compoundWordLength\n" ); undef( $interface );
Returns the _overwriteExisitingFile member variable set during Word2vec::Interface object instantiation of new function. Enables overwriting of existing text corpus if set to '1' or appends to the existing text corpus if set to '0'.
$value -> '1' = Overwrite existing file / '0' = Append to exiting file.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); my $overwriteExitingFile = $interface->XTWGetOverwriteExistingFile(); print( "Overwrite Existing File? YES\n" ) if ( $overwriteExistingFile == 1 ); print( "Overwrite Existing File? NO\n" ) if ( $overwriteExistingFile == 0 ); undef( $interface );
Sets member variable to passed integer parameter. Instructs module to store article title if true or omit if false.
$value -> '1' = Store Titles / '0' = Omit Titles
Ouput:
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWSetStoreTitle( 1 ); undef( $interface );
Sets member variable to passed integer parameter. Instructs module to store article abstracts if true or omit if false.
$value -> '1' = Store Abstracts / '0' = Omit Abstracts
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWSetStoreAbstract( 1 ); undef( $interface );
Sets member variable to passed string parameter. Represents the working directory.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWSetWorkingDir( "/samples/" ); undef( $interface );
Sets member variable to passed integer parameter. Represents the text corpus save path.
$string -> Text corpus save path
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWSetSavePath( "samples/textcorpus.txt" ); undef( $interface );
Sets member variable to passed integer parameter. Instructs module to utilize quick parse routines to speed up text corpus compilation. This method is somewhat less accurate due to its non-exhaustive nature.
$value -> '1' = Enable Quick Parse / '0' = Disable Quick Parse
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWSetQuickParse( 1 ); undef( $interface );
Sets member variable to passed integer parameter. Instructs module to utilize 'compoundify' option if true. Warning: This requires compound word data to be loaded into memory with XTWReadCompoundWordDataFromFile() method prior to executing text corpus compilation.
$value -> '1' = Compoundify text / '0' = Do not compoundify text
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWSetCompoundifyText( 1 ); undef( $interface );
Sets member variable to passed integer parameter. Instructs module to utilize 'storeAsSentencePerLine' option if true.
$value -> '1' = Store as sentence per line / '0' = Do not store as sentence per line
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWSetStoreAsSentencePerLine( 1 ); undef( $interface );
Sets member variable to passed integer parameter. Sets the requested number of threads to parse Medline XML files and compile the text corpus.
$value -> Integer (Positive value)
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWSetNumOfThreads( 4 ); undef( $interface );
Sets member variable to passed string parameter. Sets beginning date range for earliest articles to store, by 'DateCreated' Medline tag, within the text corpus during compilation. Note: Expected format - "XX/XX/XXXX" (Mon/Day/Year)
$string -> Date string - Format: "XX/XX/XXXX"
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWSetBeginDate( "01/01/2004" ); undef( $interface );
Sets member variable to passed string parameter. Sets ending date range for latest article to store, by 'DateCreated' Medline tag, within the text corpus during compilation. Note: Expected format - "XX/XX/XXXX" (Mon/Day/Year)
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWSetEndDate( "08/13/2016" ); undef( $interface );
Sets member variable to passed string parameter. This string normally consists of Medline XML data to be parsed for text corpus compilation. Warning: This is a private function and should not be called or manipulated.
$string -> String
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWSetXMLStringToParse( "Hello World!" ); undef( $interface );
Sets member variable to passed string parameter. Overwrites any stored text corpus data in memory to the string parameter. Warning: This is a private function and should not be called or manipulated.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWSetTextCorpusStr( "Hello World!" ); undef( $interface );
Sets member variable to passed string parameter. Appends string parameter to text corpus string in memory. Warning: This is a private function and should not be called or manipulated.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWAppendStrToTextCorpus( "Hello World!" ); undef( $interface );
Clears text corpus data in memory. Warning: This is a private function and should not be called or manipulated.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWClearTextCorpus(); undef( $interface );
Sets member variable to passed string parameter. Sets temporary member string to passed string parameter. (Temporary placeholder for Medline Title and Abstract data). Note: This removes special characters and converts all characters to lowercase. Warning: This is a private function and should not be called or manipulated.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWSetTempStr( "Hello World!" ); undef( $interface );
Appends string parameter to temporary member string in memory. Note: This removes special characters and converts all characters to lowercase. Warning: This is a private function and should not be called or manipulated.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWAppendToTempStr( "Hello World!" ); undef( $interface );
Clears the temporary string storage in memory. Warning: This is a private function and should not be called or manipulated.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWClearTempStr(); undef( $interface );
Sets member variable to passed string parameter. Sets temporary date string to passed string. Note: Date Format - "XX/XX/XXXX" (Mon/Day/Year) Warning: This is a private function and should not be called or manipulated.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWSetTempDate( "08/13/2016" ); undef( $interface );
Clears the temporary date storage location in memory. Warning: This is a private function and should not be called or manipulated.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWClearTempDate(); undef( $interface );
Sets member variable to de-referenced passed array reference parameter. Stores compound word array by de-referencing array reference parameter. Note: Clears previous data if existing. Warning: This is a private function and should not be called or manipulated.
$arrayReference -> Array reference of compound words
use Word2vec::Interface; my @compoundWordAry = ( "big dog", "respiratory failure", "seven large masses" ); my $interface = Word2vec::Interface->new(); $interface->XTWSetCompoundWordAry( \@compoundWordAry ); undef( $interface );
Clears compound word array in memory. Warning: This is a private function and should not be called or manipulated.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWClearCompoundWordAry(); undef( $interface );
Sets member variable to passed Word2vec::Bst parameter. Sets compound word binary search tree to passed binary tree parameter. Note: Un-defines previous binary tree if existing. Warning: This is a private function and should not be called or manipulated.
Word2vec::Bst -> Binary Search Tree
use Word2vec::Interface; my @compoundWordAry = ( "big dog", "respiratory failure", "seven large masses" ); @compoundWordAry = sort( @compoundWordAry ); my $arySize = @compoundWordAry; my $bst = Word2vec::Bst; $bst->CreateTree( \@compoundWordAry, 0, $arySize, undef ); my $interface = Word2vec::Interface->new(); $interface->XTWSetCompoundWordBST( $bst ); undef( $interface );
Clears/Un-defines existing compound word binary search tree from memory. Warning: This is a private function and should not be called or manipulated.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWClearCompoundWordBST(); undef( $interface );
Sets member variable to passed integer parameter. Sets maximum number of compound words in a phrase for comparison. ie. "medical campus of Virginia Commonwealth University" can be interpreted as a compound word of 6 words. Setting this variable to 3 will only attempt compoundifying a maximum amount of three words. The result would be "medical_campus_of Virginia commonwealth university" even-though an exact representation of this compounded string can exist. Setting this variable to 6 will result in compounding all six words if they exists in the compound word array/bst. Warning: This is a private function and should not be called or manipulated.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWSetMaxCompoundWordLength( 8 ); undef( $interface );
Sets member variable to passed integer parameter. Sets option to overwrite existing text corpus during compilation if 1 or append to existing text corpus if 0.
$value -> '1' = Overwrite existing text corpus / '0' = Append to existing text corpus during compilation.
use Word2vec::Interface; my $interface = Word2vec::Interface->new(); $interface->XTWSetOverWriteExistingFile( 1 ); undef( $xmltow2v );
Clint Cuffy, Virginia Commonwealth University
Copyright (c) 2016
Bridget T McInnes, Virginia Commonwealth University btmcinnes at vcu dot edu Clint Cuffy, Virginia Commonwealth University cuffyca at vcu dot edu
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to:
The Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
To install Word2vec::Interface, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Word2vec::Interface
CPAN shell
perl -MCPAN -e shell install Word2vec::Interface
For more information on module installation, please visit the detailed CPAN module installation guide.