liblevenshtein 4.0.0
A library for generating Finite State Transducers based on Levenshtein Automata.
Loading...
Searching...
No Matches
liblevenshtein Namespace Reference

Various utilities regarding Levenshtein transducers. More...

Namespaces

namespace  demo
 
namespace  distance
 Memoized, recursive distance metrics typically used to evaluate the correctness of Levenshtein automata.
 

Classes

class  Dawg
 A Directed Acyclic Word Graph (DAWG) maps sequences of characters to form words; the collection of words is known as a dictionary. More...
 
class  DawgIterator
 Iterates over all the terms in a DAWG dictionary. More...
 
class  DawgNode
 Represents a position within one or more terms of a DAWG dictionary. More...
 
class  Intersection
 Represents an Intersection between a dictionary automaton and Levenshtein automaton, guided by the query term. More...
 
class  LazyIterator
 Lazily traverses the intersection between the dictionary automaton and Levenshtein automaton, guided by the query term, and yields each spelling candidate as it is matched. More...
 
class  LazyQuery
 Lazily traverses the intersection between the dictionary automaton and Levenshtein automaton, guided by the query term, and yields each spelling candidate as it is matched. More...
 
class  Position
 Represents a location within the Levenshtein automaton. More...
 
class  Prefix
 Represents the prefix of a dictionary term. More...
 
class  SortedDawg
 A specific type of Dawg that is constructed over lexicographically sorted terms. More...
 
class  State
 Consists of a closure of Position nodes within the Levenshtein automaton. More...
 
class  StateIterator
 Iterates over the Position nodes in the linked-list of a Levenshtein State. More...
 
class  StateTransition
 Transitions Levenshtein States given a characteristic vector. More...
 
class  Transducer
 Constructs a Levenshtein Transducer that, when given a dictionary automaton, query term, and maximum edit distance, n, is able to match all spelling candidates in the dictionary automaton for the query term whose edit distances are no larger than n. More...
 
class  Transition
 Represents an edge from one DawgNode to another, annotated with a character label from the current position of the dictionary term. More...
 
class  UnsubsumeFn
 Removes (unsubsumes) all Positions from a State that are subsumed by another Position within the same State. More...
 

Typedefs

using TransitionFn = std::function<State *(State *, std::vector<bool> &)>
 Returns the Levenshtein State mapped-to by another State and characteristic vector.
 
using DistanceFn = std::function<std::size_t(State *, std::size_t)>
 Infers the Levenshtein distance from a final Levenshtein State and the length of the query term.
 
using Comparator = std::function<int(Position *, Position *)>
 
using PositionTransitionFn
 Transitions Position nodes.
 
using CompareFn = std::function<int(Position *, Position *)>
 Compares Position nodes.
 
using MergeFn = std::function<void(State *, std::vector<Position *>)>
 Merges Position nodes into a State.
 
using Candidate = std::pair<std::string, std::size_t>
 Spelling candidate that includes both the dictionary term and its edit distance from the query term.
 
using SubsumesFn = std::function<bool(Position *, Position *, std::size_t)>
 Determines whether one Position subsumes another given the maximum edit distance to consider.
 

Enumerations

enum class  Algorithm { STANDARD , TRANSPOSITION , MERGE_AND_SPLIT }
 Enumerates the available Levenshtein distance algorithms. More...
 

Functions

auto operator<< (std::ostream &out, const Dawg &dawg) -> std::ostream &
 
auto operator<< (std::ostream &out, const DawgNode &node) -> std::ostream &
 
auto operator<< (std::ostream &out, const Prefix &prefix) -> std::ostream &
 
template<class IterType >
auto sorted_dawg (IterType iter, IterType end) -> SortedDawg *
 Factory function that initializes a new SortedDawg using the terms yielded from an iterator.
 
template auto sorted_dawg (std::list< std::string >::iterator iter, std::list< std::string >::iterator end) -> SortedDawg *
 
template auto sorted_dawg (std::vector< std::string >::iterator iter, std::vector< std::string >::iterator end) -> SortedDawg *
 
template auto sorted_dawg (std::set< std::string >::iterator iter, std::set< std::string >::iterator end) -> SortedDawg *
 
auto serialize_protobuf (Dawg *dawg, const fs::path &path) -> bool
 Serializes the given Dawg to protobuf at the given path.
 
auto serialize_protobuf (Dawg *dawg, const std::string &path) -> bool
 Serializes the given Dawg to protobuf at the given path.
 
auto serialize_protobuf (Dawg *dawg, const char *path) -> bool
 Serializes the given Dawg to protobuf at the given path.
 
auto serialize_protobuf (Dawg *dawg, std::ostream &output) -> bool
 Serializes the given Dawg to protobuf into the given output stream.
 
auto deserialize_protobuf (const fs::path &path) -> Dawg *
 Deserializes the protobuf containing a Dawg at the given path or returns nullptr if none exists.
 
auto deserialize_protobuf (const std::string &path) -> Dawg *
 Deserializes the protobuf containing a Dawg at the given path or returns nullptr if none exists.
 
auto deserialize_protobuf (const char *path) -> Dawg *
 Deserializes the protobuf containing a Dawg at the given path or returns nullptr if none exists.
 
auto deserialize_protobuf (std::istream &input) -> Dawg *
 Deserializes the protobuf containing a Dawg from the given input stream or returns nullptr if none exists.
 
void collect_nodes (DawgNode *source, std::set< uint64_t > &node_ids, std::set< uint64_t > &final_node_ids)
 Collects the DawgNode IDs and final DawgNode IDs of all nodes reachable from the source.
 
void collect_edges (DawgNode *source, std::map< std::pair< uint64_t, char >, uint64_t > &edges)
 Collects the transitions from each source to its destination, and the respective character labels.
 
auto to_protobuf (Dawg *dawg) -> llp::Dictionary *
 Serializes a Dawg to its protobuf equivalent.
 
auto from_protobuf (const llp::Dictionary &dict_proto) -> Dawg *
 Deserializes a Dawg from its protobuf equivalent.
 
template<>
auto compare< Algorithm::STANDARD > (Position *lhs, Position *rhs) -> int
 Compares two Positions for the standard Levenshtein transducer.
 
template<>
auto compare< Algorithm::TRANSPOSITION > (Position *lhs, Position *rhs) -> int
 Compares two Positions for the standard Levenshtein transducer extended with transposition.
 
template<>
auto compare< Algorithm::MERGE_AND_SPLIT > (Position *lhs, Position *rhs) -> int
 Compares two Positions for the standard Levenshtein transducer extended with merge and split.
 
template<Algorithm Type>
auto compare (Position *lhs, Position *rhs) -> int
 Compares two Positions within the Levenshtein transducer.
 
template<>
auto compare< Algorithm::STANDARD > (Position *lhs, Position *rhs) -> int
 Compares two Positions for the standard Levenshtein transducer.
 
template<>
auto compare< Algorithm::TRANSPOSITION > (Position *lhs, Position *rhs) -> int
 Compares two Positions for the standard Levenshtein transducer extended with transposition.
 
template<>
auto compare< Algorithm::MERGE_AND_SPLIT > (Position *lhs, Position *rhs) -> int
 Compares two Positions for the standard Levenshtein transducer extended with merge and split.
 
template<>
auto distance< Algorithm::STANDARD > (State *state, std::size_t query_length) -> std::size_t
 Infers the Levenshtein distance from the given State and query length for the standard Levenshtein distance, which includes the elementary operations of insertion, deletion, and substitution.
 
template<>
auto distance< Algorithm::TRANSPOSITION > (State *state, std::size_t query_length) -> std::size_t
 Infers the Levenshtein distance from the given State and query length for the standard Levenshtein distance extended with the elementary operation of transposition.
 
template<>
auto distance< Algorithm::MERGE_AND_SPLIT > (State *state, std::size_t query_length) -> std::size_t
 Infers the Levenshtein distance from the given State and query length for the standard Levenshtein distance extended with the elementary operations of merge and split.
 
template<Algorithm Type>
auto distance (State *state, std::size_t query_length) -> std::size_t
 Infers the Levenshtein distance from the given State and query length.
 
template<>
auto distance< Algorithm::STANDARD > (State *state, std::size_t query_length) -> std::size_t
 Infers the Levenshtein distance from the given State and query length for the standard Levenshtein distance, which includes the elementary operations of insertion, deletion, and substitution.
 
template<>
auto distance< Algorithm::TRANSPOSITION > (State *state, std::size_t query_length) -> std::size_t
 Infers the Levenshtein distance from the given State and query length for the standard Levenshtein distance extended with the elementary operation of transposition.
 
template<>
auto distance< Algorithm::MERGE_AND_SPLIT > (State *state, std::size_t query_length) -> std::size_t
 Infers the Levenshtein distance from the given State and query length for the standard Levenshtein distance extended with the elementary operations of merge and split.
 
auto operator<< (std::ostream &out, const Intersection &intersection) -> std::ostream &
 
void insert_after (State *state, Position *curr, Position *next)
 Inserts one Position after another.
 
template<>
void merge< Algorithm::STANDARD > (State *state, const std::vector< Position * > &positions)
 
template<>
void merge< Algorithm::TRANSPOSITION > (State *state, const std::vector< Position * > &positions)
 
template<>
void merge< Algorithm::MERGE_AND_SPLIT > (State *state, const std::vector< Position * > &positions)
 
template<Algorithm Type>
void merge (State *state, const std::vector< Position * > &positions)
 Merges a list of Positions into the Levenshtein state.
 
auto index_of (std::vector< bool > &characteristic_vector, std::size_t k, std::size_t i) -> std::size_t
 Returns the first index of the desired character in the characteristic vector, beginning at the offset \(i\).
 
template<>
auto position_transition< Algorithm::STANDARD > (std::size_t n, Position *position, std::vector< bool > &characteristic_vector, std::size_t offset) -> std::vector< Position * >
 Returns the closure over possible next Positions that are reachable from the current Position by advancing the location in the dictionary by one.
 
template<>
auto position_transition< Algorithm::TRANSPOSITION > (std::size_t n, Position *position, std::vector< bool > &characteristic_vector, std::size_t offset) -> std::vector< Position * >
 Returns the closure over possible next Positions that are reachable from the current Position by advancing the location in the dictionary by one.
 
template<>
auto position_transition< Algorithm::MERGE_AND_SPLIT > (std::size_t n, Position *position, std::vector< bool > &characteristic_vector, std::size_t offset) -> std::vector< Position * >
 Returns the closure over possible next Positions that are reachable from the current Position by advancing the location in the dictionary by one.
 
template<Algorithm Type>
auto position_transition (std::size_t n, Position *position, std::vector< bool > &characteristic_vector, std::size_t offset) -> std::vector< Position * >
 Returns the closure over possible next Positions that are reachable from the current Position by advancing the location in the dictionary by one.
 
template<>
auto position_transition< Algorithm::STANDARD > (std::size_t n, Position *position, std::vector< bool > &characteristic_vector, std::size_t offset) -> std::vector< Position * >
 Returns the closure over possible next Positions that are reachable from the current Position by advancing the location in the dictionary by one.
 
template<>
auto position_transition< Algorithm::TRANSPOSITION > (std::size_t n, Position *position, std::vector< bool > &characteristic_vector, std::size_t offset) -> std::vector< Position * >
 Returns the closure over possible next Positions that are reachable from the current Position by advancing the location in the dictionary by one.
 
template<>
auto position_transition< Algorithm::MERGE_AND_SPLIT > (std::size_t n, Position *position, std::vector< bool > &characteristic_vector, std::size_t offset) -> std::vector< Position * >
 Returns the closure over possible next Positions that are reachable from the current Position by advancing the location in the dictionary by one.
 
template<>
auto subsumes< Algorithm::STANDARD > (Position *lhs, Position *rhs, std::size_t n) -> bool
 Returns whether lhs subsumes rhs according to the subsumption rules of standard Levenshtein distance, which consists of the elementary operations of insertion, deletion, and substitution.
 
template<>
auto subsumes< Algorithm::TRANSPOSITION > (Position *lhs, Position *rhs, std::size_t n) -> bool
 Returns whether lhs subsumes rhs according to the subsumption rules of standard Levenshtein distance extended with the elementary operation of transposition.
 
template<>
auto subsumes< Algorithm::MERGE_AND_SPLIT > (Position *lhs, Position *rhs, std::size_t n) -> bool
 Returns whether lhs subsumes rhs according to the subsumption rules of standard Levenshtein distance extended with the elementary operations of merge and split.
 
template<Algorithm Type>
auto subsumes (Position *lhs, Position *rhs, std::size_t n) -> bool
 Returns whether Position lhs subsumes Position rhs, given the maximum edit distance n.
 
template<>
auto subsumes< Algorithm::STANDARD > (Position *lhs, Position *rhs, std::size_t n) -> bool
 Returns whether lhs subsumes rhs according to the subsumption rules of standard Levenshtein distance, which consists of the elementary operations of insertion, deletion, and substitution.
 
template<>
auto subsumes< Algorithm::TRANSPOSITION > (Position *lhs, Position *rhs, std::size_t n) -> bool
 Returns whether lhs subsumes rhs according to the subsumption rules of standard Levenshtein distance extended with the elementary operation of transposition.
 
template<>
auto subsumes< Algorithm::MERGE_AND_SPLIT > (Position *lhs, Position *rhs, std::size_t n) -> bool
 Returns whether lhs subsumes rhs according to the subsumption rules of standard Levenshtein distance extended with the elementary operations of merge and split.
 

Detailed Description

Various utilities regarding Levenshtein transducers.

The classes of interest, to most, will be the various dictionaries, transducer implementations, and serializers. The remaining classes are likely of little interest to anyone but maintainers and the curiously minded.

Typedef Documentation

◆ Candidate

using liblevenshtein::Candidate = std::pair<std::string, std::size_t>

Spelling candidate that includes both the dictionary term and its edit distance from the query term.

Definition at line 17 of file transducer.h.

◆ Comparator

Definition at line 12 of file state.h.

◆ CompareFn

Compares Position nodes.

Definition at line 20 of file state_transition.h.

◆ DistanceFn

using liblevenshtein::DistanceFn = std::function<std::size_t(State *, std::size_t)>

Infers the Levenshtein distance from a final Levenshtein State and the length of the query term.

Definition at line 23 of file lazy_query.h.

◆ MergeFn

using liblevenshtein::MergeFn = std::function<void(State *, std::vector<Position *>)>

Merges Position nodes into a State.

Definition at line 23 of file state_transition.h.

◆ PositionTransitionFn

Initial value:
std::function<std::vector<Position *>(std::size_t, Position *,
std::vector<bool>&, std::size_t)>

Transitions Position nodes.

Definition at line 15 of file state_transition.h.

◆ SubsumesFn

using liblevenshtein::SubsumesFn = std::function<bool(Position *, Position *, std::size_t)>

Determines whether one Position subsumes another given the maximum edit distance to consider.

Definition at line 14 of file unsubsume.h.

◆ TransitionFn

using liblevenshtein::TransitionFn = std::function<State *(State *, std::vector<bool> &)>

Returns the Levenshtein State mapped-to by another State and characteristic vector.

Definition at line 19 of file lazy_query.h.

Enumeration Type Documentation

◆ Algorithm

Enumerates the available Levenshtein distance algorithms.

Enumerator
STANDARD 

Standard Levenshtein distance, including the elementary operations of insertion, deletion, and substitution.

TRANSPOSITION 

Standard Levenshtein distance extended with the elementary operation of transposition.

MERGE_AND_SPLIT 

Standard Levenshtein distance extended with the elementary operations of merge and split.

Definition at line 9 of file algorithm.h.

9 {
10
14
18
22};
@ MERGE_AND_SPLIT
Standard Levenshtein distance extended with the elementary operations of merge and split.
@ STANDARD
Standard Levenshtein distance, including the elementary operations of insertion, deletion,...
@ TRANSPOSITION
Standard Levenshtein distance extended with the elementary operation of transposition.

Function Documentation

◆ collect_edges()

void liblevenshtein::collect_edges ( DawgNode * source,
std::map< std::pair< uint64_t, char >, uint64_t > & edges )

Collects the transitions from each source to its destination, and the respective character labels.

Parameters
sourceInitial node for the transition.
edgesTransition mapping of source DawgNode IDs and labels to their respective target DawgNode IDs.

Definition at line 82 of file serializer.cpp.

83 {
84 auto source_id = reinterpret_cast<uint64_t>(source);
85 source->for_each_edge([&](char label, DawgNode *target) {
86 std::pair<uint64_t, char> key = std::make_pair(source_id, label);
87 if (!edges.contains(key)) {
88 auto target_id = reinterpret_cast<uint64_t>(target);
90 collect_edges(target, edges);
91 }
92 });
93}
Represents a position within one or more terms of a DAWG dictionary.
Definition dawg_node.h:20
void for_each_edge(const std::function< void(char, DawgNode *)> &fn) const
Iterates over each outgoing edge of this node and invokes a callback function with each edge's charac...
Definition dawg_node.cpp:27
void query(ll::Dawg *dawg, const std::string &query_term, std::size_t max_distance)
Definition main.cpp:25
void collect_edges(DawgNode *source, std::map< std::pair< uint64_t, char >, uint64_t > &edges)
Collects the transitions from each source to its destination, and the respective character labels.

References collect_edges(), liblevenshtein::DawgNode::for_each_edge(), and query().

Referenced by collect_edges(), and to_protobuf().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ collect_nodes()

void liblevenshtein::collect_nodes ( DawgNode * source,
std::set< uint64_t > & node_ids,
std::set< uint64_t > & final_node_ids )

Collects the DawgNode IDs and final DawgNode IDs of all nodes reachable from the source.

Parameters
sourceDawgNode whose ID is to be collected.
node_idsDawgNode IDs reachable from the source.
final_node_idsFinal DawgNode IDs reachable from the source.

Definition at line 68 of file serializer.cpp.

69 {
70 auto source_id = reinterpret_cast<uint64_t>(source);
71 if (!node_ids.contains(source_id)) {
72 node_ids.insert(source_id);
73 if (source->is_final()) {
75 }
76 source->for_each_edge([&](char label, DawgNode *target) {
77 collect_nodes(target, node_ids, final_node_ids);
78 });
79 }
80}
void is_final(bool is_final)
Specifies whether this node represents a word boundary, or immediately follows an edge having the fin...
Definition dawg_node.cpp:19

References collect_nodes(), liblevenshtein::DawgNode::for_each_edge(), liblevenshtein::DawgNode::is_final(), and query().

Referenced by collect_nodes(), and to_protobuf().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ compare()

template<Algorithm Type>
auto liblevenshtein::compare ( Position * lhs,
Position * rhs ) -> int

Compares two Positions within the Levenshtein transducer.

Parameters
lhsFirst Position to compare.
rhsSecond Position to compare.
Returns
A negative value if \(lhs < rhs\), a positive value if \(lhs > rhs\), or zero if \(lhs = rhs\).

Referenced by liblevenshtein::State::merge(), liblevenshtein::State::merge_sort(), liblevenshtein::StateTransition::operator()(), and liblevenshtein::State::sort().

Here is the caller graph for this function:

◆ compare< Algorithm::MERGE_AND_SPLIT >() [1/2]

Compares two Positions for the standard Levenshtein transducer extended with merge and split.

Parameters
lhsFirst Position to compare.
rhsSecond Position to compare.
Returns
A negative value if \(lhs < rhs\), a positive value if \(lhs > rhs\), or zero if \(lhs = rhs\).

Definition at line 53 of file comparator.cpp.

53 {
54 if (lhs->term_index() < rhs->term_index()) {
55 return -1;
56 }
57
58 if (lhs->term_index() > rhs->term_index()) {
59 return 1;
60 }
61
62 if (lhs->num_errors() < rhs->num_errors()) {
63 return -1;
64 }
65
66 if (lhs->num_errors() > rhs->num_errors()) {
67 return 1;
68 }
69
70 return static_cast<int>(lhs->is_special()) -
71 static_cast<int>(rhs->is_special());
72}

References query().

Here is the call graph for this function:

◆ compare< Algorithm::MERGE_AND_SPLIT >() [2/2]

Compares two Positions for the standard Levenshtein transducer extended with merge and split.

Parameters
lhsFirst Position to compare.
rhsSecond Position to compare.
Returns
A negative value if \(lhs < rhs\), a positive value if \(lhs > rhs\), or zero if \(lhs = rhs\).

Definition at line 53 of file comparator.cpp.

53 {
54 if (lhs->term_index() < rhs->term_index()) {
55 return -1;
56 }
57
58 if (lhs->term_index() > rhs->term_index()) {
59 return 1;
60 }
61
62 if (lhs->num_errors() < rhs->num_errors()) {
63 return -1;
64 }
65
66 if (lhs->num_errors() > rhs->num_errors()) {
67 return 1;
68 }
69
70 return static_cast<int>(lhs->is_special()) -
71 static_cast<int>(rhs->is_special());
72}

References query().

Here is the call graph for this function:

◆ compare< Algorithm::STANDARD >() [1/2]

Compares two Positions for the standard Levenshtein transducer.

Parameters
lhsFirst Position to compare.
rhsSecond Position to compare.
Returns
A negative value if \(lhs < rhs\), a positive value if \(lhs > rhs\), or zero if \(lhs = rhs\).

Definition at line 10 of file comparator.cpp.

10 {
11 if (lhs->term_index() < rhs->term_index()) {
12 return -1;
13 }
14
15 if (lhs->term_index() > rhs->term_index()) {
16 return 1;
17 }
18
19 if (lhs->num_errors() < rhs->num_errors()) {
20 return -1;
21 }
22
23 if (lhs->num_errors() > rhs->num_errors()) {
24 return 1;
25 }
26
27 return 0;
28}

References query().

Here is the call graph for this function:

◆ compare< Algorithm::STANDARD >() [2/2]

Compares two Positions for the standard Levenshtein transducer.

Parameters
lhsFirst Position to compare.
rhsSecond Position to compare.
Returns
A negative value if \(lhs < rhs\), a positive value if \(lhs > rhs\), or zero if \(lhs = rhs\).

Definition at line 10 of file comparator.cpp.

10 {
11 if (lhs->term_index() < rhs->term_index()) {
12 return -1;
13 }
14
15 if (lhs->term_index() > rhs->term_index()) {
16 return 1;
17 }
18
19 if (lhs->num_errors() < rhs->num_errors()) {
20 return -1;
21 }
22
23 if (lhs->num_errors() > rhs->num_errors()) {
24 return 1;
25 }
26
27 return 0;
28}

References query().

Here is the call graph for this function:

◆ compare< Algorithm::TRANSPOSITION >() [1/2]

Compares two Positions for the standard Levenshtein transducer extended with transposition.

Parameters
lhsFirst Position to compare.
rhsSecond Position to compare.
Returns
A negative value if \(lhs < rhs\), a positive value if \(lhs > rhs\), or zero if \(lhs = rhs\).

Definition at line 31 of file comparator.cpp.

31 {
32 if (lhs->term_index() < rhs->term_index()) {
33 return -1;
34 }
35
36 if (lhs->term_index() > rhs->term_index()) {
37 return 1;
38 }
39
40 if (lhs->num_errors() < rhs->num_errors()) {
41 return -1;
42 }
43
44 if (lhs->num_errors() > rhs->num_errors()) {
45 return 1;
46 }
47
48 return static_cast<int>(lhs->is_special()) -
49 static_cast<int>(rhs->is_special());
50}

References query().

Here is the call graph for this function:

◆ compare< Algorithm::TRANSPOSITION >() [2/2]

Compares two Positions for the standard Levenshtein transducer extended with transposition.

Parameters
lhsFirst Position to compare.
rhsSecond Position to compare.
Returns
A negative value if \(lhs < rhs\), a positive value if \(lhs > rhs\), or zero if \(lhs = rhs\).

Definition at line 31 of file comparator.cpp.

31 {
32 if (lhs->term_index() < rhs->term_index()) {
33 return -1;
34 }
35
36 if (lhs->term_index() > rhs->term_index()) {
37 return 1;
38 }
39
40 if (lhs->num_errors() < rhs->num_errors()) {
41 return -1;
42 }
43
44 if (lhs->num_errors() > rhs->num_errors()) {
45 return 1;
46 }
47
48 return static_cast<int>(lhs->is_special()) -
49 static_cast<int>(rhs->is_special());
50}

References query().

Here is the call graph for this function:

◆ deserialize_protobuf() [1/4]

auto liblevenshtein::deserialize_protobuf ( const char * path) -> Dawg *

Deserializes the protobuf containing a Dawg at the given path or returns nullptr if none exists.

Parameters
pathPath to the serialization file.
Returns
A pointer to the deserialized Dawg or nullptr if none exists.

Definition at line 52 of file serializer.cpp.

52 {
53 std::fstream input(path, std::fstream::in | std::fstream::binary);
55 input.close();
56 return dawg;
57}
A Directed Acyclic Word Graph (DAWG) maps sequences of characters to form words; the collection of wo...
Definition dawg.h:23
auto deserialize_protobuf(const fs::path &path) -> Dawg *
Deserializes the protobuf containing a Dawg at the given path or returns nullptr if none exists.

References deserialize_protobuf(), and query().

Here is the call graph for this function:

◆ deserialize_protobuf() [2/4]

auto liblevenshtein::deserialize_protobuf ( const fs::path & path) -> Dawg *

Deserializes the protobuf containing a Dawg at the given path or returns nullptr if none exists.

Parameters
pathPath to the serialization file.
Returns
The deserialized Dawg or nullptr if none exists.

Definition at line 44 of file serializer.cpp.

44 {
45 return deserialize_protobuf(path.generic_string());
46}

References deserialize_protobuf(), and query().

Referenced by deserialize_protobuf(), deserialize_protobuf(), deserialize_protobuf(), and main().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ deserialize_protobuf() [3/4]

auto liblevenshtein::deserialize_protobuf ( const std::string & path) -> Dawg *

Deserializes the protobuf containing a Dawg at the given path or returns nullptr if none exists.

Parameters
pathPath to the serialization file.
Returns
The deserialized Dawg or nullptr if none exists.

Definition at line 48 of file serializer.cpp.

48 {
49 return deserialize_protobuf(path.c_str());
50}

References deserialize_protobuf(), and query().

Here is the call graph for this function:

◆ deserialize_protobuf() [4/4]

auto liblevenshtein::deserialize_protobuf ( std::istream & input) -> Dawg *

Deserializes the protobuf containing a Dawg from the given input stream or returns nullptr if none exists.

Parameters
pathinput Input stream containing the serialized Dawg.
Returns
A pointer to the deserialized Dawg or nullptr if none exists.

Definition at line 59 of file serializer.cpp.

59 {
60 Dawg *dawg = nullptr;
61 llp::Dictionary dict_proto;
62 if (dict_proto.ParseFromIstream(&input)) {
64 }
65 return dawg;
66}
auto from_protobuf(const llp::Dictionary &dict_proto) -> Dawg *
Deserializes a Dawg from its protobuf equivalent.

References from_protobuf(), and query().

Here is the call graph for this function:

◆ distance()

template<Algorithm Type>
auto liblevenshtein::distance ( State * state,
std::size_t query_length ) -> std::size_t

Infers the Levenshtein distance from the given State and query length.

Parameters
stateLevenshtein transducer final State from which we may infer the respective distance.
query_lengthLength of the query term which is necessary for inferring the distance from the given State.
Returns
The Levenshtein distance from the given State and query length.

References distance().

Referenced by liblevenshtein::LazyQuery< Result >::advance(), liblevenshtein::distance::MergeAndSplitDistance::between(), liblevenshtein::distance::StandardDistance::between(), liblevenshtein::distance::TranspositionDistance::between(), distance(), distance< Algorithm::MERGE_AND_SPLIT >(), distance< Algorithm::STANDARD >(), distance< Algorithm::TRANSPOSITION >(), liblevenshtein::distance::MemoizedDistance::get(), liblevenshtein::distance::MemoizedDistance::set(), and liblevenshtein::LazyQuery< Result >::update_candidate().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ distance< Algorithm::MERGE_AND_SPLIT >() [1/2]

template<>
auto liblevenshtein::distance< Algorithm::MERGE_AND_SPLIT > ( State * state,
std::size_t query_length ) -> std::size_t

Infers the Levenshtein distance from the given State and query length for the standard Levenshtein distance extended with the elementary operations of merge and split.

Parameters
stateLevenshtein transducer final State from which we may infer the respective distance.
query_lengthLength of the query term which is necessary for inferring the distance from the given State.
Returns
The Levenshtein distance from the given State and query length.

Definition at line 42 of file distance.cpp.

44 {
45 std::size_t min_distance = SIZE_MAX;
46 for (Position *position : *state) {
47 if (!position->is_special()) {
48 std::size_t i = position->term_index();
49 std::size_t e = position->num_errors();
50 std::size_t distance = query_length - i + e;
51 if (distance < min_distance) {
52 min_distance = distance;
53 }
54 }
55 }
56 return min_distance;
57}
Represents a location within the Levenshtein automaton.
Definition position.h:11
auto distance(State *state, std::size_t query_length) -> std::size_t
Infers the Levenshtein distance from the given State and query length.

References distance(), and query().

Here is the call graph for this function:

◆ distance< Algorithm::MERGE_AND_SPLIT >() [2/2]

template<>
auto liblevenshtein::distance< Algorithm::MERGE_AND_SPLIT > ( State * state,
std::size_t query_length ) -> std::size_t

Infers the Levenshtein distance from the given State and query length for the standard Levenshtein distance extended with the elementary operations of merge and split.

Parameters
stateLevenshtein transducer final State from which we may infer the respective distance.
query_lengthLength of the query term which is necessary for inferring the distance from the given State.
Returns
The Levenshtein distance from the given State and query length.

Definition at line 42 of file distance.cpp.

44 {
45 std::size_t min_distance = SIZE_MAX;
46 for (Position *position : *state) {
47 if (!position->is_special()) {
48 std::size_t i = position->term_index();
49 std::size_t e = position->num_errors();
50 std::size_t distance = query_length - i + e;
51 if (distance < min_distance) {
52 min_distance = distance;
53 }
54 }
55 }
56 return min_distance;
57}

References distance(), and query().

Here is the call graph for this function:

◆ distance< Algorithm::STANDARD >() [1/2]

template<>
auto liblevenshtein::distance< Algorithm::STANDARD > ( State * state,
std::size_t query_length ) -> std::size_t

Infers the Levenshtein distance from the given State and query length for the standard Levenshtein distance, which includes the elementary operations of insertion, deletion, and substitution.

Parameters
stateLevenshtein transducer final State from which we may infer the respective distance.
query_lengthLength of the query term which is necessary for inferring the distance from the given State.
Returns
The Levenshtein distance from the given State and query length.

Definition at line 10 of file distance.cpp.

11 {
12 std::size_t min_distance = SIZE_MAX;
13 for (Position *position : *state) {
14 std::size_t i = position->term_index();
15 std::size_t e = position->num_errors();
16 std::size_t distance = query_length - i + e;
17 if (distance < min_distance) {
18 min_distance = distance;
19 }
20 }
21 return min_distance;
22}

References distance(), and query().

Here is the call graph for this function:

◆ distance< Algorithm::STANDARD >() [2/2]

template<>
auto liblevenshtein::distance< Algorithm::STANDARD > ( State * state,
std::size_t query_length ) -> std::size_t

Infers the Levenshtein distance from the given State and query length for the standard Levenshtein distance, which includes the elementary operations of insertion, deletion, and substitution.

Parameters
stateLevenshtein transducer final State from which we may infer the respective distance.
query_lengthLength of the query term which is necessary for inferring the distance from the given State.
Returns
The Levenshtein distance from the given State and query length.

Definition at line 10 of file distance.cpp.

11 {
12 std::size_t min_distance = SIZE_MAX;
13 for (Position *position : *state) {
14 std::size_t i = position->term_index();
15 std::size_t e = position->num_errors();
16 std::size_t distance = query_length - i + e;
17 if (distance < min_distance) {
18 min_distance = distance;
19 }
20 }
21 return min_distance;
22}

References distance(), and query().

Here is the call graph for this function:

◆ distance< Algorithm::TRANSPOSITION >() [1/2]

template<>
auto liblevenshtein::distance< Algorithm::TRANSPOSITION > ( State * state,
std::size_t query_length ) -> std::size_t

Infers the Levenshtein distance from the given State and query length for the standard Levenshtein distance extended with the elementary operation of transposition.

Parameters
stateLevenshtein transducer final State from which we may infer the respective distance.
query_lengthLength of the query term which is necessary for inferring the distance from the given State.
Returns
The Levenshtein distance from the given State and query length.

Definition at line 25 of file distance.cpp.

26 {
27 std::size_t min_distance = SIZE_MAX;
28 for (Position *position : *state) {
29 if (!position->is_special()) {
30 std::size_t i = position->term_index();
31 std::size_t e = position->num_errors();
32 std::size_t distance = query_length - i + e;
33 if (distance < min_distance) {
34 min_distance = distance;
35 }
36 }
37 }
38 return min_distance;
39}

References distance(), and query().

Here is the call graph for this function:

◆ distance< Algorithm::TRANSPOSITION >() [2/2]

template<>
auto liblevenshtein::distance< Algorithm::TRANSPOSITION > ( State * state,
std::size_t query_length ) -> std::size_t

Infers the Levenshtein distance from the given State and query length for the standard Levenshtein distance extended with the elementary operation of transposition.

Parameters
stateLevenshtein transducer final State from which we may infer the respective distance.
query_lengthLength of the query term which is necessary for inferring the distance from the given State.
Returns
The Levenshtein distance from the given State and query length.

Definition at line 25 of file distance.cpp.

26 {
27 std::size_t min_distance = SIZE_MAX;
28 for (Position *position : *state) {
29 if (!position->is_special()) {
30 std::size_t i = position->term_index();
31 std::size_t e = position->num_errors();
32 std::size_t distance = query_length - i + e;
33 if (distance < min_distance) {
34 min_distance = distance;
35 }
36 }
37 }
38 return min_distance;
39}

References distance(), and query().

Here is the call graph for this function:

◆ from_protobuf()

auto liblevenshtein::from_protobuf ( const llp::Dictionary & dict_proto) -> Dawg *

Deserializes a Dawg from its protobuf equivalent.

Parameters
dict_protoProtobuf containing the serialized information necessary for reconstructing the Dawg.
Returns
A pointer to the deserialized Dawg.

Definition at line 128 of file serializer.cpp.

128 {
129 std::set<uint64_t> final_node_ids;
130 for (int index = 0; index < dict_proto.final_node_id_size(); index += 1) {
131 uint64_t final_node_id = dict_proto.final_node_id(index);
133 }
134
135 std::map<uint64_t, DawgNode *> nodes;
136 for (int index = 0; index < dict_proto.node_id_size(); index += 1) {
137 uint64_t node_id = dict_proto.node_id(index);
138 if (!nodes.contains(node_id)) {
139 bool is_final = final_node_ids.contains(node_id);
140 auto *node = new DawgNode(is_final);
141 nodes[node_id] = node;
142 }
143 }
144
145 for (int index = 0; index < dict_proto.edge_size(); index += 1) {
146 const llp::Dictionary_Edge &edge_proto = dict_proto.edge(index);
147 uint64_t source_id = edge_proto.source_id();
148 char label = (char)edge_proto.label();
149 uint64_t target_id = edge_proto.target_id();
150 DawgNode *source = nodes[source_id];
151 DawgNode *target = nodes[target_id];
152 source->add_edge(label, target);
153 }
154
155 DawgNode *root = nodes[dict_proto.root_id()];
156 auto *dawg = new SortedDawg(root, dict_proto.size());
157 return dawg;
158}

References liblevenshtein::DawgNode::add_edge(), liblevenshtein::Dawg::contains(), and query().

Referenced by deserialize_protobuf().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ index_of()

auto liblevenshtein::index_of ( std::vector< bool > & characteristic_vector,
std::size_t k,
std::size_t i ) -> std::size_t

Returns the first index of the desired character in the characteristic vector, beginning at the offset \(i\).

The characteristic vector represents a sort of moving window over the characters of the query term, and flags the locations within that window whose corresponding query term characters match the desired character from the dictionary term.

Parameters
characteristic_vectorMoving window over the which query term characters match the desired one from the dictionary term.
kUpper bound of the characteristic vector.
iBeginning index (offset) to seek the desired character.
Returns
The first index from the characteristic vector matching the desired character.

Definition at line 6 of file position_transition.cpp.

7 {
8 for (std::size_t j = 0; j < k; j += 1) {
9 if (characteristic_vector[i + j]) {
10 return j;
11 }
12 }
13 return SIZE_MAX;
14}

References query().

Referenced by position_transition< Algorithm::STANDARD >(), and position_transition< Algorithm::TRANSPOSITION >().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ insert_after()

void liblevenshtein::insert_after ( State * state,
Position * curr,
Position * next )

Inserts one Position after another.

Parameters
stateLevenshtein State whose positions are being merged.
currPosition whose follow Position is being updated.
nextPosition to insert after curr.

Definition at line 8 of file merge.cpp.

8 {
9 if (curr == nullptr) {
10 state->head(next);
11 }
12 else {
13 state->insert_after(curr, next);
14 }
15}
void insert_after(Position *curr, Position *next)
Inserts a new Position into the linked-list following an existing member.
Definition state.cpp:48
void head(Position *head)
Sets the first element in the linked-list of Position nodes.
Definition state.cpp:26

References liblevenshtein::State::head(), liblevenshtein::State::insert_after(), and query().

Referenced by merge< Algorithm::MERGE_AND_SPLIT >(), merge< Algorithm::STANDARD >(), and merge< Algorithm::TRANSPOSITION >().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ merge()

template<Algorithm Type>
void liblevenshtein::merge ( State * state,
const std::vector< Position * > & positions )

Merges a list of Positions into the Levenshtein state.

This operation greatly speeds up the matching of spelling candidates.

Parameters
stateLevenshtein State whose positions are to be merged with the others.
positionsPositions to merge into the State.

Referenced by liblevenshtein::State::merge_sort(), and liblevenshtein::StateTransition::operator()().

Here is the caller graph for this function:

◆ merge< Algorithm::MERGE_AND_SPLIT >()

template<>
void liblevenshtein::merge< Algorithm::MERGE_AND_SPLIT > ( State * state,
const std::vector< Position * > & positions )

Definition at line 108 of file merge.cpp.

109 {
110 for (Position *a : positions) {
111 std::size_t i = a->term_index();
112 std::size_t e = a->num_errors();
113 bool s = a->is_special();
114
115 StateIterator iter = state->begin();
116 StateIterator iter_end = state->end();
117 Position *p = nullptr;
118
119 while (iter != iter_end) {
120 Position *b = *iter;
121 std::size_t j = b->term_index();
122 std::size_t f = b->num_errors();
123 bool t = b->is_special();
124
125 if (e < f || e == f && (i < j || i == j && !s && t)) {
126 p = b;
127 ++iter;
128 }
129 else {
130 break;
131 }
132 }
133
134 if (iter != iter_end) {
135 Position *b = *iter;
136 ++iter;
137
138 std::size_t j = b->term_index();
139 std::size_t f = b->num_errors();
140 bool t = b->is_special();
141
142 if (j != i || f != e || t != s) {
143 insert_after(state, p, a);
144 }
145 else {
146 delete a;
147 }
148 }
149 else {
150 insert_after(state, p, a);
151 }
152 }
153}
auto term_index() const -> std::size_t
Returns the current position in the dictionary term.
Definition position.cpp:23
Iterates over the Position nodes in the linked-list of a Levenshtein State.
auto begin() -> StateIterator
Returns an iterator over the Positions in this State, beginning at the head and traversing to the tai...
Definition state.cpp:134
auto end() -> StateIterator
Returns an iterator representing the end of the linked-list of Positions.
Definition state.cpp:138
void insert_after(State *state, Position *curr, Position *next)
Inserts one Position after another.
Definition merge.cpp:8

References liblevenshtein::State::begin(), liblevenshtein::State::end(), insert_after(), query(), and liblevenshtein::Position::term_index().

Here is the call graph for this function:

◆ merge< Algorithm::STANDARD >()

template<>
void liblevenshtein::merge< Algorithm::STANDARD > ( State * state,
const std::vector< Position * > & positions )

Definition at line 18 of file merge.cpp.

19 {
20 for (Position *a : positions) {
21 std::size_t i = a->term_index();
22 std::size_t e = a->num_errors();
23
24 StateIterator iter = state->begin();
25 StateIterator iter_end = state->end();
26 Position *p = nullptr;
27
28 while (iter != iter_end) {
29 Position *b = *iter;
30 std::size_t j = b->term_index();
31 std::size_t f = b->num_errors();
32
33 if ((e < f) || (e == f) && (i < j)) {
34 p = b;
35 ++iter;
36 } else {
37 break;
38 }
39 }
40
41 if (iter != iter_end) {
42 Position *b = *iter;
43 std::size_t j = b->term_index();
44 std::size_t f = b->num_errors();
45
46 if ((j != i) || (f != e)) {
47 insert_after(state, p, a);
48 }
49 else {
50 delete a;
51 }
52 }
53 else {
54 insert_after(state, p, a);
55 }
56 }
57}

References liblevenshtein::State::begin(), liblevenshtein::State::end(), insert_after(), query(), and liblevenshtein::Position::term_index().

Here is the call graph for this function:

◆ merge< Algorithm::TRANSPOSITION >()

template<>
void liblevenshtein::merge< Algorithm::TRANSPOSITION > ( State * state,
const std::vector< Position * > & positions )

Definition at line 60 of file merge.cpp.

61 {
62 for (Position *a : positions) {
63 std::size_t i = a->term_index();
64 std::size_t e = a->num_errors();
65 bool s = a->is_special();
66
67 StateIterator iter = state->begin();
68 StateIterator iter_end = state->end();
69 Position *p = nullptr;
70
71 while (iter != iter_end) {
72 Position *b = *iter;
73 std::size_t j = b->term_index();
74 std::size_t f = b->num_errors();
75 bool t = b->is_special();
76
77 if (e < f || e == f && (i < j || i == j && !s && t)) {
78 p = b;
79 ++iter;
80 }
81 else {
82 break;
83 }
84 }
85
86 if (iter != iter_end) {
87 Position *b = *iter;
88 ++iter;
89
90 std::size_t j = b->term_index();
91 std::size_t f = b->num_errors();
92 bool t = b->is_special();
93
94 if (j != i || f != e || t != s) {
95 insert_after(state, p, a);
96 }
97 else {
98 delete a;
99 }
100 }
101 else {
102 insert_after(state, p, a);
103 }
104 }
105}

References liblevenshtein::State::begin(), liblevenshtein::State::end(), insert_after(), query(), and liblevenshtein::Position::term_index().

Here is the call graph for this function:

◆ operator<<() [1/4]

auto liblevenshtein::operator<< ( std::ostream & out,
const Dawg & dawg ) -> std::ostream &

Definition at line 67 of file dawg.cpp.

67 {
68 out << "Dawg{size=" << dawg._size << ", root=" << dawg._root << "}";
69 return out;
70}

◆ operator<<() [2/4]

auto liblevenshtein::operator<< ( std::ostream & out,
const DawgNode & node ) -> std::ostream &

Definition at line 77 of file dawg_node.cpp.

77 {
78 out << "DawgNode{is_final=" << (node.is_final() ? "true" : "false") << ", edges={";
79 int index = 0;
80 node.for_each_edge([&](char label, DawgNode *target) {
81 if (index > 0) {
82 out << ", ";
83 }
84 out << "'" << label << "':DawgNode{...}";
85 index += 1;
86 });
87 out << "}}";
88 return out;
89}

References query().

Here is the call graph for this function:

◆ operator<<() [3/4]

auto liblevenshtein::operator<< ( std::ostream & out,
const Intersection & intersection ) -> std::ostream &

Definition at line 31 of file intersection.cpp.

32 {
33 if (intersection._parent != nullptr) {
34 out << *(intersection._parent) << intersection._label;
35 }
36 return out;
37}

◆ operator<<() [4/4]

auto liblevenshtein::operator<< ( std::ostream & out,
const Prefix & prefix ) -> std::ostream &

Definition at line 40 of file prefix.cpp.

40 {
41 if (prefix._parent != nullptr) {
42 out << *(prefix._parent) << prefix._label;
43 }
44 return out;
45}

◆ position_transition()

template<Algorithm Type>
auto liblevenshtein::position_transition ( std::size_t n,
Position * position,
std::vector< bool > & characteristic_vector,
std::size_t offset ) -> std::vector< Position * >

Returns the closure over possible next Positions that are reachable from the current Position by advancing the location in the dictionary by one.

Parameters
nMaximum edit distance to consider when matching spelling candidates.
positionCurrent Position within the Levenshtein automaton.
characteristic_vectorMoving window over matching characters of the query term.
offsetWhere to begin examining the chracteristic_vector.
Returns
The closure over the next Positions reachable from the current Position by advancing the location in the dictionary by one.

◆ position_transition< Algorithm::MERGE_AND_SPLIT >() [1/2]

template<>
auto liblevenshtein::position_transition< Algorithm::MERGE_AND_SPLIT > ( std::size_t n,
Position * position,
std::vector< bool > & characteristic_vector,
std::size_t offset ) -> std::vector< Position * >

Returns the closure over possible next Positions that are reachable from the current Position by advancing the location in the dictionary by one.

This covers transitions within the standard Levenshtein distance, extended with the elementary operations of merge and split.

Parameters
nMaximum edit distance to consider when matching spelling candidates.
positionCurrent Position within the Levenshtein automaton.
characteristic_vectorMoving window over matching characters of the query term.
offsetWhere to begin examining the chracteristic_vector.
Returns
The closure over the next Positions reachable from the current Position by advancing the location in the dictionary by one.

Definition at line 232 of file position_transition.cpp.

234 {
235
236 std::size_t i = position->term_index();
237 std::size_t e = position->num_errors();
238 bool s = position->is_special();
239 std::size_t h = i - offset;
240 std::size_t w = characteristic_vector.size();
241
242 if (e == 0 && 0 < n) {
243 if (h + 2 <= w) {
244 if (characteristic_vector[h]) {
245 return {
246 new Position(i + 1, e)
247 };
248 }
249
250 return {
251 new Position(i, e + 1),
252 new Position(i, e + 1, true),
253 new Position(i + 1, e + 1),
254 new Position(i + 2, e + 1)
255 };
256 }
257
258 if (h + 1 == w) {
259 if (characteristic_vector[h]) {
260 return {
261 new Position(i + 1, e)
262 };
263 }
264
265 return {
266 new Position(i, e + 1),
267 new Position(i, e + 1, true),
268 new Position(i + 1, e + 1)
269 };
270 }
271
272 return {
273 new Position(i, e + 1)
274 };
275 }
276
277 if (e < n) {
278 if (h + 2 <= w) {
279 if (!s) {
280 if (characteristic_vector[h]) {
281 return {
282 new Position(i + 1, e)
283 };
284 }
285
286 return {
287 new Position(i, e + 1),
288 new Position(i, e + 1, true),
289 new Position(i + 1, e + 1),
290 new Position(i + 2, e + 1)
291 };
292 }
293
294 return {
295 new Position(i + 1, e)
296 };
297 }
298
299 if (h + 1 == w) {
300 if (!s) {
301 if (characteristic_vector[h]) {
302 return {
303 new Position(i + 1, e)
304 };
305 }
306
307 return {
308 new Position(i, e + 1),
309 new Position(i, e + 1, true),
310 new Position(i + 1, e + 1)
311 };
312 }
313
314 return {
315 new Position(i + 1, e)
316 };
317 }
318
319 return {
320 new Position(i, e + 1)
321 };
322 }
323
324 if (h + 1 <= w) {
325 if (!s) {
326 if (characteristic_vector[h]) {
327 return {
328 new Position(i + 1, n)
329 };
330 }
331
332 return {};
333 }
334
335 return {
336 new Position(i + 1, e)
337 };
338 }
339
340 return {};
341}

References query(), and liblevenshtein::Position::term_index().

Here is the call graph for this function:

◆ position_transition< Algorithm::MERGE_AND_SPLIT >() [2/2]

template<>
auto liblevenshtein::position_transition< Algorithm::MERGE_AND_SPLIT > ( std::size_t n,
Position * position,
std::vector< bool > & characteristic_vector,
std::size_t offset ) -> std::vector< Position * >

Returns the closure over possible next Positions that are reachable from the current Position by advancing the location in the dictionary by one.

This covers transitions within the standard Levenshtein distance, extended with the elementary operations of merge and split.

Parameters
nMaximum edit distance to consider when matching spelling candidates.
positionCurrent Position within the Levenshtein automaton.
characteristic_vectorMoving window over matching characters of the query term.
offsetWhere to begin examining the chracteristic_vector.
Returns
The closure over the next Positions reachable from the current Position by advancing the location in the dictionary by one.

Definition at line 232 of file position_transition.cpp.

234 {
235
236 std::size_t i = position->term_index();
237 std::size_t e = position->num_errors();
238 bool s = position->is_special();
239 std::size_t h = i - offset;
240 std::size_t w = characteristic_vector.size();
241
242 if (e == 0 && 0 < n) {
243 if (h + 2 <= w) {
244 if (characteristic_vector[h]) {
245 return {
246 new Position(i + 1, e)
247 };
248 }
249
250 return {
251 new Position(i, e + 1),
252 new Position(i, e + 1, true),
253 new Position(i + 1, e + 1),
254 new Position(i + 2, e + 1)
255 };
256 }
257
258 if (h + 1 == w) {
259 if (characteristic_vector[h]) {
260 return {
261 new Position(i + 1, e)
262 };
263 }
264
265 return {
266 new Position(i, e + 1),
267 new Position(i, e + 1, true),
268 new Position(i + 1, e + 1)
269 };
270 }
271
272 return {
273 new Position(i, e + 1)
274 };
275 }
276
277 if (e < n) {
278 if (h + 2 <= w) {
279 if (!s) {
280 if (characteristic_vector[h]) {
281 return {
282 new Position(i + 1, e)
283 };
284 }
285
286 return {
287 new Position(i, e + 1),
288 new Position(i, e + 1, true),
289 new Position(i + 1, e + 1),
290 new Position(i + 2, e + 1)
291 };
292 }
293
294 return {
295 new Position(i + 1, e)
296 };
297 }
298
299 if (h + 1 == w) {
300 if (!s) {
301 if (characteristic_vector[h]) {
302 return {
303 new Position(i + 1, e)
304 };
305 }
306
307 return {
308 new Position(i, e + 1),
309 new Position(i, e + 1, true),
310 new Position(i + 1, e + 1)
311 };
312 }
313
314 return {
315 new Position(i + 1, e)
316 };
317 }
318
319 return {
320 new Position(i, e + 1)
321 };
322 }
323
324 if (h + 1 <= w) {
325 if (!s) {
326 if (characteristic_vector[h]) {
327 return {
328 new Position(i + 1, n)
329 };
330 }
331
332 return {};
333 }
334
335 return {
336 new Position(i + 1, e)
337 };
338 }
339
340 return {};
341}

References query(), and liblevenshtein::Position::term_index().

Here is the call graph for this function:

◆ position_transition< Algorithm::STANDARD >() [1/2]

template<>
auto liblevenshtein::position_transition< Algorithm::STANDARD > ( std::size_t n,
Position * position,
std::vector< bool > & characteristic_vector,
std::size_t offset ) -> std::vector< Position * >

Returns the closure over possible next Positions that are reachable from the current Position by advancing the location in the dictionary by one.

This covers transitions within the standard Levenshtein distance, which includes the elementary operations of insertion, deletion, and substituion.

Parameters
nMaximum edit distance to consider when matching spelling candidates.
positionCurrent Position within the Levenshtein automaton.
characteristic_vectorMoving window over matching characters of the query term.
offsetWhere to begin examining the chracteristic_vector.
Returns
The closure over the next Positions reachable from the current Position by advancing the location in the dictionary by one.

Definition at line 17 of file position_transition.cpp.

19 {
20
21 std::size_t i = position->term_index();
22 std::size_t e = position->num_errors();
23 std::size_t h = i - offset;
24 std::size_t w = characteristic_vector.size();
25
26 if (e < n) {
27 if (h + 2 <= w) {
28 std::size_t a = (n - e < SIZE_MAX)
29 ? n - e + 1
30 : SIZE_MAX;
31 std::size_t b = w - h;
32 std::size_t k = (a < b) ? a : b;
33 std::size_t j = index_of(characteristic_vector, k, h);
34
35 if (j == 0) {
36 return {
37 new Position(i + 1, e)
38 };
39 }
40
41 if (j != SIZE_MAX) {
42 return {
43 new Position(i, e + 1),
44 new Position(i + 1, e + 1),
45 new Position(i + j + 1, e + j)
46 };
47 }
48
49 return {
50 new Position(i, e + 1),
51 new Position(i + 1, e + 1)
52 };
53 }
54
55 if (h + 1 == w) {
56 if (characteristic_vector[h]) {
57 return {
58 new Position(i + 1, e)
59 };
60 }
61
62 return {
63 new Position(i, e + 1),
64 new Position(i + 1, e + 1)
65 };
66 }
67
68 return {
69 new Position(i, e + 1)
70 };
71 }
72
73 if (e == n && h + 1 <= w && characteristic_vector[h]) {
74 return {
75 new Position(i + 1, n)
76 };
77 }
78
79 return {};
80}
auto index_of(std::vector< bool > &characteristic_vector, std::size_t k, std::size_t i) -> std::size_t
Returns the first index of the desired character in the characteristic vector, beginning at the offse...

References index_of(), and query().

Here is the call graph for this function:

◆ position_transition< Algorithm::STANDARD >() [2/2]

template<>
auto liblevenshtein::position_transition< Algorithm::STANDARD > ( std::size_t n,
Position * position,
std::vector< bool > & characteristic_vector,
std::size_t offset ) -> std::vector< Position * >

Returns the closure over possible next Positions that are reachable from the current Position by advancing the location in the dictionary by one.

This covers transitions within the standard Levenshtein distance, which includes the elementary operations of insertion, deletion, and substituion.

Parameters
nMaximum edit distance to consider when matching spelling candidates.
positionCurrent Position within the Levenshtein automaton.
characteristic_vectorMoving window over matching characters of the query term.
offsetWhere to begin examining the chracteristic_vector.
Returns
The closure over the next Positions reachable from the current Position by advancing the location in the dictionary by one.

Definition at line 17 of file position_transition.cpp.

19 {
20
21 std::size_t i = position->term_index();
22 std::size_t e = position->num_errors();
23 std::size_t h = i - offset;
24 std::size_t w = characteristic_vector.size();
25
26 if (e < n) {
27 if (h + 2 <= w) {
28 std::size_t a = (n - e < SIZE_MAX)
29 ? n - e + 1
30 : SIZE_MAX;
31 std::size_t b = w - h;
32 std::size_t k = (a < b) ? a : b;
33 std::size_t j = index_of(characteristic_vector, k, h);
34
35 if (j == 0) {
36 return {
37 new Position(i + 1, e)
38 };
39 }
40
41 if (j != SIZE_MAX) {
42 return {
43 new Position(i, e + 1),
44 new Position(i + 1, e + 1),
45 new Position(i + j + 1, e + j)
46 };
47 }
48
49 return {
50 new Position(i, e + 1),
51 new Position(i + 1, e + 1)
52 };
53 }
54
55 if (h + 1 == w) {
56 if (characteristic_vector[h]) {
57 return {
58 new Position(i + 1, e)
59 };
60 }
61
62 return {
63 new Position(i, e + 1),
64 new Position(i + 1, e + 1)
65 };
66 }
67
68 return {
69 new Position(i, e + 1)
70 };
71 }
72
73 if (e == n && h + 1 <= w && characteristic_vector[h]) {
74 return {
75 new Position(i + 1, n)
76 };
77 }
78
79 return {};
80}

References index_of(), and query().

Here is the call graph for this function:

◆ position_transition< Algorithm::TRANSPOSITION >() [1/2]

template<>
auto liblevenshtein::position_transition< Algorithm::TRANSPOSITION > ( std::size_t n,
Position * position,
std::vector< bool > & characteristic_vector,
std::size_t offset ) -> std::vector< Position * >

Returns the closure over possible next Positions that are reachable from the current Position by advancing the location in the dictionary by one.

This covers transitions within the standard Levenshtein distance extended with the elementary operation, transposition.

Parameters
nMaximum edit distance to consider when matching spelling candidates.
positionCurrent Position within the Levenshtein automaton.
characteristic_vectorMoving window over matching characters of the query term.
offsetWhere to begin examining the chracteristic_vector.
Returns
The closure over the next Positions reachable from the current Position by advancing the location in the dictionary by one.

Definition at line 84 of file position_transition.cpp.

86 {
87
88 std::size_t i = position->term_index();
89 std::size_t e = position->num_errors();
90 bool t = position->is_special();
91 std::size_t h = i - offset;
92 std::size_t w = characteristic_vector.size();
93
94 if (e == 0 && 0 < n) {
95 if (h + 2 <= w) {
96 std::size_t a = (n - e < SIZE_MAX)
97 ? n - e + 1
98 : SIZE_MAX;
99 std::size_t b = w - h;
100 std::size_t k = (a < b) ? a : b;
101 std::size_t j = index_of(characteristic_vector, k, h);
102
103 switch (j) {
104 case 0:
105 return {
106 new Position(i + 1, 0)
107 };
108 case 1:
109 return {
110 new Position(i, 1),
111 new Position(i, 1, true),
112 new Position(i + 1, 1),
113 new Position(i + 2, 1)
114 };
115 case SIZE_MAX:
116 return {
117 new Position(i, 1),
118 new Position(i + 1, 1)
119 };
120 default:
121 return {
122 new Position(i, 1),
123 new Position(i + 1, 1),
124 new Position(i + j + 1, j)
125 };
126 }
127 }
128
129 if (h + 1 == w) {
130 if (characteristic_vector[h]) {
131 return {
132 new Position(i + 1, 0)
133 };
134 }
135
136 return {
137 new Position(i, 1),
138 new Position(i + 1, 1)
139 };
140 }
141
142 return {
143 new Position(i, 1)
144 };
145 }
146
147 if (1 <= e && e < n) {
148 if (h + 2 <= w) {
149 if (!t) {
150 std::size_t a = (n - e < SIZE_MAX)
151 ? n - e + 1
152 : SIZE_MAX;
153 std::size_t b = w - h;
154 std::size_t k = (a < b) ? a : b;
155 std::size_t j = index_of(characteristic_vector, k, h);
156
157 switch (j) {
158 case 0:
159 return {
160 new Position(i + 1, e)
161 };
162 case 1:
163 return {
164 new Position(i, e + 1),
165 new Position(i, e + 1, true),
166 new Position(i + 1, e + 1),
167 new Position(i + 2, e + 1)
168 };
169 case SIZE_MAX:
170 return {
171 new Position(i, e + 1),
172 new Position(i + 1, e + 1)
173 };
174 default:
175 return {
176 new Position(i, e + 1),
177 new Position(i + 1, e + 1),
178 new Position(i + j + 1, e + j)
179 };
180 }
181 }
182
183 if (characteristic_vector[h]) {
184 return {
185 new Position(i + 2, e)
186 };
187 }
188
189 return {};
190 }
191
192 if (h + 1 == w) {
193 if (characteristic_vector[h]) {
194 return {
195 new Position(i + 1, e)
196 };
197 }
198
199 return {
200 new Position(i, e + 1),
201 new Position(i + 1, e + 1)
202 };
203 }
204
205 return {
206 new Position(i, e + 1)
207 };
208 }
209
210 if (h + 1 <= w && !t) {
211 if (characteristic_vector[h]) {
212 return {
213 new Position(i + 1, n)
214 };
215 }
216
217 return {};
218 }
219
220 if (h + 2 <= w && t && characteristic_vector[h]) {
221 return {
222 new Position(i + 2, n)
223 };
224 }
225
226 return {};
227}

References index_of(), query(), and liblevenshtein::Position::term_index().

Here is the call graph for this function:

◆ position_transition< Algorithm::TRANSPOSITION >() [2/2]

template<>
auto liblevenshtein::position_transition< Algorithm::TRANSPOSITION > ( std::size_t n,
Position * position,
std::vector< bool > & characteristic_vector,
std::size_t offset ) -> std::vector< Position * >

Returns the closure over possible next Positions that are reachable from the current Position by advancing the location in the dictionary by one.

This covers transitions within the standard Levenshtein distance extended with the elementary operation, transposition.

Parameters
nMaximum edit distance to consider when matching spelling candidates.
positionCurrent Position within the Levenshtein automaton.
characteristic_vectorMoving window over matching characters of the query term.
offsetWhere to begin examining the chracteristic_vector.
Returns
The closure over the next Positions reachable from the current Position by advancing the location in the dictionary by one.

Definition at line 84 of file position_transition.cpp.

86 {
87
88 std::size_t i = position->term_index();
89 std::size_t e = position->num_errors();
90 bool t = position->is_special();
91 std::size_t h = i - offset;
92 std::size_t w = characteristic_vector.size();
93
94 if (e == 0 && 0 < n) {
95 if (h + 2 <= w) {
96 std::size_t a = (n - e < SIZE_MAX)
97 ? n - e + 1
98 : SIZE_MAX;
99 std::size_t b = w - h;
100 std::size_t k = (a < b) ? a : b;
101 std::size_t j = index_of(characteristic_vector, k, h);
102
103 switch (j) {
104 case 0:
105 return {
106 new Position(i + 1, 0)
107 };
108 case 1:
109 return {
110 new Position(i, 1),
111 new Position(i, 1, true),
112 new Position(i + 1, 1),
113 new Position(i + 2, 1)
114 };
115 case SIZE_MAX:
116 return {
117 new Position(i, 1),
118 new Position(i + 1, 1)
119 };
120 default:
121 return {
122 new Position(i, 1),
123 new Position(i + 1, 1),
124 new Position(i + j + 1, j)
125 };
126 }
127 }
128
129 if (h + 1 == w) {
130 if (characteristic_vector[h]) {
131 return {
132 new Position(i + 1, 0)
133 };
134 }
135
136 return {
137 new Position(i, 1),
138 new Position(i + 1, 1)
139 };
140 }
141
142 return {
143 new Position(i, 1)
144 };
145 }
146
147 if (1 <= e && e < n) {
148 if (h + 2 <= w) {
149 if (!t) {
150 std::size_t a = (n - e < SIZE_MAX)
151 ? n - e + 1
152 : SIZE_MAX;
153 std::size_t b = w - h;
154 std::size_t k = (a < b) ? a : b;
155 std::size_t j = index_of(characteristic_vector, k, h);
156
157 switch (j) {
158 case 0:
159 return {
160 new Position(i + 1, e)
161 };
162 case 1:
163 return {
164 new Position(i, e + 1),
165 new Position(i, e + 1, true),
166 new Position(i + 1, e + 1),
167 new Position(i + 2, e + 1)
168 };
169 case SIZE_MAX:
170 return {
171 new Position(i, e + 1),
172 new Position(i + 1, e + 1)
173 };
174 default:
175 return {
176 new Position(i, e + 1),
177 new Position(i + 1, e + 1),
178 new Position(i + j + 1, e + j)
179 };
180 }
181 }
182
183 if (characteristic_vector[h]) {
184 return {
185 new Position(i + 2, e)
186 };
187 }
188
189 return {};
190 }
191
192 if (h + 1 == w) {
193 if (characteristic_vector[h]) {
194 return {
195 new Position(i + 1, e)
196 };
197 }
198
199 return {
200 new Position(i, e + 1),
201 new Position(i + 1, e + 1)
202 };
203 }
204
205 return {
206 new Position(i, e + 1)
207 };
208 }
209
210 if (h + 1 <= w && !t) {
211 if (characteristic_vector[h]) {
212 return {
213 new Position(i + 1, n)
214 };
215 }
216
217 return {};
218 }
219
220 if (h + 2 <= w && t && characteristic_vector[h]) {
221 return {
222 new Position(i + 2, n)
223 };
224 }
225
226 return {};
227}

References index_of(), query(), and liblevenshtein::Position::term_index().

Here is the call graph for this function:

◆ serialize_protobuf() [1/4]

auto liblevenshtein::serialize_protobuf ( Dawg * dawg,
const char * path ) -> bool

Serializes the given Dawg to protobuf at the given path.

Parameters
dawgDawg to serialize.
pathPath to the serialization file.
Returns
Whether the Dawg was successfully serialized to the given path.

Definition at line 29 of file serializer.cpp.

29 {
30 std::fstream output(path, std::fstream::out | std::fstream::trunc |
31 std::fstream::binary);
33 output.close();
34 return serialized;
35}
auto serialize_protobuf(Dawg *dawg, const fs::path &path) -> bool
Serializes the given Dawg to protobuf at the given path.

References query(), and serialize_protobuf().

Here is the call graph for this function:

◆ serialize_protobuf() [2/4]

auto liblevenshtein::serialize_protobuf ( Dawg * dawg,
const fs::path & path ) -> bool

Serializes the given Dawg to protobuf at the given path.

Parameters
dawgDawg to serialize.
pathPath to the serialization file.
Returns
Whether the Dawg was successfully serialized to the given path.

Definition at line 21 of file serializer.cpp.

21 {
22 return serialize_protobuf(dawg, path.generic_string());
23}

References query(), and serialize_protobuf().

Referenced by serialize_protobuf(), serialize_protobuf(), and serialize_protobuf().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ serialize_protobuf() [3/4]

auto liblevenshtein::serialize_protobuf ( Dawg * dawg,
const std::string & path ) -> bool

Serializes the given Dawg to protobuf at the given path.

Parameters
dawgDawg to serialize.
pathPath to the serialization file.
Returns
Whether the Dawg was successfully serialized to the given path.

Definition at line 25 of file serializer.cpp.

25 {
26 return serialize_protobuf(dawg, path.c_str());
27}

References query(), and serialize_protobuf().

Here is the call graph for this function:

◆ serialize_protobuf() [4/4]

auto liblevenshtein::serialize_protobuf ( Dawg * dawg,
std::ostream & output ) -> bool

Serializes the given Dawg to protobuf into the given output stream.

Parameters
dawgDawg to serialize.
outputOutput stream to write the serialized protobuf.
Returns
Whether the Dawg was successfully serialized to the output stream.

Definition at line 37 of file serializer.cpp.

37 {
38 llp::Dictionary *dict_proto = to_protobuf(dawg);
39 bool serialized = dict_proto->SerializeToOstream(&output);
40 delete dict_proto;
41 return serialized;
42}
auto to_protobuf(Dawg *dawg) -> llp::Dictionary *
Serializes a Dawg to its protobuf equivalent.

References query(), and to_protobuf().

Here is the call graph for this function:

◆ sorted_dawg() [1/4]

template<class IterType >
auto liblevenshtein::sorted_dawg ( IterType iter,
IterType end ) -> SortedDawg *

Factory function that initializes a new SortedDawg using the terms yielded from an iterator.

Parameters
iterIterator that yields lexicographically sorted terms.
endIterator representing the end of the terms.
Returns
A new SortedDawg containing all the terms from the iterator.

Definition at line 124 of file sorted_dawg.cpp.

124 {
125 auto *dawg = new SortedDawg();
126 dawg->init();
127
128 while (iter != end) {
129 const std::string &term = *iter;
130 if (!dawg->add(term)) {
131 delete dawg;
132 return nullptr;
133 }
134 ++iter;
135 }
136
137 dawg->finish();
138 dawg->clean_up();
139 return dawg;
140}
A specific type of Dawg that is constructed over lexicographically sorted terms.
Definition sorted_dawg.h:20

References query().

Referenced by main().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ sorted_dawg() [2/4]

template auto liblevenshtein::sorted_dawg ( std::list< std::string >::iterator iter,
std::list< std::string >::iterator end ) -> SortedDawg *

◆ sorted_dawg() [3/4]

template auto liblevenshtein::sorted_dawg ( std::set< std::string >::iterator iter,
std::set< std::string >::iterator end ) -> SortedDawg *

◆ sorted_dawg() [4/4]

template auto liblevenshtein::sorted_dawg ( std::vector< std::string >::iterator iter,
std::vector< std::string >::iterator end ) -> SortedDawg *

◆ subsumes()

template<Algorithm Type>
auto liblevenshtein::subsumes ( Position * lhs,
Position * rhs,
std::size_t n ) -> bool

Returns whether Position lhs subsumes Position rhs, given the maximum edit distance n.

We say that lhs subsumes rhs if the set of spelling candidates reachable from rhs is a subset of the spelling candidates reachable from lhs. As such, if lhs subsumes rhs, then rhs is redundant and may be removed from the set of candidate Positions. In fact, it must be removed for the current set of algorithms to perform correctly.

Parameters
lhsPosition to compare against rhs.
rhsPosition to compare against lhs.
nMaximum edit distance to consider when matching spelling candidates.
Returns
Whether lhs subsumes rhs, given the maximum edit distance n.

◆ subsumes< Algorithm::MERGE_AND_SPLIT >() [1/2]

template<>
auto liblevenshtein::subsumes< Algorithm::MERGE_AND_SPLIT > ( Position * lhs,
Position * rhs,
std::size_t n ) -> bool

Returns whether lhs subsumes rhs according to the subsumption rules of standard Levenshtein distance extended with the elementary operations of merge and split.

Parameters
lhsPosition to compare against rhs.
rhsPosition to compare against lhs.
nMaximum edit distance to consider when matching spelling candidates.
Returns
Whether lhs subsumes rhs, given the maximum edit distance n.

Definition at line 42 of file subsumes.cpp.

43 {
44 std::size_t i = lhs->term_index();
45 std::size_t e = lhs->num_errors();
46 bool s = lhs->is_special();
47 std::size_t j = rhs->term_index();
48 std::size_t f = rhs->num_errors();
49 bool t = lhs->is_special();
50
51 if (s && !t) {
52 return false;
53 }
54
55 return ((i < j) ? (j - i) : (i - j)) <= (f - e);
56}

References query().

Here is the call graph for this function:

◆ subsumes< Algorithm::MERGE_AND_SPLIT >() [2/2]

template<>
auto liblevenshtein::subsumes< Algorithm::MERGE_AND_SPLIT > ( Position * lhs,
Position * rhs,
std::size_t n ) -> bool

Returns whether lhs subsumes rhs according to the subsumption rules of standard Levenshtein distance extended with the elementary operations of merge and split.

Parameters
lhsPosition to compare against rhs.
rhsPosition to compare against lhs.
nMaximum edit distance to consider when matching spelling candidates.
Returns
Whether lhs subsumes rhs, given the maximum edit distance n.

Definition at line 42 of file subsumes.cpp.

43 {
44 std::size_t i = lhs->term_index();
45 std::size_t e = lhs->num_errors();
46 bool s = lhs->is_special();
47 std::size_t j = rhs->term_index();
48 std::size_t f = rhs->num_errors();
49 bool t = lhs->is_special();
50
51 if (s && !t) {
52 return false;
53 }
54
55 return ((i < j) ? (j - i) : (i - j)) <= (f - e);
56}

References query().

Here is the call graph for this function:

◆ subsumes< Algorithm::STANDARD >() [1/2]

template<>
auto liblevenshtein::subsumes< Algorithm::STANDARD > ( Position * lhs,
Position * rhs,
std::size_t n ) -> bool

Returns whether lhs subsumes rhs according to the subsumption rules of standard Levenshtein distance, which consists of the elementary operations of insertion, deletion, and substitution.

Parameters
lhsPosition to compare against rhs.
rhsPosition to compare against lhs.
nMaximum edit distance to consider when matching spelling candidates.
Returns
Whether lhs subsumes rhs, given the maximum edit distance n.

Definition at line 6 of file subsumes.cpp.

7 {
8 std::size_t i = lhs->term_index();
9 std::size_t e = lhs->num_errors();
10 std::size_t j = rhs->term_index();
11 std::size_t f = rhs->num_errors();
12 return ((i < j) ? (j - i) : (i - j)) <= (f - e);
13}

References query().

Here is the call graph for this function:

◆ subsumes< Algorithm::STANDARD >() [2/2]

template<>
auto liblevenshtein::subsumes< Algorithm::STANDARD > ( Position * lhs,
Position * rhs,
std::size_t n ) -> bool

Returns whether lhs subsumes rhs according to the subsumption rules of standard Levenshtein distance, which consists of the elementary operations of insertion, deletion, and substitution.

Parameters
lhsPosition to compare against rhs.
rhsPosition to compare against lhs.
nMaximum edit distance to consider when matching spelling candidates.
Returns
Whether lhs subsumes rhs, given the maximum edit distance n.

Definition at line 6 of file subsumes.cpp.

7 {
8 std::size_t i = lhs->term_index();
9 std::size_t e = lhs->num_errors();
10 std::size_t j = rhs->term_index();
11 std::size_t f = rhs->num_errors();
12 return ((i < j) ? (j - i) : (i - j)) <= (f - e);
13}

References query().

Here is the call graph for this function:

◆ subsumes< Algorithm::TRANSPOSITION >() [1/2]

template<>
auto liblevenshtein::subsumes< Algorithm::TRANSPOSITION > ( Position * lhs,
Position * rhs,
std::size_t n ) -> bool

Returns whether lhs subsumes rhs according to the subsumption rules of standard Levenshtein distance extended with the elementary operation of transposition.

Parameters
lhsPosition to compare against rhs.
rhsPosition to compare against lhs.
nMaximum edit distance to consider when matching spelling candidates.
Returns
Whether lhs subsumes rhs, given the maximum edit distance n.

Definition at line 16 of file subsumes.cpp.

17 {
18 std::size_t i = lhs->term_index();
19 std::size_t e = lhs->num_errors();
20 bool s = lhs->is_special();
21
22 std::size_t j = rhs->term_index();
23 std::size_t f = rhs->num_errors();
24 bool t = lhs->is_special();
25
26 if (s) {
27 if (t) {
28 return (i == j);
29 }
30
31 return (f == n) && (i == j);
32 }
33
34 if (t) {
35 return ((j < i) ? (i - j - 1) : (j - i + 1)) <= (f - e);
36 }
37
38 return ((i < j) ? (j - i) : (i - j)) <= (f - e);
39}

References query().

Here is the call graph for this function:

◆ subsumes< Algorithm::TRANSPOSITION >() [2/2]

template<>
auto liblevenshtein::subsumes< Algorithm::TRANSPOSITION > ( Position * lhs,
Position * rhs,
std::size_t n ) -> bool

Returns whether lhs subsumes rhs according to the subsumption rules of standard Levenshtein distance extended with the elementary operation of transposition.

Parameters
lhsPosition to compare against rhs.
rhsPosition to compare against lhs.
nMaximum edit distance to consider when matching spelling candidates.
Returns
Whether lhs subsumes rhs, given the maximum edit distance n.

Definition at line 16 of file subsumes.cpp.

17 {
18 std::size_t i = lhs->term_index();
19 std::size_t e = lhs->num_errors();
20 bool s = lhs->is_special();
21
22 std::size_t j = rhs->term_index();
23 std::size_t f = rhs->num_errors();
24 bool t = lhs->is_special();
25
26 if (s) {
27 if (t) {
28 return (i == j);
29 }
30
31 return (f == n) && (i == j);
32 }
33
34 if (t) {
35 return ((j < i) ? (i - j - 1) : (j - i + 1)) <= (f - e);
36 }
37
38 return ((i < j) ? (j - i) : (i - j)) <= (f - e);
39}

References query().

Here is the call graph for this function:

◆ to_protobuf()

auto liblevenshtein::to_protobuf ( Dawg * dawg) -> llp::Dictionary *

Serializes a Dawg to its protobuf equivalent.

Parameters
dawgDawg to serialize.
Returns
A pointer to the serialized Dawg.

Definition at line 95 of file serializer.cpp.

95 {
96 auto *dict_proto = new llp::Dictionary();
97 DawgNode *root = dawg->root();
98
99 std::set<uint64_t> node_ids;
100 std::set<uint64_t> final_node_ids;
102 for (const uint64_t &node_id : node_ids) {
103 dict_proto->add_node_id(node_id);
104 }
105 for (const uint64_t &final_node_id : final_node_ids) {
106 dict_proto->add_final_node_id(final_node_id);
107 }
108
109 std::map<std::pair<uint64_t, char>, uint64_t> edges;
110 collect_edges(root, edges);
111 for (const auto &edge : edges) {
112 uint64_t source_id = edge.first.first;
113 char label = edge.first.second;
114 uint64_t target_id = edge.second;
115 llp::Dictionary_Edge *edge_proto = dict_proto->add_edge();
116 edge_proto->set_source_id(source_id);
117 edge_proto->set_label(label);
118 edge_proto->set_target_id(target_id);
119 }
120
121 auto root_id = reinterpret_cast<uint64_t>(root);
122 dict_proto->set_root_id(root_id);
123 dict_proto->set_size(dawg->size());
124
125 return dict_proto;
126}
void collect_nodes(DawgNode *source, std::set< uint64_t > &node_ids, std::set< uint64_t > &final_node_ids)
Collects the DawgNode IDs and final DawgNode IDs of all nodes reachable from the source.

References collect_edges(), collect_nodes(), and query().

Referenced by serialize_protobuf().

Here is the call graph for this function:
Here is the caller graph for this function: