liblevenshtein 4.0.0
A library for generating Finite State Transducers based on Levenshtein Automata.
Loading...
Searching...
No Matches
liblevenshtein::SortedDawg Class Reference

A specific type of Dawg that is constructed over lexicographically sorted terms. More...

#include <sorted_dawg.h>

Inheritance diagram for liblevenshtein::SortedDawg:
Collaboration diagram for liblevenshtein::SortedDawg:

Public Member Functions

 ~SortedDawg () override
 Cleans up any residual intermediate members used to construct the Dawg.
 
auto add (const std::string &term) -> bool override
 Adds a new term to this dictionary.
 
auto remove (const std::string &term) -> bool override
 Removes a term from this dictionary.
 
void init ()
 Prepares this dictionary for initialization.
 
void clean_up ()
 Cleans up any intermediate members used to initialize this Dawg.
 
void finish ()
 Specifies the dictionary has been fully initialized.
 
void minimize (std::size_t lower_bound)
 Minimizes the suffixes of the unchecked paths until there are no more than lower_bound unchecked transitions.
 
auto minimized_node (DawgNode *key) const -> DawgNode *
 Returns a DawgNode whose following suffix is equivalent to the following suffix of the given DawgNode, or nullptr if none exist.
 
 Dawg (DawgNode *root, std::size_t size)
 Constructs a new DAWG with the given root node and number of dictionary terms (size).
 
 Dawg ()
 Constructs an empty DAWG with a default root node (there are no outgoing edges or reachable terms from the root, not even the empty string).
 
auto contains (const std::string &term) const -> bool
 Determines whether the given term is contained within this dictionary.
 
auto root () const -> DawgNode *
 Returns a pointer to the root node of this dictionary from which all its terms may be determined.
 
auto size () const -> size_t
 Returns the number of terms in this dictionary.
 
auto begin () const -> DawgIterator
 Returns an iterator pointing to the first term in this dictionary, from which all terms may be iterated over.
 
auto end () const -> DawgIterator
 Returns an iterator representing the boundary following the final term in this dictionary.
 
auto operator== (const Dawg &other) const -> bool
 Determines whether another DAWG is equivalent to this one.
 

Protected Member Functions

auto all_nodes () const -> std::unordered_set< DawgNode * >
 Returns a set of all nodes reachable from the root, including the root.
 

Protected Attributes

DawgNode_root
 Root node of this DAWG from which all its terms may be reached.
 
std::size_t _size
 Number of terms reachable from the root node.
 

Private Attributes

std::stack< Transition > * unchecked_transitions = nullptr
 Transitions whose redundancies have not been checked.
 
std::unordered_map< DawgNode, DawgNode * > * minimized_nodes = nullptr
 Mapping of suffixes to DawgNodes that have been minimized.
 
std::unordered_set< DawgNode * > * floating_nodes = nullptr
 Nodes that have not been added to this Dawg.
 
std::string _prev_term
 Previous term inserted into this Dawg.
 

Detailed Description

A specific type of Dawg that is constructed over lexicographically sorted terms.

The precondition that the terms are sorted, lexigraphically, in ascending order is a requirement by the algorithm used to construct this type of Dawg.

Definition at line 20 of file sorted_dawg.h.

Constructor & Destructor Documentation

◆ ~SortedDawg()

liblevenshtein::SortedDawg::~SortedDawg ( )
override

Cleans up any residual intermediate members used to construct the Dawg.

The super destructor is subsequently called to clean up the rest of its nodes.

Definition at line 11 of file sorted_dawg.cpp.

11 {
12 clean_up();
13}
void clean_up()
Cleans up any intermediate members used to initialize this Dawg.

References clean_up().

Here is the call graph for this function:

Member Function Documentation

◆ add()

auto liblevenshtein::SortedDawg::add ( const std::string & term) -> bool
overridevirtual

Adds a new term to this dictionary.

The term must be lexicographically larger than the previously added term.

Parameters
termA term to add to this dictionary.
Returns
Wether the term was successfully added to this dictionary.

Implements liblevenshtein::Dawg.

Definition at line 15 of file sorted_dawg.cpp.

15 {
16 if (term < _prev_term) {
17 return false;
18 }
19
20 if (term.empty()) {
21 _root->is_final(true);
23 _size += 1;
24 return true;
25 }
26
27 std::size_t upper_bound = std::min(term.length(), _prev_term.length());
28
29 // Find the length of the longest common prefix between term and prev_term
30 std::size_t i = 0;
31 while (i < upper_bound && term[i] == _prev_term[i]) {
32 i += 1;
33 }
34
35 // Check the unchecked nodes for redundancy from the last one down to
36 // the common prefix size. Then, truncate the list at that point.
37 minimize(i);
38
39 // Add the suffix starting from the correct node min-way through the graph
40 DawgNode *source = unchecked_transitions->empty()
41 ? _root
42 : unchecked_transitions->top().target();
43
44 for (std::size_t k = term.length(); i < k; i += 1) {
45 auto *target = new DawgNode();
46 floating_nodes->insert(target);
47 unchecked_transitions->emplace(term[i], source, target);
48 source = target;
49 }
50
51 source->is_final(true);
53 _size += 1;
54 return true;
55}
void is_final(bool is_final)
Specifies whether this node represents a word boundary, or immediately follows an edge having the fin...
Definition dawg_node.cpp:19
std::size_t _size
Number of terms reachable from the root node.
Definition dawg.h:129
DawgNode * _root
Root node of this DAWG from which all its terms may be reached.
Definition dawg.h:126
std::unordered_set< DawgNode * > * floating_nodes
Nodes that have not been added to this Dawg.
Definition sorted_dawg.h:95
std::string _prev_term
Previous term inserted into this Dawg.
Definition sorted_dawg.h:98
void minimize(std::size_t lower_bound)
Minimizes the suffixes of the unchecked paths until there are no more than lower_bound unchecked tran...
std::stack< Transition > * unchecked_transitions
Transitions whose redundancies have not been checked.
Definition sorted_dawg.h:89
void query(ll::Dawg *dawg, const std::string &query_term, std::size_t max_distance)
Definition main.cpp:25

References liblevenshtein::DawgNode::is_final(), and query().

Here is the call graph for this function:

◆ all_nodes()

auto liblevenshtein::Dawg::all_nodes ( ) const -> std::unordered_set<DawgNode *>
protectedinherited

Returns a set of all nodes reachable from the root, including the root.

Returns
The closure of nodes reachable from the root, including the root.

Definition at line 28 of file dawg.cpp.

28 {
29 std::unordered_set<DawgNode *> nodes;
30 std::queue<DawgNode *> pending;
31 pending.push(_root);
32 while (!pending.empty()) {
33 DawgNode *node = pending.front();
34 pending.pop();
35 nodes.insert(node);
36 node->for_each_edge([&](char label, DawgNode *target) {
37 pending.push(target);
38 });
39 }
40 return nodes;
41}

References liblevenshtein::Dawg::_root, and query().

Referenced by clean_up(), and liblevenshtein::Dawg::~Dawg().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ begin()

auto liblevenshtein::Dawg::begin ( ) const -> DawgIterator
inherited

Returns an iterator pointing to the first term in this dictionary, from which all terms may be iterated over.

Returns
An iterator pointing to the first term in this dictionary.

Definition at line 59 of file dawg.cpp.

59 {
60 return {_root};
61}

References liblevenshtein::Dawg::_root.

◆ clean_up()

void liblevenshtein::SortedDawg::clean_up ( )

Cleans up any intermediate members used to initialize this Dawg.

Definition at line 98 of file sorted_dawg.cpp.

98 {
99 if (unchecked_transitions != nullptr) {
101 unchecked_transitions = nullptr;
102 }
103
104 if (minimized_nodes != nullptr) {
105 delete minimized_nodes;
106 minimized_nodes = nullptr;
107 }
108
109 if (floating_nodes != nullptr) {
110 for (DawgNode *node : all_nodes()) {
111 floating_nodes->erase(node);
112 }
113
114 for (DawgNode *node : *floating_nodes) {
115 delete node;
116 }
117
118 delete floating_nodes;
119 floating_nodes = nullptr;
120 }
121}
auto all_nodes() const -> std::unordered_set< DawgNode * >
Returns a set of all nodes reachable from the root, including the root.
Definition dawg.cpp:28
std::unordered_map< DawgNode, DawgNode * > * minimized_nodes
Mapping of suffixes to DawgNodes that have been minimized.
Definition sorted_dawg.h:92

References liblevenshtein::Dawg::all_nodes(), floating_nodes, minimized_nodes, and unchecked_transitions.

Referenced by ~SortedDawg().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ contains()

auto liblevenshtein::Dawg::contains ( const std::string & term) const -> bool
inherited

Determines whether the given term is contained within this dictionary.

Parameters
termQuery term whose membership in this dictionary must be determined.
Returns
Whether the term is contained within this dictionary.

Definition at line 43 of file dawg.cpp.

43 {
44 DawgNode* node = _root;
45 for (int i = 0; i < term.length() && node != nullptr; i += 1) {
46 node = node->transition(term[i]);
47 }
48 return node != nullptr && node->is_final();
49}
auto transition(char label) const -> DawgNode *
Returns the target node, following this one, along the edge annotated with the given label.
Definition dawg_node.cpp:33

References liblevenshtein::DawgNode::is_final(), query(), and liblevenshtein::DawgNode::transition().

Referenced by liblevenshtein::from_protobuf().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ Dawg() [1/2]

liblevenshtein::Dawg::Dawg ( )

Constructs an empty DAWG with a default root node (there are no outgoing edges or reachable terms from the root, not even the empty string).

Definition at line 41 of file dawg.cpp.

18 : _root(new DawgNode()),
19 _size(0)
20{}

◆ Dawg() [2/2]

liblevenshtein::Dawg::Dawg ( DawgNode * root,
std::size_t size )

Constructs a new DAWG with the given root node and number of dictionary terms (size).

The number of terms reachable by combining the subsequent outgoing edges from the root must be equal to the size.

Parameters
rootNode from which all dictionary terms may be reached by subsequently traversing its outgoing edges.
sizeNumber of terms in the dictionary.

Definition at line 35 of file dawg.cpp.

13 : _root(root),
14 _size(size)
15{}
auto root() const -> DawgNode *
Returns a pointer to the root node of this dictionary from which all its terms may be determined.
Definition dawg.cpp:51
auto size() const -> size_t
Returns the number of terms in this dictionary.
Definition dawg.cpp:55

References query().

Here is the call graph for this function:

◆ end()

auto liblevenshtein::Dawg::end ( ) const -> DawgIterator
inherited

Returns an iterator representing the boundary following the final term in this dictionary.

The value of the iterator is not a term, but represent a boundary that must not be crossed.

Returns
An iterator pointing to the boundary following the final term in this dictionary.

Definition at line 63 of file dawg.cpp.

63 {
64 return {_size};
65}

References liblevenshtein::Dawg::_size.

◆ finish()

void liblevenshtein::SortedDawg::finish ( )

Specifies the dictionary has been fully initialized.

This is only called during initialization.

Definition at line 57 of file sorted_dawg.cpp.

57 {
58 minimize(0);
59}

References minimize().

Here is the call graph for this function:

◆ init()

void liblevenshtein::SortedDawg::init ( )

Prepares this dictionary for initialization.

This function is only called during initialization.

Definition at line 92 of file sorted_dawg.cpp.

92 {
93 unchecked_transitions = new std::stack<Transition>();
94 minimized_nodes = new std::unordered_map<DawgNode, DawgNode *>();
95 floating_nodes = new std::unordered_set<DawgNode *>();
96}

References floating_nodes, minimized_nodes, and unchecked_transitions.

◆ minimize()

void liblevenshtein::SortedDawg::minimize ( std::size_t lower_bound)

Minimizes the suffixes of the unchecked paths until there are no more than lower_bound unchecked transitions.

This is only called during initialization.

Parameters
lower_boundDesired size of remaining unchecked transitions once the other have been minimized.

Definition at line 69 of file sorted_dawg.cpp.

69 {
70 // proceed from the leaf up to a certain point
71 for (std::size_t j = unchecked_transitions->size(); j > lower_bound; j -= 1) {
72 Transition transition = unchecked_transitions->top();
74 char label = transition.label();
75 DawgNode* source = transition.source();
76 DawgNode* target = transition.target();
77 DawgNode* existing = minimized_node(target);
78 if (existing != nullptr) {
79 source->add_edge(label, existing);
80 }
81 else {
82 source->add_edge(label, target);
83 (*minimized_nodes)[*target] = target;
84 }
85 }
86}
auto minimized_node(DawgNode *key) const -> DawgNode *
Returns a DawgNode whose following suffix is equivalent to the following suffix of the given DawgNode...

References liblevenshtein::DawgNode::add_edge(), liblevenshtein::Transition::label(), minimized_node(), query(), liblevenshtein::Transition::source(), liblevenshtein::Transition::target(), and unchecked_transitions.

Referenced by finish().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ minimized_node()

auto liblevenshtein::SortedDawg::minimized_node ( DawgNode * key) const -> DawgNode *

Returns a DawgNode whose following suffix is equivalent to the following suffix of the given DawgNode, or nullptr if none exist.

This is used to determine whether the given DawgNode is redundant and may be discarded.

Parameters
keyDawgNode whose redundancy must be determined.
Returns
Either an existing DawgNode having an equivalent suffix to the key or nullptr if the key is thus far unique.

Definition at line 61 of file sorted_dawg.cpp.

61 {
62 auto iter = minimized_nodes->find(*key);
63 if (iter != minimized_nodes->end()) {
64 return iter->second;
65 }
66 return nullptr;
67}

References query().

Referenced by minimize().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ operator==()

auto liblevenshtein::Dawg::operator== ( const Dawg & other) const -> bool
inherited

Determines whether another DAWG is equivalent to this one.

Parameters
otherDictionary (DAWG) to compare to this one.
Returns
Whether the other dictionary is equivalent to this one.

Definition at line 72 of file dawg.cpp.

72 {
73 return size() == other.size() && *root() == *other.root();
74}

References query().

Here is the call graph for this function:

◆ remove()

auto liblevenshtein::SortedDawg::remove ( const std::string & term) -> bool
overridevirtual

Removes a term from this dictionary.

Currently, this is just a placeholder and performs no action.

Parameters
Theterm to remove from this dictionary.
Returns
Whether the term was removed from this dictionary.

Implements liblevenshtein::Dawg.

Definition at line 88 of file sorted_dawg.cpp.

88 {
89 return false;
90}

◆ root()

auto liblevenshtein::Dawg::root ( ) const -> DawgNode *
inherited

Returns a pointer to the root node of this dictionary from which all its terms may be determined.

Returns
A pointer to the root node of this dictionary.

Definition at line 51 of file dawg.cpp.

51 {
52 return _root;
53}

References liblevenshtein::Dawg::_root.

Referenced by query().

Here is the caller graph for this function:

◆ size()

auto liblevenshtein::Dawg::size ( ) const -> size_t
inherited

Returns the number of terms in this dictionary.

Returns
The size of this dictionary.

Definition at line 55 of file dawg.cpp.

55 {
56 return _size;
57}

References liblevenshtein::Dawg::_size.

Member Data Documentation

◆ _prev_term

std::string liblevenshtein::SortedDawg::_prev_term
private

Previous term inserted into this Dawg.

Definition at line 98 of file sorted_dawg.h.

◆ _root

DawgNode* liblevenshtein::Dawg::_root
protectedinherited

Root node of this DAWG from which all its terms may be reached.

Definition at line 126 of file dawg.h.

Referenced by liblevenshtein::Dawg::all_nodes(), liblevenshtein::Dawg::begin(), and liblevenshtein::Dawg::root().

◆ _size

std::size_t liblevenshtein::Dawg::_size
protectedinherited

Number of terms reachable from the root node.

Definition at line 129 of file dawg.h.

Referenced by liblevenshtein::Dawg::end(), and liblevenshtein::Dawg::size().

◆ floating_nodes

std::unordered_set<DawgNode *>* liblevenshtein::SortedDawg::floating_nodes = nullptr
private

Nodes that have not been added to this Dawg.

Definition at line 95 of file sorted_dawg.h.

Referenced by clean_up(), and init().

◆ minimized_nodes

std::unordered_map<DawgNode, DawgNode *>* liblevenshtein::SortedDawg::minimized_nodes = nullptr
private

Mapping of suffixes to DawgNodes that have been minimized.

Definition at line 92 of file sorted_dawg.h.

Referenced by clean_up(), and init().

◆ unchecked_transitions

std::stack<Transition>* liblevenshtein::SortedDawg::unchecked_transitions = nullptr
private

Transitions whose redundancies have not been checked.

Definition at line 89 of file sorted_dawg.h.

Referenced by clean_up(), init(), and minimize().


The documentation for this class was generated from the following files: