liblevenshtein 4.0.0
A library for generating Finite State Transducers based on Levenshtein Automata.
Loading...
Searching...
No Matches
liblevenshtein::Dawg Class Referenceabstract

A Directed Acyclic Word Graph (DAWG) maps sequences of characters to form words; the collection of words is known as a dictionary. More...

#include <dawg.h>

Inheritance diagram for liblevenshtein::Dawg:
Collaboration diagram for liblevenshtein::Dawg:

Public Member Functions

 Dawg (DawgNode *root, std::size_t size)
 Constructs a new DAWG with the given root node and number of dictionary terms (size).
 
 Dawg ()
 Constructs an empty DAWG with a default root node (there are no outgoing edges or reachable terms from the root, not even the empty string).
 
virtual ~Dawg ()
 Deletes all the nodes associated with this dictionary.
 
virtual auto add (const std::string &term) -> bool=0
 Adds a new term to this dictionary by adding missing sequential edges leading outward from the root node.
 
virtual auto remove (const std::string &term) -> bool=0
 Removes a term from this dictionary and cleans up its path.
 
auto contains (const std::string &term) const -> bool
 Determines whether the given term is contained within this dictionary.
 
auto root () const -> DawgNode *
 Returns a pointer to the root node of this dictionary from which all its terms may be determined.
 
auto size () const -> size_t
 Returns the number of terms in this dictionary.
 
auto begin () const -> DawgIterator
 Returns an iterator pointing to the first term in this dictionary, from which all terms may be iterated over.
 
auto end () const -> DawgIterator
 Returns an iterator representing the boundary following the final term in this dictionary.
 
auto operator== (const Dawg &other) const -> bool
 Determines whether another DAWG is equivalent to this one.
 

Protected Member Functions

auto all_nodes () const -> std::unordered_set< DawgNode * >
 Returns a set of all nodes reachable from the root, including the root.
 

Protected Attributes

DawgNode_root
 Root node of this DAWG from which all its terms may be reached.
 
std::size_t _size
 Number of terms reachable from the root node.
 

Friends

class std::hash< Dawg >
 Specifies the hash function for this DAWG may view its membership.
 
auto operator<< (std::ostream &out, const Dawg &dawg) -> std::ostream &
 Specifies the output streaming operator for this DAWG may view its membership.
 

Detailed Description

A Directed Acyclic Word Graph (DAWG) maps sequences of characters to form words; the collection of words is known as a dictionary.

As its name implies, it is a directed, acyclic automaton having a single root node from which all terms in the dictionary are reachable. It is important that only those terms in the dictionary may be formed by traversing the edges of the DAWG, no terms that are not in the dictionary may be formed by joining subsequent outgoing edges from the root. Nodes denoting word boundaries are flagged as being final.

Definition at line 23 of file dawg.h.

Constructor & Destructor Documentation

◆ Dawg() [1/2]

liblevenshtein::Dawg::Dawg ( DawgNode * root,
std::size_t size )

Constructs a new DAWG with the given root node and number of dictionary terms (size).

The number of terms reachable by combining the subsequent outgoing edges from the root must be equal to the size.

Parameters
rootNode from which all dictionary terms may be reached by subsequently traversing its outgoing edges.
sizeNumber of terms in the dictionary.

Definition at line 12 of file dawg.cpp.

13 : _root(root),
14 _size(size)
15{}
std::size_t _size
Number of terms reachable from the root node.
Definition dawg.h:129
auto root() const -> DawgNode *
Returns a pointer to the root node of this dictionary from which all its terms may be determined.
Definition dawg.cpp:51
DawgNode * _root
Root node of this DAWG from which all its terms may be reached.
Definition dawg.h:126
auto size() const -> size_t
Returns the number of terms in this dictionary.
Definition dawg.cpp:55

◆ Dawg() [2/2]

liblevenshtein::Dawg::Dawg ( )

Constructs an empty DAWG with a default root node (there are no outgoing edges or reachable terms from the root, not even the empty string).

Definition at line 17 of file dawg.cpp.

18 : _root(new DawgNode()),
19 _size(0)
20{}

◆ ~Dawg()

liblevenshtein::Dawg::~Dawg ( )
virtual

Deletes all the nodes associated with this dictionary.

Definition at line 22 of file dawg.cpp.

22 {
23 for (DawgNode* node : all_nodes()) {
24 delete node;
25 }
26}
auto all_nodes() const -> std::unordered_set< DawgNode * >
Returns a set of all nodes reachable from the root, including the root.
Definition dawg.cpp:28

References all_nodes().

Here is the call graph for this function:

Member Function Documentation

◆ add()

virtual auto liblevenshtein::Dawg::add ( const std::string & term) -> bool
pure virtual

Adds a new term to this dictionary by adding missing sequential edges leading outward from the root node.

Parameters
termThe term to add to this dictionary.
Returns
Whether the term was successfully added to this dictionary.

Implemented in liblevenshtein::SortedDawg.

◆ all_nodes()

auto liblevenshtein::Dawg::all_nodes ( ) const -> std::unordered_set<DawgNode *>
protected

Returns a set of all nodes reachable from the root, including the root.

Returns
The closure of nodes reachable from the root, including the root.

Definition at line 28 of file dawg.cpp.

28 {
29 std::unordered_set<DawgNode *> nodes;
30 std::queue<DawgNode *> pending;
31 pending.push(_root);
32 while (!pending.empty()) {
33 DawgNode *node = pending.front();
34 pending.pop();
35 nodes.insert(node);
36 node->for_each_edge([&](char label, DawgNode *target) {
37 pending.push(target);
38 });
39 }
40 return nodes;
41}
void query(ll::Dawg *dawg, const std::string &query_term, std::size_t max_distance)
Definition main.cpp:25

References _root, and query().

Referenced by liblevenshtein::SortedDawg::clean_up(), and ~Dawg().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ begin()

auto liblevenshtein::Dawg::begin ( ) const -> DawgIterator

Returns an iterator pointing to the first term in this dictionary, from which all terms may be iterated over.

Returns
An iterator pointing to the first term in this dictionary.

Definition at line 59 of file dawg.cpp.

59 {
60 return {_root};
61}

References _root.

◆ contains()

auto liblevenshtein::Dawg::contains ( const std::string & term) const -> bool

Determines whether the given term is contained within this dictionary.

Parameters
termQuery term whose membership in this dictionary must be determined.
Returns
Whether the term is contained within this dictionary.

Definition at line 43 of file dawg.cpp.

43 {
44 DawgNode* node = _root;
45 for (int i = 0; i < term.length() && node != nullptr; i += 1) {
46 node = node->transition(term[i]);
47 }
48 return node != nullptr && node->is_final();
49}
auto transition(char label) const -> DawgNode *
Returns the target node, following this one, along the edge annotated with the given label.
Definition dawg_node.cpp:33

References liblevenshtein::DawgNode::is_final(), query(), and liblevenshtein::DawgNode::transition().

Referenced by liblevenshtein::from_protobuf().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ end()

auto liblevenshtein::Dawg::end ( ) const -> DawgIterator

Returns an iterator representing the boundary following the final term in this dictionary.

The value of the iterator is not a term, but represent a boundary that must not be crossed.

Returns
An iterator pointing to the boundary following the final term in this dictionary.

Definition at line 63 of file dawg.cpp.

63 {
64 return {_size};
65}

References _size.

◆ operator==()

auto liblevenshtein::Dawg::operator== ( const Dawg & other) const -> bool

Determines whether another DAWG is equivalent to this one.

Parameters
otherDictionary (DAWG) to compare to this one.
Returns
Whether the other dictionary is equivalent to this one.

Definition at line 72 of file dawg.cpp.

72 {
73 return size() == other.size() && *root() == *other.root();
74}

References query().

Here is the call graph for this function:

◆ remove()

virtual auto liblevenshtein::Dawg::remove ( const std::string & term) -> bool
pure virtual

Removes a term from this dictionary and cleans up its path.

Parameters
termThe term to remove from this dictionary.
Returns
Whether the term was removed from this dictionary.

Implemented in liblevenshtein::SortedDawg.

◆ root()

auto liblevenshtein::Dawg::root ( ) const -> DawgNode *

Returns a pointer to the root node of this dictionary from which all its terms may be determined.

Returns
A pointer to the root node of this dictionary.

Definition at line 51 of file dawg.cpp.

51 {
52 return _root;
53}

References _root.

Referenced by query().

Here is the caller graph for this function:

◆ size()

auto liblevenshtein::Dawg::size ( ) const -> size_t

Returns the number of terms in this dictionary.

Returns
The size of this dictionary.

Definition at line 55 of file dawg.cpp.

55 {
56 return _size;
57}

References _size.

Friends And Related Symbol Documentation

◆ operator<<

auto operator<< ( std::ostream & out,
const Dawg & dawg ) -> std::ostream &
friend

Specifies the output streaming operator for this DAWG may view its membership.

Definition at line 67 of file dawg.cpp.

67 {
68 out << "Dawg{size=" << dawg._size << ", root=" << dawg._root << "}";
69 return out;
70}

◆ std::hash< Dawg >

friend class std::hash< Dawg >
friend

Specifies the hash function for this DAWG may view its membership.

Definition at line 113 of file dawg.h.

Member Data Documentation

◆ _root

DawgNode* liblevenshtein::Dawg::_root
protected

Root node of this DAWG from which all its terms may be reached.

Definition at line 126 of file dawg.h.

Referenced by all_nodes(), begin(), and root().

◆ _size

std::size_t liblevenshtein::Dawg::_size
protected

Number of terms reachable from the root node.

Definition at line 129 of file dawg.h.

Referenced by end(), and size().


The documentation for this class was generated from the following files: