Class CasCompare

java.lang.Object
org.apache.uima.cas.impl.CasCompare

public class CasCompare extends Object
Used by tests for Binary Compressed de/serialization code. Used by test app: XmiCompare. Compare 2 CASes, with perhaps different type systems. If the type systems are different, construct a type mapper and use that to selectively ignore types or features not in other type system The Mapper is from CAS1 -> CAS2 When computing the things to compare from CAS1, filter to remove feature structures not reachable via indexes or refs The index definitions are not compared. The indexes are used to locate the FSs to be compared. Reports are produced to System.out and System.err as a side effect System.out: status messages, type system comparison System.err: mismatch comparison information Usage: Use the static compareCASes method for default comparisons Use the multi-step approach for more complex comparisons: - Make an instance of this class, passing in the two CASes. - Set any additional configuration cc.compareAll(true) - continue comparing if mismatch found cc.compardIds(true) - compare ids (require ids to be ==) - Do any transformations needed on the CASes to account for known but allowed differences: -- These are transformations done on the CAS Feature Structures outside of this routine -- example: for certain type:feature string values, normalize to the same canonical value -- example: for certain type:feature string arrays, where the order is not important, sort them -- example: for certain type:feature FSArrays, where the order is not important, sort them --- using the sortFSArray method - Do any configuration to specify congruence sets for String values -- example: addStringCongruenceSet( type, feature, set-of-strings, -1 or int index if array) -- these are specific to type / feature specs -- range can be string or string array - if string array, the spec includes the index or -1 to indicate all indexes How it works Prepare arrays of all the FSs in the CAS - for each of 2 CASes to be compared - 2 arrays: -- all FSs in any index in any view -- the above, plus all FSs reachable via references -- but omit some types: only of interest when reached via ref, e.g. String/Int/Float/Boolean arrays The comparison of FSs is done, one FS at a time. - in order to determine the right FSs to compare with each other, the FSs for each CAS are sorted. The sort and the CAS compare both use a Compare method. - sorting skips items not in the other type system, including features - (only possible if comparing two CASes with different type systems, of course) Compare - used for two purposes: a) sorting FSs belonging to one CAS - can be used by caller to pre-sort any array values where the compare should be for set equality (in other words, ignore the order) b) comparing a FS in one CAS with a FS in the other CAS sort keys, in order: 1) type 2) if primitive array: sort based on - size - iterating thru all array items 3) All the features, considered in an order where non-refs are sorted before refs. comparing values: primitives - value comparison refs - compare the ref'd FS, while recording reference paths - stop when reach a compare point where the pair being compared has been seen - stop at any point if the two FSs compare unequal - at the stop point, if compare is equal, check the reference paths, and report unequal reference paths (different cycle lengths, or different total lengths, see the Prev data structure) Context information, reused across compares: prevCompare - if a particular pair of FSs compared equal -- used to speed up comparison -- used to stop recursive loops of references prev1, prev2 - reset for each top level FS compare - not reset for inner FS compares of fs-reference values) holds information about the reference path for chains of FS references
  • Field Details

    • IS_DEBUG_STOP_ON_MISCOMPARE

      private static final boolean IS_DEBUG_STOP_ON_MISCOMPARE
      See Also:
    • IS_MEAS_LIST_2_ARRAY

      private static final boolean IS_MEAS_LIST_2_ARRAY
      See Also:
    • BLANKS_89

      private static final String BLANKS_89
    • IS_SHOW_PROGRESS

      private static boolean IS_SHOW_PROGRESS
    • IS_CANONICAL_EMPTY_LISTS

      private static final boolean IS_CANONICAL_EMPTY_LISTS
      See Also:
    • removed_list_marker

      private static final CommonList removed_list_marker
    • map_e_to_a_list

      private final Int2ObjHashMap<ArrayList<CommonList>,ArrayList<CommonList>> map_e_to_a_list
      key = _id, value = arraylist holding well-formed list with this node in it
    • non_linear_list_elements

      private final PositiveIntSet non_linear_list_elements
      set of list elements determined to be members of a non-linear structure
    • node_indexes

      private final Obj2IntIdentityHashMap<CommonList> node_indexes
      a map from list nodes which might be removed, to their place in the fss array list The index is 1 more, to avoid colliding with the 0 value, used for missing value
    • list_successor_seen

      private final PositiveIntSet list_successor_seen
    • type2featLists

      private final Map<TypeImpl,CasCompare.FeatLists> type2featLists
    • c1

      private final CASImpl c1
    • c2

      private final CASImpl c2
    • ts1

      private final TypeSystemImpl ts1
    • ts2

      private final TypeSystemImpl ts2
    • isCompareAll

      private boolean isCompareAll
      if true, continues comparison and reporting after finding the first miscompare
    • isCompareIds

      private boolean isCompareIds
    • leafErrorReported

      private Pair<TOP,TOP> leafErrorReported
    • excludedRootNames

      private final Set<String> excludedRootNames
    • includedTypeNames

      private final Set<String> includedTypeNames
    • prevCompare

      private final Map<Pair<TOP,TOP>,Integer> prevCompare
      This is used - to speed up the compare - avoid comparing the same things multiple times, instead just use previous result - when doing the comparison to break recursion if asked to compare the same two things while comparing them. value is the result of previous comparison.
    • prevReport

      private final Set<Pair<TOP,TOP>> prevReport
    • prev1

      private final CasCompare.Prev prev1
    • prev2

      private final CasCompare.Prev prev2
    • isSrcCas

      private boolean isSrcCas
    • mismatchSb

      private final StringBuilder mismatchSb
    • inSortContext

      private boolean inSortContext
    • isSkipMismatch

      private boolean isSkipMismatch
    • isTypeMapping

      private boolean isTypeMapping
    • typeMapper

      private final CasTypeSystemMapper typeMapper
    • c1FoundFSs

      private ArrayList<TOP> c1FoundFSs
    • c2FoundFSs

      private ArrayList<TOP> c2FoundFSs
    • stringCongruenceSets

      private final Map<CasCompare.ScsKey,String[]> stringCongruenceSets
    • isUsingStringCongruenceSets

      private boolean isUsingStringCongruenceSets
    • maxId1

      private int maxId1
    • maxId2

      private int maxId2
    • miscompare_index

      private int miscompare_index
    • s1maxLen

      private int s1maxLen
    • working_on

      private static int working_on
  • Constructor Details

    • CasCompare

      public CasCompare(CASImpl c1, CASImpl c2)
      Make an instance of this class to set up a compare operation, and optionally use to configure the compare.
      Parameters:
      c1 - one CAS to compare
      c2 - the other CAS to compare
  • Method Details

    • compareCASes

      public static boolean compareCASes(CASImpl c1, CASImpl c2)
      Compare 2 CASes, with perhaps different type systems. - using default configuration.
      Parameters:
      c1 - CAS to compare
      c2 - CAS to compare
      Returns:
      true if equal (for types / features in both)
    • compareAll

      public void compareAll(boolean v)
      Continues the comparison after a miscompare (or not). This is useful when you want to see all of the miscompares.
      Parameters:
      v - defaults to false, set to true to continue the comparison after a miscompare
    • compareIds

      public void compareIds(boolean v)
      Normally, compares ignore the Feature Structure ID when comparing.
      Parameters:
      v - defaults to false, set to true to include the Feature Structure ID in the compare.
    • applyToBoth

      public void applyToBoth(Consumer<CASImpl> c)
      Many times some customation needs to be applied to both CASs being compared. This routine does that
      Parameters:
      c - the customization to be applied to both CASs
    • applyToTypeFeature

      public void applyToTypeFeature(String typeName, String featureBaseName, Consumer2<TOP,Feature> c)
      Before comparing, you can adjust specific features of specific types, arbitrarily. This routine applies the adjustments to both CASs.
      Parameters:
      typeName - the fully qualified name of the type
      featureBaseName - the short feature name to adjust
      c - a function to do the adjustment
    • applyToTypeFeature_inner

      private void applyToTypeFeature_inner(CASImpl cas, String typeName, String featureBaseName, Consumer2<TOP,Feature> c)
    • type_feature_to_runnable

      public List<Runnable> type_feature_to_runnable(String typeName, String featureBaseName, BiFunction<TOP,Feature,Runnable> c)
      Before comparing, you can create pending values for specific types / features, and return a list of runnables, which when run, plug in those pending values.
      Parameters:
      typeName - the type
      featureBaseName - the feature of the type
      c - the code to run for this type and feature
      Returns:
      a list of runnables, for both CASs
    • type_feature_to_runnable

      private List<Runnable> type_feature_to_runnable(CASImpl cas, String typeName, String featureBaseName, BiFunction<TOP,Feature,Runnable> c)
    • canonicalizeString

      public void canonicalizeString(String typeName, String featureBaseName, String[] items_to_change, String canonical_value)
      Before comparing, you can, for a selected type and feature which has a string value belonging to one of a set of strings, change the value to another (fixed) string which will of course compare equal. Use this to ignore selected string-valued features having particular values.
      Parameters:
      typeName - the fully qualified type name
      featureBaseName - the feature
      items_to_change - an array of strings to change if matched to one of these
      canonical_value - the new value
    • sortFSArray

      public List<Runnable> sortFSArray(String typeName, String featureBaseName)
    • sort_dedup_FSArray

      public List<Runnable> sort_dedup_FSArray(String typeName, String featureBaseName)
    • sortStringArray

      public List<Runnable> sortStringArray(String typeName, String featureBaseName)
    • excludeRootTypesFromIndexes

      public void excludeRootTypesFromIndexes(Set<String> excluded_typeNames)
      The compare can find FeatureStructures to compare either from - being in some index in some view, or - being referenced through some chain which starts with the above. It sometimes helps to exclude miscompares of FeatureStructure like StringArrays which (for some reason) are indexed, in favor of finding these only via refs. You can exclude these from being found via indexes by setting types here. They could still be found via refs from other Feature Structures. Calling this disables any includeOnlyTheseTypesFromIndexes call;
      Parameters:
      excluded_typeNames - type names to exclude
    • excludeCollectionsTypesFromIndexes

      public void excludeCollectionsTypesFromIndexes()
      The compare can find FeatureStructures to compare either from - being in some index in some view, or - being referenced through some chain which starts with the above. It sometimes helps to exclude miscompares of FeatureStructure like StringArrays which (for some reason) are indexed, in favor of finding these only via refs. Call this to exclude the array types: boolean, byte, short, integer, long, float, double, string and fs arrays from being found via indexes. They could still be found via refs from other Feature Structures. Calling this disables any includeOnlyTheseTypesFromIndexes call;
    • excludeListTypesFromIndexes

      public void excludeListTypesFromIndexes()
      The compare can find FeatureStructures to compare either from - being in some index in some view, or - being referenced through some chain which starts with the above. It sometimes helps to exclude miscompares of List FeatureStructures like StringLists which (for some reason) are indexed, in favor of finding these only via refs. Call this to exclude the list types non-empty Float/Integer/String list elements from being found in the index. They could still be found via refs from other Feature Structures. Calling this disables any includeOnlyTheseTypesFromIndexes call;
    • includeOnlyTheseTypesFromIndexes

      public void includeOnlyTheseTypesFromIndexes(List<String> includedTypeNames)
      The compare can find FeatureStructures to compare either from - being in some index in some view, or - being referenced through some chain which starts with the above. It sometimes helps to exclude all types except for a few selected ones which are indexed, in favor of finding these only via refs. Calling this disables any excludeXXXTypesFromIndexes calls;
      Parameters:
      includedTypeNames - fully qualified type names to include when finding Feature Structures to compare via the indexes.
    • addStringCongruenceSet

      public void addStringCongruenceSet(String typeName, String featureBaseName, String[] set_of_strings_that_are_equivalent, int index)
      Add a set of strings that should be considered equal when doing string comparisons. This is conditioned on the typename and feature name
      Parameters:
      typeName - the fully qualified type name
      featureBaseName - the feature short name
      set_of_strings_that_are_equivalent - a set of strings that should compare equal, if testing the type / feature
      index - if the item being compared is a reference to a string array, which index should be compared. Use -1 if not applicable.
    • showProgress

      public static void showProgress()
      call this to show progress of the compare - useful for long compares
    • compareCASes

      public boolean compareCASes()
      This does the actual comparison operation of the previously specified CASes
      Returns:
      true if compare is OK
    • sortFSArray

      public Runnable sortFSArray(FSArray<?> fsArray)
      This is an optional pre-compare operation. Somtimes, when comparing FSArrays, the order of the elements is not significant, and the compare should be done ignoring order differences. This is accomplished by sorting the elements, before the compare is done, using this method. The sort order is not significant; it just needs to be the same order for otherwise equal FSArrays. Use this routine to accomplish the sort, on particular FSArrays you designate. Call it for each one you want to sort. During the sort, links are followed. The sorting is done in a clone of the array, and the original array is not updated. Instead, a Runnable is returned, which may be invoked later to update the original array with the sorted copy. This allows sorting to be done on the original item values (in case the links refer back to the originals)
      Parameters:
      fsArray - the array to be sorted
      Returns:
      a runnable, which (when invoked) updates the original array with the sorted result.
    • sort_dedup_FSArray

      public Runnable sort_dedup_FSArray(TOP fs, Feature feat)
      This is an optional pre-compare operation. It is identical to the method above, except that after sorting, it removes duplicates.
      Parameters:
      fs - the feature structure having the fsarray feature
      feat - the feature having the fsarray
      Returns:
      a runnable, which (when invoked) updates the original array with the sorted result.
    • sortStringArray

      public Runnable sortStringArray(StringArray stringArray)
      This is an optional pre-compare operation. Somtimes, when comparing StringArrays, the order of the elements is not significant, and the compare should be done ignoring order differences. This is accomplished by sorting the elements, before the compare is done, using this method. Use this routine to accomplish the sort, on particular StringArrays you designate. Call it for each one you want to sort. The sorting is done in a clone of the array, and the original array is not updated. Instead, a Runnable is returned, which may be invoked later to update the original array with the sorted copy. This allows sorting to be done while keeping the original values until a later time
      Parameters:
      stringArray - the array to be sorted
      Returns:
      null or a runnable, which (when invoked) updates the original array with the sorted result. callers should insure the runnable is garbage collected after use
    • convert_linear_lists_to_arrays

      private void convert_linear_lists_to_arrays(ArrayList<TOP> fss)
    • convert_to_array

      private void convert_to_array(ArrayList<CommonList> al, ArrayList<TOP> fss, CASImpl view, TypeSystemImpl tsi)
      Convert an array list to a uima array (int, float, fs, string) - add to fss - go thru fss and null out list elements
      Parameters:
      al - -
      fss - -
    • addSuccessors

      private boolean addSuccessors(CommonList node, ArrayList<CommonList> al)
      walk down list, adding successors, looking for loops - each element is added to the array list, and also to the map from id -> array list - if loop found, stop and return false - before adding element, see if already in map from id -> array list -- if so, couple the array lists
      Parameters:
      node - -
      al - -
      Returns:
      false if loop found
    • couple_array_lists

      private void couple_array_lists(ArrayList<CommonList> a1, ArrayList<CommonList> a2, CommonList commonNode)
      merge a2 to follow a1, starting from position where commonNode is in a2
      Parameters:
      a1 - -
      a2 - -
      commonNode - -
    • move_to_non_linear

      private void move_to_non_linear(ArrayList<CommonList> al)
    • clearPrevFss

      private void clearPrevFss()
    • compareFss

      private int compareFss(TOP fs1, TOP fs2, TypeImpl callerTi, FeatureImpl callerFi)
      To work with Java sort, must implement the comparator contract: - compare(x, y) must be opposite compare(y, x) - compare(x, y) invalid input: '<' 0 invalid input: '&'invalid input: '&' compare(y, z) invalid input: '<' 0 implies compare(x, z) invalid input: '<' 0 - compare(x, y) == 0 implies compare(x, z) same as compare(y, z) for any z Inner part of compareRefs; that other method adds: null-check type-mapping skips loop determination If not in a sort context, a miscompare generates messaging information.
      Parameters:
      callerTi - - the type of another FS referencing this one, or null, used in congruence set testing
      callerFi - - the feature of the another FS referencing this one, or null, used in congruence set testing
      Returns:
      the compare result
    • compareFeature

      private int compareFeature(TOP fs1, TOP fs2, TypeImpl ti1, FeatureImpl fi1)
    • computeFeatLists

      private CasCompare.FeatLists computeFeatLists(TypeImpl ti)
      called during sort phase
      Parameters:
      ti - - type being sorted
      Returns:
      the feature lists for that type
    • compareFssArray

      private int compareFssArray(TOP fs1, TOP fs2, TypeImpl callerTi, FeatureImpl callerFi)
    • compareSlot

      private int compareSlot(TOP fs1, TOP fs2, FeatureImpl fi1, FeatureImpl fi2, TypeImpl ti1)
    • compareRefs

      private int compareRefs(TOP rfs1, TOP rfs2, TypeImpl callerTi, FeatureImpl callerFi)
      Two uses cases supported: - comparing for sorting (within on type system) -- goal is to be able to compare two CASes --- ordering must guarantee that the equal FSs appear in the --- same order - comparing two FSs (maybe from different CASes) -- supporting missing types and features -- happens when the two type systems are different -- the missing types and features are ignored in the comparison Different reference chains This compare routine may be called recursively - use case: FS(a) has slot which is ref to FS(b) which has slot which is ref to FS(c) -- the type of a, b, c may all be different. Two reference chains for the two arguments may have different structures - Difference in two ways: -- length of unique (same fs_id) FSs -- length of loop (if one exists at the end reached so far) IMPORTANT: the following 2 chains have different lengths, but this won't be discovered if the recursive descent stops too soon: - a -> b -> c ( -> b ) - c -> b ( -> c) At the 2nd recursion, we have b vs b, but haven't explored the chain deeply enough to know the first one has length 3, and the 2nd length 2. Meaning of comparision of two refs: - recursively defined - captures notion of reference chain differences -- means if two refs compare 0, the result may still be non-0 if the reference chains to get to these are different -- first compare on length of unique FSs -- if ==, compare on length of loop - if comparing (use case 2, two different type systems) with type not existing in other type system, skip (treat as 0). If comparing two FSs in 1 CAS, where there is type mapping, if the mapping to the other CAS is null, change the value of the FS to null to match the sort order the other CAS will haveand that mapping is to null (because the type is missing), use null for the argument(s). Complexities: the type rfs1 may not be in the target type system. For this case - treat rfs2 == null as "equal", rfs2 != null as not equal (always gt) Is assymetrical (?) - same logic isn't applied for reverse case.
      Parameters:
      rfs1 - -
      rfs2 - -
      fi - -
      Returns:
      -
    • compareRefResult

      private int compareRefResult(TOP rfs1, TOP rfs2)
      Returning because recursion detected a loop.
      Parameters:
      rfs1 - -
      rfs2 - -
      Returns:
      - -1 if ref chain 1 length invalid input: '<' ref chain 2 length or is the same length but loop length 1 invalid input: '<' 2 1 if ref chain 1 length > ref chain 2 length or is the same length but loop length 1 > 2 0 if ref chain lengths are the same and loop length is the same Exception: if one of the items is a canonical "empty" list element, and the other is a non-canonical one - treat as equal.
    • compareAllArrayElements

      private int compareAllArrayElements(TOP fs1, TOP fs2, int len, IntUnaryOperator c, TypeImpl callerTi, FeatureImpl callerFi)
    • compareStringsWithNull

      private int compareStringsWithNull(String s1, String s2, TypeImpl t, FeatureImpl f, int index)
    • mismatchFsDisplay

      private void mismatchFsDisplay()
    • mismatchFs

      private void mismatchFs(TOP fs1, TOP fs2, TypeImpl callerTi, FeatureImpl callerFi)
    • mismatchFs

      private void mismatchFs(TOP fs1, TOP fs2, Feature fi, Feature fi2)
    • mismatchFs

      private void mismatchFs(TOP fs1, TOP fs2, String msg, TypeImpl callerTi, FeatureImpl callerFi)
    • sort

      private void sort(List<TOP> fss)
      called to sort all the FSs before doing the equality compares
    • sortCompare

      private int sortCompare(TOP scFs1, TOP scFs2)
      Used for sorting within one type system, for two instances of the same type Uses field isSrcCas (boolean) to differentiate when being used to sort for srcCas vs tgtCas When sorting where type mapping is happening between source and target CASs, skip compares for features which are not in the opposite CAS.
      Parameters:
      scFs1 - -
      scFs2 - -
      Returns:
      -
    • isTypeInTgt

      private boolean isTypeInTgt(TOP fs)
    • ps

      private String ps(TOP fs)
    • compareNumberOfFSsByType

      public static StringBuilder compareNumberOfFSsByType(CAS cas1, CAS cas2)
      Counts and compares the number of Feature Structures, by type, and generates a report
      Parameters:
      cas1 - first CAS to compare
      cas2 - second CAS to compare
      Returns:
      a StringBuilder with a report