GITBOOK-608: add docs for AI clustering

This commit is contained in:
Mike Solomon
2024-08-13 15:58:11 +00:00
committed by gitbook-bot
parent dbab3a34ef
commit af4c619715
21 changed files with 64 additions and 9 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 398 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.1 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 75 KiB

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 22 KiB

After

Width:  |  Height:  |  Size: 163 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 163 KiB

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.9 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 4.9 KiB

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 26 KiB

After

Width:  |  Height:  |  Size: 700 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 700 KiB

After

Width:  |  Height:  |  Size: 114 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 114 KiB

After

Width:  |  Height:  |  Size: 202 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 202 KiB

After

Width:  |  Height:  |  Size: 163 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 163 KiB

After

Width:  |  Height:  |  Size: 124 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 124 KiB

After

Width:  |  Height:  |  Size: 71 KiB

View File

@@ -26,6 +26,7 @@
* [How to track migration status with Moderne](user-documentation/moderne-platform/how-to-guides/track-migrations.md)
* [How to upgrade transitive dependencies](user-documentation/moderne-platform/how-to-guides/transitive-dependencies.md)
* [How to find method invocations based on a pattern](user-documentation/moderne-platform/how-to-guides/how-to-find-method-invocations-based-on-a-pattern.md)
* [How to gain a high-level overview of your codebase using clustering](user-documentation/moderne-platform/how-to-guides/how-to-gain-a-high-level-overview-of-your-codebase-using-clustering.md)
* [References](user-documentation/moderne-platform/references/README.md)
* [Moderne tokens](user-documentation/moderne-platform/references/moderne-tokens.md)
* [Creating SCM access tokens](user-documentation/moderne-platform/references/create-scm-access-tokens.md)

View File

@@ -24,7 +24,7 @@ To download audit logs, use the "Export to CEF" button: ![](<../../../.gitbook/a
To access non-audit-log reports, navigate to `https://<TENANT>.moderne.io/admin/reports`.
<figure><img src="../../../.gitbook/assets/image (2) (1) (1) (1) (1).png" alt=""><figcaption></figcaption></figure>
<figure><img src="../../../.gitbook/assets/image (2) (1) (1) (1) (1) (1).png" alt=""><figcaption></figcaption></figure>
These reports can be downloaded using the download button: ![](<../../../.gitbook/assets/image (3) (1) (1) (1) (1).png>)

View File

@@ -1,12 +1,12 @@
# Platform changelog
### UI v10.139.2 (2024/08/09)
- Bug fixes and other improvements.
* Bug fixes and other improvements.
### UI v10.139.1 (2024/08/09)
- Bug fixes and other improvements.
* Bug fixes and other improvements.
### UI v10.139.0 (2024/08/07)
@@ -58,7 +58,7 @@ In this release we have made various improvements to the new builder to increase
* We received feedback that the options were hard to discover. We have begun to address this by making the options panel auto open when the selected recipe has options and also auto expand to fit more options before having to scroll. Note the options have moved to the lower right now:\
![](<../.gitbook/assets/image (2) (4).png>)
* We found that some users were not aware of the recipe menu and the options available there so we have made the button more visible by adding a label:\
<img src="../.gitbook/assets/image (1) (1) (3).png" alt="" data-size="original">
<img src="../.gitbook/assets/image (1) (1) (3) (1).png" alt="" data-size="original">
### UI v10.131.0 (2024/07/22)
@@ -1126,7 +1126,7 @@ Now you can see the latest version number of the CLI before downloading.
<div align="left" data-full-width="false">
<figure><img src="../.gitbook/assets/image (2) (1) (1) (1) (1) (1).png" alt="" width="176"><figcaption></figcaption></figure>
<figure><img src="../.gitbook/assets/image (2) (1) (1) (1) (1) (1) (1).png" alt="" width="176"><figcaption></figcaption></figure>
</div>

View File

@@ -26,7 +26,7 @@ This opens a small menu which allows you to do three things:
For more information about creating search recipes using the Moderne plugin, check out our [recipe creation guide](creating-recipes.md).
{% endhint %}
<figure><img src="../../../.gitbook/assets/image (2) (1) (1).png" alt="" width="563"><figcaption><p><code>Run Find Recipe</code> kicks off a recipe run using OpenRewrite's <code>Find method usages</code>.</p></figcaption></figure>
<figure><img src="../../../.gitbook/assets/image (2) (1) (1) (1).png" alt="" width="563"><figcaption><p><code>Run Find Recipe</code> kicks off a recipe run using OpenRewrite's <code>Find method usages</code>.</p></figcaption></figure>
If you choose to initiate the search via `Run Find Recipe`, you will immediately see a new Usages window open in the IDE, and a progress bar that shows which repository in the multi-repo the recipe is currently running on. Amazingly, the results of this OpenRewrite recipe have been brought directly back into the IDE and surfaced in the Usages view that engineers are already familiar with.

View File

@@ -30,7 +30,7 @@ Please note, though, that this AI recipe is designed to find results even if the
3. Press `Dry Run` to kick off the recipe.
4. On the recipe results page, you can click on any repository to see the code that matches the method you specified. For instance, if you searched for `Java.util.List add(..)` you might see results like:
<figure><img src="../../../.gitbook/assets/image.png" alt="" width="563"><figcaption><p>Find method usages results</p></figcaption></figure>
<figure><img src="../../../.gitbook/assets/image (2).png" alt="" width="563"><figcaption><p>Find method usages results</p></figcaption></figure>
### Find method invocations that resemble a pattern
@@ -51,7 +51,7 @@ The AI search is aware of the arguments inside the method invocation. If you are
4. With those two options specified, press `Dry Run` to kick off the recipe.
5. On the recipe results page, you can click on any repository to see the code that matches the method you described. For instance, if you searched for `HTTP Request` - you might see results like:
<figure><img src="../../../.gitbook/assets/image (1).png" alt="" width="563"><figcaption><p>Find method invocations that resemble a pattern results</p></figcaption></figure>
<figure><img src="../../../.gitbook/assets/image (1) (1).png" alt="" width="563"><figcaption><p>Find method invocations that resemble a pattern results</p></figcaption></figure>
## Gain insight from the results

View File

@@ -0,0 +1,54 @@
---
coverY: 0
---
# How to gain a high-level overview of your codebase using clustering
<figure><img src="../../../.gitbook/assets/clustering_methods.gif" alt=""><figcaption></figcaption></figure>
## Why is this useful?&#x20;
A visualization that clusters all the method declarations in a codebase is particularly useful for understanding the overall structure and organization of the code. By grouping related methods or classes together, it allows developers to quickly grasp how different parts of the codebase are connected. This, in turn, makes it easier to navigate and comprehend, especially in large projects.&#x20;
Furthermore, this type of visualization can reveal patterns that indicate potential “code smells” or areas where refactoring might be needed. For example, by highlighting clusters of methods that naturally belong together but are dispersed across different classes, the visualization can help identify opportunities for refactoring to enhance code cohesion and modularity.&#x20;
It also allows developers to quickly spot method names that dont follow established naming conventions, such as discovering a method named `fetchData` in a cluster where the convention is to use `retrieveData`.
## Concepts and terminology
### Embeddings
Embeddings are numerical representations of data concepts that AI models can operate on. A data concept can be an image, a word, a document, chunks of a document, or even a method declaration. Since embeddings are vectors of floats, they can be used to do arithmetics. With the basis that embeddings represent entities, if two entitiess embeddings are close to each other numerically, the two entities will be similar. For example, you might think that “love” and “hate” would be far from each other, but they tend to have similar embeddings. They both are emotions that people use to define a relationship in regards to something or someone else.&#x20;
<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXd_nMwI-2zzXP2owUu25QhcKu267qySjyS1kaUfjmyb72Y1mVm8jtGrMowh7j9W2KFCfNtrRZ-iJDuePzzkXkGUhxxoFaOM4rHu3C3GHAGyjnqWA4A3Jq4qUgE3qRxepLkzs0hUFKqWNkvI289VsVRWNL_d?key=0rfyGw4SLZE5ORet2TfwJg" alt=""><figcaption></figcaption></figure>
### Clustering
Clustering is the process of grouping similar objects together based on their features. It involves dividing a dataset into clusters, where objects within the same cluster are more similar to each other than to those in other clusters. Using the example above, we could see two clusters: one containing “hate”, “love” and “adore”, and another containing “table”.
## How to produce the visualization
Before starting the visualization, you must first run a recipe which collects the embeddings for each method or class (depending on your preference). Select `methods` or `classes` depending on which one you want to analyze.
<figure><img src="../../../.gitbook/assets/image.png" alt=""><figcaption></figcaption></figure>
Once the recipe has finished running, click on the visualization tab and run the "clustering code snippets" visualization.&#x20;
<figure><img src="../../../.gitbook/assets/image (1).png" alt=""><figcaption></figcaption></figure>
As a result you will get a 2D scatter plot, where each dot represents either a method or a class depending on what you selected. You can hover over any dot to reveal the name of the method or class. The closer two dots are, the more likely their contents are similar. For instance, you can expect to find methods that read, write, or delete files near each other.&#x20;
<figure><img src="../../../.gitbook/assets/Screenshot 2024-08-12 at 5.09.12PM.png" alt=""><figcaption></figcaption></figure>
## What information can you extract from this?
* You could find methods that do similar things but have different names, which could be refactored into a more cohesive design, such as creating a class that extends another to group similar functionality.
* You can identify inconsistencies in naming conventions across the codebase, such as methods that perform similar actions but are named differently, which could be standardized for clarity.
* It allows you to see which methods (or classes) are most closely related, potentially revealing opportunities for optimizing the codebase by improving modularity or reducing dependencies.
## Gotchas & tips and tricks
* Note that the methods are deduplicated, so only one dot represents a method even if it appears multiple times in the codebase. The deduplication is based on the full method declaration, not just the name or signature.
* Running the visualization on too many repositories with too many methods or classes can make the information overwhelming and difficult to digest, so its best to narrow the scope when possible.
* The embeddings used for clustering are based on the content of the method (or class) as well, not just the method (or class) names, which helps in grouping methods (or class) that perform similar tasks even if they have different names.
* While clusters can provide useful information about related methods, the relative positions of the clusters within the visualization also hold significant insights. One dot may be perfectly in between two clusters, so its position conveys more information than the cluster it was assigned to.\