chore: refactor

This commit is contained in:
guojianzhu 2024-08-19 22:17:07 +08:00
parent 43f78039dd
commit 5c2cd63937
13 changed files with 39 additions and 33 deletions

2
app.py
View File

@ -244,7 +244,7 @@ with gr.Blocks(theme=gr.themes.Soft(font=[gr.themes.GoogleFont("Plus Jakarta San
flag_relative_input = gr.Checkbox(value=True, label="relative motion") flag_relative_input = gr.Checkbox(value=True, label="relative motion")
flag_remap_input = gr.Checkbox(value=True, label="paste-back") flag_remap_input = gr.Checkbox(value=True, label="paste-back")
flag_stitching_input = gr.Checkbox(value=True, label="stitching") flag_stitching_input = gr.Checkbox(value=True, label="stitching")
animation_region = gr.Radio(["exp", "pose", "lip", "eyes", "all"], value="exp", label="animation region") animation_region = gr.Radio(["exp", "pose", "lip", "eyes", "all"], value="all", label="animation region")
driving_option_input = gr.Radio(['expression-friendly', 'pose-friendly'], value="expression-friendly", label="driving option (i2v)") driving_option_input = gr.Radio(['expression-friendly', 'pose-friendly'], value="expression-friendly", label="driving option (i2v)")
driving_multiplier = gr.Number(value=1.0, label="driving multiplier (i2v)", minimum=0.0, maximum=2.0, step=0.02) driving_multiplier = gr.Number(value=1.0, label="driving multiplier (i2v)", minimum=0.0, maximum=2.0, step=0.02)
driving_smooth_observation_variance = gr.Number(value=3e-7, label="motion smooth strength (v2v)", minimum=1e-11, maximum=1e-2, step=1e-8) driving_smooth_observation_variance = gr.Number(value=3e-7, label="motion smooth strength (v2v)", minimum=1e-11, maximum=1e-2, step=1e-8)

View File

@ -1,49 +1,55 @@
# Image-driven animation and regional animation mode ## Image Driven and Regional Control
Inspired by [ComfyUI-AdvancedLivePortrait](https://github.com/PowerHouseMan/ComfyUI-AdvancedLivePortrait) ([@PowerHouseMan](https://github.com/PowerHouseMan)), we added the image-driven portrait animation and image-driven portrait video editing, as well as the regional animation mode. We also improved the absolute driving mode. You can now **use an image as a driving signal** to drive the source image or video! Additionally, we **have refined the driving options to support expressions, pose, lips, eyes, or all** (all is consistent with the previous default method), which we name it regional control. The control is becoming more and more precise! 🎯
## Arguments Description > Please note that image-based driving or regional control may not perform well in certain cases. Feel free to try different options, and be patient. 😊
You can modify the argument options in `src/config/argument_config.py`. Below we explain the newly added arguments and the arguments that may confuse. ### CLI Usage
It's very simple to use an image as a driving reference. Just set the `-d` argument to the driving image:
### Image-driven portrait animation and portrait video editing ###
- **driving**: You can set the image path of the driving option to your given driving image path as below:
```bash ```bash
python inference.py -d assets/examples/driving/d30.jpg python inference.py -s assets/examples/source/s5.jpg -d assets/examples/driving/d30.jpg
``` ```
You can also upload the driving image to the corresponding location in the Gradio interface.
To change the `animation_region` option, you can use the `--animation_region` argument to `exp`, `pose`, `lip`, `eyes`, or `all`. For example, to only drive the lip region, you can run by:
```bash
# only driving the lip region
python inference.py -s assets/examples/source/s5.jpg -d assets/examples/driving/d0.mp4 --animation_region lip
```
### Gradio Interface
<p align="center"> <p align="center">
<img src="../image-driven-portrait-animation-2024-08-19.jpg" alt="LivePortrait" width="960px"> <img src="../image-driven-portrait-animation-2024-08-19.jpg" alt="LivePortrait" width="960px">
<br> <br>
Image-driven portrait animation <strong>Image-driven Portrait Animation and Regional Control</strong>
<br><br>
<img src="../image-driven-portrait-video-editing-2024-08-19.jpg" alt="LivePortrait" width="960px">
<br>
Image-driven portrait video editing
</p> </p>
- **flag_relative_motion**: If the driving input is an image and you set `flag_relative_motion` to true, then the driving motion will be the motion deformation between the driving image and its canonical image. If you set `flag_relative_motion` to false, the driving motion will be the absolute motion of the driving image, which may result in greater expression driving strength and a certain degree of identity leakage. `flag_relative_motion` corresponds to the `relative motion` option in the Gradio interface. ### More Detailed Explanation
In addition, if both source input and driving input are images, the output will be an image. If source input is a video and driving input is an image, the animated result is a video, and the driving motion of each frame is provided by the driving image. In the Gradio interface, the output will be automatically saved in the corresponding format (image or video) and displayed in the corresponding window. **flag_relative_motion**:
When using an image as the driving input, setting `--flag_relative_motion` to true will apply the motion deformation between the driving image and its canonical form. If set to false, the absolute motion of the driving image is used, which may amplify expression driving strength but could also cause identity leakage. This option corresponds to the `relative motion` toggle in the Gradio interface. Additionally, if both source and driving inputs are images, the output will be an image. If the source is a video and the driving input is an image, the output will be a video, with each frame driven by the image's motion. The Gradio interface automatically saves and displays the output in the appropriate format.
### Regional animation mode ### **animation_region**:
- **animation_region**: This argument contains five options. The `exp` option means that only the expression part of the driving input is used to drive the source input, `pose` means that only the head pose part of the driving input is used to drive the source input, `lip` means that only the lip movement part of the driving input is used to drive the source input, `eyes` means that only the eye movement part of the driving input is used to drive the source input, and `all` means that all motions of the driving input are used to drive the source input. Alternatively, you can also select the corresponding option in the following area in the gradio interface. This argument offers five options:
<p align="center"> - `exp`: Only the expression of the driving input influences the source.
<img src="../regional-animation-2024-08-19.jpg" alt="LivePortrait" width="659px"> - `pose`: Only the head pose drives the source.
<br> - `lip`: Only lip movement drives the source.
Regional animation mode - `eyes`: Only eye movement drives the source.
</p> - `all`: All motions from the driving input are applied.
### Editing the lip region of the source video to the neutral expression ### You can also select these options directly in the Gradio interface.
Some issues mentioned that in the `Retargeting Video` of the Gradio interface, it is hoped that not only the lips can be closed, but also the lip region of the source video can be edited into a neutral expression. Therefore, we provide a `keeping the lip silent` option. When you select this option, the lip region of the animated video will be the neutral expression. Since this uses a mode similar to absolute driving, this may cause inter-frame jitter or identity leakage in the animated video. Note that the neutral expression sometimes has a slightly open mouth.
### More explanation ### **Editing the Lip Region of the Source Video to a Neutral Expression**:
When both the source input and the driving input are videos, some issues mentioned that the output motion is a mixture of the source video and the driving video motions. This is because the `flag_relative_motion` argument is enabled by default. In other words, we are using relative driving, not absolute driving. The difference between the two is that `flag_relative_motion` means that the motion offset of the current driving frame relative to the first driving frame will be added to the motion of the source frame as the final driving motion, while `no_flag_relative_motion` means that the motion of the current driving frame will be directly used as the final driving motion. In response to requests for a more neutral lip region in the `Retargeting Video` of the Gradio interface, we've added a `keeping the lip silent` option. When selected, the animated video's lip region will adopt a neutral expression. However, this may cause inter-frame jitter or identity leakage, as it uses a mode similar to absolute driving. Note that the neutral expression may sometimes feature a slightly open mouth.
So if you want to keep only the motion of the driving video in the animated video, you can use: **Others**:
When both source and driving inputs are videos, the output motion may be a blend of both, due to the default setting of `--flag_relative_motion`. This option uses relative driving, where the motion offset of the current driving frame relative to the first driving frame is added to the source frame's motion. In contrast, `--no_flag_relative_motion` applies the driving frame's motion directly as the final driving motion.
For CLI usage, to retain only the driving video's motion in the output, use:
```bash ```bash
python inference.py --no_flag_relative_motion python inference.py --no_flag_relative_motion
``` ```
Or you can uncheck the `relative motion` option in the Gradio interface. Using absolute driving may cause jitter or identity leakage in the animated video. In the Gradio interface, simply uncheck the relative motion option. Note that absolute driving may cause jitter or identity leakage in the animated video.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.5 MiB

After

Width:  |  Height:  |  Size: 544 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 210 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 777 KiB

After

Width:  |  Height:  |  Size: 97 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 240 KiB

After

Width:  |  Height:  |  Size: 67 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 29 KiB

After

Width:  |  Height:  |  Size: 75 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 827 KiB

After

Width:  |  Height:  |  Size: 74 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 639 KiB

After

Width:  |  Height:  |  Size: 91 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 717 KiB

After

Width:  |  Height:  |  Size: 82 KiB

View File

@ -38,8 +38,8 @@
## 🔥 Updates ## 🔥 Updates
- **`2024/08/19`**: 🖼️ We support **image driven and regional control**, insipred by [ComfyUI-AdvancedLivePortrait](https://github.com/PowerHouseMan/ComfyUI-AdvancedLivePortrait). See [**here**](./assets/docs/changelog/2024-08-19.md). - **`2024/08/19`**: 🖼️ We support **image driven mode** and **regional control**. For details, see [**here**](./assets/docs/changelog/2024-08-19.md).
- **`2024/08/06`**: 🎨 We support **precise portrait editing** in the Gradio interface, insipred by [ComfyUI-AdvancedLivePortrait](https://github.com/PowerHouseMan/ComfyUI-AdvancedLivePortrait). See [**here**](./assets/docs/changelog/2024-08-06.md). - **`2024/08/06`**: 🎨 We support **precise portrait editing** in the Gradio interface, inspired by [ComfyUI-AdvancedLivePortrait](https://github.com/PowerHouseMan/ComfyUI-AdvancedLivePortrait). See [**here**](./assets/docs/changelog/2024-08-06.md).
- **`2024/08/05`**: 📦 Windows users can now download the [one-click installer](https://huggingface.co/cleardusk/LivePortrait-Windows/blob/main/LivePortrait-Windows-v20240806.zip) for Humans mode and **Animals mode** now! For details, see [**here**](./assets/docs/changelog/2024-08-05.md). - **`2024/08/05`**: 📦 Windows users can now download the [one-click installer](https://huggingface.co/cleardusk/LivePortrait-Windows/blob/main/LivePortrait-Windows-v20240806.zip) for Humans mode and **Animals mode** now! For details, see [**here**](./assets/docs/changelog/2024-08-05.md).
- **`2024/08/02`**: 😸 We released a version of the **Animals model**, along with several other updates and improvements. Check out the details [**here**](./assets/docs/changelog/2024-08-02.md)! - **`2024/08/02`**: 😸 We released a version of the **Animals model**, along with several other updates and improvements. Check out the details [**here**](./assets/docs/changelog/2024-08-02.md)!
- **`2024/07/25`**: 📦 Windows users can now download the package from [HuggingFace](https://huggingface.co/cleardusk/LivePortrait-Windows/tree/main). Simply unzip and double-click `run_windows.bat` to enjoy! - **`2024/07/25`**: 📦 Windows users can now download the package from [HuggingFace](https://huggingface.co/cleardusk/LivePortrait-Windows/tree/main). Simply unzip and double-click `run_windows.bat` to enjoy!

View File

@ -34,7 +34,7 @@ class ArgumentConfig(PrintableConfig):
driving_multiplier: float = 1.0 # be used only when driving_option is "expression-friendly" driving_multiplier: float = 1.0 # be used only when driving_option is "expression-friendly"
driving_smooth_observation_variance: float = 3e-7 # smooth strength scalar for the animated video when the input is a source video, the larger the number, the smoother the animated video; too much smoothness would result in loss of motion accuracy driving_smooth_observation_variance: float = 3e-7 # smooth strength scalar for the animated video when the input is a source video, the larger the number, the smoother the animated video; too much smoothness would result in loss of motion accuracy
audio_priority: Literal['source', 'driving'] = 'driving' # whether to use the audio from source or driving video audio_priority: Literal['source', 'driving'] = 'driving' # whether to use the audio from source or driving video
animation_region: Literal["exp", "pose", "lip", "eyes", "all"] = "exp" # the region where the animation was performed, "exp" means the expression, "pose" means the head pose animation_region: Literal["exp", "pose", "lip", "eyes", "all"] = "all" # the region where the animation was performed, "exp" means the expression, "pose" means the head pose, "all" means all regions
########## source crop arguments ########## ########## source crop arguments ##########
det_thresh: float = 0.15 # detection threshold det_thresh: float = 0.15 # detection threshold
scale: float = 2.3 # the ratio of face area is smaller if scale is larger scale: float = 2.3 # the ratio of face area is smaller if scale is larger