LivePortrait/assets/docs/changelog/2024-08-19.md
Jianzhu Guo a19c3e15fb
feat: support image driven mode and regional control (#336)
* feat: update

* feat: update

* feat: update

* feat: update

* feat: update

* feat: update

* feat: image driven, regional animation

* feat: image driven, regional animation

* feat: image driven, regional animation, doc

* feat: image driven, regional animation

* feat: image driven, regional animation

* feat: image driven, regional animation

* chore: refactor

* doc: update readme

* doc: update changelog

* feat: image driven, regional control

---------

Co-authored-by: zhangdingyun <zhangdingyun@kuaishou.com>
2024-08-19 23:08:57 +08:00

60 lines
4.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## Image Driven and Regional Control
You can now **use an image as a driving signal** to drive the source image or video! Additionally, we **have refined the driving options to support expressions, pose, lips, eyes, or all** (all is consistent with the previous default method), which we name it regional control. The control is becoming more and more precise! 🎯
> Please note that image-based driving or regional control may not perform well in certain cases. Feel free to try different options, and be patient. 😊
> [!Note]
> We recognize that the project now offers more options, which have become increasingly complex, but due to our limited team capacity and resources, we havent fully documented them yet. We ask for your understanding and will work to improve the documentation over time. Contributions via PRs are welcome! If anyone is considering donating or sponsoring, feel free to leave a message in the GitHub Issues or Discussions. We will set up a payment account to reward the team members or support additional efforts in maintaining the project. 💖
### CLI Usage
It's very simple to use an image as a driving reference. Just set the `-d` argument to the driving image:
```bash
python inference.py -s assets/examples/source/s5.jpg -d assets/examples/driving/d30.jpg
```
To change the `animation_region` option, you can use the `--animation_region` argument to `exp`, `pose`, `lip`, `eyes`, or `all`. For example, to only drive the lip region, you can run by:
```bash
# only driving the lip region
python inference.py -s assets/examples/source/s5.jpg -d assets/examples/driving/d0.mp4 --animation_region lip
```
### Gradio Interface
<p align="center">
<img src="../image-driven-portrait-animation-2024-08-19.jpg" alt="LivePortrait" width="960px">
<br>
<strong>Image-driven Portrait Animation and Regional Control</strong>
</p>
### More Detailed Explanation
**flag_relative_motion**:
When using an image as the driving input, setting `--flag_relative_motion` to true will apply the motion deformation between the driving image and its canonical form. If set to false, the absolute motion of the driving image is used, which may amplify expression driving strength but could also cause identity leakage. This option corresponds to the `relative motion` toggle in the Gradio interface. Additionally, if both source and driving inputs are images, the output will be an image. If the source is a video and the driving input is an image, the output will be a video, with each frame driven by the image's motion. The Gradio interface automatically saves and displays the output in the appropriate format.
**animation_region**:
This argument offers five options:
- `exp`: Only the expression of the driving input influences the source.
- `pose`: Only the head pose drives the source.
- `lip`: Only lip movement drives the source.
- `eyes`: Only eye movement drives the source.
- `all`: All motions from the driving input are applied.
You can also select these options directly in the Gradio interface.
**Editing the Lip Region of the Source Video to a Neutral Expression**:
In response to requests for a more neutral lip region in the `Retargeting Video` of the Gradio interface, we've added a `keeping the lip silent` option. When selected, the animated video's lip region will adopt a neutral expression. However, this may cause inter-frame jitter or identity leakage, as it uses a mode similar to absolute driving. Note that the neutral expression may sometimes feature a slightly open mouth.
**Others**:
When both source and driving inputs are videos, the output motion may be a blend of both, due to the default setting of `--flag_relative_motion`. This option uses relative driving, where the motion offset of the current driving frame relative to the first driving frame is added to the source frame's motion. In contrast, `--no_flag_relative_motion` applies the driving frame's motion directly as the final driving motion.
For CLI usage, to retain only the driving video's motion in the output, use:
```bash
python inference.py --no_flag_relative_motion
```
In the Gradio interface, simply uncheck the relative motion option. Note that absolute driving may cause jitter or identity leakage in the animated video.