Skip to content

Segmentation fault applying LoRA to Flux model in GPU accerlated program #729

@mimi3421

Description

@mimi3421

Applying LoRA to Flux-dev or Flux-Kontext-dev will cause segmentation fault and crash at the moment when LoRA tensors are loaded successfully in GPU accerlated program (both Vulkan and CUDA) on both Ubuntu or Windows. While the program runs normally in the CPU version.

The program runs like this:

# https://github.com/leejet/stable-diffusion.cpp/releases/tag/master-0739361
# https://civitai.com/models/1737381/deblur
sd --diffusion-model  "./model/flux1-kontext-dev-Q3_K_M.gguf" --vae "./model/ae.safetensors" --clip_l "./model/clip_l.safetensors" \
    --t5xxl "./model/t5xxl_fp8_e4m3fn.safetensors" --lora-model-dir "./model" \
    -H 512 -W 512 --vae-tiling --diffusion-fa --clip-on-cpu --cfg-scale 1.0 --sampling-method euler -v \
    -r "flux1-dev-q8_0.png" -p "deblur<lora:fluxKontextDeblur.HwEy:1>"

The output of the failed run (with gdb) is like this:

gdb --args sd --threads 20 --diffusion-model  "flux1-kontext-dev-Q3_K_M.gguf" --vae "ae.safetensors" --clip_l "clip_l.safetensors" --t5xxl "t5xxl_fp8_e4m3fn.safetensors" --lora-model-dir "./model" -H 640 -W 768 --vae-tiling --diffusion-fa --clip-on-cpu --cfg-scale 1.0 --sampling-method euler -v -r "flux1-dev-q8_0.png" -p "deblur<lora:fluxKontextDeblur.HwEy:1>"

(gdb) r
Starting program: sd --threads 20 --diffusion-model flux1-kontext-dev-Q3_K_M.gguf --vae ae.safetensors --clip_l clip_l.safetensors --t5xxl t5xxl_fp8_e4m3fn.safetensors --lora-model-dir model -H 640 -W 768 --vae-tiling --diffusion-fa --clip-on-cpu --cfg-scale 1.0 --sampling-method euler -v -r flux1-dev-q8_0.png -p deblur\<lora:fluxKontextDeblur.HwEy:1\>
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Option:
    n_threads:         20
    mode:              img_gen
    model_path:
    wtype:             unspecified
    clip_l_path:       clip_l.safetensors
    clip_g_path:
    t5xxl_path:        t5xxl_fp8_e4m3fn.safetensors
    diffusion_model_path:   flux1-kontext-dev-Q3_K_M.gguf
    vae_path:          ae.safetensors
    taesd_path:
    esrgan_path:
    control_net_path:
    embedding_dir:
    control_net_path:
    embedding_dir:
    stacked_id_embed_dir:
    input_id_images_path:
    style ratio:       20.00
    normalize input image :  false
    output_path:       output.png
    init_img:
    mask_img:
    control_image:
    ref_images_paths:
        flux1-dev-q8_0.png
    clip on cpu:       true
    controlnet cpu:    false
    vae decoder on cpu:false
    diffusion flash attention:true
    strength(control): 0.90
    prompt:            deblur<lora:fluxKontextDeblur.HwEy:1>
    negative_prompt:
    min_cfg:           1.00
    cfg_scale:         1.00
    img_cfg_scale:     1.00
    slg_scale:         0.00
    guidance:          3.50
    eta:               0.00
    clip_skip:         -1
    width:             768
    height:            640
    sample_method:     euler
    schedule:          default
    sample_steps:      20
    strength(img2img): 0.75
    rng:               cuda
    seed:              42
    batch_count:       1
    vae_tiling:        true
    upscale_repeats:   1
    chroma_use_dit_mask:   true
    chroma_use_t5_mask:    false
    chroma_t5_mask_pad:    1
System Info:
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[DEBUG] stable-diffusion.cpp:145  - Using Vulkan backend
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 3080 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
[New Thread 0x7ffff766c700 (LWP 2612895)]
[New Thread 0x7fffe79d0700 (LWP 2612896)]
[New Thread 0x7fffe6fc3700 (LWP 2612897)]

[New Thread 0x7fffe79d0700 (LWP 2612896)]

[New Thread 0x7fffe6fc3700 (LWP 2612897)]
[New Thread 0x7fffe6342700 (LWP 2612899)]
[INFO ] stable-diffusion.cpp:198  - loading diffusion model from 'flux1-kontext-dev-Q3_K_M.gguf'
[INFO ] model.cpp:995  - load flux1-kontext-dev-Q3_K_M.gguf using gguf format
[DEBUG] model.cpp:1012 - init from 'flux1-kontext-dev-Q3_K_M.gguf'
[INFO ] stable-diffusion.cpp:207  - loading clip_l from 'clip_l.safetensors'
[INFO ] model.cpp:998  - load clip_l.safetensors using safetensors format
[DEBUG] model.cpp:1073 - init from 'clip_l.safetensors'
[INFO ] stable-diffusion.cpp:223  - loading t5xxl from 't5xxl_fp8_e4m3fn.safetensors'
[INFO ] model.cpp:998  - load t5xxl_fp8_e4m3fn.safetensors using safetensors format
[DEBUG] model.cpp:1073 - init from 't5xxl_fp8_e4m3fn.safetensors'
[INFO ] stable-diffusion.cpp:230  - loading vae from 'ae.safetensors'
[INFO ] model.cpp:998  - load ae.safetensors using safetensors format
[DEBUG] model.cpp:1073 - init from 'ae.safetensors'
[INFO ] stable-diffusion.cpp:242  - Version: Flux
[INFO ] stable-diffusion.cpp:276  - Weight type:                 q3_K
[INFO ] stable-diffusion.cpp:277  - Conditioner weight type:     f16
[INFO ] stable-diffusion.cpp:278  - Diffusion model weight type: q3_K
[INFO ] stable-diffusion.cpp:279  - VAE weight type:             f32
[DEBUG] stable-diffusion.cpp:281  - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:322  - set clip_on_cpu to true
[INFO ] stable-diffusion.cpp:325  - CLIP: Using CPU backend
[INFO ] stable-diffusion.cpp:329  - Using flash attention in the diffusion model
[DEBUG] clip.hpp:171  - vocab size: 49408
[DEBUG] clip.hpp:182  -  trigger word img already in vocab
[INFO ] flux.hpp:1096 - Flux blocks: 19 double, 38 single
[DEBUG] ggml_extend.hpp:1193 - clip params backend buffer size =  307.44 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1193 - t5 params backend buffer size =  9083.77 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1193 - flux params backend buffer size =  5119.74 MB(VRAM) (780 tensors)
[DEBUG] ggml_extend.hpp:1193 - vae params backend buffer size =  160.00 MB(VRAM) (244 tensors)
[DEBUG] stable-diffusion.cpp:458  - loading weights
[DEBUG] model.cpp:1868 - loading tensors from flux1-kontext-dev-Q3_K_M.gguf
  |===========================>                      | 780/1439 - 166.67it/s
[DEBUG] model.cpp:1868 - loading tensors from clip_l.safetensors
  |=================================>                | 976/1439 - 0.00it/s
[DEBUG] model.cpp:1868 - loading tensors from t5xxl_fp8_e4m3fn.safetensors
  |=========================================>        | 1195/1439 - 1.42it/s
[DEBUG] model.cpp:1868 - loading tensors from ae.safetensors
  |==================================================| 1439/1439 - 0.00it/s
[INFO ] stable-diffusion.cpp:542  - total params memory size = 14670.95MB (VRAM 5279.74MB, RAM 9391.21MB): clip 9391.21MB(RAM), unet 5119.74MB(VRAM), vae 160.00MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:561  - loading model from '' completed, taking 37.50s
[INFO ] stable-diffusion.cpp:587  - running in Flux FLOW mode
[DEBUG] stable-diffusion.cpp:647  - finished loaded file
[DEBUG] stable-diffusion.cpp:1877 - generate_image 768x640
[INFO ] stable-diffusion.cpp:2007 - TXT2IMG
[INFO ] stable-diffusion.cpp:2015 - EDIT mode
[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 976.50 MB(VRAM)
[New Thread 0x7fffd557c700 (LWP 2613509)]
[New Thread 0x7fffd4d7b700 (LWP 2613510)]
[New Thread 0x7ffd708d7700 (LWP 2613511)]
[New Thread 0x7ffd6bfff700 (LWP 2613512)]
[New Thread 0x7ffd6b7fe700 (LWP 2613513)]
[New Thread 0x7ffd6affd700 (LWP 2613514)]
[New Thread 0x7ffd6a7fc700 (LWP 2613515)]
[New Thread 0x7ffd69ffb700 (LWP 2613516)]
[Thread 0x7ffd6bfff700 (LWP 2613512) exited]
[New Thread 0x7ffd697fa700 (LWP 2613517)]
[New Thread 0x7ffd68ff9700 (LWP 2613518)]
[New Thread 0x7ffd47fff700 (LWP 2613519)]

[New Thread 0x7ffd68ff9700 (LWP 2613518)]
[New Thread 0x7ffd47fff700 (LWP 2613519)]
[Thread 0x7fffd557c700 (LWP 2613509) exited]
[Thread 0x7ffd708d7700 (LWP 2613511) exited]
[New Thread 0x7ffd477fe700 (LWP 2613520)]
[New Thread 0x7ffd46ffd700 (LWP 2613521)]
[New Thread 0x7ffd467fc700 (LWP 2613522)]
[Thread 0x7fffd4d7b700 (LWP 2613510) exited]
[New Thread 0x7ffd45ffb700 (LWP 2613523)]
[Thread 0x7ffd46ffd700 (LWP 2613521) exited]
[Thread 0x7ffd477fe700 (LWP 2613520) exited]
[Thread 0x7ffd47fff700 (LWP 2613519) exited]
[Thread 0x7ffd68ff9700 (LWP 2613518) exited]
[Thread 0x7ffd697fa700 (LWP 2613517) exited]
[Thread 0x7ffd69ffb700 (LWP 2613516) exited]
[Thread 0x7ffd6a7fc700 (LWP 2613515) exited]
[Thread 0x7ffd6affd700 (LWP 2613514) exited]
[Thread 0x7ffd6b7fe700 (LWP 2613513) exited]
[Thread 0x7ffd45ffb700 (LWP 2613523) exited]
[Thread 0x7ffd467fc700 (LWP 2613522) exited]
[DEBUG] stable-diffusion.cpp:1181 - computing vae [mode: ENCODE] graph completed, taking 0.66s
[INFO ] stable-diffusion.cpp:2050 - encode_first_stage completed, taking 0.68s
[DEBUG] stable-diffusion.cpp:1552 - lora fluxKontextDeblur.HwEy:1.00
[DEBUG] stable-diffusion.cpp:1556 - prompt after extract and remove lora: "deblur"
[WARN ] stable-diffusion.cpp:719  - In quantized models when applying LoRA, the images have poor quality.
[INFO ] stable-diffusion.cpp:737  - Attempting to apply 1 LoRAs
[INFO ] model.cpp:998  - load fluxKontextDeblur.HwEy.safetensors using safetensors format
[DEBUG] model.cpp:1073 - init from 'fluxKontextDeblur.HwEy.safetensors'
[INFO ] lora.hpp:117  - loading LoRA from 'fluxKontextDeblur.HwEy.safetensors'
[DEBUG] model.cpp:1868 - loading tensors from fluxKontextDeblur.HwEy.safetensors
[DEBUG] ggml_extend.hpp:1193 - lora params backend buffer size =  292.32 MB(VRAM) (610 tensors)
[DEBUG] model.cpp:1868 - loading tensors from fluxKontextDeblur.HwEy.safetensors
  |==================================================| 610/610 - 0.00it/s
[DEBUG] lora.hpp:160  - lora type: ".lora_down"/".lora_up"
[DEBUG] lora.hpp:162  - finished loaded lora
[DEBUG] lora.hpp:831  - (610 / 610) LoRA tensors applied successfully
[DEBUG] ggml_extend.hpp:1145 - lora compute buffer size: 757.69 MB(VRAM)
[DEBUG] lora.hpp:831  - (610 / 610) LoRA tensors applied successfully

Thread 1 "sd" received signal SIGSEGV, Segmentation fault.
0x00000000007541c4 in ggml_vk_build_graph(ggml_backend_vk_context*, ggml_tensor*, int, ggml_tensor*, int, bool, bool, bool, bool) ()

(gdb) bt
#0  0x00000000007541c4 in ggml_vk_build_graph(ggml_backend_vk_context*, ggml_tensor*, int, ggml_tensor*, int, bool, bool, bool, bool) ()
#1  0x000000000075a714 in ggml_backend_vk_graph_compute(ggml_backend*, ggml_cgraph*) ()
#2  0x0000000000770f20 in ggml_backend_graph_compute ()
#3  0x0000000000509db6 in GGMLRunner::compute(std::function<ggml_cgraph* ()>, int, bool, ggml_tensor**, ggml_context*) ()
#4  0x00000000005908ce in StableDiffusionGGML::apply_lora(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, float) ()
#5  0x0000000000599709 in generate_image_internal(sd_ctx_t*, ggml_context*, ggml_tensor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, sd_guidance_params_t, float, int, int, sample_method_t, std::vector<float, std::allocator<float> > const&, long, int, sd_image_t const*, float, float, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<ggml_tensor*, std::allocator<ggml_tensor*> >, ggml_tensor*, ggml_tensor*) ()
#6  0x000000000059ccae in generate_image ()
#7  0x000000000046d9f2 in main ()

The output of the successful run in the CPU version is like this:

sd --threads 20 --diffusion-model  "flux1-kontext-dev-Q3_K_M.gguf" --vae "ae.safetensors" --clip_l "clip_l.safetensors" --t5xxl "t5xxl_fp8_e4m3fn.safetensors" --lora-model-dir "./model" --steps 6 --lora-model-dir "./model" -H 640 -W 768 --vae-tiling --diffusion-fa --clip-on-cpu --cfg-scale 1.0 --sampling-method euler -v -r "ph372_c2_5.png" -p "deblur<lora:fluxKontextDeblur.HwEy:1>" -o ph37_test_cpu.png
Option:
    n_threads:         20
    mode:              img_gen
    model_path:
    wtype:             unspecified
    clip_l_path:       clip_l.safetensors
    clip_g_path:
    t5xxl_path:        t5xxl_fp8_e4m3fn.safetensors
    diffusion_model_path:   flux1-kontext-dev-Q3_K_M.gguf
    vae_path:          ae.safetensors
    taesd_path:
    esrgan_path:
    control_net_path:
    embedding_dir:
    stacked_id_embed_dir:
    input_id_images_path:
    style ratio:       20.00
    normalize input image :  false
    output_path:       ph37_test_cpu.png
    init_img:
    mask_img:
    control_image:
    ref_images_paths:
        ph372_c2_5.png
    clip on cpu:       true
    controlnet cpu:    false
    vae decoder on cpu:false
    diffusion flash attention:true
    strength(control): 0.90
    prompt:            deblur<lora:fluxKontextDeblur.HwEy:1>
    negative_prompt:
    min_cfg:           1.00
    cfg_scale:         1.00
    img_cfg_scale:     1.00
    slg_scale:         0.00
    guidance:          3.50
    eta:               0.00
    clip_skip:         -1
    width:             768
    clip_skip:         -1
[232/1511]
    width:             768
    height:            640
    sample_method:     euler
    schedule:          default
    sample_steps:      6
    strength(img2img): 0.75
    rng:               cuda
    seed:              42
    batch_count:       1
    vae_tiling:        true
    upscale_repeats:   1
    chroma_use_dit_mask:   true
    chroma_use_t5_mask:    false
    chroma_t5_mask_pad:    1
System Info:
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[DEBUG] stable-diffusion.cpp:167  - Using CPU backend
[INFO ] stable-diffusion.cpp:198  - loading diffusion model from 'flux1-kontext-dev-Q3_K_M.gguf'
[INFO ] model.cpp:995  - load flux1-kontext-dev-Q3_K_M.gguf using gguf format
[DEBUG] model.cpp:1012 - init from 'flux1-kontext-dev-Q3_K_M.gguf'
[INFO ] stable-diffusion.cpp:207  - loading clip_l from 'clip_l.safetensors'
[INFO ] model.cpp:998  - load clip_l.safetensors using safetensors format
[DEBUG] model.cpp:1073 - init from 'clip_l.safetensors'
[INFO ] stable-diffusion.cpp:223  - loading t5xxl from 't5xxl_fp8_e4m3fn.safetensors'
[INFO ] model.cpp:998  - load t5xxl_fp8_e4m3fn.safetensors using safetensors format
[DEBUG] model.cpp:1073 - init from 't5xxl_fp8_e4m3fn.safetensors'
[INFO ] stable-diffusion.cpp:230  - loading vae from 'ae.safetensors'
[INFO ] model.cpp:998  - load ae.safetensors using safetensors format
[DEBUG] model.cpp:1073 - init from 'ae.safetensors'
[INFO ] stable-diffusion.cpp:242  - Version: Flux
[INFO ] stable-diffusion.cpp:276  - Weight type:                 q3_K
[INFO ] stable-diffusion.cpp:277  - Conditioner weight type:     f16
[INFO ] stable-diffusion.cpp:278  - Diffusion model weight type: q3_K
[INFO ] stable-diffusion.cpp:279  - VAE weight type:             f32
[DEBUG] stable-diffusion.cpp:281  - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:329  - Using flash attention in the diffusion model
[DEBUG] clip.hpp:171  - vocab size: 49408
[DEBUG] clip.hpp:182  -  trigger word img already in vocab
[INFO ] flux.hpp:1096 - Flux blocks: 19 double, 38 single
[DEBUG] ggml_extend.hpp:1193 - clip params backend buffer size =  307.44 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1193 - t5 params backend buffer size =  9083.77 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1193 - flux params backend buffer size =  5119.74 MB(RAM) (780 tensors)
[DEBUG] ggml_extend.hpp:1193 - vae params backend buffer size =  160.00 MB(RAM) (244 tensors)
[DEBUG] stable-diffusion.cpp:458  - loading weights
[DEBUG] model.cpp:1868 - loading tensors from flux1-kontext-dev-Q3_K_M.gguf
  |===========================>                      | 780/1439 - 83.33it/s
[DEBUG] model.cpp:1868 - loading tensors from clip_l.safetensors
  |===========================>                      | 780/1439 - 83.33it/s
[DEBUG] model.cpp:1868 - loading tensors from clip_l.safetensors
  |=================================>                | 976/1439 - 1000.00it/s
[DEBUG] model.cpp:1868 - loading tensors from t5xxl_fp8_e4m3fn.safetensors
  |=========================================>        | 1195/1439 - 1.28it/s
[DEBUG] model.cpp:1868 - loading tensors from ae.safetensors
  |==================================================| 1439/1439 - 0.00it/s
[INFO ] stable-diffusion.cpp:542  - total params memory size = 14670.95MB (VRAM 0.00MB, RAM 14670.95MB): clip 9391.21MB(RAM), unet 5119.74MB(RAM), vae 160.00MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:561  - loading model from '' completed, taking 51.13s
[INFO ] stable-diffusion.cpp:587  - running in Flux FLOW mode
[DEBUG] stable-diffusion.cpp:647  - finished loaded file
[DEBUG] stable-diffusion.cpp:1877 - generate_image 768x640
[INFO ] stable-diffusion.cpp:2007 - TXT2IMG
[INFO ] stable-diffusion.cpp:2015 - EDIT mode
[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 1825.31 MB(RAM)
[DEBUG] stable-diffusion.cpp:1181 - computing vae [mode: ENCODE] graph completed, taking 17.44s
[INFO ] stable-diffusion.cpp:2050 - encode_first_stage completed, taking 17.55s
[DEBUG] stable-diffusion.cpp:1552 - lora fluxKontextDeblur.HwEy:1.00
[DEBUG] stable-diffusion.cpp:1556 - prompt after extract and remove lora: "deblur"
[WARN ] stable-diffusion.cpp:719  - In quantized models when applying LoRA, the images have poor quality.
[INFO ] stable-diffusion.cpp:737  - Attempting to apply 1 LoRAs
[INFO ] model.cpp:998  - load fluxKontextDeblur.HwEy.safetensors using safetensors format
[DEBUG] model.cpp:1073 - init from 'fluxKontextDeblur.HwEy.safetensors'
[INFO ] lora.hpp:117  - loading LoRA from 'fluxKontextDeblur.HwEy.safetensors'
[DEBUG] model.cpp:1868 - loading tensors from fluxKontextDeblur.HwEy.safetensors
[DEBUG] ggml_extend.hpp:1193 - lora params backend buffer size =  292.32 MB(RAM) (610 tensors)
[DEBUG] model.cpp:1868 - loading tensors from fluxKontextDeblur.HwEy.safetensors
  |==================================================| 610/610 - 0.00it/s
[DEBUG] lora.hpp:160  - lora type: ".lora_down"/".lora_up"
[DEBUG] lora.hpp:162  - finished loaded lora
[DEBUG] lora.hpp:831  - (610 / 610) LoRA tensors applied successfully
[DEBUG] ggml_extend.hpp:1145 - lora compute buffer size: 757.69 MB(RAM)
[DEBUG] lora.hpp:831  - (610 / 610) LoRA tensors applied successfully

############ the GPU version crashs here #############

[INFO ] stable-diffusion.cpp:714  - lora 'fluxKontextDeblur.HwEy' applied, taking 63.81s
[INFO ] stable-diffusion.cpp:1561 - apply_loras completed, taking 63.81s
[DEBUG] conditioner.hpp:1060 - parse 'deblur' to [['deblur', 1], ]
[DEBUG] clip.hpp:311  - token length: 77
[DEBUG] t5.hpp:398  - token length: 256
[DEBUG] clip.hpp:736  - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1145 - clip compute buffer size: 1.40 MB(RAM)
[DEBUG] clip.hpp:736  - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1145 - t5 compute buffer size: 68.25 MB(RAM)
[DEBUG] conditioner.hpp:1175 - computing condition graph completed, taking 6257 ms
[INFO ] stable-diffusion.cpp:1695 - get_learned_condition completed, taking 6266 ms
[INFO ] stable-diffusion.cpp:1718 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1767 - generating image: 1/1 - seed 42
[DEBUG] stable-diffusion.cpp:864  - Sample
[DEBUG] ggml_extend.hpp:1145 - flux compute buffer size: 793.17 MB(RAM)
  |==================================================| 6/6 - 225.86s/it
[INFO ] stable-diffusion.cpp:1805 - sampling completed, taking 1354.66s
[INFO ] stable-diffusion.cpp:1813 - generating 1 latent images completed, taking 1356.29s
[INFO ] stable-diffusion.cpp:1816 - decoding 1 latents
[DEBUG] ggml_extend.hpp:623  - tile work buffer size: 0.81 MB
[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
[INFO ] ggml_extend.hpp:637  - processing 30 tiles
  |=>                                                | 1/30 - 0.00it/s[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |=>                                                | 1/30 - 5.12s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |===>                                              | 2/30 - 5.24s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |=====>                                            | 3/30 - 5.02s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |======>                                           | 4/30 - 5.04s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |=====>                                            | 3/30 - 5.02s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM) 
  |======>                                           | 4/30 - 5.04s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |========>                                         | 5/30 - 5.02s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |==========>                                       | 6/30 - 4.82s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |===========>                                      | 7/30 - 4.77s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |=============>                                    | 8/30 - 4.82s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |===============>                                  | 9/30 - 4.70s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |================>                                 | 10/30 - 4.73s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |==================>                               | 11/30 - 4.65s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |====================>                             | 12/30 - 4.68s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |=====================>                            | 13/30 - 4.77s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |=======================>                          | 14/30 - 5.03s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |=========================>                        | 15/30 - 4.95s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |==========================>                       | 16/30 - 4.99s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |============================>                     | 17/30 - 4.94s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |==============================>                   | 18/30 - 4.87s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |===============================>                  | 19/30 - 4.84s/it[DEBUG] ggml_extend.hpp:1145 - vae compute buffer size: 416.00 MB(RAM)
  |==================================================| 30/30 - 4.82s/it
[DEBUG] stable-diffusion.cpp:1181 - computing vae [mode: DECODE] graph completed, taking 102.86s
[INFO ] stable-diffusion.cpp:1826 - latent 1 decoded, taking 102.86s
[INFO ] stable-diffusion.cpp:1830 - decode_first_stage completed, taking 102.86s
[INFO ] stable-diffusion.cpp:2078 - generate_image completed in 1546.81s
save result PNG image to 'ph37_test_cpu.png'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions